# Assignment 2 - import and explore the data

## Analyse provided data

1. How many locations are there in the data set?
2. What are the five locations with the highest number of records?
3. How many service settings, context types, national categories, and appointment statuses are there?

### Import Pandas and prepare DataFrames

In [82]:
# Import packages with standard conventions
import numpy as np
import pandas as pd

ad = pd.read_csv('actual_duration.csv')
ar = pd.read_csv('appointments_regional.csv')
nc = pd.read_excel('national_categories.xlsx')

# View the DataFrames.
print(ad.shape)
print(ad.columns)

print(ar.shape)
print(ar.columns)

print(nc.shape)
print(nc.columns)

(137793, 8)
Index(['sub_icb_location_code', 'sub_icb_location_ons_code',
       'sub_icb_location_name', 'icb_ons_code', 'region_ons_code',
       'appointment_date', 'actual_duration', 'count_of_appointments'],
      dtype='object')
(596821, 7)
Index(['icb_ons_code', 'appointment_month', 'appointment_status', 'hcp_type',
       'appointment_mode', 'time_between_book_and_appointment',
       'count_of_appointments'],
      dtype='object')
(817394, 8)
Index(['appointment_date', 'icb_ons_code', 'sub_icb_location_name',
       'service_setting', 'context_type', 'national_category',
       'count_of_appointments', 'appointment_month'],
      dtype='object')


### Determine descriptive statistics

In [83]:
ad.describe()

Unnamed: 0,count_of_appointments
count,137793.0
mean,1219.080011
std,1546.902956
min,1.0
25%,194.0
50%,696.0
75%,1621.0
max,15400.0


In [84]:
ar.describe()

Unnamed: 0,count_of_appointments
count,596821.0
mean,1244.601857
std,5856.887042
min,1.0
25%,7.0
50%,47.0
75%,308.0
max,211265.0


In [85]:
nc.describe()

Unnamed: 0,count_of_appointments
count,817394.0
mean,362.183684
std,1084.5766
min,1.0
25%,7.0
50%,25.0
75%,128.0
max,16590.0


### How many locations are there in the data set?

In [86]:
# Count how many locations using the nc DatFrame
locations = nc['sub_icb_location_name'].value_counts()

print(locations, '\n')
print(f"There are {len(locations)} locations in total.")

NHS North West London ICB - W2U3Z              13007
NHS Kent and Medway ICB - 91Q                  12637
NHS Devon ICB - 15N                            12526
NHS Hampshire and Isle Of Wight ICB - D9Y0V    12171
NHS North East London ICB - A3A8R              11837
                                               ...  
NHS North East and North Cumbria ICB - 00N      4210
NHS Lancashire and South Cumbria ICB - 02G      4169
NHS Cheshire and Merseyside ICB - 01V           3496
NHS Cheshire and Merseyside ICB - 01T           3242
NHS Greater Manchester ICB - 00V                2170
Name: sub_icb_location_name, Length: 106, dtype: int64 

There are 106 locations in total.


### What are the five locations with the highest number of records?

In [87]:
# Count which locations have the highest number of records
nc['sub_icb_location_name'].value_counts().to_frame()

Unnamed: 0,sub_icb_location_name
NHS North West London ICB - W2U3Z,13007
NHS Kent and Medway ICB - 91Q,12637
NHS Devon ICB - 15N,12526
NHS Hampshire and Isle Of Wight ICB - D9Y0V,12171
NHS North East London ICB - A3A8R,11837
...,...
NHS North East and North Cumbria ICB - 00N,4210
NHS Lancashire and South Cumbria ICB - 02G,4169
NHS Cheshire and Merseyside ICB - 01V,3496
NHS Cheshire and Merseyside ICB - 01T,3242


There are 106 locations.

Top 5 most number of records are:

NHS North West London ICB - W2U3Z	13007\
NHS Kent and Medway ICB - 91Q	12637\
NHS Devon ICB - 15N	12526\
NHS Hampshire and Isle Of Wight ICB - D9Y0V	12171\
NHS North East London ICB - A3A8R	11837

**Which areas have the most number of appointments:**

In [88]:
# Create subset for ease of use
locations2 = nc[['sub_icb_location_name','count_of_appointments']]

# Determine the most number of appointments among the locations
loc_sum = locations2.groupby(['sub_icb_location_name'])['count_of_appointments']\
.agg(sum).sort_values(ascending=False)

# View the DataFrame
print(loc_sum)

sub_icb_location_name
NHS North West London ICB - W2U3Z               12142390
NHS North East London ICB - A3A8R                9588891
NHS Kent and Medway ICB - 91Q                    9286167
NHS Hampshire and Isle Of Wight ICB - D9Y0V      8288102
NHS South East London ICB - 72Q                  7850170
                                                  ...   
NHS Cheshire and Merseyside ICB - 01V             641149
NHS Nottingham and Nottinghamshire ICB - 02Q      639660
NHS Greater Manchester ICB - 00V                  639211
NHS Cheshire and Merseyside ICB - 01T             606606
NHS Lancashire and South Cumbria ICB - 02G        554694
Name: count_of_appointments, Length: 106, dtype: int64


### How many service settings, context types, national categories, and appointment statuses are there?

**Service settings:**

In [89]:
# Count how many service settings are there
nc['service_setting'].value_counts().to_frame()

Unnamed: 0,service_setting
General Practice,359274
Primary Care Network,183790
Other,138789
Extended Access Provision,108122
Unmapped,27419


In [90]:
# Count how many service settings are there
serv_set = nc['service_setting'].value_counts().to_frame()

print(serv_set, '\n')
print(f"There are {len(serv_set)} service settings used.")

                           service_setting
General Practice                    359274
Primary Care Network                183790
Other                               138789
Extended Access Provision           108122
Unmapped                             27419 

There are 5 service settings used.


**Context types:**

In [91]:
# Count how many context types are there
nc['context_type'].value_counts().to_frame()

Unnamed: 0,context_type
Care Related Encounter,700481
Inconsistent Mapping,89494
Unmapped,27419


In [92]:
# Count how many context types are there
con_types = nc['context_type'].value_counts().to_frame()

print(con_types, '\n')
print(f"There are {len(con_types)} context types used.")

                        context_type
Care Related Encounter        700481
Inconsistent Mapping           89494
Unmapped                       27419 

There are 3 context types used.


**National categories:**

In [93]:
# Count how many national categories are there
nc['national_category'].value_counts().to_frame()

Unnamed: 0,national_category
Inconsistent Mapping,89494
General Consultation Routine,89329
General Consultation Acute,84874
Planned Clinics,76429
Clinical Triage,74539
Planned Clinical Procedure,59631
Structured Medication Review,44467
Service provided by organisation external to the practice,43095
Home Visit,41850
Unplanned Clinical Activity,40415


In [94]:
# Count how many national categories are there
nat_cat = nc['national_category'].value_counts().to_frame()

print(nat_cat, '\n')
print(f"There are {len(nat_cat)} national categories used.")

                                                    national_category
Inconsistent Mapping                                            89494
General Consultation Routine                                    89329
General Consultation Acute                                      84874
Planned Clinics                                                 76429
Clinical Triage                                                 74539
Planned Clinical Procedure                                      59631
Structured Medication Review                                    44467
Service provided by organisation external to th...              43095
Home Visit                                                      41850
Unplanned Clinical Activity                                     40415
Patient contact during Care Home Round                          28795
Unmapped                                                        27419
Care Home Visit                                                 26644
Social Prescribing S

**Appointment statuses:**

In [95]:
# Count how many appointment statuses are there
ar['appointment_status'].value_counts().to_frame()

Unnamed: 0,appointment_status
Attended,232137
Unknown,201324
DNA,163360


In [96]:
# Count how many appointment statuses are there
app_stat = ar['appointment_status'].value_counts().to_frame()

print(app_stat, '\n')
print(f"There are {len(app_stat)} appointment statuses used.")

          appointment_status
Attended              232137
Unknown               201324
DNA                   163360 

There are 3 appointment statuses used.


In [97]:
app_stat_subset = ar[['icb_ons_code', 'appointment_status', 'count_of_appointments']]
print(app_stat_subset)

       icb_ons_code appointment_status  count_of_appointments
0         E54000034           Attended                   8107
1         E54000034           Attended                   6791
2         E54000034           Attended                  20686
3         E54000034           Attended                   4268
4         E54000034           Attended                  11971
...             ...                ...                    ...
596816    E54000050            Unknown                     21
596817    E54000050            Unknown                      8
596818    E54000050            Unknown                     28
596819    E54000050            Unknown                     17
596820    E54000050            Unknown                     10

[596821 rows x 3 columns]


In [98]:
# Determine the total count of status to understand opportunities in data
app_stat_sum = app_stat_subset.groupby(['appointment_status'])['count_of_appointments']\
.agg(sum).sort_values(ascending=False)

# View the DataFrame
print(app_stat_sum)

appointment_status
Attended    677755876
Unknown      34137416
DNA          30911233
Name: count_of_appointments, dtype: int64


### Are there any insights and trends identified while determining the results?

As already indicated there are a lot of quality issues in the data and that is apparent here.  However, looking at the content itself:
1. On apppointment status, which we will dwell on in this project:
    - 91% of the regional appointments show that patients have attended their appointments.  there is 5% of Did not Attend (DNA) and 4% Unknown that we have an opportunity to look into further.  This is where the unnecessary costs are.

2. On national categories and context types:
    - From a data sanitation and streamlining perspective, it would be beneficial if the national categories are standardised so that time spent and future pritoritisation can be reviewed more accurately.
    - Inconsistent mapping is at 5th position while Unmapped items is currently 7th (in volume) out of 18 categories.  This is a sizeable 13% of transactions that we cannot trace at the moment.

# Assignment 3 - analyse the data

## Analyse provided data

1. Between what dates were appointments scheduled? 
2. Which service setting reported the most appointments in North West London from 1 January to 1 June 2022?
3. Which month had the highest number of appointments?
4. What was the total number of records per month?

In [99]:
# View the first five rows of appointment_date for the ad DataFrame to determine the date format.
ad.head()

Unnamed: 0,sub_icb_location_code,sub_icb_location_ons_code,sub_icb_location_name,icb_ons_code,region_ons_code,appointment_date,actual_duration,count_of_appointments
0,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,01-Dec-21,31-60 Minutes,364
1,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,01-Dec-21,21-30 Minutes,619
2,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,01-Dec-21,6-10 Minutes,1698
3,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,01-Dec-21,Unknown / Data Quality,1277
4,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,01-Dec-21,16-20 Minutes,730


In [100]:
# View the first five rows of appointment_date for the ad DataFrame to determine the date format.
ar.head()

Unnamed: 0,icb_ons_code,appointment_month,appointment_status,hcp_type,appointment_mode,time_between_book_and_appointment,count_of_appointments
0,E54000034,2020-01,Attended,GP,Face-to-Face,1 Day,8107
1,E54000034,2020-01,Attended,GP,Face-to-Face,15 to 21 Days,6791
2,E54000034,2020-01,Attended,GP,Face-to-Face,2 to 7 Days,20686
3,E54000034,2020-01,Attended,GP,Face-to-Face,22 to 28 Days,4268
4,E54000034,2020-01,Attended,GP,Face-to-Face,8 to 14 Days,11971


Date format of ad is DD-Mmm-YY

In [101]:
# View the first five rows of appointment_date for the nc DataFrame to determine the date format.
nc.head()

Unnamed: 0,appointment_date,icb_ons_code,sub_icb_location_name,service_setting,context_type,national_category,count_of_appointments,appointment_month
0,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,Primary Care Network,Care Related Encounter,Patient contact during Care Home Round,3,2021-08
1,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,Other,Care Related Encounter,Planned Clinics,7,2021-08
2,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,General Practice,Care Related Encounter,Home Visit,79,2021-08
3,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,General Practice,Care Related Encounter,General Consultation Acute,725,2021-08
4,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,General Practice,Care Related Encounter,Structured Medication Review,2,2021-08


Date format of nc is YYYY-MM-DD

In [102]:
# Change the date format of ad['appointment_date'].
ad['appointment_date'] = pd.to_datetime(ad['appointment_date'])
ad['appointment_date'] = ad['appointment_date'].dt.strftime('%Y-%m-%d')

# View the DateFrame.
ad

Unnamed: 0,sub_icb_location_code,sub_icb_location_ons_code,sub_icb_location_name,icb_ons_code,region_ons_code,appointment_date,actual_duration,count_of_appointments
0,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,2021-12-01,31-60 Minutes,364
1,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,2021-12-01,21-30 Minutes,619
2,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,2021-12-01,6-10 Minutes,1698
3,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,2021-12-01,Unknown / Data Quality,1277
4,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,2021-12-01,16-20 Minutes,730
...,...,...,...,...,...,...,...,...
137788,X2C4Y,E38000254,NHS West Yorkshire ICB - X2C4Y,E54000054,E40000012,2022-06-30,31-60 Minutes,430
137789,X2C4Y,E38000254,NHS West Yorkshire ICB - X2C4Y,E54000054,E40000012,2022-06-30,21-30 Minutes,751
137790,X2C4Y,E38000254,NHS West Yorkshire ICB - X2C4Y,E54000054,E40000012,2022-06-30,16-20 Minutes,921
137791,X2C4Y,E38000254,NHS West Yorkshire ICB - X2C4Y,E54000054,E40000012,2022-06-30,11-15 Minutes,1439


In [103]:
# Change the date format of ar['appointment_date'].
ar['appointment_month'] = pd.to_datetime(ar['appointment_month'])
ar['appointment_month'] = ar['appointment_month'].dt.strftime('%Y-%m-%d')

# View the DateFrame.
ar

Unnamed: 0,icb_ons_code,appointment_month,appointment_status,hcp_type,appointment_mode,time_between_book_and_appointment,count_of_appointments
0,E54000034,2020-01-01,Attended,GP,Face-to-Face,1 Day,8107
1,E54000034,2020-01-01,Attended,GP,Face-to-Face,15 to 21 Days,6791
2,E54000034,2020-01-01,Attended,GP,Face-to-Face,2 to 7 Days,20686
3,E54000034,2020-01-01,Attended,GP,Face-to-Face,22 to 28 Days,4268
4,E54000034,2020-01-01,Attended,GP,Face-to-Face,8 to 14 Days,11971
...,...,...,...,...,...,...,...
596816,E54000050,2022-06-01,Unknown,Unknown,Unknown,2 to 7 Days,21
596817,E54000050,2022-06-01,Unknown,Unknown,Unknown,22 to 28 Days,8
596818,E54000050,2022-06-01,Unknown,Unknown,Unknown,8 to 14 Days,28
596819,E54000050,2022-06-01,Unknown,Unknown,Unknown,More than 28 Days,17


In [104]:
# Change the date format of ar['appointment_date'].
nc['appointment_date'] = pd.to_datetime(nc['appointment_date'])
nc['appointment_date'] = nc['appointment_date'].dt.strftime('%Y-%m-%d')

# View the DateFrame.
nc

Unnamed: 0,appointment_date,icb_ons_code,sub_icb_location_name,service_setting,context_type,national_category,count_of_appointments,appointment_month
0,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,Primary Care Network,Care Related Encounter,Patient contact during Care Home Round,3,2021-08
1,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,Other,Care Related Encounter,Planned Clinics,7,2021-08
2,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,General Practice,Care Related Encounter,Home Visit,79,2021-08
3,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,General Practice,Care Related Encounter,General Consultation Acute,725,2021-08
4,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,General Practice,Care Related Encounter,Structured Medication Review,2,2021-08
...,...,...,...,...,...,...,...,...
817389,2022-06-30,E54000054,NHS West Yorkshire ICB - X2C4Y,Extended Access Provision,Care Related Encounter,Unplanned Clinical Activity,12,2022-06
817390,2022-06-30,E54000054,NHS West Yorkshire ICB - X2C4Y,Extended Access Provision,Care Related Encounter,Planned Clinics,4,2022-06
817391,2022-06-30,E54000054,NHS West Yorkshire ICB - X2C4Y,Extended Access Provision,Care Related Encounter,Planned Clinical Procedure,92,2022-06
817392,2022-06-30,E54000054,NHS West Yorkshire ICB - X2C4Y,Extended Access Provision,Care Related Encounter,General Consultation Routine,4,2022-06


### Between what dates were appointments scheduled? 

In [105]:
# Determine the minimum and maximum dates in the ad DataFrame.
# Use appropriate docstrings.
print(ad['appointment_date'].min(), '\n')
print(ad['appointment_date'].max(), '\n')

print(f"Appointments are between {(ad['appointment_date'].min())} and \
{(ad['appointment_date'].max())} in the actual duration database.")

2021-12-01 

2022-06-30 

Appointments are between 2021-12-01 and 2022-06-30 in the actual duration database.


In [106]:
# Determine the minimum and maximum dates in the nc DataFrame.
# Use appropriate docstrings.
print(nc['appointment_date'].min(), '\n')
print(nc['appointment_date'].max(), '\n')

print(f"Appointments are between {(nc['appointment_date'].min())} and \
{(nc['appointment_date'].max())} in the national categories database.")

2021-08-01 

2022-06-30 

Appointments are between 2021-08-01 and 2022-06-30 in the national categories database.


**Which month had the highest number of appointments in general?**

### Which service setting reported the most appointments in North West London from 1 January to 1 June 2022?

In [107]:
# Write a function to identify all rows with code 'E54000027'
nwlondon = nc.loc[nc['icb_ons_code'].str.contains("E54000027",case=False)]

nwlondon

Unnamed: 0,appointment_date,icb_ons_code,sub_icb_location_name,service_setting,context_type,national_category,count_of_appointments,appointment_month
794321,2021-08-01,E54000027,NHS North West London ICB - W2U3Z,Unmapped,Unmapped,Unmapped,607,2021-08
794322,2021-08-01,E54000027,NHS North West London ICB - W2U3Z,Other,Inconsistent Mapping,Inconsistent Mapping,6,2021-08
794323,2021-08-01,E54000027,NHS North West London ICB - W2U3Z,General Practice,Inconsistent Mapping,Inconsistent Mapping,47,2021-08
794324,2021-08-01,E54000027,NHS North West London ICB - W2U3Z,General Practice,Care Related Encounter,Walk-in,74,2021-08
794325,2021-08-01,E54000027,NHS North West London ICB - W2U3Z,General Practice,Care Related Encounter,Planned Clinics,98,2021-08
...,...,...,...,...,...,...,...,...
807323,2022-06-30,E54000027,NHS North West London ICB - W2U3Z,Extended Access Provision,Care Related Encounter,Planned Clinical Procedure,6,2022-06
807324,2022-06-30,E54000027,NHS North West London ICB - W2U3Z,Extended Access Provision,Care Related Encounter,General Consultation Routine,25,2022-06
807325,2022-06-30,E54000027,NHS North West London ICB - W2U3Z,Extended Access Provision,Care Related Encounter,General Consultation Acute,217,2022-06
807326,2022-06-30,E54000027,NHS North West London ICB - W2U3Z,Extended Access Provision,Care Related Encounter,Clinical Triage,103,2022-06


In [135]:
# Filter to extract only the needed dates to be reviewed
nwlondon_filtered=nwlondon[(nwlondon['appointment_date'] >= '2022-01-01') &\
         (nwlondon['appointment_date'] < '2022-06-02')]
nwlondon_filtered

Unnamed: 0,appointment_date,icb_ons_code,sub_icb_location_name,service_setting,context_type,national_category,count_of_appointments,appointment_month
800289,2022-01-01,E54000027,NHS North West London ICB - W2U3Z,Unmapped,Unmapped,Unmapped,496,2022-01
800290,2022-01-01,E54000027,NHS North West London ICB - W2U3Z,Primary Care Network,Care Related Encounter,Clinical Triage,19,2022-01
800291,2022-01-01,E54000027,NHS North West London ICB - W2U3Z,Other,Inconsistent Mapping,Inconsistent Mapping,1,2022-01
800292,2022-01-01,E54000027,NHS North West London ICB - W2U3Z,General Practice,Inconsistent Mapping,Inconsistent Mapping,16,2022-01
800293,2022-01-01,E54000027,NHS North West London ICB - W2U3Z,Primary Care Network,Care Related Encounter,Planned Clinics,29,2022-01
...,...,...,...,...,...,...,...,...
806220,2022-06-01,E54000027,NHS North West London ICB - W2U3Z,Extended Access Provision,Care Related Encounter,Home Visit,4,2022-06
806221,2022-06-01,E54000027,NHS North West London ICB - W2U3Z,Extended Access Provision,Care Related Encounter,General Consultation Routine,27,2022-06
806222,2022-06-01,E54000027,NHS North West London ICB - W2U3Z,General Practice,Care Related Encounter,Unplanned Clinical Activity,626,2022-06
806223,2022-06-01,E54000027,NHS North West London ICB - W2U3Z,Extended Access Provision,Care Related Encounter,General Consultation Acute,224,2022-06


In [136]:
# Using the filtered data, determine the highest number of appointments based on the
# service setting being reviewed
nwlondon_filtered.groupby('service_setting')[['count_of_appointments']]\
.sum().sort_values('count_of_appointments', ascending=False)

Unnamed: 0_level_0,count_of_appointments
service_setting,Unnamed: 1_level_1
General Practice,4804239
Unmapped,391106
Other,152897
Primary Care Network,109840
Extended Access Provision,98159


The GP had the most number of appointments in North West London between 1 January to 1 June 2022.

### Which month had the highest number of appointments?

**Which month had the highest number of appointments North West London?**

In [138]:
# Using the filtered data total the monthly appointments
nwlondon_filtered.groupby(nwlondon_filtered.appointment_date.dt.month)\
['count_of_appointments'].sum()

appointment_date
1    1050517
2    1053468
3    1232596
4    1006387
5    1163863
6      49410
Name: count_of_appointments, dtype: int64

The month of March has the highest number of appointments, followed by May.

In order: March, May, February, January, April and June (considering this is just counting one day).

**Which month had the highest number of appointments using all available months?**

In [152]:
# Using the original data total the monthly appointments
nc.groupby(nc.appointment_date.dt.month)\
['count_of_appointments'].sum()

appointment_date
1.0     1050517
2.0     1053468
3.0     1232596
4.0     1006387
5.0     1163863
6.0     1102597
8.0      981385
9.0     1144590
10.0    1180674
11.0    1204118
12.0    1022195
Name: count_of_appointments, dtype: int64

The month of March has the highest number of appointments, followed by November.

*Caveat: As August is the start of the data we may need to take with caution as there may be data collection issues at the start of each similar initiative.  Projections may be made for July, as well given that this is not yet available.*

| Overall Rank |
| :- |
| March |
| November |
| October |
| May |
| September |
| June |
| February |
| January |
| December |
| April |
| August |

From a project perspective, there is opportunity in the second half of the year to focus on the high volume month and implement changes to see the most potential for improvement.

### What was the total number of records per month?

**What was the total number of records per month for North West London?**

In [139]:
# Using the filtered data, what is the total count of records per month
nwlondon_filtered.groupby(nwlondon_filtered.appointment_date.dt.month)\
['count_of_appointments'].count()

appointment_date
1    1183
2    1113
3    1271
4    1121
5    1201
6      47
Name: count_of_appointments, dtype: int64

The entries per month for this area are as follows:

* January - 1183
* February - 1113
* March - 1271
* April - 1121
* May - 1201
* June - 47

**What was the total number of records per month using all months?**

In [155]:
# Using the filtered data, what is the total count of records per month
nc.groupby(nc.appointment_date.dt.month)\
['count_of_appointments'].count()

appointment_date
1.0     1183
2.0     1113
3.0     1271
4.0     1121
5.0     1201
6.0     1150
8.0     1147
9.0     1219
10.0    1190
11.0    1212
12.0    1200
Name: count_of_appointments, dtype: int64

The entries per month for **all** areas are as follows:

* January - 1183
* February - 1113
* March - 1271
* April - 1121
* May - 1201
* June - 1150
* July - 
* August - 1147
* September - 1219
* October - 1190
* November - 1212
* December - 1200

In [156]:
nc

Unnamed: 0,appointment_date,icb_ons_code,sub_icb_location_name,service_setting,context_type,national_category,count_of_appointments,appointment_month
0,NaT,E54000050,NHS North East and North Cumbria ICB - 00L,Primary Care Network,Care Related Encounter,Patient contact during Care Home Round,3,2021-08
1,NaT,E54000050,NHS North East and North Cumbria ICB - 00L,Other,Care Related Encounter,Planned Clinics,7,2021-08
2,NaT,E54000050,NHS North East and North Cumbria ICB - 00L,General Practice,Care Related Encounter,Home Visit,79,2021-08
3,NaT,E54000050,NHS North East and North Cumbria ICB - 00L,General Practice,Care Related Encounter,General Consultation Acute,725,2021-08
4,NaT,E54000050,NHS North East and North Cumbria ICB - 00L,General Practice,Care Related Encounter,Structured Medication Review,2,2021-08
...,...,...,...,...,...,...,...,...
817389,NaT,E54000054,NHS West Yorkshire ICB - X2C4Y,Extended Access Provision,Care Related Encounter,Unplanned Clinical Activity,12,2022-06
817390,NaT,E54000054,NHS West Yorkshire ICB - X2C4Y,Extended Access Provision,Care Related Encounter,Planned Clinics,4,2022-06
817391,NaT,E54000054,NHS West Yorkshire ICB - X2C4Y,Extended Access Provision,Care Related Encounter,Planned Clinical Procedure,92,2022-06
817392,NaT,E54000054,NHS West Yorkshire ICB - X2C4Y,Extended Access Provision,Care Related Encounter,General Consultation Routine,4,2022-06
