### Bartholomew_Luke_DA201_Assignment_Notebook <a id='top'></a>

# Diagnostic Analysis of NHS Data Using Python

This notebook requires the `actual_duration.csv` and `appointments_regional.csv` and `national_categories.xlsx` files. Upload these files to the directory before you begin.

The csv files, Jupyter notebook and report can be accessed at the [GitHub repository][id2].

[id2]:https://github.com/lukebart/Bartholomew_Luke_DA201_Assignment

Notebook Sections:

1. <a href='#1'>Describe The Data</a>
2. <a href='#2'>Analyse The Data</a>
3. <a href='#3'>Visualise & Identify Trends</a>
4. <a href='#4'>Analyse The Twitter Data</a>
5. <a href='#5'>Recommendations</a>

<table width="100%">
<thead>
    <tr style="background-color:#D6EEEE">
        <th><h2 style="text-align:left">1. Describe The Data</h2></th>
        <th><a id='1'></a></th>
    </tr>
</thead>
</table>

Import and sense-check the data from the csv files using Pandas DataFrames. Determine (a) column names, number of rows and columns, data types and number of missing values, and (b) descriptive statistics and metadata of each DataFrame.

In [7]:
# Import packages with standard conventions
import numpy as np
import pandas as pd

In [2]:
# Function to describe data in the dataframe
# df = DataFrame object (dataframe)
# df_name = Name of the file the DataFrame object created from (string)
# df_columns = List of columns to describe unique values (list)
# df_sum = Column to sum (string)
# df_head = Show number of rows (integer)
def describe_data(df, df_name, df_columns, df_sum, df_head):
    print(f"Shape of DataFrame {df_name}:")
    print(df.shape)
    print('\n')
    print(f"Info of DataFrame {df_name}:")
    print(df.info())
    print('\n')
    print(f"Head of DataFrame {df_name}:")
    print(df.head())
    print('\n')
    print(f"Tail of DataFrame {df_name}:")
    print(df.tail())
    print('\n')
    print(f"Describe DataFrame {df_name}:")
    print(df.describe())
    print('\n')
    df_na = df[df.isna().any(axis=1)]  # missing values
    print(f"Missing values in DataFrame {df_name}:")
    print(df_na.shape)
    print('\n')
    # loop df_columns
    for col_name in df_columns:
        # Show unique values by col_name
        print(f"Unique count for {col_name} of DataFrame {df_name}: ")
        print(df[col_name].unique())
        print(df[col_name].nunique())
        print('\n')
        # Show sum of column df_sum by col_name
        print(f"Sum {df_sum} by {col_name} of DataFrame {df_name} (top {df_head}): ")
        print(df.groupby(col_name).aggregate({df_sum : 'sum'}).
              sort_values(by=[df_sum], ascending=False).head(df_head))
        print('\n')
    return

In [3]:
# Read the CSV files
ad = pd.read_csv('actual_duration.csv')
ar = pd.read_csv('appointments_regional.csv')

In [4]:
# Call function to describe data in actual_duration.csv
describe_data(ad,'actual_duration.csv',['sub_icb_location_ons_code',
                                        'icb_ons_code',
                                        'region_ons_code',
                                        'actual_duration'],
                                        'count_of_appointments',10)

Shape of DataFrame actual_duration.csv:
(137793, 8)


Info of DataFrame actual_duration.csv:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 137793 entries, 0 to 137792
Data columns (total 8 columns):
 #   Column                     Non-Null Count   Dtype 
---  ------                     --------------   ----- 
 0   sub_icb_location_code      137793 non-null  object
 1   sub_icb_location_ons_code  137793 non-null  object
 2   sub_icb_location_name      137793 non-null  object
 3   icb_ons_code               137793 non-null  object
 4   region_ons_code            137793 non-null  object
 5   appointment_date           137793 non-null  object
 6   actual_duration            137793 non-null  object
 7   count_of_appointments      137793 non-null  int64 
dtypes: int64(1), object(7)
memory usage: 8.4+ MB
None


Head of DataFrame actual_duration.csv:
  sub_icb_location_code sub_icb_location_ons_code   
0                   00L                 E38000130  \
1                   00L            

In [5]:
# Call function to describe data in appointments_regional.csv
describe_data(ar,'appointments_regional.csv',['icb_ons_code',
                                              'appointment_status', 
                                              'hcp_type', 
                                              'appointment_mode',
                                              'time_between_book_and_appointment'],
                                              'count_of_appointments',10)

Shape of DataFrame appointments_regional.csv:
(596821, 7)


Info of DataFrame appointments_regional.csv:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 596821 entries, 0 to 596820
Data columns (total 7 columns):
 #   Column                             Non-Null Count   Dtype 
---  ------                             --------------   ----- 
 0   icb_ons_code                       596821 non-null  object
 1   appointment_month                  596821 non-null  object
 2   appointment_status                 596821 non-null  object
 3   hcp_type                           596821 non-null  object
 4   appointment_mode                   596821 non-null  object
 5   time_between_book_and_appointment  596821 non-null  object
 6   count_of_appointments              596821 non-null  int64 
dtypes: int64(1), object(6)
memory usage: 31.9+ MB
None


Head of DataFrame appointments_regional.csv:
  icb_ons_code appointment_month appointment_status hcp_type appointment_mode   
0    E54000034           

In [8]:
# Read the Excel file
nc = pd.read_excel('national_categories.xlsx')

In [9]:
# Call function to describe data in national_categories.xlsx
describe_data(nc,'national_categories.xlsx',['sub_icb_location_name',
                                             'icb_ons_code',
                                             'service_setting', 
                                             'context_type', 
                                             'national_category'],
                                             'count_of_appointments',10)

Shape of DataFrame national_categories.xlsx:
(817394, 8)


Info of DataFrame national_categories.xlsx:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 817394 entries, 0 to 817393
Data columns (total 8 columns):
 #   Column                 Non-Null Count   Dtype         
---  ------                 --------------   -----         
 0   appointment_date       817394 non-null  datetime64[ns]
 1   icb_ons_code           817394 non-null  object        
 2   sub_icb_location_name  817394 non-null  object        
 3   service_setting        817394 non-null  object        
 4   context_type           817394 non-null  object        
 5   national_category      817394 non-null  object        
 6   count_of_appointments  817394 non-null  int64         
 7   appointment_month      817394 non-null  object        
dtypes: datetime64[ns](1), int64(1), object(6)
memory usage: 49.9+ MB
None


Head of DataFrame national_categories.xlsx:
  appointment_date icb_ons_code                       sub_icb_loca

                                             count_of_appointments
sub_icb_location_name                                             
NHS North West London ICB - W2U3Z                         12142390
NHS North East London ICB - A3A8R                          9588891
NHS Kent and Medway ICB - 91Q                              9286167
NHS Hampshire and Isle Of Wight ICB - D9Y0V                8288102
NHS South East London ICB - 72Q                            7850170
NHS Devon ICB - 15N                                        7447758
NHS South West London ICB - 36L                            7155030
NHS Black Country ICB - D2P2L                              7033637
NHS North Central London ICB - 93C                         6747958
NHS Birmingham and Solihull ICB - 15E                      6383746


Unique count for icb_ons_code of DataFrame national_categories.xlsx: 
['E54000050' 'E54000048' 'E54000057' 'E54000008' 'E54000061' 'E54000060'
 'E54000054' 'E54000051' 'E54000015' 'E54000010' 'E

<span style="font-family:Helvetica">
<h2><u>Answer the Questions</u></h2>
<br>
<b>Question 1. How many locations are there in the data set?</b>
   
<ul>
    <li>Sub-ICB Locations = 106</li>
    <li>ICB = 42</li>
    <li>Region = 7</li>
</ul>

<blockquote>The number of locations by Sub-ICB, ICB & Region was cross-referenced 
    with official statistics on the UK Government 
    <a href="https://geoportal.statistics.gov.uk/datasets/2bca16d4f8e4426d80137213fce90bbd_0/explore">website</a>.
    </blockquote>
    

<b>Question 2. What are the five locations with the highest number of records?</b>

<blockquote>This was calculated as the sum of count_of_appointments by sub_icb_location_name from the dataset national_categories.xlsx.</blockquote>

<table>
<thead>
    <tr><th>sub_icb_location_name</th><th>count_of_appointments</th></tr>
</thead>
<tbody>
    <tr><td>NHS North West London ICB - W2U3Z</td><td>12142390</td></tr>
    <tr><td>NHS North East London ICB - A3A8R</td><td>9588891</td></tr>
    <tr><td>NHS Kent and Medway ICB - 91Q</td><td>9286167</td></tr>
    <tr><td>NHS Hampshire and Isle Of Wight ICB - D9Y0V</td><td>8288102</td></tr>
    <tr><td>NHS South East London ICB - 72Q</td><td>7850170</td></tr>
</tbody>
</table>   

<blockquote><div class="alert alert-block alert-info">The reason to use the sum rather than count was that
it would be more useful to know the total sum of count_of_appointments by sub_icb_location_name rather 
    than only the number of records by sub_icb_location_name.</div></blockquote>
                              
<b>Question 3. How many service settings, context types, national categories and appointment statuses are there?</b>

<table>
<thead>
    <tr><th>Category</th><th>Sub-categories</th><th>Count of Sub-categories</th></tr>
</thead>
<tbody>
    <tr><td>Service Settings</td><td>['Primary Care Network' 'Other'<br>'General Practice'<br>
      'Unmapped' 'Extended Access Provision']</td><td>5</td></tr>
    <tr><td>Context Types</td><td>['Care Related Encounter'<br>'Unmapped' 'Inconsistent Mapping']</td><td>3</td></tr>
    <tr><td>National Categories</td><td>['Patient contact during Care Home Round' 'Planned Clinics'<br>
        'Home Visit'
     'General Consultation Acute' 'Structured Medication Review'<br>
     'Care Home Visit' 'Unmapped' 'Clinical Triage'<br>
     'Planned Clinical Procedure' 'Inconsistent Mapping'<br>
     'Care Home Needs Assessment &amp Personalised Care and Support Planning'<br>
     'General Consultation Routine'<br>
     'Service provided by organisation external to the practice'<br>
     'Unplanned Clinical Activity' 'Social Prescribing Service'<br>
     'Non-contractual chargeable work'
     'Group Consultation and Group Education' 'Walk-in']</td><td>18</td></tr>
    <tr><td>Appointment Status</td><td>['Attended' 'DNA' 'Unknown']</td><td>3</td></tr>
</tbody>
</table>
    
> The categories above correspond correctly with the metadata provided with one exception - the "Inconsistent Mapping" and "Unmapped" sub-categories for National Categories does not exactly match the source metadata description text.
<hr>


<table width="100%">
<thead>
    <tr style="background-color:#D6EEEE">
        <th><h2 style="text-align:left">2. Analyse The Data</h2></th>
        <th><a id='2'></a><a href='#top'>Back to Top</a></th>
    </tr>
</thead>
</table>

#### Based on the exploratory analysis:

1. The datatype for ad['appointment_date'] needs to be changed from object to datetime
2. A new column ar['appointment_date'] as datetime datatype needs to be created from ar['appointment_month']

In [45]:
# Change datatype of appointment_date from object to date in dataframe ad
ad["appointment_date"] = pd.to_datetime(ad["appointment_date"], format="%d-%m-%Y")

# Show info for ad dataframe to confirm datatype for appointment_date has changed to datetime
ad.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 137793 entries, 0 to 137792
Data columns (total 8 columns):
 #   Column                     Non-Null Count   Dtype         
---  ------                     --------------   -----         
 0   sub_icb_location_code      137793 non-null  object        
 1   sub_icb_location_ons_code  137793 non-null  object        
 2   sub_icb_location_name      137793 non-null  object        
 3   icb_ons_code               137793 non-null  object        
 4   region_ons_code            137793 non-null  object        
 5   appointment_date           137793 non-null  datetime64[ns]
 6   actual_duration            137793 non-null  object        
 7   count_of_appointments      137793 non-null  int64         
dtypes: datetime64[ns](1), int64(1), object(6)
memory usage: 8.4+ MB


In [13]:
# Show head of ad dataframe
ad.head(3)

Unnamed: 0,sub_icb_location_code,sub_icb_location_ons_code,sub_icb_location_name,icb_ons_code,region_ons_code,appointment_date,actual_duration,count_of_appointments
0,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,2021-12-01,31-60 Minutes,364
1,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,2021-12-01,21-30 Minutes,619
2,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,2021-12-01,6-10 Minutes,1698


In [17]:
# Create new column appointment_date in the ar dataframe from appointment_month
# appointment_date as datatype datetime
ar["appointment_date"] = pd.to_datetime(ar["appointment_month"])

# Show info for ar dataframe to confirm appointment_date has been created with datatype datetime
ar.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 596821 entries, 0 to 596820
Data columns (total 8 columns):
 #   Column                             Non-Null Count   Dtype         
---  ------                             --------------   -----         
 0   icb_ons_code                       596821 non-null  object        
 1   appointment_month                  596821 non-null  object        
 2   appointment_status                 596821 non-null  object        
 3   hcp_type                           596821 non-null  object        
 4   appointment_mode                   596821 non-null  object        
 5   time_between_book_and_appointment  596821 non-null  object        
 6   count_of_appointments              596821 non-null  int64         
 7   appointment_date                   596821 non-null  datetime64[ns]
dtypes: datetime64[ns](1), int64(1), object(6)
memory usage: 36.4+ MB


In [16]:
# Show head of ar dataframe
ar.head(3)

# NOTE: The 'day' attribute for appointment_date has been set to '01' for all rows

Unnamed: 0,icb_ons_code,appointment_month,appointment_status,hcp_type,appointment_mode,time_between_book_and_appointment,count_of_appointments,appointment_date
0,E54000034,2020-01,Attended,GP,Face-to-Face,1 Day,8107,2020-01-01
1,E54000034,2020-01,Attended,GP,Face-to-Face,15 to 21 Days,6791,2020-01-01
2,E54000034,2020-01,Attended,GP,Face-to-Face,2 to 7 Days,20686,2020-01-01


In [18]:
# Show info for nc dataframe
nc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 817394 entries, 0 to 817393
Data columns (total 8 columns):
 #   Column                 Non-Null Count   Dtype         
---  ------                 --------------   -----         
 0   appointment_date       817394 non-null  datetime64[ns]
 1   icb_ons_code           817394 non-null  object        
 2   sub_icb_location_name  817394 non-null  object        
 3   service_setting        817394 non-null  object        
 4   context_type           817394 non-null  object        
 5   national_category      817394 non-null  object        
 6   count_of_appointments  817394 non-null  int64         
 7   appointment_month      817394 non-null  object        
dtypes: datetime64[ns](1), int64(1), object(6)
memory usage: 49.9+ MB


## <u>Question 1</u>

### Between what dates were appointments scheduled?

In [20]:
# Function to calculate aggregates for a column of the same datatype
# from multiple dataframes and concatenate results

# dict_in = pass in a dictionary in format {col_new : {df_dict}}
# col_new = the new column name (string)
# df_dict = dictionary of old column name (df_col) and dataframe name (df_name)

# df_name = Name of the file the DataFrame object created from (string)
# df_col = Column to return aggregate values (string)

# df_list = List of returned dataframes to concatenate

def agg_columns(dict_in, list_agg):
    df_list = []
    for col_new, df_dict in dict_in.items():
        for df_col, df_name in df_dict.items():
            df = df_name[df_col].agg(list_agg).reset_index()
            df.rename(columns = {df_col:col_new}, inplace = True)
            df.set_index('index', inplace=True)
            df_list.append(df)
            # print(df)
            # print('\n')
    result = pd.concat(df_list, axis=1, join='inner')
    result.rename_axis('aggregate', inplace=True)
    return result

In [56]:
# Create dictionary to pass into agg_columns function
dict_source = {'ad_date' : {'appointment_date' : ad},
               'ar_date' : {'appointment_date' : ar},
               'nc_date' : {'appointment_date' : nc}}

# Pass dictionary and aggregatation to function
agg_columns(dict_source, ['min', 'max']).reset_index()

Unnamed: 0,aggregate,ad_date,ar_date,nc_date
0,min,2021-12-01,2020-01-01,2021-08-01
1,max,2022-06-30,2022-06-01,2022-06-30


In [22]:
# Create dictionary to pass into agg_columns function
dict_source = {'ad_appointments' : {'count_of_appointments' : ad},
               'ar_appointments' : {'count_of_appointments' : ar},
               'nc_appointments' : {'count_of_appointments' : nc}}

# Pass dictionary and aggregatation to function
agg_columns(dict_source, ['min', 'max', 'sum']).reset_index()

Unnamed: 0,aggregate,nc_appointments,ad_appointments,ar_appointments
0,min,1,1,1
1,max,16590,15400,211265
2,sum,296046770,167980692,742804525


## <u>Question 2</u>

### Which service setting reported the most appointments in North West London from 1 January to 1 June 2022?

In [23]:
# Service settings are in the nc dataframe only
nc.head(2)

Unnamed: 0,appointment_date,icb_ons_code,sub_icb_location_name,service_setting,context_type,national_category,count_of_appointments,appointment_month
0,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,Primary Care Network,Care Related Encounter,Patient contact during Care Home Round,3,2021-08
1,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,Other,Care Related Encounter,Planned Clinics,7,2021-08


In [24]:
# Create subset of nc dataframe with only sub_icb_location_name = 'NHS North West London ICB - W2U3Z'
nc_subset = nc.loc[nc['sub_icb_location_name'] == 'NHS North West London ICB - W2U3Z']

# Drop columns not required
nc_subset = nc_subset.drop(columns=['national_category', 'appointment_month'])

print(nc_subset.shape)
nc_subset.head(5)

(13007, 6)


Unnamed: 0,appointment_date,icb_ons_code,sub_icb_location_name,service_setting,context_type,count_of_appointments
794321,2021-08-01,E54000027,NHS North West London ICB - W2U3Z,Unmapped,Unmapped,607
794322,2021-08-01,E54000027,NHS North West London ICB - W2U3Z,Other,Inconsistent Mapping,6
794323,2021-08-01,E54000027,NHS North West London ICB - W2U3Z,General Practice,Inconsistent Mapping,47
794324,2021-08-01,E54000027,NHS North West London ICB - W2U3Z,General Practice,Care Related Encounter,74
794325,2021-08-01,E54000027,NHS North West London ICB - W2U3Z,General Practice,Care Related Encounter,98


In [28]:
# filter by date range 01/01/2022 to 01/06/2022
filtered_nc_subset = nc_subset.loc[(nc_subset['appointment_date'] >= '2022-01-01')
                                 & (nc_subset['appointment_date'] < '2022-06-01')]

print(filtered_nc_subset.shape)
filtered_nc_subset.head(5)

(5889, 6)


Unnamed: 0,appointment_date,icb_ons_code,sub_icb_location_name,service_setting,context_type,count_of_appointments
800289,2022-01-01,E54000027,NHS North West London ICB - W2U3Z,Unmapped,Unmapped,496
800290,2022-01-01,E54000027,NHS North West London ICB - W2U3Z,Primary Care Network,Care Related Encounter,19
800291,2022-01-01,E54000027,NHS North West London ICB - W2U3Z,Other,Inconsistent Mapping,1
800292,2022-01-01,E54000027,NHS North West London ICB - W2U3Z,General Practice,Inconsistent Mapping,16
800293,2022-01-01,E54000027,NHS North West London ICB - W2U3Z,Primary Care Network,Care Related Encounter,29


In [29]:
# Show the tail to double check the date filter worked correctly
filtered_nc_subset.tail(5)

Unnamed: 0,appointment_date,icb_ons_code,sub_icb_location_name,service_setting,context_type,count_of_appointments
806173,2022-05-31,E54000027,NHS North West London ICB - W2U3Z,General Practice,Care Related Encounter,10092
806174,2022-05-31,E54000027,NHS North West London ICB - W2U3Z,General Practice,Care Related Encounter,2541
806175,2022-05-31,E54000027,NHS North West London ICB - W2U3Z,General Practice,Care Related Encounter,20
806176,2022-05-31,E54000027,NHS North West London ICB - W2U3Z,General Practice,Care Related Encounter,11
806177,2022-05-31,E54000027,NHS North West London ICB - W2U3Z,General Practice,Care Related Encounter,106


In [60]:
# Groupby Service Settings and calculate sum of count_of_appointments
answer = filtered_nc_subset.groupby('service_setting').\
         agg({'count_of_appointments' : 'sum'}).\
         sort_values(by=['count_of_appointments'], ascending=False)

print(answer)

                           count_of_appointments
service_setting                                 
General Practice                         4760966
Unmapped                                  387939
Other                                     151616
Primary Care Network                      108901
Extended Access Provision                  97409


## <u>Question 3</u>
### Which month had the highest number of appointments?

In [33]:
# Sum of count_of_appointments by Year
nc_groupby_year = nc.groupby(nc['appointment_date'].dt.year)['count_of_appointments'].sum()

nc_groupby_year.head()

appointment_date
2021    138224352
2022    157822418
Name: count_of_appointments, dtype: int64

In [55]:
# Sum of count_of_appointments by Month
nc_groupby_month = nc.groupby(nc['appointment_date'].dt.month)['count_of_appointments'].sum()

nc_groupby_month.head(20)

appointment_date
1     25635474
2     25355260
3     29595038
4     23913060
5     27495508
6     25828078
8     23852171
9     28522501
10    30303834
11    30405070
12    25140776
Name: count_of_appointments, dtype: int64

In [57]:
# Sum of count_of_appointments by Year & Month sorted by sum
nc_groupby_month_year = nc.groupby(
                                   [(nc['appointment_date'].dt.year).rename('year'),
                                    (nc['appointment_date'].dt.month).rename('month')]
                                  )['count_of_appointments'].agg(sum).\
                                   sort_values(ascending=False).reset_index(
                                   name='sum_of_appointments')
nc_groupby_month_year.head(20)

Unnamed: 0,year,month,sum_of_appointments
0,2021,11,30405070
1,2021,10,30303834
2,2022,3,29595038
3,2021,9,28522501
4,2022,5,27495508
5,2022,6,25828078
6,2022,1,25635474
7,2022,2,25355260
8,2021,12,25140776
9,2022,4,23913060


In [53]:
# Sum of count_of_appointments by Year & Month sorted by date
nc_groupby_month_year = nc.groupby(
                                   [(nc['appointment_date'].dt.year).rename('year'),
                                    (nc['appointment_date'].dt.month).rename('month')]
                                  )['count_of_appointments'].sum().reset_index(
                                   name='sum_of_records')
nc_groupby_month_year.head(20)

Unnamed: 0,year,month,sum_of_records
0,2021,8,23852171
1,2021,9,28522501
2,2021,10,30303834
3,2021,11,30405070
4,2021,12,25140776
5,2022,1,25635474
6,2022,2,25355260
7,2022,3,29595038
8,2022,4,23913060
9,2022,5,27495508


## <u>Question 4</u>
### What is the total number of records per month?

In [40]:
# count of records by Year & Month
nc_groupby_month_year = nc.groupby(
                                   [(nc['appointment_date'].dt.year).rename('year'),
                                    (nc['appointment_date'].dt.month).rename('month')]
                                  )['count_of_appointments'].count().reset_index(
                                    name='count_of_records')

nc_groupby_month_year.head(20)

Unnamed: 0,year,month,count_of_records
0,2021,8,69999
1,2021,9,74922
2,2021,10,74078
3,2021,11,77652
4,2021,12,72651
5,2022,1,71896
6,2022,2,71769
7,2022,3,82822
8,2022,4,70012
9,2022,5,77425


In [43]:
# count of records by Year & Month sorted by count
nc_groupby_month_year = nc.groupby(
                                   [(nc['appointment_date'].dt.year).rename('year'),
                                    (nc['appointment_date'].dt.month).rename('month')]
                                  )['count_of_appointments'].count().\
                                    sort_values(ascending=False).reset_index(
                                    name='count_of_records')

nc_groupby_month_year.head(20)

Unnamed: 0,year,month,count_of_records
0,2022,3,82822
1,2021,11,77652
2,2022,5,77425
3,2021,9,74922
4,2022,6,74168
5,2021,10,74078
6,2021,12,72651
7,2022,1,71896
8,2022,2,71769
9,2022,4,70012


<span style="font-family:Helvetica">
<h2><u>Answer the Questions</u></h2>
<br>
<b>Question 1. Between what dates were appointments scheduled?</b>
   
<table>
<thead>
    <tr><th>Dataset</th><th>Dates (from/to)</th></tr>
</thead>
<tbody>
    <tr><td>actual_duration.csv</td><td>from 01/12/2021 to 30/06/2022</td></tr>
    <tr><td>appointments_regional.csv</td><td>from 01/01/2020 to 01/06/2022</td></tr>
    <tr><td>national_categories.xlsx</td><td>from 01/08/2021 to 30/06/2022</td></tr>
</tbody>
</table>

<blockquote>NOTE: appointments_regional.csv dates aggregated by month, therefore every date will be format '01/mm/yyyy'. Each dataset ended in June 2022.</blockquote>
    
<b>Question 2: Which service setting reported the most appointments in North West London from 1 January to 1 June 2022?</b>

<table>
<thead>
    <tr><th>service_setting</th><th>count_of_appointments</th></tr>
</thead>
<tbody>
    <tr><td>General Practice</td><td>4760966</td></tr>
    <tr><td>Unmapped</td><td>387939</td></tr>
    <tr><td>Other</td><td>151616</td></tr>
    <tr><td>Primary Care Network</td><td>108901</td></tr>
    <tr><td>Extended Access Provision</td><td>97409</td></tr>
</tbody>
</table>

<b>Question 3: Which month had the highest number of appointments?</b>

<table>
<thead>
    <tr><th>Month/Year</th><th>sum_of_appointments</th></tr>
</thead>
<tbody>
    <tr><td>October 2021</td><td>30,405,070</td></tr>
    <tr><td>November 2021</td><td>30,303,834</td></tr>
</tbody>
</table>
    
<b>Question 4: What is the total number of records per month?</b>

<table>
<thead>
    <tr><th>Month/Year</th><th>count_of_records</th></tr>
</thead>
<tbody>
    <tr><td>March 2022</td><td>82,822</td></tr>
    <tr><td>November 2021</td><td>77,652</td></tr>
</tbody>
</table>

<table width="100%">
<thead>
    <tr style="background-color:#D6EEEE">
        <th><h2 style="text-align:left">3. Visualise &amp Identify Trends</h2></th>
        <th><a id='3'></a><a href='#top'>Back to Top</a></th>
    </tr>
</thead>
</table>

<table width="100%">
<thead>
    <tr style="background-color:#D6EEEE">
        <th><h2 style="text-align:left">4. Analyse The Twitter Data</h2></th>
        <th><a id='4'></a><a href='#top'>Back to Top</a></th>
    </tr>
</thead>
</table>

<table width="100%">
<thead>
    <tr style="background-color:#D6EEEE">
        <th><h2 style="text-align:left">5. Recommendations</h2></th>
        <th><a id='5'></a><a href='#top'>Back to Top</a></th>
    </tr>
</thead>
</table>