<div class='alert' style='background-color: #1c1a1e; color: #f5f4f0; padding:16px 26px; border-radius:20px; font-size:40px;'><B>Project Name</b> - First Notebook </div>
<div style='margin:0px 26px; color:#1c1a1e; font-size:16px;'>
<ol>
<li><B>Notebook Documentation</B>: Documenting a Jupyter Notebook is crucial for ensuring that the analysis it contains is understandable and reproducible, not only by others but also by your future self.  
Best practices in documentation start with a clear, informative introduction: what the notebook aims to achieve, the dataset being used, and any prerequisite knowledge or context needed to follow the analysis.  
Each cell, especially those containing key computations or decisions, should be accompanied by Markdown cells explaining the rationale behind the code, any assumptions made, and a summary of the results. Inline comments within the code cells can clarify complex lines of code or non-obvious steps.  
Visualizations should include titles, axis labels, and legends where appropriate to make them self-explanatory. Finally, the notebook should conclude with a summary of findings, any conclusions drawn, and possible next steps. This narrative structure turns your notebook from a mere collection of code cells into a coherent story about your data analysis journey.
</ol>
</div>

# Libraries & Data

In [1]:
# Importing default Libraries
import matplotlib.pyplot as plt
import pandas as pd 
import numpy as np
import seaborn as sns
import warnings
import datetime 
import os 

pd.options.display.max_rows = 1000
pd.options.display.max_columns = 1000

# Hi-resolution Plots and Matplotlib inline
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

# Set the maximum number of rows and columns to be displayed
warnings.filterwarnings('ignore')

# "magic commands" to enable autoreload of your imported packages
%load_ext autoreload
%autoreload 2

## Loading Data

In [39]:
data = pd.read_csv('../data/data.csv')
data.head(2)

Unnamed: 0,Vaccination type,Patient ID,Date of birth,Surname,Event date,Event done at ID,Patient Count
0,Cholera 2,27861537,18-Nov-1992,Stanley,16-Mar-2015,,1
1,Measles/Mumps/Rubella 1,41183583,29-Mar-2013,Ferreira,22-Apr-2014,E87750,1


In [62]:
gms = pd.read_csv('../data/gms.csv')
reg_list = gms['Patient ID'].to_list()
reg_list

[52702,
 65293,
 2457891,
 1201808,
 1237456,
 1678856,
 1662154,
 1690694,
 1776589,
 1818640,
 1499541,
 1409894,
 2017998,
 3317543,
 3709292,
 3689361,
 3922932,
 10776440,
 9978428,
 10453520,
 11705499,
 11649497,
 11807477,
 4404601,
 4850765,
 4851120,
 4633876,
 4984559,
 6816605,
 7128980,
 5753135,
 6217264,
 13073262,
 14911780,
 14916520,
 15031590,
 15711584,
 14218021,
 16196817,
 34093107,
 34304517,
 33939151,
 35850226,
 36383580,
 36055714,
 35099658,
 35115361,
 36882197,
 37241138,
 37299255,
 42134636,
 42342978,
 42367664,
 44325331,
 43549424,
 43635223,
 43408435,
 43359182,
 43879409,
 43888000,
 45257790,
 45285847,
 45606448,
 45910918,
 38590189,
 38685152,
 39888829,
 39931993,
 40567329,
 39361717,
 39546808,
 39582030,
 39823161,
 40903412,
 41437091,
 41576169,
 46777785,
 46919404,
 46914629,
 46914617,
 46427679,
 46455773,
 46924390,
 47130170,
 48238713,
 48381176,
 48381704,
 48332029,
 48874815,
 48544396,
 48522449,
 49172205,
 49115309,
 4924719

In [64]:
new = data[data['Patient ID'].isin(reg_list)]

In [65]:
new

Unnamed: 0,Vaccination type,Patient ID,Date of birth,Surname,Event date,Event done at ID,Patient Count
602,Typhim VI - Single Dose Single,44328194,13-Oct-1978,Davies,10-Oct-2018,E87063,1
1045,Influenza Vaccine 1,44328194,13-Oct-1978,Davies,07-Nov-2005,E87063,1
1046,Influenza Vaccine 1,44328194,13-Oct-1978,Davies,10-Aug-2012,E87063,1
1250,Avaxim 1,44328194,13-Oct-1978,Davies,10-Oct-2018,E87063,1
1895,Influenza Vaccine 1,44328194,13-Oct-1978,Davies,02-Nov-2007,E87063,1
2234,Influenza Vaccine 1,44328194,13-Oct-1978,Davies,13-Oct-2009,E87063,1
2235,Influenza Vaccine 1,44328194,13-Oct-1978,Davies,18-Nov-2022,E87762,1
2315,Priorix 1,30733688,04-Oct-1987,Hoxha,24-Mar-2022,E87762,1
2322,Influenza Vaccine 1,44328194,13-Oct-1978,Davies,10-Oct-2008,E87063,1
2633,Influenza Vaccine 1,44328194,13-Oct-1978,Davies,12-Feb-2007,E87063,1


In [66]:
new.info()

<class 'pandas.core.frame.DataFrame'>
Index: 23 entries, 602 to 10745
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Vaccination type  23 non-null     object
 1   Patient ID        23 non-null     int64 
 2   Date of birth     23 non-null     object
 3   Surname           23 non-null     object
 4   Event date        23 non-null     object
 5   Event done at ID  23 non-null     object
 6   Patient Count     23 non-null     int64 
dtypes: int64(2), object(5)
memory usage: 1.4+ KB


In [67]:
new.shape

(23, 7)

In [68]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12556 entries, 0 to 12555
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Vaccination type  12556 non-null  object
 1   Patient ID        12556 non-null  int64 
 2   Date of birth     12556 non-null  object
 3   Surname           12556 non-null  object
 4   Event date        12556 non-null  object
 5   Event done at ID  11712 non-null  object
 6   Patient Count     12556 non-null  int64 
dtypes: int64(2), object(5)
memory usage: 686.8+ KB


In [69]:
new.isna().sum()

Vaccination type    0
Patient ID          0
Date of birth       0
Surname             0
Event date          0
Event done at ID    0
Patient Count       0
dtype: int64

In [70]:
new.dropna(inplace=True)

In [71]:
new.isna().sum()

Vaccination type    0
Patient ID          0
Date of birth       0
Surname             0
Event date          0
Event done at ID    0
Patient Count       0
dtype: int64

In [32]:
new.shape

(23, 8)

# Exploratory Analysis