<div class='alert' style='background-color: #1c1a1e; color: #f5f4f0; padding:16px 26px; border-radius:20px; font-size:40px;'><B>Project Name</b> - First Notebook </div>
<div style='margin:0px 26px; color:#1c1a1e; font-size:16px;'>
<ol>
<li><B>Notebook Documentation</B>: Documenting a Jupyter Notebook is crucial for ensuring that the analysis it contains is understandable and reproducible, not only by others but also by your future self.  
Best practices in documentation start with a clear, informative introduction: what the notebook aims to achieve, the dataset being used, and any prerequisite knowledge or context needed to follow the analysis.  
Each cell, especially those containing key computations or decisions, should be accompanied by Markdown cells explaining the rationale behind the code, any assumptions made, and a summary of the results. Inline comments within the code cells can clarify complex lines of code or non-obvious steps.  
Visualizations should include titles, axis labels, and legends where appropriate to make them self-explanatory. Finally, the notebook should conclude with a summary of findings, any conclusions drawn, and possible next steps. This narrative structure turns your notebook from a mere collection of code cells into a coherent story about your data analysis journey.
</ol>
</div>

# Libraries & Data

In [1]:
# Importing default Libraries
import matplotlib.pyplot as plt
import pandas as pd 
import numpy as np
import seaborn as sns
import warnings
import datetime 
import os 

pd.options.display.max_rows = 1000
pd.options.display.max_columns = 1000

# Hi-resolution Plots and Matplotlib inline
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

# Set the maximum number of rows and columns to be displayed
warnings.filterwarnings('ignore')

# "magic commands" to enable autoreload of your imported packages
%load_ext autoreload
%autoreload 2

In [2]:
from vf.utils import *
from vf.params import *
from vf.interface.main import *
from vf.timeseries_helper import *
from vf.ml_functions.data import *

In [3]:
list_all_files(RAW_DATA)

['Influenza_Predictor_KMC.csv',
 'influenzaTCP.csv',
 'influenza_predictorECS.csv',
 'sampledata_all.csv',
 'influenzaSMW.csv',
 'influenzaTGP.csv',
 'sampledata.csv']

## Loading Data

In [9]:
data = pd.read_csv(f'{RAW_DATA}/influenzaTGP.csv')
data = preprocess_dataframe(data)
data.head(2)

Unnamed: 0,timestamp,location,vaccine,dob,pt_count,age_at_vaccine
1,2000-10-02,E87762,Influenza Vaccine 1,1931-08-02,1,69
2,2000-10-26,E87043,Influenza Vaccine 1,1970-12-26,1,30


In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10851 entries, 1 to 11367
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   timestamp       10851 non-null  datetime64[ns]
 1   location        10851 non-null  object        
 2   vaccine         10851 non-null  object        
 3   dob             10851 non-null  datetime64[ns]
 4   pt_count        10851 non-null  int64         
 5   age_at_vaccine  10851 non-null  int32         
dtypes: datetime64[ns](2), int32(1), int64(1), object(2)
memory usage: 551.0+ KB


In [7]:
data.isna().sum()

timestamp         0
location          0
vaccine           0
dob               0
pt_count          0
age_at_vaccine    0
dtype: int64

# Exploratory Analysis

In [None]:
ts = TimeSeriesHelper() 