Prevention is better than Cure?
# Predicting who will get the flu shot
### Business Problem
#### As the world struggles to vaccinate the global population against COVID-19, an understanding of how people’s backgrounds, opinions, and health behaviors are related to their personal vaccination patterns can provide guidance for future public health efforts. This challenge: can you predict whether people got H1N1 and seasonal flu vaccines using data collected in the National 2009 H1N1 Flu Survey? 

### Business Understanding 
Understanding the behaviors that influence vaccination decisions is critical for designing effective public health interventions, especially in the face of global health threats like influenza and pandemics such as COVID-19. Vaccination is one of the most powerful tools in preventing infectious diseases, yet uptake remains inconsistent across populations. This inconsistency is often not due to lack of access alone, but also to a complex interplay of individual beliefs, risk perceptions, trust in healthcare systems, and social influences. By studying these behavioral patterns, public health professionals can identify barriers to vaccine acceptance and tailor communication strategies, outreach programs, and policy decisions to address specific concerns and motivations within different communities.

The importance of this model lies in its ability to predict who is likely to receive a seasonal flu vaccine based on behavioral and demographic data. Using data from the 2009 National H1N1 Flu Survey, the model analyzes variables such as health beliefs, preventive behaviors, healthcare access, and socio-demographic characteristics to classify individuals as vaccinated or not. This predictive capability is invaluable for public health planning. For instance, if the model identifies that individuals with low perceived risk or limited health knowledge are less likely to get vaccinated, targeted education campaigns can be developed to address these gaps. Similarly, if certain age groups or income brackets are underrepresented among the vaccinated, resources can be allocated to improve outreach and accessibility in those segments.

Moreover, predictive models like this one enable proactive rather than reactive public health strategies. Instead of waiting for low vaccination rates to manifest in outbreaks, health authorities can use model insights to anticipate and mitigate risks. This is especially crucial in resource-limited settings, where efficient allocation of vaccines and personnel can make a significant difference in outcomes. The model also supports equity in healthcare by highlighting disparities in vaccine uptake, allowing for interventions that ensure vulnerable populations are not left behind.

In the broader context of pandemic preparedness, understanding vaccination behavior through data science empowers decision-makers with evidence-based tools. It bridges the gap between behavioral science and epidemiology, offering a scalable way to monitor and influence public health behavior. As misinformation and vaccine hesitancy continue to challenge global health efforts, models like this provide a scientific foundation for building trust and improving health literacy. Ultimately, this work contributes to a more resilient healthcare system—one that not only responds to disease but anticipates and prevents it through informed, data-driven action.

In [1]:
#Importing necessary libraries for data manipulation, machine learning, model building and evaluation, handling missing data, encoding, scaling etc.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler


In [None]:
#Loading the necessary data sets
pd.read_csv H1N1_Flu_Vaccines.csv


Data exploration to understand all the 3 data sets

In [4]:
train_features.columns


Index(['respondent_id', 'h1n1_concern', 'h1n1_knowledge',
       'behavioral_antiviral_meds', 'behavioral_avoidance',
       'behavioral_face_mask', 'behavioral_wash_hands',
       'behavioral_large_gatherings', 'behavioral_outside_home',
       'behavioral_touch_face', 'doctor_recc_h1n1', 'doctor_recc_seasonal',
       'chronic_med_condition', 'child_under_6_months', 'health_worker',
       'health_insurance', 'opinion_h1n1_vacc_effective', 'opinion_h1n1_risk',
       'opinion_h1n1_sick_from_vacc', 'opinion_seas_vacc_effective',
       'opinion_seas_risk', 'opinion_seas_sick_from_vacc', 'age_group',
       'education', 'race', 'sex', 'income_poverty', 'marital_status',
       'rent_or_own', 'employment_status', 'hhs_geo_region', 'census_msa',
       'household_adults', 'household_children', 'employment_industry',
       'employment_occupation'],
      dtype='object')

In [5]:
train_labels.columns

Index(['respondent_id', 'h1n1_vaccine', 'seasonal_vaccine'], dtype='object')

In [6]:
test_features.columns

Index(['respondent_id', 'h1n1_concern', 'h1n1_knowledge',
       'behavioral_antiviral_meds', 'behavioral_avoidance',
       'behavioral_face_mask', 'behavioral_wash_hands',
       'behavioral_large_gatherings', 'behavioral_outside_home',
       'behavioral_touch_face', 'doctor_recc_h1n1', 'doctor_recc_seasonal',
       'chronic_med_condition', 'child_under_6_months', 'health_worker',
       'health_insurance', 'opinion_h1n1_vacc_effective', 'opinion_h1n1_risk',
       'opinion_h1n1_sick_from_vacc', 'opinion_seas_vacc_effective',
       'opinion_seas_risk', 'opinion_seas_sick_from_vacc', 'age_group',
       'education', 'race', 'sex', 'income_poverty', 'marital_status',
       'rent_or_own', 'employment_status', 'hhs_geo_region', 'census_msa',
       'household_adults', 'household_children', 'employment_industry',
       'employment_occupation'],
      dtype='object')