## Support Vector Machines

### Table of Contents

- [Packages and Notebook Properties](#notebook)
- [Theory Discussion](#theory)
- [Dataset Exploration](#dataset)
    - [Background](#databackground)
- [Implementation](#implementation)

### Packages and Notebook Properties

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import sys
import os

Set notebook properties

In [19]:
warnings.filterwarnings('ignore')
pd.options.display.max_columns = None

s

In [7]:
DATA_PATH = r'../data_source'

### Theory Discussion

### Dataset Exploration

#### Background

In this exercise, we will take a look at vaccination, a key public health measure used to fight infectious diseases. Vaccines provide immunization for individuals, and enough immunization in a community can further reduce the spread of diseases through "herd immunity".

A phone survey asked respondents whether they had received the H1N1 and seasonal flu vaccines, in conjunction with questions about themselves. These additional questions covered their social, economic, and demographic background, opinions on risks of illness and vaccine effectiveness, and behaviors towards mitigating transmission. A better understanding of how these characteristics are associated with personal vaccination patterns can provide guidance for future public health efforts.

The goal is to predict how likely individuals are to receive their H1N1 and seasonal flu vaccines. Specifically, we will be predicting two probabilities: one for h1n1_vaccine and one for seasonal_vaccine. Each row in the dataset represents one person who responded to the National 2009 H1N1 Flu Survey.

The dataset is taken from the competetion page in [DrivenData](https://www.drivendata.org/competitions/66/flu-shot-learning/page/210/).

#### Exploratory Data Analysis

In [14]:
training_set_features = pd.read_csv(os.path.join(DATA_PATH, 'training_set_features.csv'))
training_set_labels = pd.read_csv(os.path.join(DATA_PATH, 'training_set_labels.csv'))
test_set_features = pd.read_csv(os.path.join(DATA_PATH, 'test_set_features.csv'))

In [16]:
training_set_features.sample(4)

Unnamed: 0,respondent_id,h1n1_concern,h1n1_knowledge,behavioral_antiviral_meds,behavioral_avoidance,behavioral_face_mask,behavioral_wash_hands,behavioral_large_gatherings,behavioral_outside_home,behavioral_touch_face,...,income_poverty,marital_status,rent_or_own,employment_status,hhs_geo_region,census_msa,household_adults,household_children,employment_industry,employment_occupation
21296,21296,2.0,2.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,...,"> $75,000",Married,Own,Employed,fpwskwrf,"MSA, Not Principle City",1.0,2.0,fcxhlnwr,cmhcxjea
25674,25674,2.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,...,"> $75,000",Married,Own,Not in Labor Force,bhuqouqj,"MSA, Not Principle City",1.0,0.0,,
21920,21920,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,...,Below Poverty,Married,Own,Not in Labor Force,oxchjgsf,"MSA, Not Principle City",1.0,3.0,,
23619,23619,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,...,"<= $75,000, Above Poverty",Married,Own,Not in Labor Force,qufhixun,"MSA, Principle City",1.0,0.0,,


In [17]:
training_set_labels.sample(4)

Unnamed: 0,respondent_id,h1n1_vaccine,seasonal_vaccine
25535,25535,0,1
19499,19499,0,1
1479,1479,0,0
19273,19273,0,0


Merge the training set features with labels

In [18]:
train_df = training_set_features.merge(training_set_labels, on=['respondent_id'], how='left')