## Support Vector Machines

### Table of Contents <a name="toc"></a>


- [Packages and Notebook Properties](#notebook)
- [Theory Discussion](#theory)
- [Dataset Exploration](#dataset)
    - [Background](#databackground)
    - [Exploratory Data Analysis](#eda)
- [Implementation](#implementation)

### Packages and Notebook Properties <a name="notebook"></a>

In [21]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import sys
import os

Set notebook properties

In [22]:
warnings.filterwarnings('ignore')
pd.options.display.max_columns = None

Set data path

In [23]:
DATA_PATH = r'../data_source'

### Theory Discussion <a name="theory"></a>

### Dataset Exploration <a name="dataset"></a>

#### Background <a name="databackground"></a>

In this exercise, we will take a look at vaccination, a key public health measure used to fight infectious diseases. Vaccines provide immunization for individuals, and enough immunization in a community can further reduce the spread of diseases through "herd immunity".

A phone survey asked respondents whether they had received the H1N1 and seasonal flu vaccines, in conjunction with questions about themselves. These additional questions covered their social, economic, and demographic background, opinions on risks of illness and vaccine effectiveness, and behaviors towards mitigating transmission. A better understanding of how these characteristics are associated with personal vaccination patterns can provide guidance for future public health efforts.

The goal is to predict how likely individuals are to receive their H1N1 and seasonal flu vaccines. Specifically, we will be predicting two probabilities: one for h1n1_vaccine and one for seasonal_vaccine. Each row in the dataset represents one person who responded to the National 2009 H1N1 Flu Survey.

The dataset is taken from the competetion page in [DrivenData](https://www.drivendata.org/competitions/66/flu-shot-learning/page/210/).

#### Exploratory Data Analysis <a name="eda"></a>

[back to top](#toc)

In [24]:
training_set_features = pd.read_csv(os.path.join(DATA_PATH, 'training_set_features.csv'))
training_set_labels = pd.read_csv(os.path.join(DATA_PATH, 'training_set_labels.csv'))
test_set_features = pd.read_csv(os.path.join(DATA_PATH, 'test_set_features.csv'))

In [25]:
training_set_features.sample(4)

Unnamed: 0,respondent_id,h1n1_concern,h1n1_knowledge,behavioral_antiviral_meds,behavioral_avoidance,behavioral_face_mask,behavioral_wash_hands,behavioral_large_gatherings,behavioral_outside_home,behavioral_touch_face,doctor_recc_h1n1,doctor_recc_seasonal,chronic_med_condition,child_under_6_months,health_worker,health_insurance,opinion_h1n1_vacc_effective,opinion_h1n1_risk,opinion_h1n1_sick_from_vacc,opinion_seas_vacc_effective,opinion_seas_risk,opinion_seas_sick_from_vacc,age_group,education,race,sex,income_poverty,marital_status,rent_or_own,employment_status,hhs_geo_region,census_msa,household_adults,household_children,employment_industry,employment_occupation
18824,18824,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,,,,,,,2.0,,3.0,,65+ Years,,White,Male,,,,,mlyzmhmf,Non-MSA,0.0,0.0,,
13098,13098,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,2.0,2.0,4.0,2.0,2.0,55 - 64 Years,Some College,White,Male,"<= $75,000, Above Poverty",Married,Rent,Unemployed,fpwskwrf,"MSA, Not Principle City",1.0,0.0,,
18695,18695,1.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,4.0,1.0,1.0,4.0,2.0,1.0,35 - 44 Years,College Graduate,White,Male,"> $75,000",Married,Own,Employed,lrircsnp,"MSA, Not Principle City",1.0,3.0,atmlpfrs,mxkfnird
19031,19031,3.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,5.0,1.0,5.0,5.0,2.0,2.0,18 - 34 Years,12 Years,White,Female,"<= $75,000, Above Poverty",Married,Rent,Not in Labor Force,mlyzmhmf,"MSA, Principle City",1.0,1.0,,


In [26]:
training_set_labels.sample(4)

Unnamed: 0,respondent_id,h1n1_vaccine,seasonal_vaccine
13378,13378,0,1
4977,4977,0,0
2923,2923,0,1
126,126,0,0


Merge the training set features with labels

In [18]:
train_df = training_set_features.merge(training_set_labels, on=['respondent_id'], how='left')