# Model Comparison

I created 2 classes, one for the dataset, one for the model.
These are the steps to run successfully the training, testing and prediction.

 1. Load datasets
 2. Apply transformations and feature engineering to the dataset (optional)
     1. Choose variables to be used for training the model (optional)
 4. Load model from SKLearn
 5. Run the simple test
 
 Below I created an example with the model that I had to test, Support Vector Machine.
 
 The shape of the dataset is the following:
 
1. `'Family_Case_ID'`
2. `'Severity'`
3. `'Birthday_year'`
4. `'Parents or siblings infected'`
5. `'Wife/Husband or children infected'`
7. `'Medical_Expenses_Family'`
8. `'Medical_Tent_A'`
9. `'Medical_Tent_B'`
10. `'Medical_Tent_C'`
11. `'Medical_Tent_D'`
12. `'Medical_Tent_E'`
13. `'Medical_Tent_F'`
14. `'Medical_Tent_G'`
15. `'Medical_Tent_T'`
16. `'Medical_Tent_n/a'`
17. `'City_Albuquerque'`
18. `'City_Santa Fe'`
19. `'City_Taos'`
20. `'Gender_M'`
21. `'family_size'`
22. `'Sev_by_city'`: Average severity in the city of the patient.
23. `'Sev_by_tent'`: Average severity in the medical tent of the patient.
24. `'Sev_by_gender'`: Average severity whithin the gender of the patient.
25. `'Sev_family'`: Average severity in the family of the patient.
26. `'spending_vs_severity'`: Medical Expenses Family / Patient's Severity
27. `'spending_family_member'`: Medical Expenses Family / Number of cases in the family
28. `'severity_against_avg_city'`: Patient's Severity / Sev_by_city
29. `'severity_against_avg_tent'`: Patient's Severity / Sev_by_tent
30. `'severity_against_avg_gender'`: Patient's Severity / Sev_by_gender
31. `'spending_family_severity'`: Patient's Severity / Sev_family


In [1]:
from dataset import Dataset
from model import Model

## First model - Support Vector Machine - Alejandro

### Step 1: Load datasets

In [2]:
dataset = Dataset()            # Loads the preprocessed dataset
train_set = dataset.train_data # Training set without labels (train.csv)
target = dataset.target        # Labels for training set     (train.csv[Deceased])
test_set = dataset.test_data   # Unlabeled test set          (test.csv)

train_set.describe()

Unnamed: 0,Family_Case_ID,Severity,Birthday_year,Parents or siblings infected,Wife/Husband or children infected,Medical_Expenses_Family,Sev_by_city,Sev_by_tent,Sev_by_gender,Sev_family,...,City_Santa Fe,City_Taos,Gender_M,family_size,spending_vs_severity,spending_family_member,severity_against_avg_city,severity_against_avg_tent,severity_against_avg_gender,spending_family_severity
count,898.0,898.0,898.0,898.0,898.0,898.0,898.0,898.0,898.0,898.0,...,898.0,898.0,898.0,898.0,898.0,898.0,898.0,898.0,898.0,898.0
mean,14286.119154,2.316258,1597.824053,0.380846,0.522272,892.749443,2.316258,2.316258,2.313653,2.316258,...,0.722717,0.089087,0.648107,1.826281,692.063103,550.403471,1.0,1.0,1.001201,430.066268
std,25443.036379,0.832842,792.720095,0.803941,1.099333,1385.91799,0.25518,0.615844,0.10605,0.825019,...,0.447907,0.285028,0.477827,1.369723,1428.606552,997.077121,0.359305,0.240359,0.35925,1021.963995
min,345.0,1.0,-1.0,0.0,0.0,0.0,1.893491,1.0,2.169811,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.344828,0.381107,0.418103,0.0
25%,8195.0,2.0,1966.0,0.0,0.0,221.0,2.354391,2.623932,2.169811,2.0,...,0.0,0.0,0.0,1.0,73.666667,203.0,0.849476,0.762215,0.836207,70.0
50%,13587.5,3.0,1988.0,0.0,0.0,405.0,2.354391,2.623932,2.391753,3.0,...,1.0,0.0,1.0,1.0,173.0,228.0,1.034483,1.143322,1.25431,81.0
75%,18891.0,3.0,1998.0,0.0,1.0,857.75,2.354391,2.623932,2.391753,3.0,...,1.0,0.0,1.0,2.0,573.0,553.75,1.274215,1.143322,1.25431,343.0
max,742836.0,3.0,2019.0,6.0,8.0,14345.0,2.9,3.0,2.391753,3.0,...,1.0,1.0,1.0,7.0,14345.0,14345.0,1.584375,2.898305,1.382609,14345.0


### Step 2: Apply transformations and select variables

In [3]:
from sklearn.preprocessing import RobustScaler

selected_variables_SVC = [
    'Severity',
    'Gender_M',
    'City_Albuquerque',
    'City_Santa Fe',
    "severity_against_avg_gender",
    'Medical_Tent_n/a',
    'spending_family_member',
    'family_size',
    'Sev_family'
]

scaler = MinMaxScaler().fit(test_set[selected_variables_SVC])
train_set[selected_variables_SVC] = scaler.transform(train_set[selected_variables_SVC])
test_set[selected_variables_SVC] = scaler.transform(test_set[selected_variables_SVC])

NameError: name 'MinMaxScaler' is not defined

### Step 3: Load model from SKLearn

In [None]:
from sklearn import svm

# Create classifier from SciKitLearn
svm_model = svm.NuSVC()

### Step 4: Run model

In [5]:
model = Model(model     = svm_model,              # Initialized classifier model from SKLearn
              variables = selected_variables_SVC, # Subset of variables from data to be used for training
                                                  # If variables=None, then all variables in set are used
              
              train_set = train_set,              # Samples X for training and validating
              target    = target,                 # Samples Y for training and validating
              test_set  = test_set                # Unlabeled samples for creating prediction
              )                 

model.run_model(path="results/svc_results.csv")
model.train_data

Model - NuSVC(break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
      decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
      max_iter=-1, nu=0.5, probability=False, random_state=None, shrinking=True,
      tol=0.001, verbose=False)
Average model accuracy: 59.37%
Highest model accuracy: 67.60%
Solution set saved as 'results/svc_results.csv'.


Unnamed: 0_level_0,Severity,Gender_M,City_Albuquerque,City_Santa Fe,severity_against_avg_gender,Medical_Tent_n/a,spending_family_member,family_size,Sev_family
Patient_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,3,0,0,1,1.382609,1,225.000000,1,3.0
2,1,0,1,0,0.460870,1,831.500000,1,1.0
3,3,1,0,1,1.254310,1,221.000000,1,3.0
4,3,1,0,1,1.254310,1,220.000000,1,3.0
5,3,0,0,1,1.382609,1,222.000000,1,3.0
...,...,...,...,...,...,...,...,...,...
896,3,0,0,1,1.382609,1,114.666667,2,3.0
897,3,1,0,1,1.254310,1,258.000000,1,3.0
898,3,0,0,0,1.382609,1,214.000000,1,3.0
899,2,1,0,1,0.836207,1,270.666667,3,2.0
