<p align="center"><img width="50%" src="https://aimodelsharecontent.s3.amazonaws.com/aimodshare_banner.jpg" /></p>


---

## Model Submission Guide: Titanic Survival Competition
Let's share our models to a centralized leaderboard, so that we can collaborate and learn from the model experimentation process...

**Instructions:**
1.   Get data in and set up X_train / X_test / y_train
2.   Preprocess data using Sklearn Column Transformer/ Write and Save Preprocessor function
3. Fit model on preprocessed data and save preprocessor function and model
4. Generate predictions from X_test data and submit model to competition
5. Repeat submission process to improve place on leaderboard



## 1. Get data in and set up X_train, X_test, y_train objects

In [None]:
#install aimodelshare library
! pip install aimodelshare --upgrade

In [None]:
# Get competition data
# Separate data into X_train, y_train, and X_test
import pandas as pd
training_data=pd.read_csv("training_data.csv")
y_train_labels = training_data['survived']
X_train = training_data.drop(['survived'], axis=1)

X_test=pd.read_csv("test_data.csv")

X_train.head()

Unnamed: 0,pclass,sex,age,fare,embarked
0,3,male,,15.5,Q
1,2,female,18.0,13.0,S
2,2,male,36.0,27.75,S
3,2,female,22.0,41.5792,C
4,2,female,24.0,27.7208,C


##2.   Preprocess data using Sklearn Column Transformer/ Write and Save Preprocessor function


In [None]:
# In this case we use Sklearn's Column transformer in our preprocessor function

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

#Preprocess data using sklearn's Column Transformer approach

# We create the preprocessing pipelines for both numeric and categorical data.
numeric_features = ['age', 'fare']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')), #'imputer' names the step
    ('scaler', StandardScaler())])

categorical_features = ['embarked', 'sex', 'pclass']

# Replacing missing values with Modal value and then one-hot encoding.
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

# Final preprocessor object set up with ColumnTransformer...

preprocess = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

# fit preprocessor to your data
preprocess = preprocess.fit(X_train)

In [None]:
# Here is where we actually write the preprocessor function:

# Write function to transform data with preprocessor
# In this case we use sklearn's Column transformer in our preprocessor function

def preprocessor(data):
    preprocessed_data=preprocess.transform(data)
    return preprocessed_data

In [None]:
# check shape of X data after preprocessing it using our new function
preprocessor(X_train).shape

(1047, 10)

##3. Fit model on preprocessed data and save preprocessor function and model


In [None]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(C=10, penalty='l1', solver = 'liblinear')
model.fit(preprocessor(X_train), y_train_labels) # Fitting to the training set.
model.score(preprocessor(X_train), y_train_labels) # Fit score, 0-1 scale.

0.6991404011461319

#### Save preprocessor function to local "preprocessor.zip" file

In [None]:
import aimodelshare as ai
ai.export_preprocessor(preprocessor,"")

Your preprocessor is now saved to 'preprocessor.zip'


## 4. Generate predictions from X_test data and submit model to competition


In [None]:
#Set credentials using modelshare.org username/password

from aimodelshare.aws import set_credentials

#This is the unique rest api that powers this Titanic Survival Playground -- make sure to update the apiurl for new competition deployments
apiurl='https://y32zyyf6t5.execute-api.us-east-2.amazonaws.com/prod/m'

set_credentials(apiurl=apiurl)

Modelshare.ai Username:··········
Modelshare.ai Password:··········
Modelshare.ai login credentials set successfully.


In [None]:
#Instantiate Competition
import aimodelshare as ai
mycompetition= ai.ModelPlayground(playground_url=apiurl)

In [None]:
#Submit Model 1:

#-- Generate predicted values (a list of predicted labels "survived" or "died") (Model 1)
prediction_labels = model.predict(preprocessor(X_test))

# Submit Model 1 to Competition Leaderboard
mycompetition.submit_model(      model=model,
                                 preprocessor="preprocessor.zip",
                                 prediction_submission=prediction_labels)

This ORT build has ['AzureExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['AzureExecutionProvider', 'CPUExecutionProvider'], ...)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 
Your model has been submitted to experiment as model version 5.

Visit your Model Playground Page for more.
https://www.modelshare.ai/detail/model:3860


In [None]:
# Get leaderboard to explore current best model architectures

# Get raw data in pandas data frame
data = mycompetition.get_leaderboard()

# Stylize leaderboard data
mycompetition.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,model_type,num_params,optimizer,username,version
0,82.82%,81.29%,81.70%,80.96%,sklearn,LogisticRegression,10,liblinear,mikedparrott,1
1,82.82%,81.29%,81.70%,80.96%,sklearn,LogisticRegression,10,liblinear,newusertest,3
2,82.82%,81.29%,81.70%,80.96%,sklearn,LogisticRegression,10,liblinear,newusertest,4
3,76.72%,69.27%,85.32%,68.45%,sklearn,LogisticRegression,10,liblinear,newusertest,5
4,50.38%,45.56%,45.51%,45.69%,sklearn,LogisticRegression,10,liblinear,newusertest,2


## 5. Repeat submission process to improve place on leaderboard


In [None]:
# Train and submit model 2 using same preprocessor (note that you could save a new preprocessor, but we will use the same one for this example).
from sklearn.linear_model import LogisticRegression

model_2 = LogisticRegression(C=.01, penalty='l2')
model_2.fit(preprocessor(X_train), y_train_labels) # Fitting to the training set.
model_2.score(preprocessor(X_train), y_train_labels) # Fit score, 0-1 scale.

0.7583572110792741

In [None]:
#Submit Model 2:

#-- Generate predicted y values (Model 2)
prediction_labels = model_2.predict(preprocessor(X_test))

# Submit Model 2 to Competition Leaderboard
mycompetition.submit_model(model = model_2,
                                 preprocessor="preprocessor.zip",

                           prediction_submission=prediction_labels)

This ORT build has ['AzureExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['AzureExecutionProvider', 'CPUExecutionProvider'], ...)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 
Your model has been submitted to experiment as model version 6.

Visit your Model Playground Page for more.
https://www.modelshare.ai/detail/model:3860


In [None]:
# Compare two or more models
data=mycompetition.compare_models([1,5], verbose=1)
mycompetition.stylize_compare(data)

Unnamed: 0,param_name,default_value,model_version_1,model_version_5
0,C,1.000000,10,0.010000
1,class_weight,,,
2,dual,False,False,False
3,fit_intercept,True,True,True
4,intercept_scaling,1,1,1
5,l1_ratio,,,
6,max_iter,100,100,100
7,multi_class,auto,auto,auto
8,n_jobs,,,
9,penalty,l2,l1,l1







In [None]:
# Submit a third model using GridSearchCV

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
import numpy as np

param_grid = {'n_estimators': np.arange(100, 300, 500),'max_depth':[1, 3, 5]} #np.arange creates sequence of numbers for each k value

gridmodel = GridSearchCV(RandomForestClassifier(), param_grid=param_grid, cv=10)

#use meta model methods to fit score and predict model:
gridmodel.fit(preprocessor(X_train), y_train_labels)

#extract best score and parameter by calling objects "best_score_" and "best_params_"
print("best mean cross-validation score: {:.3f}".format(gridmodel.best_score_))
print("best parameters: {}".format(gridmodel.best_params_))


best mean cross-validation score: 0.796
best parameters: {'max_depth': 5, 'n_estimators': 100}


In [None]:
#Submit Model 3:

#-- Generate predicted values
prediction_labels = gridmodel.predict(preprocessor(X_test))

# Submit to Competition Leaderboard
mycompetition.submit_model(model = gridmodel,
                                 preprocessor="preprocessor.zip",
                                 prediction_submission=prediction_labels)

This ORT build has ['AzureExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['AzureExecutionProvider', 'CPUExecutionProvider'], ...)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 
Your model has been submitted to experiment as model version 7.

Visit your Model Playground Page for more.
https://www.modelshare.ai/detail/model:3860


In [None]:
# Get leaderboard

data = mycompetition.get_leaderboard()
mycompetition.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,model_type,num_params,optimizer,username,version
0,85.11%,83.17%,85.67%,81.88%,sklearn,RandomForestClassifier,,,newusertest,7
1,82.82%,81.29%,81.70%,80.96%,sklearn,LogisticRegression,10.0,liblinear,mikedparrott,1
2,82.82%,81.29%,81.70%,80.96%,sklearn,LogisticRegression,10.0,liblinear,newusertest,3
3,82.82%,81.29%,81.70%,80.96%,sklearn,LogisticRegression,10.0,liblinear,newusertest,4
4,83.59%,81.10%,84.76%,79.58%,sklearn,LogisticRegression,10.0,lbfgs,newusertest,6
5,82.82%,81.11%,81.94%,80.52%,sklearn,GradientBoostingClassifier,,,newusertest,8
6,76.72%,69.27%,85.32%,68.45%,sklearn,LogisticRegression,10.0,liblinear,newusertest,5
7,50.38%,45.56%,45.51%,45.69%,sklearn,LogisticRegression,10.0,liblinear,newusertest,2


In [None]:
# Compare two or more models
data=mycompetition.compare_models([2,3], verbose=1)
mycompetition.stylize_compare(data)

Unnamed: 0,param_name,default_value,model_version_2,model_version_3
0,C,1.000000,10,0.010000
1,class_weight,,,
2,dual,False,False,False
3,fit_intercept,True,True,True
4,intercept_scaling,1,1,1
5,l1_ratio,,,
6,max_iter,100,100,100
7,multi_class,auto,auto,auto
8,n_jobs,,,
9,penalty,l2,l1,l2







In [None]:
# Here are several classic ML architectures you can consider choosing from to experiment with next:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import GradientBoostingClassifier

#Example code to fit model:
model = GradientBoostingClassifier(n_estimators=50, learning_rate=1.0,
    max_depth=1, random_state=0).fit(preprocessor(X_train), y_train_labels)
model.score(preprocessor(X_train), y_train_labels)

#-- Generate predicted values (a list of predicted labels "real" or "fake")
prediction_labels = model.predict(preprocessor(X_test))

# Submit model to Competition Leaderboard
mycompetition.submit_model(model = model,
                                 preprocessor ="preprocessor.zip",
                                 prediction_submission=prediction_labels)


This ORT build has ['AzureExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['AzureExecutionProvider', 'CPUExecutionProvider'], ...)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 
Your model has been submitted to experiment as model version 8.

Visit your Model Playground Page for more.
https://www.modelshare.ai/detail/model:3860
