### Creating and deploying a machine learning model with scikit-learn and Google AI Platform

In this notebook we create a simple linear classifier using the [penguins](https://github.com/allisonhorst/palmerpenguins) dataset and deploy it.

In [1]:
import seaborn as sns

df = sns.load_dataset('penguins')
df = df.dropna()

In [2]:
df

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female
5,Adelie,Torgersen,39.3,20.6,190.0,3650.0,Male
...,...,...,...,...,...,...,...
338,Gentoo,Biscoe,47.2,13.7,214.0,4925.0,Female
340,Gentoo,Biscoe,46.8,14.3,215.0,4850.0,Female
341,Gentoo,Biscoe,50.4,15.7,222.0,5750.0,Male
342,Gentoo,Biscoe,45.2,14.8,212.0,5200.0,Female


In [3]:
df['species'].unique()

array(['Adelie', 'Chinstrap', 'Gentoo'], dtype=object)

In [4]:
x = df[['bill_length_mm','bill_depth_mm','flipper_length_mm','body_mass_g']]
y = df['species'].map({'Adelie':0,'Chinstrap':1,'Gentoo':2})

In [5]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x.values,y.values,test_size=0.1,random_state=42)

Let's train a linear classifier on the data.

In [6]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000,C=100).fit(x_train,y_train)

In [7]:
print('Test accuracy: ',model.score(x_test,y_test))

Test accuracy:  0.9705882352941176


This will save the model to a joblib file.  You will need to upload this file to a Google Cloud Storage bucket.

In [8]:
import joblib
joblib.dump(model, 'model.joblib')

['model.joblib']