This Notebooks present an example to retrieve a dataset from <a href="https://cloud.google.com/storage/">Google Cloud Storage</a>, train a simple model using <a href="https://cloud.google.com/datalab/">Datalab</a>, upload classifier to Cloud Storage and use <a href="https://cloud.google.com/ml-engine/">Cloud Machine Learning Engine</a> to make online predictions. 

Dataset to use wll be <a href="https://archive.ics.uci.edu/ml/datasets/iris">Iris data set</a>. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. Predicted attribute is class of iris plant. 

<b> Prerequisites</b> <br>
<ul>
  <li>Storage bucket created</li>
  <li>CSV file</li>
</ul>


In [7]:
# Libraries 
import datetime
import os
import subprocess
import sys
import pandas as pd
from sklearn import svm
from sklearn.externals import joblib

#### Storage settings

In [9]:
# Cloud Storage
BUCKET_NAME = '<YOUR_BUCKET_NAME>'
FILENAME = 'data.csv'
DIRECTORY = 'gs://<YOUR_BUCKET_NAME>/data'
STORAGE = os.path.join(DIRECTORY, FILENAME)

#### Download data

 gsutil cp command allows you to copy data between your local file system and the cloud

In [11]:
!gsutil cp $STORAGE .



Updates are available for some Cloud SDK components.  To install them,
please run:
  $ gcloud components update

Copying gs://demo-iris/data/data.csv...
/ [1 files][  3.8 KiB/  3.8 KiB]                                                
Operation completed over 1 objects/3.8 KiB.                                      


The gsutil cp command allows you to copy data between your local file system and the cloud, copy data within the cloud

#### Read data 

In [12]:
iris_data = pd.read_csv(FILENAME)
iris_data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


Get labels

In [13]:
# labels
iris_label = iris_data.pop('species')

#### Train model

In [14]:
classifier = svm.SVC(gamma='auto')
classifier.fit(iris_data, iris_label)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [15]:
# Predict some instances
classifier.predict(iris_data.iloc[147:149])

array(['virginica', 'virginica'], dtype=object)

In [16]:
# Classes
classifier.classes_

array(['setosa', 'versicolor', 'virginica'], dtype=object)

#### Export classifier to file

In [18]:
model_filename = 'model.joblib'
joblib.dump(classifier, model_filename)

['model.joblib']

#### Copy Classifier to  Storage

In [19]:
!gsutil cp $model_filename gs://$BUCKET_NAME/model

Copying file://model.joblib [Content-Type=application/octet-stream]...
/ [1 files][  4.4 KiB/  4.4 KiB]                                                
Operation completed over 1 objects/4.4 KiB.                                      


##### Resources

Code reference at: https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/sklearn/iris_training.py