Using scikit to detect digits from handwritten images

First, let's import the necessary libraries.

In [1]:
import pandas as pd
import pickle
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import sys

We need some data to train our model. Ideally, you would want real-world images to train the model. scikit-learn provides sample datasets that we can use for experimentation and learning. We'll download the example training data for digits.

In [2]:
digits = datasets.load_digits()

Split the training set into 75% training and 25% test data

In [3]:
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size = 0.25, random_state = 0)

Use logistic regression model

In [4]:
model = LogisticRegression(solver = 'liblinear', multi_class = 'auto')
model.fit(x_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='auto',
          n_jobs=None, penalty='l2', random_state=None, solver='liblinear',
          tol=0.0001, verbose=0, warm_start=False)

Save the model

In [5]:
pickle.dump(model, open('model.joblib', 'wb'))

Publish the model to Google Cloud ML

In [6]:
!gcloud ml-engine versions create v1 --model digit_recognition --origin gs://digit_recognition/ --runtime-version 1.12 --python-version 3.5

Creating version (this might take a few minutes)......done.                    


Run the published version with the JSON-formatted number 2 digit:

In [7]:
!gcloud ml-engine predict --model digit_recognition --json-instances digit_2.json

[2]
