# CoreML Demo

Let's make an ML model to import into a Swift application! We'll make a very simple model that is trained to test whether a given value is greater than five.

Note: coremltools is only available for Python 2.7 as of 2017-08-13

To start, we'll import the tools we need to create our data, train the model, and export it for Swift.

In [1]:
import coremltools
import numpy as np
from sklearn.tree import DecisionTreeClassifier

## Create the Data

First, we'll create our training data. In machine learning, we call our inputs "features" and our expected outputs "labels". For our model, we'll create a 2D array of integer values from 1 to 10 for our features and a 2D array of five False and 5 True values indicating that the corresponding feature (by index) is greater than five.

In [2]:
# Python's range func is non-inclusive
features = np.array([range(1, 11)]).reshape(-1, 1)
labels = np.array([False]*5 + [True]*5).reshape(-1, 1)

print '2D array of training data:\n', features
print '2D array of expected values:\n', labels

2D array of training data:
[[ 1]
 [ 2]
 [ 3]
 [ 4]
 [ 5]
 [ 6]
 [ 7]
 [ 8]
 [ 9]
 [10]]
2D array of expected values:
[[False]
 [False]
 [False]
 [False]
 [False]
 [ True]
 [ True]
 [ True]
 [ True]
 [ True]]


## Train the model

Next we'll create and test our model.

In [3]:
tree = DecisionTreeClassifier()
tree.fit(features, labels)
tree.predict(features)

array([False, False, False, False, False,  True,  True,  True,  True,  True], dtype=bool)

The first five values should be False followed by five True. Scikit-Learn uses Numpy under the hood, so the export was a boolean typed array. Since Python doesn't care about types (just the methods they implement), we can use it like any other list.

## Export to .mlmodel

Now we're going to convert our sklearn model into a CoreML model and add metadata.

You might need to run this command to switch to the beta version of Xcode command line tools:

`sudo xcode-select --switch /Applications/Xcode-beta.app/Contents/Developer`

In [4]:
mlmodel = coremltools.converters.sklearn.convert(
    tree,
    input_features=['number'],
    output_feature_names='isgt5'
)

mlmodel.author = 'Michael duPont'
mlmodel.license = 'MIT'
mlmodel.short_description = 'Determines whether a number is greater than 5'

# Set feature descriptions manually
mlmodel.input_description['number'] = 'Number'

# Set the output descriptions
mlmodel.output_description['isgt5'] = '0-False 1-True: The value is greater than 5'

And finally we save the model.

In [5]:
mlmodel.save('GreaterThanFive.mlmodel')

Here's a quick look at the converters we can use:

In [6]:
[s for s in dir(coremltools.converters) if not s.startswith('__')]

['caffe', 'keras', 'libsvm', 'sklearn', 'xgboost']

## Test the .mlmodel

Now let's test our exported model here with the original data.

In [7]:
imported_model = coremltools.models.MLModel('GreaterThanFive.mlmodel')

# Make predictions
prediction = imported_model.predict({'number': 8})
print(prediction)
print
print '8 is greater than 5:', bool(prediction['isgt5'])
print
print 'Output probabilities:'
for k, v in prediction['classProbability'].iteritems():
    print 'Value:', k, ' - Probability:', v

{u'__feature_vector__': array([ 8.]), u'isgt5': 1L, u'number': 8L, u'classProbability': {0L: 0.0, 1L: 1.0}}

8 is greater than 5: True

Output probabilities:
Value: 0  - Probability: 0.0
Value: 1  - Probability: 1.0


Two important things to notice:

1. Our input and output values are specified by the name we gave them
2. Our output type was changed from a Boolean to an Int

This will be a major point of contention because the coremltools library is having to convert your Python model's expected data types into Swift CoreML-compatible types.

Let's run our training data back through our imported model and make sure our values still make sense.

In [8]:
imported_model.predict({'number': 6.9})

{u'__feature_vector__': array([ 6.9]),
 u'classProbability': {0L: 0.0, 1L: 1.0},
 u'isgt5': 1L,
 u'number': 6.9}

In [9]:
print 'Key is greater than 5:'
dict(zip(features.reshape(-1), [imported_model.predict({'number': f[0]})['isgt5'] == 1L for f in list(features)]))

Key is greater than 5:


{1: False,
 2: False,
 3: False,
 4: False,
 5: False,
 6: True,
 7: True,
 8: True,
 9: True,
 10: True}

## Introspection

Let's get a quick look at the fields available to us:

In [10]:
[s for s in dir(imported_model) if not s.startswith('_')]

['author',
 'get_spec',
 'input_description',
 'license',
 'output_description',
 'predict',
 'save',
 'short_description',
 'user_defined_metadata']

We can actually get a full breakdown of the entire spec of our .mlmodel

In [11]:
imported_model.get_spec()

specificationVersion: 1
description {
  input {
    name: "number"
    shortDescription: "Number"
    type {
      doubleType {
      }
    }
  }
  output {
    name: "isgt5"
    shortDescription: "0-False 1-True: The value is greater than 5"
    type {
      int64Type {
      }
    }
  }
  output {
    name: "classProbability"
    type {
      dictionaryType {
        int64KeyType {
        }
      }
    }
  }
  predictedFeatureName: "isgt5"
  predictedProbabilitiesName: "classProbability"
  metadata {
    shortDescription: "Determines whether a number is greater than 5"
    author: "Michael duPont"
    license: "MIT"
  }
}
pipelineClassifier {
  pipeline {
    models {
      specificationVersion: 1
      description {
        input {
          name: "number"
          type {
            doubleType {
            }
          }
        }
        output {
          name: "__feature_vector__"
          type {
            multiArrayType {
              shape: 1
              dataType: DOUB