This template shows how to set up a model for deployment to FastScore. The integration with Jupyter is one of several ways to deploy to FS but one for the most familiar for Data Scientists.

We'll start with a familiar workflow: We'll import the libraries we intend to use for the model, read in the data, transform the non-numeric features, and train the model.

In [1]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.ensemble import RandomForestRegressor

In [2]:
df = pd.read_csv('synthetic.csv')

In [3]:
!pip install fastscore
!pip install fastscoredeploy
!pip3 uninstall avro

[33mSkipping avro as it is not installed.[0m


In [4]:
df.head()

Unnamed: 0,feat_1,feat_2,id,target,word
0,0.077368,-0.574927,824,1.012205,bar
1,-0.30125,-0.101516,815,0.069122,bar
2,-6.202221,1.639673,5921,-10.827503,baz
3,-0.265197,-2.459763,5730,-4.570818,foo
4,-2.867035,-2.37799,4855,-2.559931,qux


In [5]:
train, test = train_test_split(df, train_size = 0.8)



In [6]:
train.shape

(400, 5)

In [7]:
enc = OneHotEncoder()

In [8]:
enc.fit(train.word.values.reshape(-1,1))

OneHotEncoder(categorical_features=None, categories=None,
       dtype=<class 'numpy.float64'>, handle_unknown='error',
       n_values=None, sparse=True)

In [9]:
array = enc.transform(train.word.values.reshape(-1,1)).toarray()

columns = enc.get_feature_names().tolist()

index = train.index

In [10]:
one_hot = pd.DataFrame(array, columns = columns, index=index)

In [11]:
train = pd.concat([train,one_hot], axis=1)

In [12]:
train.head()

Unnamed: 0,feat_1,feat_2,id,target,word,x0_bar,x0_baz,x0_foo,x0_qux
203,5.437514,4.692416,2639,25.594783,foo,0.0,0.0,1.0,0.0
289,-1.999135,-3.236163,5571,-5.550551,qux,0.0,0.0,0.0,1.0
221,-3.483115,-2.142961,561,5.903794,bar,1.0,0.0,0.0,0.0
180,-3.352308,-0.826056,1092,-0.515608,baz,0.0,1.0,0.0,0.0
41,-4.952039,-1.007757,9747,-2.597816,baz,0.0,1.0,0.0,0.0


In [13]:
model = RandomForestRegressor()

In [14]:
model.fit(train.drop(['id', 'target', 'word'], axis=1).values, train.target.values)



RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

Now let's score one sample from the test dataframe. This will help us in constructing our action function later.

In [15]:
one_hot = enc.transform(np.array(test.word.iloc[0]).reshape(-1,1)).toarray()

In [16]:
one_hot.shape

(1, 4)

In [17]:
values = test.loc[:,['feat_1', 'feat_2']].iloc[0,:].values.reshape(1,-1)

In [18]:
values.shape

(1, 2)

In [19]:
single_test = np.concatenate([values, one_hot],axis=1)

In [20]:
single_test

array([[ 1.83577329, -3.75473259,  1.        ,  0.        ,  0.        ,
         0.        ]])

In [21]:
model.predict(single_test)[0]

-10.638266971614481

Now that the model is trained, we must serialize the fit model and fit one-hot encoder for use inside the FastScore engine.

In [22]:
import pickle
pickle.dump(model, open('example_model.pkl', 'wb'))
pickle.dump(enc, open('onehotenc.pkl', 'wb'))

In [23]:
#To start, we import the fastscore deploy library, this will leverage the FastScore API for deploying assets and models  
from fastscoredeploy import ipmagic

SyntaxError: invalid syntax (io.py, line 213)

Schemas are going to define the input and output contract of the model execution code with the data transport. We will add one each for input and output. The schemas leverage the Avro system: https://avro.apache.org/docs/1.8.1/spec.html. The cell magic command %%schema (name) at the top defines the name of the schema. The name in this command must match the name of the corresponding schema name in the model.

In [25]:
%%schema example_input
{
    "type":"record",
    "name":"example_input",
    "fields":[
        {"type":"double", "name":"feat_1"},
        {"type":"double", "name":"feat_2"},
        {"type":"int", "name":"id"},
        {"type":"string", "name":"word"}
    ]
}

UsageError: Cell magic `%%schema` not found.


In [26]:
%%schema tagged_double
{
    "type":"record",
    "name":"tagged_double",
    "fields":[
        {"type":"int", "name":"id"},
        {"type":"double", "name":"pred"}
    ]
}

UsageError: Cell magic `%%schema` not found.


Schema can also be inferred from sample data using Schema.infer, but the samples must be given as dictionaries.

In [27]:
samples = train.drop(['id', 'word', 'target'], axis=1).to_dict(orient='records')[0:20]

Next we need to provide the model execution code. This code will be deployed into the engine and used to score the mode. The cell magic command %%model  at the top defines the name of the model (rfr_model). The following smart comments map the schemas to the input and outputs. Again, the names of the schemas in these smart comments _must_ match the names of the schemas in the cell magic commands above (the name after %%schema). Next, use import statements to pull in the dependencies. Since the engine is containerized, you must include these import statements again even though you included them at the beginning of the notebook. These will need to be added to the Fastscore Engine's Dockerfile and import policy if they are not included in the default engine. We have two functions that will be called to execute code. The *begin* function is run when the model is deployed; the *action* function is called when scored (data comes to the input stream). The *begin* method typically sets the model coefficents/ object to a variable to be passed to used in the action function. The *action* function scores the model and yields the output to the output stream.

In [28]:
%%model rfr_model

# fastscore.schema.0: example_input
# fastscore.schema.1: tagged_double

import numpy as np
import pickle
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import OneHotEncoder




def begin():
    #It's necessary to set these variables as global so they will be scoped to use in the "action" function.
    global enc
    global rfr
    enc = pickle.load(open('onehotenc.pkl', 'rb'))
    rfr = pickle.load(open('example_model.pkl', 'rb'))

def action(x):
    #In this example, FastScore will parse x as a Python dictionary since x is a single record.
    ID = x['id']
    word = x['word']
    one_hot = enc.transform(np.array(word).reshape(-1,1)).toarray()[0].reshape(1,-1)
    feats = np.array([x['feat_1'], x['feat_2']]).reshape(1,-1)
    #print(one_hot.shape)
    #print(feats.shape)
    sample = np.concatenate([feats,one_hot], axis=1)
    pred = rfr.predict(sample)[0]
    yield dict(id=ID, pred=pred)


UsageError: Cell magic `%%model` not found.


Now we can test that our model works before loading it into a FastScore engine by utilizing FastScore Deploy's score function. Note again that we're inputting dictionaries into our model. FastScore's default data encoding is JSON, though other encodings are supported. The print statements that have been commented out will work in the score function below. This can be an important debugging tool.

In [29]:
rfr_model.score(test.drop(['target'], axis=1).to_dict(orient='records')[0:5])

NameError: name 'rfr_model' is not defined

The input $x$ to the action function will be a JSON object and typically parsed in Python as a dictionary. Another common pattern to use is to score multiple records at once. To use recordsets, use the smart-comment `#fastscore.recordset.<slot_no>` where `<slot_no>` is the slot number you want to use. The schema will be checked against each individual record in the set, but $x$ will be parsed as a Pandas DataFrame by default. 

Now we're ready to deploy our FastScore conformed model to a FastScore engine. 

In [30]:
#Next we can deploy the model to FastScore to validate it works within the Engine using basicauth
from fastscoredeploy.suite import Connect
from base64 import b64encode
from six.moves.urllib.parse import quote

def encode(username, password):
   if ':' in username:
       raise FastScoreError('invalid username')
   username_password = '%s:%s' % (quote(username), quote(password))
   return 'Basic ' + b64encode(username_password.encode()).decode()

secret = encode("fastscore","fastscore")
c = Connect('https://a358847f378e211e9ae160a680e94179-1690110141.us-east-1.elb.amazonaws.com/dashboard')
c.set_basic_auth_secret(secret)

#Then we specify the model-manage to add the model assets to
mm = c.lookup('model-manage')
#And then the Engine we will deploy to
eng = c.get('engine-3')

SyntaxError: invalid syntax (io.py, line 213)

In [121]:
#Now we add/ update the model to Model Manage to make it avaiable for deployment
#Returns true for updated, false when the model is the same within Model Manage
rfr_model.update(model_manage=mm)

True

The pickle files containing the trained model and trained one-hot encoder must be given to the FastScore engine as an attachment, since each model can only have one attachment, we must include them in the same tar.gz archive or zip file. Note that tar.gz and zip are the only two file formats supported by FastScore attachments. Also note that anything the model needs to run can be included in an attachment including libraries which you've developed yourself.

In [122]:
!tar cvfz att.tar.gz example_model.pkl onehotenc.pkl

a example_model.pkl
a onehotenc.pkl


In [31]:
from fastscore.attachment import Attachment

att = Attachment('att.tar.gz', datafile='att.tar.gz')
att.upload(rfr_model)

NameError: name 'rfr_model' is not defined

In [72]:
#Now we deploy to the engine. If there are errors, view the container logs for details
rfr_model.deploy(eng)

In [74]:
#Now we score with our sample data
eng.score(test.drop(['target'], axis=1).to_dict(orient='records')[0:5])

[{'id': 9415, 'pred': 7.631455583625618},
 {'id': 5137, 'pred': 10.955568468491528},
 {'id': 9398, 'pred': -11.19811311297549},
 {'id': 6565, 'pred': -1.601069399894607},
 {'id': 5929, 'pred': -3.8559539736696413}]

In [35]:
eng.reset()

At this point, we have tested the model locally and within the engine. It is ready to pass to model ops for promotion into UAT and futher operationalization