# Saving and Loading Python Models

Scikit models written in Python can be saved with one line of code and loaded with another. This enables client apps that rely on these models to recreate them in their trained state and consume them without training them again. You can save and load models with [pickle](https://docs.python.org/3/library/pickle.html), or you can save and load them with [ONNX](https://onnx.ai/). Here are some examples of both.

The following code trains a binary-classification model on the Titanic dataset and saves it to a pickle file:

In [1]:
import pickle
import pandas as pd
from sklearn.linear_model import LogisticRegression
 
df = pd.read_csv('Data/titanic.csv')
df = df[['Survived', 'Age', 'Sex', 'Pclass']]
df = pd.get_dummies(df, columns=['Sex', 'Pclass'])
df.dropna(inplace=True)
 
x = df.drop('Survived', axis=1)
y = df['Survived']
 
model = LogisticRegression(random_state=0)
model.fit(x, y)
 
pickle.dump(model, open('titanic.pkl', 'wb'))

The next example loads the trained model and uses it to predict the odds that a 30-year-old female traveling in first class will survive the Titanic's final voyage:

In [2]:
import pickle
import pandas as pd

model = pickle.load(open('titanic.pkl', 'rb'))

female = pd.DataFrame({ 'Age': [30], 'Sex_female': [1], 'Sex_male': [0],
                        'Pclass_1': [1], 'Pclass_2': [0], 'Pclass_3': [0] })

probability = model.predict_proba(female)[0][1]
print(f'Probability of survival: {probability:.1%}')

Probability of survival: 92.8%


The following code trains and saves a sentiment-analysis model. This time, a pipeline containing a `CountVectorizer` and a `LogisticRegression` classifier is saved:

In [3]:
import pickle
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
 
df = pd.read_csv('Data/reviews.csv', encoding="ISO-8859-1")
df = df.drop_duplicates()
 
x = df['Text']
y = df['Sentiment']
 
vectorizer = CountVectorizer(ngram_range=(1, 2), stop_words='english',
                             min_df=20)

model = LogisticRegression(max_iter=1000, random_state=0)
pipe = make_pipeline(vectorizer, model)
pipe.fit(x, y)
 
pickle.dump(pipe, open('sentiment.pkl', 'wb'))

A Python client can deserialize the pipeline and call `predict_proba` to score a line of text for sentiment with a few simple lines of code:

In [4]:
import pickle

pipe = pickle.load(open('sentiment.pkl', 'rb'))
pipe.predict_proba(['Great food and excellent service!'])[0][1]

0.8826906828210357

A model saved with the pickle module can only be loaded in Python. But you can save Scikit models in ONNX format and consume them in other languages, too. The following statements save the sentiment-analysis pipeline in an ONNX file:

In [5]:
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import StringTensorType

initial_type = [('string_input', StringTensorType([None, 1]))]
onnx = convert_sklearn(pipe, initial_types=initial_type)

with open('sentiment.onnx', 'wb') as f:
    f.write(onnx.SerializeToString())



These statements use the ONNX runtime for Python to load the model and call `predict_proba`:

In [6]:
import numpy as np
import onnxruntime as rt

session = rt.InferenceSession('sentiment.onnx')
input_name = session.get_inputs()[0].name
label_name = session.get_outputs()[1].name # 0 = predict, 1 = predict_proba

input = np.array('Great food and excellent service!').reshape(1, -1)
session.run([label_name], { input_name: input })[0][0][1]

0.8826906681060791

A model saved in an ONNX file can loaded and consumed in other languages, too, including C, C++, C#, Java, and JavaScript.