When you are trying to deploy a machine learning model in live trading, you will need to answer some practical questions such as:

* How do I save a model for later use?

* How do I use an already saved model?

* How to handle the data?

* When should I retrain the model?

Let us go through them one by one.

#### How to save a model?

Training machine learning models takes a lot of time and resources. So, most traders train their models on weekends, if the model works on daily data. Or if the model works on intraday data, they train it after the close of the market. 

In both cases, traders use the latest data available to train their models and then save them. These models are later retrieved and used to make predictions while trading. This process saves both time and resources. You can use the pickle library to save a model once it is trained.

The `pickle` module implements a fundamental, but powerful algorithm for saving and loading a Python object structure. The process of saving is known as “Serialization” and the process of loading is called “De-serialization”. 

Saving ⇒ Serialization
Loading ⇒ De-Serialization

When you save any object using pickle, that object will be converted into a byte stream of 1s and 0s, and then saved. When you want to load a pickled object then an inverse operation takes place, whereby a byte stream is converted back into an object. There are a couple of things that you need to remember when saving an object using pickle.
<br>

---



	Things to remember while serialization or deserialization

<b>Python Version Compatibility</b>

When (de)serializing objects you need to use the same version of  Python, as the process used is different in different versions and this might result in errors.

<b>Security</b>

Pickle data can be altered to insert malicious code, so it is not recommended to restore data from unauthenticated sources.

Let us save the model. Before you save the model, you need to decide on two parameters:

<b>What do you want to save? </br>
How do you want to save it? </b>

In the code shown here, we have created a simple function called `save_model` which takes these two parameters as inputs and saves the model.

In a simulation, we will be saving the trained model multiple times. So we create a function called `save_model` to handle this repetitive step. This function takes the model name and model's saved name as its input, and saves the model in a binary file since the machine could be local, remote or a de-located file system. 

Right now, we will assume we are saving the file on a local system. 

The `save_model` function opens a file in the local machine using the variable `model_pickled_name`. Here the keyword ‘wb’, or ‘write binary’, implies that Python will overwrite the file, if it already exists or creates a new one if it doesn’t. Then the dump command of the `pickle` library is used to write the model to the specified destination. 

Apart from the `pickle` library, you can also use `joblib` and JSON libraries to save your models. The `joblib` library is very efficient compared to the `pickle` library when saving objects containing large data. On the other hand, JSON saves a model in a string format which is easier for humans to read.

In [None]:
def save_model(model_name, model_saved_name):
    # Open a file with the mentioned name on the local machine
    with open(model_saved_name, 'wb') as model_save:
        # Use the dump command to save it
        pickle.dump(model_name, model_save)

In this example, we run this `save_model` function and it saves the model in the local machine with the name `model_pickle` as shown.

In [None]:
model_saved_name = 'model_save.pkl'
save_model(model, model_saved_name)



---


#### Load a Model

Once you have saved the model, you can access the model on your local machine by using the `load_model` function. This function takes the name of the pickled model as its input and loads that model. 

We will be loading the trained model at every data point.

In [None]:
def load_model(model_saved_name):
    # Open the file containing the model with the mentioned name
    # on the local machine
    with open(model_saved_name, 'rb') as file:
        # Load the model and assign it to a variable
        model = pickle.load(file)
        # Return the model
        return model



---


#### How to handle the data?<br>

To train machine learning-based trading models we require a lot of data. Downloading data from an online source every time you want to train a model takes a lot of time. To avoid this, the old data that you used to train the initial model must be saved on your local machine, and the new data can be added to this file at the end of trading every day. The new data can be appended to the existing data using the pandas append function.

In [None]:
Updated_data = Old_data.append(current_day_OHLC , ignore_index=True)
Updated_data.to_csv(“Data.csv”)

---



#### When do you retrain the model?

We need to retrain a model whenever its performance goes bad. 

You can decide when to retrain a model based on its performance metrics such as:

#### 1. Capital Loss

Let us say that you want to retrain a model based on its capital loss. Then you need to track the profit and loss (or PnL) of the strategy at every time period, such as every day or a minute.

If the PnL falls below a certain limit, then you will retrain it.

If the model has initially made a profit of 100 dollars, and then it has lost 5 dollars, which is the cutoff criteria in this case.

After the cutoff criterion is triggered, we will stop trading and then retrain the model.

This cutoff criteria is decided by a trader, depending on his or her own risk appetite.

#### 2. Accuracy

This is another criteria that can used to decide whether to retrain a model or not.

Let us say that you have set 55% accuracy as the criterion for retraining a model.

Whenever the model’s accuracy falls below the 55% mark, you retrain it.

In addition to these two approaches, you can retrain your model as often as possible, regardless of the model’s performance. However, make sure your model is not overfitted.

This will create a model that is trained on the latest available data at all times.

When you want to retrain a model, you need to perform many tasks such as creating the features, training the model and saving it.

To do these multiple tasks, we created a simple function called `create_new_model`. This function takes the raw data and the saved name of the model as input.

In [None]:
def create_new_model(data, model_saved_name):
    # Create a feature from the raw data
    X, y = create_features(data)
    # Train the model on the features generated
    model = train_model(X, y)
    # Save the model on the local machine
    save_model(model, model_saved_name)

In this way, you should take care that your machine learning model is performing according to your expectations. Remember, there might be occasions where your model's performance might start deteriorating. Do not hesitate in pausing your trading until you have modified the strategy to perform as per your expectations. 

Great! we have finally implemented a machine learning model from start to end.

So far, you have studied the classification-based machine learning model. This is a type of supervised learning algorithm. This brings us to the end of the second part of the book. In the next part, you will see other types of machine learning algorithms.



---



#### Additional Reading

1. A Practical Guide to Feature Engineering in Python - https://heartbeat.fritz.ai/a-practical-guide-to-feature-engineering-in-python-8326e40747c8
2. Data & Feature Engineering for Trading [Course] - https://quantra.quantinsti.com/course/data-and-feature-engineering-for-trading
3. Best Input for Financial Models - https://davidzhao12.medium.com/advances-in-financial-machine-learning-for-dummies-part-1-7913aa7226f5
4. Top 9 Feature Engineering Techniques with Python - https://rubikscode.net/2021/06/29/top-9-feature-engineering-techniques/
5. Data Labelling: The Triple-barrier Method - https://towardsdatascience.com/the-triple-barrier-method-251268419dcd
6. How to Use StandardScaler and MinMaxScaler Transforms in Python - https://machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/
7. What is the ideal ratio of in-sample length to out-of-sample length? - https://quant.stackexchange.com/questions/1480/what-is-the-ideal-ratio-of-in-sample-length-to-out-of-sample-length
8. Data normalization before or after train-test split? - https://datascience.stackexchange.com/questions/54908/data-normalization-before-or-after-train-test-split
9. Cross Validation In Machine Learning Trading Models - https://blog.quantinsti.com/cross-validation-machine-learning-trading-models/
10. Cross Validation in Finance: Purging, Embargoing, Combination - https://blog.quantinsti.com/cross-validation-embargo-purging-combinatorial/
11. How to Choose Right Metric for Evaluating ML Model - https://www.kaggle.com/vipulgandhi/how-to-choose-right-metric-for-evaluating-ml-model
12. Choosing the Right Metric for Evaluating Machine Learning Models — Part 2 - https://www.kdnuggets.com/2018/06/right-metric-evaluating-machine-learning-models-2.html
13. How do I evaluate models that predict stock performance? - https://quant.stackexchange.com/questions/33074/strategy-for-backtesting
14. What is an acceptable Sharpe Ratio for a prop desk? - https://quant.stackexchange.com/questions/21120/what-is-an-acceptable-sharpe-ratio-for-a-prop-desk/21123#21123
15. Doing opposite of what the model says - https://quant.stackexchange.com/questions/35905/doing-opposite-of-what-the-model-says/35906#35906
16. Should a model be re-trained if new observations are available? - https://datascience.stackexchange.com/questions/12761/should-a-model-be-re-trained-if-new-observations-are-available