# Exercise Instructions: Panel Data Modeling with Machine Learning Models

**Objective:**
The goal of this exercise is to practice panel data modeling skills using three machine learning models (Random Forest, Single Decision Tree, and Linear Regression with Elastic Net) that have not been utilized in the project so far. Completing the entire task or a significant portion during the class will earn you an additional 7 points (above what is outlined in the syllabus) towards your final grade.

**Tasks:**

1. **GitHub Setup:**
   - If you haven't done so already, [create](https://github.com/join) a GitHub account.
   - [Download](https://desktop.github.com) and [configure](https://docs.github.com/en/desktop/configuring-and-customizing-github-desktop/configuring-basic-settings-in-github-desktop) GitHub Desktop on your laptop. (Here you can find nice intro to the GitHub Dekstop app: [link](https://joshuadull.github.io/GitHub-Desktop/02-getting-started/index.html)). If you prefare git command line usage you can go with this [instruction](https://github.com/michaelwozniak/ml2_tools?tab=readme-ov-file#git).
2. **Repository Forking:**
   - [Fork](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo) the following repository to your projects: [https://github.com/michaelwozniak/ML-in-Finance-I-case-study-forecasting-tax-avoidance-rates](https://github.com/michaelwozniak/ML-in-Finance-I-case-study-forecasting-tax-avoidance-rates)

3. **Repository Cloning:**
   - [Clone](https://docs.github.com/en/desktop/adding-and-cloning-repositories/cloning-a-repository-from-github-to-github-desktop) the forked repository to your local computer using GitHub Desktop.

4. **Notebook Exploration:**
   - Open the file `notebooks/10.exercise.ipynb` to begin the ML tasks.

5. **Model Creation:**

   In the file `notebooks/10.exercise.ipynb`:
   - Create the following models:
      1. Random Forest ([RandomForestRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html))
      2. Decision Tree ([DecisionTreeRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html))
      3. Linear Regression with Elastic Net ([ElasticNet](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html))
   
   Follow a similar process to the models presented in class (e.g., KNN - `notebooks/07.knn-model.ipynb`):
      - Load the prepared training data.
      - Perform feature engineering if deemed necessary (note: these three models do not require data standardization, unlike SVM and KNN).
      - Conduct feature selection.
      - Perform hyperparameter tuning.
      - Identify a local champion for each model class (the best model for RF, DT, Elastic Net).
      - Save local champions to a pickle file.

6. **Model Evaluation:**
   - In the notebook `notebooks/09.final-comparison-and-summary.ipynb`, load the models you created and check if they outperform the previously used models.

7. **Version Control:**
   - At the end of the class, even if the tasks are incomplete, [commit](https://docs.github.com/en/desktop/making-changes-in-a-branch/committing-and-reviewing-changes-to-your-project-in-github-desktop) your changes using GitHub Desktop.
   - [Push](https://docs.github.com/en/desktop/making-changes-in-a-branch/pushing-changes-to-github-from-github-desktop) your changes to your remote GitHub repository.

8. **Submission:**
   - Send me the link to your GitHub project (my email: *mj.wozniak9@uw.edu.pl*).

Good luck with the exercise! If you have any questions, feel free to ask.

In [7]:
pip install pandas

Collecting pandasNote: you may need to restart the kernel to use updated packages.

  Downloading pandas-2.2.0-cp310-cp310-win_amd64.whl (11.6 MB)
     --------------------------------------- 11.6/11.6 MB 38.5 MB/s eta 0:00:00
Collecting pytz>=2020.1
  Downloading pytz-2024.1-py2.py3-none-any.whl (505 kB)
     ------------------------------------- 505.5/505.5 KB 31.0 MB/s eta 0:00:00
Collecting numpy<2,>=1.22.4
  Downloading numpy-1.26.3-cp310-cp310-win_amd64.whl (15.8 MB)
     --------------------------------------- 15.8/15.8 MB 54.7 MB/s eta 0:00:00
Collecting tzdata>=2022.7
  Downloading tzdata-2023.4-py2.py3-none-any.whl (346 kB)
     ------------------------------------- 346.6/346.6 KB 22.4 MB/s eta 0:00:00
Installing collected packages: pytz, tzdata, numpy, pandas
Successfully installed numpy-1.26.3 pandas-2.2.0 pytz-2024.1 tzdata-2023.4


You should consider upgrading via the 'c:\Users\Oybek\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.


In [8]:
import pandas as pd

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [11]:
pip install scikit-learn

Collecting scikit-learn
  Downloading scikit_learn-1.4.0-1-cp310-cp310-win_amd64.whl (10.6 MB)
     --------------------------------------- 10.6/10.6 MB 25.2 MB/s eta 0:00:00
Collecting scipy>=1.6.0
  Downloading scipy-1.12.0-cp310-cp310-win_amd64.whl (46.2 MB)
     --------------------------------------- 46.2/46.2 MB 32.8 MB/s eta 0:00:00
Collecting joblib>=1.2.0
  Downloading joblib-1.3.2-py3-none-any.whl (302 kB)
     -------------------------------------- 302.2/302.2 KB 9.4 MB/s eta 0:00:00
Collecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-3.2.0-py3-none-any.whl (15 kB)
Installing collected packages: threadpoolctl, scipy, joblib, scikit-learn
Successfully installed joblib-1.3.2 scikit-learn-1.4.0 scipy-1.12.0 threadpoolctl-3.2.0
Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\Oybek\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.


In [62]:
pip install matplotlib

Collecting matplotlib
  Downloading matplotlib-3.8.2-cp310-cp310-win_amd64.whl (7.6 MB)
     ---------------------------------------- 7.6/7.6 MB 12.9 MB/s eta 0:00:00
Collecting pyparsing>=2.3.1
  Downloading pyparsing-3.1.1-py3-none-any.whl (103 kB)
     -------------------------------------- 103.1/103.1 KB 6.2 MB/s eta 0:00:00
Collecting contourpy>=1.0.1
  Downloading contourpy-1.2.0-cp310-cp310-win_amd64.whl (186 kB)
     ------------------------------------- 186.7/186.7 KB 11.8 MB/s eta 0:00:00
Collecting kiwisolver>=1.3.1
  Downloading kiwisolver-1.4.5-cp310-cp310-win_amd64.whl (56 kB)
     ---------------------------------------- 56.1/56.1 KB 3.1 MB/s eta 0:00:00
Collecting pillow>=8
  Downloading pillow-10.2.0-cp310-cp310-win_amd64.whl (2.6 MB)
     ---------------------------------------- 2.6/2.6 MB 41.4 MB/s eta 0:00:00
Collecting fonttools>=4.22.0
  Downloading fonttools-4.47.2-cp310-cp310-win_amd64.whl (2.2 MB)
     ---------------------------------------- 2.2/2.2 MB 69.9 MB

You should consider upgrading via the 'c:\Users\Oybek\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.


In [64]:
pip install seaborn

Collecting seaborn
  Downloading seaborn-0.13.2-py3-none-any.whl (294 kB)
     ------------------------------------ 294.9/294.9 KB 520.4 kB/s eta 0:00:00
Installing collected packages: seaborn
Successfully installed seaborn-0.13.2
Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\Oybek\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.


In [66]:
pip install statsmodels

Collecting statsmodels
  Downloading statsmodels-0.14.1-cp310-cp310-win_amd64.whl (9.8 MB)
     ---------------------------------------- 9.8/9.8 MB 8.0 MB/s eta 0:00:00
Collecting patsy>=0.5.4
  Downloading patsy-0.5.6-py2.py3-none-any.whl (233 kB)
     ------------------------------------- 233.9/233.9 KB 14.0 MB/s eta 0:00:00
Installing collected packages: patsy, statsmodels
Successfully installed patsy-0.5.6 statsmodels-0.14.1
Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\Oybek\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.


In [67]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.formula.api import ols
import scipy.stats as stats
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from pathlib import Path

pd.set_option("display.max_columns", 500)

# np.random.seed(1916) #uncomment if you want your code to be reproducible; for the purposes of our activity, let's add some randomness to the results

In [68]:
raw_input_data_path = "../data/input"
preprocessed_output_data_path = "../data/output"

In [69]:
df = pd.read_stata(f"{raw_input_data_path}/tax_avoidance.dta")

In [70]:
df.sample(10)

Unnamed: 0,index,Ticker,Nazwa2,sektor,rok,gielda,ta,txt,pi,str,xrd,ni,ppent,intant,dlc,dltt,capex,revenue,cce,adv,etr,diff,roa,lev,intan,rd,ppe,sale,cash_holdings,adv_expenditure,capex2,cfc,dta,capex2_scaled,firm_id,firma_id,rok2005,rok2006,rok2007,rok2008,rok2009,rok2010,rok2011,rok2012,rok2013,rok2014,rok2015,rok2016,rok2017,industry,industry1,capex1,roa1,country1,country2,country3,country4,country5,industry11,industry12,industry13,industry14,industry15,industry16,industry17,industry18,industry19,industry20,diff1,diff2,diff3,_est_random,_est_fixed
734,2710,CBK GY Equity,CBK GY Equity,consumer discretionary,2011,2,661763.0,-240.0,507.0,0.2937,0.0,638.0,1608.0,3038.0,157224.0,55321.0,296.0,21026.0,6075.0,0.0,0.0,0.767073,0.000964,0.32118,0.004591,0.0,0.00243,0.031278,0.00918,0.0,0.18408,1,0,0.000171,CBK GY Equity,CBK GY Equity,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,consumer discretionary,consumer discretionary,5.693732,0.000964,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.767073,0.767073,0.767073,1,1
221,142,AMB PW Equity,Ambra SA,consumer staples,2005,1,247.315002,6.075,25.191,0.19,0.0,15.561,43.807999,10.981,37.894001,50.742001,12.894,370.747986,5.224,0.0,0.241158,-0.051158,0.06292,0.358393,0.044401,0.0,0.177134,0.915928,0.021123,0.0,0.29433,0,0,0.000273,Ambra SA,Ambra SA,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,consumer staples,consumer staples,2.631457,0.06292,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.051158,-0.051158,-0.051158,1,1
2268,1124,KER PW Equity,Kernel Holding SA,consumer staples,2011,1,1572.609009,-17.629,208.417999,0.23,0.0,226.272003,502.752014,151.552002,156.057007,271.444,50.271999,1899.118042,115.897003,0.0,0.0,0.314585,0.143883,0.271842,0.09637,0.0,0.319693,0.791916,0.073697,0.0,0.099994,0,0,9.3e-05,Kernel Holding SA,Kernel Holding SA,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,consumer staples,consumer staples,3.937145,0.143883,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.314585,0.314585,0.314585,1,1
277,4337,MT NA Equity,ArcelorMittal,materials,2009,4,127697.0,-4432.0,-4261.0,0.25,253.0,157.0,60385.0,17034.0,20677.0,4135.0,2709.0,61021.0,5919.0,0.0,1.0,-0.790131,0.001229,0.194304,0.133394,0.001981,0.472877,0.390594,0.046352,0.0,0.044862,1,1,4.2e-05,ArcelorMittal,ArcelorMittal,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,materials,materials,7.904704,0.001229,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,-0.790131,-0.790131,-0.790131,1,1
1627,2922,G1A GY Equity,GEA Group AG,industrials,2007,2,4747.953125,64.417999,301.588013,0.295544,72.385002,282.399994,486.036987,1395.519043,20.874001,218.897995,129.281998,4855.970215,275.825012,0.0,0.213596,0.081948,0.059478,0.0505,0.29392,0.015246,0.102368,0.704458,0.058093,0.0,0.265992,1,0,0.000247,GEA Group AG,GEA Group AG,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,industrials,industrials,4.869701,0.059478,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.081948,0.081948,0.081948,1,1
905,3527,CCH LN Equity,Coca-Cola HBC AG,consumer staples,2013,3,7274.799805,72.900002,294.100006,0.23,221.199997,2901.899902,0.0,1921.300049,1853.599976,346.200012,380.200012,6874.0,737.5,0.0,0.247875,-0.017875,0.398898,0.302386,0.264103,0.030406,0.0,0.665214,0.101377,0.0,0.0,1,1,0.0,Coca-Cola HBC AG,Coca-Cola HBC AG,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,consumer staples,consumer staples,5.943324,0.398898,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.017875,-0.017875,-0.017875,1,1
3966,2182,SPH PW Equity,Sopharma AD/Sofia,health care,2006,1,346.31601,5.471,29.356001,0.19,0.0,28.216999,113.152,9.034,35.055,65.364998,20.818001,197.839996,13.938,0.0,0.186367,0.003633,0.081478,0.289966,0.026086,0.0,0.32673,0.451884,0.040246,0.0,0.183983,0,0,0.000171,Sopharma AD/Sofia,Sopharma AD/Sofia,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,health care,health care,3.082735,0.081478,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.003633,0.003633,0.003633,1,1
1784,3012,HOT GY Equity,HOCHTIEF AG,industrials,2008,2,12064.064453,173.042007,496.924011,0.2951,4.987,156.744003,1120.392944,482.660004,1678.463989,1248.352051,645.492004,18703.134766,1787.713013,0.0,0.348226,-0.053126,0.012993,0.242606,0.040008,0.000413,0.09287,0.936218,0.148185,0.0,0.57613,1,0,0.000535,HOCHTIEF AG,HOCHTIEF AG,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,industrials,industrials,6.471561,0.012993,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,-0.053126,-0.053126,-0.053126,1,1
2792,1497,MON PW Equity,Monnari Trade SA,consumer discretionary,2015,1,194.964996,-15.522,30.785,0.19,0.0,46.306999,38.082001,0.525,0.0,0.0,9.608,213.695999,49.105999,0.0,0.0,0.694207,0.237514,0.0,0.002693,0.0,0.195327,0.740066,0.251871,0.0,0.252298,1,0,0.000234,Monnari Trade SA,Monnari Trade SA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,consumer discretionary,consumer discretionary,2.361609,0.237514,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.694207,0.694207,0.694207,1,1
198,295,AWM PW Equity,Airway Medix SA,health care,2008,1,1180.783569,-0.1786,-0.465017,0.19,0.0,-0.316183,0.086833,2.591833,0.5066,0.0,0.0166,0.0095,5.6189,0.0,0.384072,-0.194072,-0.000268,0.000429,0.002195,0.0,7.4e-05,8e-06,0.004759,0.0,0.191171,0,0,0.000177,Airway Medix SA,Airway Medix SA,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,health care,health care,0.016464,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,-0.194072,-0.194072,-0.194072,1,1


In [71]:
print(df.columns)

Index(['index', 'Ticker', 'Nazwa2', 'sektor', 'rok', 'gielda', 'ta', 'txt',
       'pi', 'str', 'xrd', 'ni', 'ppent', 'intant', 'dlc', 'dltt', 'capex',
       'revenue', 'cce', 'adv', 'etr', 'diff', 'roa', 'lev', 'intan', 'rd',
       'ppe', 'sale', 'cash_holdings', 'adv_expenditure', 'capex2', 'cfc',
       'dta', 'capex2_scaled', 'firm_id', 'firma_id', 'rok2005', 'rok2006',
       'rok2007', 'rok2008', 'rok2009', 'rok2010', 'rok2011', 'rok2012',
       'rok2013', 'rok2014', 'rok2015', 'rok2016', 'rok2017', 'industry',
       'industry1', 'capex1', 'roa1', 'country1', 'country2', 'country3',
       'country4', 'country5', 'industry11', 'industry12', 'industry13',
       'industry14', 'industry15', 'industry16', 'industry17', 'industry18',
       'industry19', 'industry20', 'diff1', 'diff2', 'diff3', '_est_random',
       '_est_fixed'],
      dtype='object')


In [72]:
print(df.head())

   index         Ticker            Nazwa2                  sektor   rok  \
0   2781  DRI GY Equity  1&1 Drillisch AG  consumer discretionary  2005   
1   2780  DRI GY Equity  1&1 Drillisch AG  consumer discretionary  2006   
2   2779  DRI GY Equity  1&1 Drillisch AG  consumer discretionary  2007   
3   2778  DRI GY Equity  1&1 Drillisch AG  consumer discretionary  2008   
4   2777  DRI GY Equity  1&1 Drillisch AG  consumer discretionary  2009   

   gielda          ta     txt          pi       str  xrd          ni  ppent  \
0       2  110.718002  10.616   25.056000  0.295544  0.0   14.440000  1.801   
1       2  250.901993  10.866   28.056999  0.295544  0.0   17.191000  2.005   
2       2  385.980988   3.377   27.707001  0.295544  0.0   24.330000  1.934   
3       2  182.259003  11.702 -172.373001  0.295544  0.0 -184.074997  1.723   
4       2  305.266998   9.687  110.886002  0.294400  0.0  101.123001  1.274   

      intant        dlc       dltt       capex     revenue        cce    a

In [73]:
print('etr' in df.columns)

True


In [85]:
# Handling missing values
X_train.fillna(0, inplace=True)  # Replace NaN values with zeros


In [86]:
# Selecting top features using RandomForestRegressor
from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor()
rf.fit(X_train, y_train)

# Get feature importances
feature_importances = pd.Series(rf.feature_importances_, index=X_train.columns)
top_features = feature_importances.nlargest(10)  # Select top 10 features

# Use top features for further modeling
X_train_selected = X_train[top_features.index]


In [87]:
# Hyperparameter tuning for RandomForestRegressor
from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10],
}

grid_search = GridSearchCV(RandomForestRegressor(), param_grid, cv=5)
grid_search.fit(X_train_selected, y_train)

# Get the best hyperparameters
best_params = grid_search.best_params_


Running the models

In [98]:
# Random Forest:

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Assuming 'df' is your DataFrame
features = ['ta', 'txt', 'pi', 'str', 'xrd', 'ni', 'ppent', 'intant', 'dlc', 'dltt', 'capex', 'revenue', 'cce', 'adv', 'roa', 'lev', 'intan', 'rd', 'ppe', 'sale', 'cash_holdings', 'adv_expenditure', 'capex2', 'cfc', 'dta', 'capex2_scaled']
target = 'etr'

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[features], df[target], test_size=0.2, random_state=42)

# Creating the Random Forest Regressor model
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)

# Training the model
rf_model.fit(X_train, y_train)

# Making predictions on the test set
rf_predictions = rf_model.predict(X_test)

# Evaluating the model
mse_rf = mean_squared_error(y_test, rf_predictions)
print(f'Mean Squared Error (Random Forest): {mse_rf}')

Mean Squared Error (Random Forest): 0.005909447919445462


In [99]:
# Decision Tree

from sklearn.tree import DecisionTreeRegressor

# Creating the Decision Tree Regressor model
dt_model = DecisionTreeRegressor(random_state=42)

# Training the model
dt_model.fit(X_train, y_train)

# Making predictions on the test set
dt_predictions = dt_model.predict(X_test)

# Evaluating the model
mse_dt = mean_squared_error(y_test, dt_predictions)
print(f'Mean Squared Error (Decision Tree): {mse_dt}')

Mean Squared Error (Decision Tree): 0.010573671644804421


In [100]:
# Linear Regression with Elastic Net:

from sklearn.linear_model import ElasticNet

# Creating the Elastic Net Regressor model
en_model = ElasticNet(alpha=0.1, l1_ratio=0.5, random_state=42)

# Training the model
en_model.fit(X_train, y_train)

# Making predictions on the test set
en_predictions = en_model.predict(X_test)

# Evaluating the model
mse_en = mean_squared_error(y_test, en_predictions)
print(f'Mean Squared Error (Elastic Net): {mse_en}')


Mean Squared Error (Elastic Net): 0.022719660490774018


Identify Local Champions

In [104]:
# Compare MSE and identify local champion
champion_model = None
min_mse = min(mse_rf, mse_dt, mse_en)

if min_mse == mse_rf:
    champion_model = 'Random Forest'
elif min_mse == mse_dt:
    champion_model = 'Decision Tree'
elif min_mse == mse_en:
    champion_model = 'Elastic Net'

print(f'Local Champion: {champion_model} with MSE {min_mse}')


Local Champion: Random Forest with MSE 0.005909447919445462


Save local champions to a pickle file.

In [106]:
import pickle
# Save local champion to a pickle file
with open('local_champion_model.pkl', 'wb') as file:
    pickle.dump(champion_model, file)
