# Setup

In [1]:
%pip install -q modelscan
!modelscan -v

Note: you may need to restart the kernel to use updated packages.
modelscan, version 0.0.0


In [2]:
%pip install -q xgboost==2.0.3
%pip install -U -q scikit-learn==1.4.2

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [3]:
import pickle
from pathlib import Path
import os
import numpy as np
from utils.pickle_codeinjection import generate_unsafe_file
from utils.xgboost_diabetes_model import train_model, get_predictions

# Save a XGBoost Model

The model is trained on a diabetes dataset, and predicts whether a person has diabetes or not. The dataset can be found here: [Link to PIMA Indian diabetes dataset](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database). The model is saved at ```./XGBoostModels/safe_model.pkl```

In [4]:
model_directory = os.path.join(os.getcwd(), "XGBoostModels")
if not os.path.isdir(model_directory):
    os.mkdir(model_directory)

safe_model_path_pickle = os.path.join(model_directory, "safe_model.pkl")
model = train_model()
with open(safe_model_path_pickle, "wb") as fo:
    pickle.dump(model, fo)

# Predict using Safe Model

In [5]:
number_of_predictions = 3
get_predictions(number_of_predictions, model)

The model predicts: [0, 1, 1]
The true labels are: [0. 1. 1.]


# Scan the safe model

The scan results include information on the files scanned, and any issues if found. For the safe model scanned, modelscan finds no code injections in it, as expected.

In [6]:
!modelscan -p XGBoostModels/safe_model.pkl

No settings file detected at /workspaces/modelscan/notebooks/modelscan-settings.toml. Using defaults. 

Scanning /workspaces/modelscan/notebooks/XGBoostModels/safe_model.pkl using modelscan.scanners.PickleUnsafeOpScan model scan

[34m--- Summary ---[0m

[32m No issues found! 🎉[0m


# Model Serialization Attack

Here code is injected in the safe model to read aws secret keys. The unsafe model is saved at ```./XGBoostModels/unsafe_model.pkl```

In [7]:
# Inject code with the command
command = "system"
malicious_code = """cat ~/.aws/secrets
    """

In [8]:
with open(safe_model_path_pickle, "rb") as fo:
    safe_model_pickle = pickle.load(fo)

unsafe_model_path = os.path.join(model_directory, "unsafe_model.pkl")
generate_unsafe_file(model, command, malicious_code, unsafe_model_path)

# Predict using Unsafe Model

The malicious code gets executed when the model is loaded. The aws secret keys are displayed. 

Also, the unsafe model predicts just as well as safe model i.e., the code injection attack will not impact the model performance. The unaffected performance of unsafe models makes the ML models an effective attack vector. 

In [9]:
with open(unsafe_model_path, "rb") as fo:
    unsafe_model = pickle.load(fo)

get_predictions(number_of_predictions, unsafe_model)

aws_access_key_id=AKIAIOSFODNN7EXAMPLE
aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
The model predicts: [0, 1, 1]
The true labels are: [0. 1. 1.]


# Scan the Unsafe Model

The scan results include information on the files scanned, and any issues if found. In this case, a critical severity level issue is found in the unsafe model scanned. 

modelscan also outlines the found operator(s) and module(s) deemed unsafe. 

In [10]:
!modelscan -p XGBoostModels/unsafe_model.pkl

No settings file detected at /workspaces/modelscan/notebooks/modelscan-settings.toml. Using defaults. 

Scanning /workspaces/modelscan/notebooks/XGBoostModels/unsafe_model.pkl using modelscan.scanners.PickleUnsafeOpScan model scan

[34m--- Summary ---[0m

Total Issues: [1;36m1[0m

Total Issues By Severity:

    - LOW: [1;32m0[0m
    - MEDIUM: [1;32m0[0m
    - HIGH: [1;32m0[0m
    - CRITICAL: [1;36m1[0m

[34m--- Issues by Severity ---[0m

[34m--- CRITICAL ---[0m

Unsafe operator found:
  - Severity: CRITICAL
  - Description: Use of unsafe operator 'system' from module 'posix'
  - Source: /workspaces/modelscan/notebooks/XGBoostModels/unsafe_model.pkl


# Reporting Format
ModelScan can report scan results in console (default), JSON, or custom report (to be defined by user in settings-file). For mode details, please see:  ` modelscan -h` 

## JSON Report

For JSON reporting: `modelscan -p ./path-to/file -r json -o output-file-name.json` 

In [11]:
# This will save the scan results in file: xgboost-model-scan-results.json
!modelscan --path  XGBoostModels/unsafe_model.pkl -r json -o XGBoostModels/xgboost-model-scan-results.json

No settings file detected at /workspaces/modelscan/notebooks/modelscan-settings.toml. Using defaults. 

Scanning /workspaces/modelscan/notebooks/XGBoostModels/unsafe_model.pkl using modelscan.scanners.PickleUnsafeOpScan model scan
[1m{[0m[32m"summary"[0m: [1m{[0m[32m"total_issues_by_severity"[0m: [1m{[0m[32m"LOW"[0m: [1;36m0[0m, [32m"MEDIUM"[0m: [1;36m0[0m, [32m"HIGH"[0m: [1;36m0[0m, 
[32m"CRITICAL"[0m: [1;36m1[0m[1m}[0m, [32m"total_issues"[0m: [1;36m1[0m, [32m"input_path"[0m: 
[32m"XGBoostModels/unsafe_model.pkl"[0m, [32m"absolute_path"[0m: 
[32m"/workspaces/modelscan/notebooks/XGBoostModels"[0m, [32m"modelscan_version"[0m: [32m"0.0.0"[0m, 
[32m"timestamp"[0m: [32m"2024-04-21T12:13:42.698872"[0m, [32m"scanned"[0m: [1m{[0m[32m"total_scanned"[0m: [1;36m1[0m, 
[32m"scanned_files"[0m: [1m[[0m[32m"unsafe_model.pkl"[0m[1m][0m[1m}[0m[1m}[0m, [32m"issues"[0m: [1m[[0m[1m{[0m[32m"description"[0m: [32m"Use of [0m
[32m