Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
292 additions
and
0 deletions.
There are no files selected for viewing
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,292 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "209d2b58", | ||
"metadata": {}, | ||
"source": [ | ||
"## Part 5: Boosted Decision Trees\n", | ||
"\n", | ||
"The `conifer` package was created out of `hls4ml`, providing a similar set of features but specifically targeting inference of Boosted Decision Trees. In this notebook we will train a `GradientBoostingClassifier` with scikit-learn, using the same jet tagging dataset as in the other tutorial notebooks. Then we will convert the model using `conifer`, and run bit-accurate prediction and synthesis as we did with `hls4ml` before.\n", | ||
"\n", | ||
"`conifer` is available from GitHub [here](https://github.com/thesps/conifer), and we have a publication describing the inference implementation and performance in detail [here](https://iopscience.iop.org/article/10.1088/1748-0221/15/05/P05026/pdf).\n", | ||
"\n", | ||
"<img src=\"images/conifer_v1.png\" width=\"250\" alt=\"conifer\">" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "eda9b784", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import numpy as np\n", | ||
"from sklearn.ensemble import GradientBoostingClassifier\n", | ||
"from sklearn.preprocessing import LabelEncoder, OneHotEncoder\n", | ||
"from sklearn.metrics import accuracy_score\n", | ||
"import joblib\n", | ||
"import conifer\n", | ||
"import plotting\n", | ||
"import matplotlib.pyplot as plt\n", | ||
"import os\n", | ||
"os.environ['PATH'] = '/opt/Xilinx/Vivado/2019.2/bin:' + os.environ['PATH']\n", | ||
"np.random.seed(0)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "18354699", | ||
"metadata": {}, | ||
"source": [ | ||
"## Load the dataset\n", | ||
"Note you need to have gone through `part1_getting_started` to download the data." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "1574ed18", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"X_train_val = np.load('X_train_val.npy')\n", | ||
"X_test = np.load('X_test.npy')\n", | ||
"y_train_val = np.load('y_train_val.npy')\n", | ||
"y_test = np.load('y_test.npy', allow_pickle=True)\n", | ||
"classes = np.load('classes.npy', allow_pickle=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "24658fb4", | ||
"metadata": {}, | ||
"source": [ | ||
"We need to transform the test labels from the one-hot encoded values to labels" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "00f304bd", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"le = LabelEncoder().fit(classes)\n", | ||
"ohe = OneHotEncoder().fit(le.transform(classes).reshape(-1,1))\n", | ||
"y_train_val = ohe.inverse_transform(y_train_val.astype(np.int))\n", | ||
"y_test = ohe.inverse_transform(y_test)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "8305e22c", | ||
"metadata": {}, | ||
"source": [ | ||
"## Train a `GradientBoostingClassifier`\n", | ||
"We will use 20 estimators with a maximum depth of 3. The number of decision trees will be `n_estimators * n_classes`, so 100 for this dataset. If you are returning to this notebook having already trained the BDT once, set `train = False` to load the model rather than retrain." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f5044231", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"train = True\n", | ||
"if train:\n", | ||
" clf = GradientBoostingClassifier(n_estimators=20, learning_rate=1.0,\n", | ||
" max_depth=3, random_state=0, verbose=1).fit(X_train_val, y_train_val.ravel())\n", | ||
" if not os.path.exists('model_5'):\n", | ||
" os.makedirs('model_5')\n", | ||
" joblib.dump(clf, 'model_5/bdt.joblib')\n", | ||
"else:\n", | ||
" clf = joblib.load('model_5/bdt.joblib')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "5e9857c2", | ||
"metadata": {}, | ||
"source": [ | ||
"## Create a conifer configuration\n", | ||
"\n", | ||
"Similarly to `hls4ml`, we can use a utility method to get a template for the configuration dictionary that we can modify." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "5bab868f", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"cfg = conifer.backends.xilinxhls.auto_config()\n", | ||
"cfg['OutputDir'] = 'model_5/conifer_prj'\n", | ||
"cfg['XilinxPart'] = 'xcu250-figd2104-2L-e'\n", | ||
"plotting.print_dict(cfg)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "9e3ca740", | ||
"metadata": {}, | ||
"source": [ | ||
"## Convert the model\n", | ||
"The syntax for model conversion with `conifer` is a little different to `hls4ml`. We construct a `conifer.model` object, providing the trained BDT, the converter corresponding to the library we used, the conifer 'backend' that we wish to target, and the configuration.\n", | ||
"\n", | ||
"`conifer` has converters for:\n", | ||
"- `sklearn`\n", | ||
"- `xgboost`\n", | ||
"- `tmva`\n", | ||
"\n", | ||
"And backends:\n", | ||
"- `vivadohls`\n", | ||
"- `vitishls`\n", | ||
"- `xilinxhls` (use whichever `vivado` or `vitis` is on the path\n", | ||
"- `vhdl`\n", | ||
"\n", | ||
"Here we will use the `sklearn` converter, since that's how we trained our model, and the `vivadohls` backend. For larger BDTs with many more trees or depth, it may be preferable to generate VHDL directly using the `vhdl` backend to get best performance. See [our paper](https://iopscience.iop.org/article/10.1088/1748-0221/15/05/P05026/pdf) for the performance comparison between those backends." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "7ebf5b06", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"cnf = conifer.model(clf, conifer.converters.sklearn, conifer.backends.vivadohls, cfg)\n", | ||
"cnf.compile()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "dc5e487b", | ||
"metadata": {}, | ||
"source": [ | ||
"## profile\n", | ||
"Similarly to hls4ml, we can visualize the distribution of the parameters of the BDT to guide the choice of precision" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "993fef56", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"cnf.profile()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "9c840ca4", | ||
"metadata": {}, | ||
"source": [ | ||
"## Run inference\n", | ||
"Now we can execute the BDT inference with `sklearn`, and also the bit exact simulation using Vivado HLS. The output that the `conifer` BDT produces is equivalent to the `decision_function` method." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b9fd0fee", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"y_skl = clf.decision_function(X_test)\n", | ||
"y_cnf = cnf.decision_function(X_test)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c486535e", | ||
"metadata": {}, | ||
"source": [ | ||
"## Check performance\n", | ||
"\n", | ||
"Print the accuracy from `sklearn` and `conifer` evaluations, and plot the ROC curves. We should see that we can get quite close to the accuracy of the Neural Networks from parts 1-4." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "3a87c1b8", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"yt = ohe.transform(y_test).toarray().astype(np.int)\n", | ||
"print(\"Accuracy sklearn: {}\".format(accuracy_score(np.argmax(yt, axis=1), np.argmax(y_skl, axis=1))))\n", | ||
"print(\"Accuracy conifer: {}\".format(accuracy_score(np.argmax(yt, axis=1), np.argmax(y_cnf, axis=1))))\n", | ||
"fig, ax = plt.subplots(figsize=(9, 9))\n", | ||
"_ = plotting.makeRoc(yt, y_skl, classes)\n", | ||
"plt.gca().set_prop_cycle(None) # reset the colors\n", | ||
"_ = plotting.makeRoc(yt, y_cnf, classes, linestyle='--')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "70c43d82", | ||
"metadata": {}, | ||
"source": [ | ||
"## Synthesize\n", | ||
"Now run the Vivado HLS C Synthesis step to produce an IP that we can use, and inspect the estimate resources and latency.\n", | ||
"You can see some live output while the synthesis is running by opening a terminal from the Jupyter home page and executing:\n", | ||
"`tail -f model_5/conifer_prj/vivado_hls.log`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "721814ef", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"cnf.build()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ad1efe07", | ||
"metadata": {}, | ||
"source": [ | ||
"## Read report\n", | ||
"We can use an hls4ml utility to read the Vivado report" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "578a62c3", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import hls4ml\n", | ||
"hls4ml.report.read_vivado_report('model_5/conifer_prj/')" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.7.10" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |