Skip to content

Commit

Permalink
Add a BDT/conifer notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
thesps committed Apr 16, 2021
1 parent b806996 commit a23a30c
Show file tree
Hide file tree
Showing 2 changed files with 292 additions and 0 deletions.
Binary file added images/conifer_v1.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
292 changes: 292 additions & 0 deletions part5_bdt.ipynb
@@ -0,0 +1,292 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "209d2b58",
"metadata": {},
"source": [
"## Part 5: Boosted Decision Trees\n",
"\n",
"The `conifer` package was created out of `hls4ml`, providing a similar set of features but specifically targeting inference of Boosted Decision Trees. In this notebook we will train a `GradientBoostingClassifier` with scikit-learn, using the same jet tagging dataset as in the other tutorial notebooks. Then we will convert the model using `conifer`, and run bit-accurate prediction and synthesis as we did with `hls4ml` before.\n",
"\n",
"`conifer` is available from GitHub [here](https://github.com/thesps/conifer), and we have a publication describing the inference implementation and performance in detail [here](https://iopscience.iop.org/article/10.1088/1748-0221/15/05/P05026/pdf).\n",
"\n",
"<img src=\"images/conifer_v1.png\" width=\"250\" alt=\"conifer\">"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eda9b784",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from sklearn.ensemble import GradientBoostingClassifier\n",
"from sklearn.preprocessing import LabelEncoder, OneHotEncoder\n",
"from sklearn.metrics import accuracy_score\n",
"import joblib\n",
"import conifer\n",
"import plotting\n",
"import matplotlib.pyplot as plt\n",
"import os\n",
"os.environ['PATH'] = '/opt/Xilinx/Vivado/2019.2/bin:' + os.environ['PATH']\n",
"np.random.seed(0)"
]
},
{
"cell_type": "markdown",
"id": "18354699",
"metadata": {},
"source": [
"## Load the dataset\n",
"Note you need to have gone through `part1_getting_started` to download the data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1574ed18",
"metadata": {},
"outputs": [],
"source": [
"X_train_val = np.load('X_train_val.npy')\n",
"X_test = np.load('X_test.npy')\n",
"y_train_val = np.load('y_train_val.npy')\n",
"y_test = np.load('y_test.npy', allow_pickle=True)\n",
"classes = np.load('classes.npy', allow_pickle=True)"
]
},
{
"cell_type": "markdown",
"id": "24658fb4",
"metadata": {},
"source": [
"We need to transform the test labels from the one-hot encoded values to labels"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "00f304bd",
"metadata": {},
"outputs": [],
"source": [
"le = LabelEncoder().fit(classes)\n",
"ohe = OneHotEncoder().fit(le.transform(classes).reshape(-1,1))\n",
"y_train_val = ohe.inverse_transform(y_train_val.astype(np.int))\n",
"y_test = ohe.inverse_transform(y_test)"
]
},
{
"cell_type": "markdown",
"id": "8305e22c",
"metadata": {},
"source": [
"## Train a `GradientBoostingClassifier`\n",
"We will use 20 estimators with a maximum depth of 3. The number of decision trees will be `n_estimators * n_classes`, so 100 for this dataset. If you are returning to this notebook having already trained the BDT once, set `train = False` to load the model rather than retrain."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f5044231",
"metadata": {},
"outputs": [],
"source": [
"train = True\n",
"if train:\n",
" clf = GradientBoostingClassifier(n_estimators=20, learning_rate=1.0,\n",
" max_depth=3, random_state=0, verbose=1).fit(X_train_val, y_train_val.ravel())\n",
" if not os.path.exists('model_5'):\n",
" os.makedirs('model_5')\n",
" joblib.dump(clf, 'model_5/bdt.joblib')\n",
"else:\n",
" clf = joblib.load('model_5/bdt.joblib')"
]
},
{
"cell_type": "markdown",
"id": "5e9857c2",
"metadata": {},
"source": [
"## Create a conifer configuration\n",
"\n",
"Similarly to `hls4ml`, we can use a utility method to get a template for the configuration dictionary that we can modify."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5bab868f",
"metadata": {},
"outputs": [],
"source": [
"cfg = conifer.backends.xilinxhls.auto_config()\n",
"cfg['OutputDir'] = 'model_5/conifer_prj'\n",
"cfg['XilinxPart'] = 'xcu250-figd2104-2L-e'\n",
"plotting.print_dict(cfg)"
]
},
{
"cell_type": "markdown",
"id": "9e3ca740",
"metadata": {},
"source": [
"## Convert the model\n",
"The syntax for model conversion with `conifer` is a little different to `hls4ml`. We construct a `conifer.model` object, providing the trained BDT, the converter corresponding to the library we used, the conifer 'backend' that we wish to target, and the configuration.\n",
"\n",
"`conifer` has converters for:\n",
"- `sklearn`\n",
"- `xgboost`\n",
"- `tmva`\n",
"\n",
"And backends:\n",
"- `vivadohls`\n",
"- `vitishls`\n",
"- `xilinxhls` (use whichever `vivado` or `vitis` is on the path\n",
"- `vhdl`\n",
"\n",
"Here we will use the `sklearn` converter, since that's how we trained our model, and the `vivadohls` backend. For larger BDTs with many more trees or depth, it may be preferable to generate VHDL directly using the `vhdl` backend to get best performance. See [our paper](https://iopscience.iop.org/article/10.1088/1748-0221/15/05/P05026/pdf) for the performance comparison between those backends."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ebf5b06",
"metadata": {},
"outputs": [],
"source": [
"cnf = conifer.model(clf, conifer.converters.sklearn, conifer.backends.vivadohls, cfg)\n",
"cnf.compile()"
]
},
{
"cell_type": "markdown",
"id": "dc5e487b",
"metadata": {},
"source": [
"## profile\n",
"Similarly to hls4ml, we can visualize the distribution of the parameters of the BDT to guide the choice of precision"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "993fef56",
"metadata": {},
"outputs": [],
"source": [
"cnf.profile()"
]
},
{
"cell_type": "markdown",
"id": "9c840ca4",
"metadata": {},
"source": [
"## Run inference\n",
"Now we can execute the BDT inference with `sklearn`, and also the bit exact simulation using Vivado HLS. The output that the `conifer` BDT produces is equivalent to the `decision_function` method."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b9fd0fee",
"metadata": {},
"outputs": [],
"source": [
"y_skl = clf.decision_function(X_test)\n",
"y_cnf = cnf.decision_function(X_test)"
]
},
{
"cell_type": "markdown",
"id": "c486535e",
"metadata": {},
"source": [
"## Check performance\n",
"\n",
"Print the accuracy from `sklearn` and `conifer` evaluations, and plot the ROC curves. We should see that we can get quite close to the accuracy of the Neural Networks from parts 1-4."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3a87c1b8",
"metadata": {},
"outputs": [],
"source": [
"yt = ohe.transform(y_test).toarray().astype(np.int)\n",
"print(\"Accuracy sklearn: {}\".format(accuracy_score(np.argmax(yt, axis=1), np.argmax(y_skl, axis=1))))\n",
"print(\"Accuracy conifer: {}\".format(accuracy_score(np.argmax(yt, axis=1), np.argmax(y_cnf, axis=1))))\n",
"fig, ax = plt.subplots(figsize=(9, 9))\n",
"_ = plotting.makeRoc(yt, y_skl, classes)\n",
"plt.gca().set_prop_cycle(None) # reset the colors\n",
"_ = plotting.makeRoc(yt, y_cnf, classes, linestyle='--')"
]
},
{
"cell_type": "markdown",
"id": "70c43d82",
"metadata": {},
"source": [
"## Synthesize\n",
"Now run the Vivado HLS C Synthesis step to produce an IP that we can use, and inspect the estimate resources and latency.\n",
"You can see some live output while the synthesis is running by opening a terminal from the Jupyter home page and executing:\n",
"`tail -f model_5/conifer_prj/vivado_hls.log`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "721814ef",
"metadata": {},
"outputs": [],
"source": [
"cnf.build()"
]
},
{
"cell_type": "markdown",
"id": "ad1efe07",
"metadata": {},
"source": [
"## Read report\n",
"We can use an hls4ml utility to read the Vivado report"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "578a62c3",
"metadata": {},
"outputs": [],
"source": [
"import hls4ml\n",
"hls4ml.report.read_vivado_report('model_5/conifer_prj/')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

0 comments on commit a23a30c

Please sign in to comment.