<a href="https://colab.research.google.com/github/gorkemozkaya/Data-Science-Notes/blob/master/reproducing_bugs/XGBoost_multi_label_classification_workaround_for_hummingbird_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Description:

Hummingbird by defauld does not support XGBoost multi-label classifier. This notebook suggests a workaround. Since a multi-label classifier is essentially a separate binary-classifier for each class. We will extract a binary classifier for each class, and convert each one of them to Pytorch-jit separately.  

### In this example, the multi-label model is a tree-ensemble with 100 trees for each of the 5 classes.
### We are going to extract a binary classifier with 100 trees, corresponding to the first class

### Installations:

In [1]:
!pip install seaborn hummingbird_ml==0.4.8 torch==1.10.2
!pip install xgboost==1.6.0
!pip install xgboost-ray==0.1.16
!pip install hummingbird-ml==0.4.8

Collecting hummingbird_ml==0.4.8
  Downloading hummingbird_ml-0.4.8-py2.py3-none-any.whl (164 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m164.6/164.6 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: Could not find a version that satisfies the requirement torch==1.10.2 (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1)[0m[31m
[0m[31mERROR: No matching distribution found for torch==1.10.2[0m[31m
[0mCollecting xgboost==1.6.0
  Downloading xgboost-1.6.0-py3-none-manylinux2014_x86_64.whl (193.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.7/193.7 MB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: xgboost
  Attempting uninstall: xgboost
    Found existing installation: xgboost 1.7.6
    Uninstalling xgboost-1.7.6:
      Successfully uninstalled xgboost-1.7.6
Successfully installed xgboost-1.6.0
Collecting xgboost-ray==0.1.16
  Downloading xgboost_ray-0.1.16-py3-none-any.whl (13

In [2]:
!pip freeze | grep ray
!pip freeze | grep xgboost
!pip freeze | grep hummingbird

array-record==0.4.0
ray==2.5.1
xarray==2022.12.0
xarray-einstats==0.5.1
xgboost-ray==0.1.16
xgboost==1.6.0
xgboost-ray==0.1.16
hummingbird-ml==0.4.8


### Imports and Setup

In [3]:
import os
import sys
from matplotlib import pyplot as plt


import xgboost as xgb
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np
import hummingbird
from hummingbird.ml import convert
print(xgb.__version__)
print(hummingbird.__version__)

1.6.0
0.4.8


multi label classification

In [4]:
from sklearn.datasets import make_multilabel_classification
import numpy as np

X, y = make_multilabel_classification(
    n_samples=32, n_classes=5, n_labels=3, random_state=0
)
clf = xgb.XGBClassifier(tree_method="hist")

In [5]:
clf.fit(X, y)


In [6]:
clf._Booster.save_model("model.json")

In [9]:
import json
from copy import deepcopy
model_json = json.load(open("model.json"))

In [10]:
model_json_modified = deepcopy(model_json)

In [11]:
model_json_modified['learner']['learner_model_param']['num_target'] = '1'

In [12]:
model_json_modified['learner']['gradient_booster']['model']['trees'] = model_json_modified['learner']['gradient_booster']['model']['trees'][::5]

In [13]:
for i, tree in enumerate(model_json_modified['learner']['gradient_booster']['model']['trees']):
  tree["id"] = i

In [14]:
model_json_modified['learner']['gradient_booster']['model']['tree_info'] = model_json_modified['learner']['gradient_booster']['model']['tree_info'][::5]

In [15]:
model_json_modified['learner']['gradient_booster']['model']['gbtree_model_param']['num_trees'] = "100"

In [16]:
json.dump(model_json_modified, open('modified.json', 'w'))

In [17]:
bst = xgb.Booster()

In [18]:
bst.load_model('modified.json')

In [19]:
from copy import copy

In [20]:
clf_modified = copy(clf)

In [21]:
clf_modified._Booster = bst

In [22]:
clf_modified.predict_proba(X)

array([[0.7452279 , 0.25477213],
       [0.02497673, 0.97502327],
       [0.99091506, 0.00908493],
       [0.8204981 , 0.17950186],
       [0.02166033, 0.9783397 ],
       [0.7978771 , 0.20212294],
       [0.06165993, 0.93834007],
       [0.9484383 , 0.05156172],
       [0.9843033 , 0.01569671],
       [0.0100835 , 0.9899165 ],
       [0.9890338 , 0.01096616],
       [0.06005919, 0.9399408 ],
       [0.1165427 , 0.8834573 ],
       [0.14652663, 0.85347337],
       [0.8990026 , 0.10099739],
       [0.86152565, 0.13847438],
       [0.12062764, 0.87937236],
       [0.17292649, 0.8270735 ],
       [0.04956883, 0.95043117],
       [0.99713695, 0.00286302],
       [0.91014534, 0.08985464],
       [0.94596297, 0.05403705],
       [0.1191988 , 0.8808012 ],
       [0.01439768, 0.9856023 ],
       [0.16040564, 0.83959436],
       [0.04793268, 0.9520673 ],
       [0.02277887, 0.97722113],
       [0.95172095, 0.04827907],
       [0.89401376, 0.10598624],
       [0.0933488 , 0.9066512 ],
       [0.

In [24]:
import torch
xgb_binary_torch = convert(clf_modified, torch.jit.__name__, X[0:1])

In [25]:
xgb_binary_torch.predict_proba(X)

array([[0.7452278 , 0.25477216],
       [0.02497673, 0.97502327],
       [0.99091506, 0.00908493],
       [0.82049817, 0.17950185],
       [0.02166033, 0.9783397 ],
       [0.7978771 , 0.2021229 ],
       [0.06165987, 0.9383401 ],
       [0.9484383 , 0.05156171],
       [0.9843033 , 0.01569672],
       [0.0100835 , 0.9899165 ],
       [0.9890338 , 0.01096616],
       [0.06005919, 0.9399408 ],
       [0.11654264, 0.88345736],
       [0.14652663, 0.85347337],
       [0.8990027 , 0.10099736],
       [0.86152565, 0.13847438],
       [0.12062776, 0.87937224],
       [0.17292655, 0.82707345],
       [0.04956871, 0.9504313 ],
       [0.99713695, 0.00286302],
       [0.91014534, 0.08985466],
       [0.9459629 , 0.05403709],
       [0.1191988 , 0.8808012 ],
       [0.01439768, 0.9856023 ],
       [0.16040558, 0.8395944 ],
       [0.04793268, 0.9520673 ],
       [0.02277887, 0.97722113],
       [0.9517209 , 0.0482791 ],
       [0.8940137 , 0.10598628],
       [0.09334868, 0.9066513 ],
       [0.