# Giza XGBoost benchmarking

- In this notebook we will follow the minimum steps necessary to reproduce the results presented in this article.
- First of all, we must install [giza-cli](https://github.com/gizatechxyz/giza-cli) and [giza-benchmark](https://github.com/gizatechxyz/giza-benchmark). In the README of both repositories you have a detailed description of how to install it and how to start using it.
- Once both are installed and a user is created, the process to reproduce the results is very simple:

In [None]:
# Install dependecies.

!pip install scikit-learn
!pip install giza-cli
!pip install xgboost
# Install scarb 2.6.4: https://docs.swmansion.com/scarb/download.html
# Install giza-benchmark: https://github.com/gizatechxyz/giza-benchmark

### Train your model

- Benchmark results are not dependent on the dataset used, only on the complexity (number of trees and depth) of the XGBoost. 
- In this case we will use the dataset load_diabetes but it can be changed by any other dataset.

In [1]:
import xgboost as xgb
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

data = load_diabetes()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
xgb_reg = xgb.XGBRegressor(n_estimators=2, max_depth = 6)
xgb_reg.fit(X_train, y_train)

### Save model
Save the model in Json format

In [2]:
xgb_reg.save_model("model.json")

### Transpile it

- Use the terminal for login with "giza users login". If you do not have a user created, see our documentation.
- Once logged in, we will transpile our model:

In [4]:
!giza transpile model.json --output-path my_first_xg

[1;33m[[0m[33mgiza[0m[1;33m][0m[1m[[0m[1;36m2024[0m-[1;36m04[0m-[1;36m30[0m [1;92m16:02:30[0m.[1;36m537[0m[1m][0m No model id provided, checking if model exists ✅
[1;33m[[0m[33mgiza[0m[1;33m][0m[1m[[0m[1;36m2024[0m-[1;36m04[0m-[1;36m30[0m [1;92m16:02:30[0m.[1;36m538[0m[1m][0m Model name is: model
[2K[1;33m[[0m[33mgiza[0m[1;33m][0m[1m[[0m[1;36m2024[0m-[1;36m04[0m-[1;36m30[0m [1;92m16:02:30[0m.[1;36m684[0m[1m][0m Model already exists, using existing model ✅ 
[2K[1;33m[[0m[33mgiza[0m[1;33m][0m[1m[[0m[1;36m2024[0m-[1;36m04[0m-[1;36m30[0m [1;92m16:02:30[0m.[1;36m685[0m[1m][0m Model found with id -> [1;36m532[0m! ✅
[2K[1;33m[[0m[33mgiza[0m[1;33m][0m[1m[[0m[1;36m2024[0m-[1;36m04[0m-[1;36m30[0m [1;92m16:02:31[0m.[1;36m181[0m[1m][0m Version Created with id -> [1;36m2[0m! ✅
[2K[1;33m[[0m[33mgiza[0m[1;33m][0m[1m[[0m[1;36m2024[0m-[1;36m04[0m-[1;36m30[0m [1;92m16:02:31[0m.[1;

### Creating the sierra file.

Once we have the project created, from the terminal execute:


In [5]:
!cd my_first_xg && scarb build

[32m[1m   Compiling[0m model v0.1.0 (/Users/alejandromartinez/projects/Giza-Hub/benchmark/my_first_xg/Scarb.toml)
[32m[1m    Finished[0m release target(s) in 1 second


This will generate a sierra file in “my_first_xg/target/dev/modelo.sierra.json”. 

### Executing the benchmark.

In [7]:
!giza-benchmark -p ./my_first_xg/target/dev/model.sierra.json -i input.txt -b benches

Directories created: benches
  Time spent: 1.272663416s 

Making proof ...
- Started round 0: Air Initialization
  Time spent: 242.322083ms
- Started round 1: RAP
  Time spent: 2.992626625s
- Started round 2: Compute composition polynomial
     Evaluating periodic columns on lde: 41ns
     Created boundary polynomials: 167.045333ms
     Evaluated boundary polynomials on LDE: 30.888917ms
     Evaluated transition zerofiers: 267.336875ms
     Evaluated transitions and accumulated results: 573.445208ms
  Time spent: 1.859553458s
- Started round 3: Evaluate polynomial in out of domain elements
  Time spent: 408.42225ms
- Started round 4: FRI
  Time spent: 537.735916ms
 Fraction of proving time per round: 0.0418 0.5161 0.3207 0.0704 0.0927
  Time spent in proving: 6.074616167s 

Proof written to benches/program.proof
Verifying ...
- Started step 1: Recover challenges
  Time spent: 246.004958ms
- Started step 2: Verify claimed polynomial
  Time spent: 193.166µs
- Started step 3: Verify FRI
 

Now you can reproduce any result of the article! The only thing you will have to do is to change the "max_depth" and "n_estimators" parameters to adapt them to the test you want to perform. 

The results of the execution will be inside the benches folder in the files program.memory, program.proof and program.trace.

### Checking results


In [8]:
features_example = X_test[1 , :].reshape(1, -1)
xgb_reg.predict(features_example)

array([175.58783], dtype=float32)

The last thing we need to check is if the result of our custom implementation is the same as the XGBoost one. Let's go step by step:

- To the “giza-benchmark” run we have passed an input.txt file as a parameter. This file contains an array representing the row “X_test[1 , :].reshape(1, -1)” but multiplied by 1e5.
- Giza-benchmark returns: "Program result: 17558781". This number divided by 1e5 is equal to 175.58781.
- The result of the XGBoost prediction() executed in the cell above is equal to 175.58781.

Great, the results are the same! The performance of XGBoost will be exactly the same as ours, but.... why did we have to multiply and divide by 1e5?

This is because cairo cannot work with decimals, so we look for an integer representation of these values. Even so, these steps will not need to be executed by an end user. The full Giza pipeline will handle all these issues for us. In this notebook we just want to show how to replicate the results. We will soon have a full tutorial on how to generate predictions with this verifiable XGBoost without having to perform any of these operations.