Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training is not reproducible between Intel and ARM (M2) #92

Open
lusis-ai opened this issue Apr 30, 2024 · 0 comments
Open

Training is not reproducible between Intel and ARM (M2) #92

lusis-ai opened this issue Apr 30, 2024 · 0 comments

Comments

@lusis-ai
Copy link

lusis-ai commented Apr 30, 2024

When doing the same simple RF classifier training on an Intel Xeon and on Mac M2, metrics are different (and of course trees are also different).

import ydf  # Yggdrasil Decision Forests
import pandas as pd  # We use Pandas to load small datasets

# Download a classification dataset and load it as a Pandas DataFrame.
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
train_ds = pd.read_csv(f"{ds_path}/adult_train.csv")
test_ds = pd.read_csv(f"{ds_path}/adult_test.csv")


model = ydf.RandomForestLearner(label="income",
                                task=ydf.Task.CLASSIFICATION).train(train_ds)


evaluation = model.evaluate(test_ds)

print(evaluation)

Accuracy is :

  • Intel: 0.866005
  • M2 Max : 0.866107

The other metrics and the threes are also different.

The same problem happened on Scikit-Learn and XGBoost when setting some hyper parameters using random number generation. It acts like if the random sequence is not the same between Intel and ARM.

But it does not happened with LightGBM that is the only one enabling perfect reproducibility no matter the CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant