You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The GLM ordinal model has an interesting bug, where it returns the incorrect prediction label in the event that no labels have a probability > 0.50
It takes a few tries to reproduce this because you need to build a model and then feed it a row of data that returns no >0.5 label probabilities. I've managed to reproduce it reliably with this code snippet, which builds and predicts on a model in a loop of 100. Eventually, it fails.
{code:java}import h2o
import pandas as pd
from h2o.estimators import H2OGeneralizedLinearEstimator
model = H2OGeneralizedLinearEstimator(seed=1234, family="ordinal")
model.train(
x=predictors, y=response, training_frame=train, validation_frame=valid
)
features = h2o.H2OFrame(pd.DataFrame([[18,101,22,23.142,1]], columns=predictors))
model_raw_preds = model.predict(features).as_data_frame().values.tolist()[0]
model_pred = model_raw_preds[0] # Label
probs = model_raw_preds[1:] # Probabilities
labels = [3, 4, 5, 6, 8]
max_prob = max(probs)
max_prob_index = probs.index(max_prob)
prob_pred = labels[max_prob_index]
label_probs = dict(zip(labels, probs))
print(f'Model pred: {model_pred}, probabilities: {label_probs}')
assert prob_pred==model_pred, f'Predictions are wrong, model gave {model_pred} but max prob was {prob_pred} with probability {max_prob}. All probs: {label_probs}'
for _ in range(100):
test_model(){code}
An example error output:
{noformat}---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
in
49
50 for _ in range(100):
---> 51 test_model()
in test_model()
46 print(f'Model pred: {model_pred}, probabilities: {label_probs}')
47
---> 48 assert prob_pred==model_pred, f'Predictions are wrong, model gave {model_pred} but max prob was {prob_pred} with probability {max_prob}. All probs: {label_probs}'
49
50 for _ in range(100):
AssertionError: Predictions are wrong, model gave 6.0 but max prob was 8 with probability 0.4285287291223623. All probs: {3: 0.0010826152119126613, 4: 0.2415455664644905, 5: 0.011730752683124124, 6: 0.3171123365181104, 8: 0.4285287291223623}
{noformat}
The text was updated successfully, but these errors were encountered:
The GLM ordinal model has an interesting bug, where it returns the incorrect prediction label in the event that no labels have a probability > 0.50
It takes a few tries to reproduce this because you need to build a model and then feed it a row of data that returns no >0.5 label probabilities. I've managed to reproduce it reliably with this code snippet, which builds and predicts on a model in a loop of 100. Eventually, it fails.
{code:java}import h2o
import pandas as pd
from h2o.estimators import H2OGeneralizedLinearEstimator
h2o.init()
cars = h2o.import_file(
"https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv"
)
def test_model():
cars["cylinders"] = cars["cylinders"].asfactor()
cars.rename(columns={"year": "year_make"})
r = cars[0].runif()
train = cars[r > 0.2]
valid = cars[r <= 0.2]
response = "cylinders"
predictors = [
"displacement",
"power",
"weight",
"acceleration",
"year_make",
]
for _ in range(100):
test_model(){code}
An example error output:
{noformat}---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
in
49
50 for _ in range(100):
---> 51 test_model()
in test_model()
46 print(f'Model pred: {model_pred}, probabilities: {label_probs}')
47
---> 48 assert prob_pred==model_pred, f'Predictions are wrong, model gave {model_pred} but max prob was {prob_pred} with probability {max_prob}. All probs: {label_probs}'
49
50 for _ in range(100):
AssertionError: Predictions are wrong, model gave 6.0 but max prob was 8 with probability 0.4285287291223623. All probs: {3: 0.0010826152119126613, 4: 0.2415455664644905, 5: 0.011730752683124124, 6: 0.3171123365181104, 8: 0.4285287291223623}
{noformat}
The text was updated successfully, but these errors were encountered: