-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GLM with interactions
and Lambda = 0
produces "Categorical value out of bounds" error
#8050
Comments
Neema Mashayekhi commented: Fails on Py Client 3.30.0.4 End of Logs: {noformat}06-01 13:34:22.767 127.0.0.1:54321 64759 FJ-1-5 INFO: Building H2O GLM model with these parameters: |
Neema Mashayekhi commented: Training occurs without error if manually launching jar file. End of Logs: {noformat}06-01 13:38:20.670 192.168.1.73:54321 64768 #98217-22 INFO: GET /3/Frames/adult.hex, parms: {column_offset=0, full_column_count=-1, row_count=10, row_offset=0, column_count=-1} |
Neema Mashayekhi commented: Check if assertions are wrong. Since it fails when -ea is enabled. |
Wendy commented: When Lambda=[0], the use_all_factor_levels = False and this creates a problem for the interaction vectors generation. In this case, the interaction vectors are gender (FEMALE, MALE) and income (<50K, >=50K). However, the interaction vectors domain are: FEMALE_<50K, FEMAL_>= 50K, MALE_<50K, MALE_>=50K. With use_all_factor_levels=False, the domain of the final interaction vectors is MALE_>=50K. This seems incorrect. However, when checked with R, they did exactly the same. |
Wendy commented: When lambda=0, use_all_factor_level = False. Hence, if the original factor level is 4, it will not become 3. Hence, the check with assert. However, when you have interaction and lambda=9, use_all_factor_level=False, the combined column categorical level is going to drop for more than 1! Consider gender (2) and race (2), the final combined gender_race interaction column is of 1 level only! This is because we try to combine gender (1) and race (1) due to use_all_factor_level=False. Hence, it will fail the assert. Concrete example: gender=<Female, Male>, race=<Red, Green>. The complete interaction will be Female_Red, Female_Green, Male_Red, Male_Green. If use_all_factor_level=False, then, the interaction domain will have Male_Green in this case. Basically, you are only taking into the effect of the interaction column when gender = Male and race=Green. |
JIRA Issue Migration Info Jira Issue: PUBDEV-7588 Linked PRs from JIRA |
To Repro, Python code:
{code:python}data = h2o.import_file("https://raw.githubusercontent.com/guru99-edu/R-Programming/master/adult.csv")
data['y'] = data['hours-per-week'] / data['age']
model_cust = H2OGeneralizedLinearEstimator(interactions = ["income", "gender"],
family = "tweedie",
tweedie_variance_power = 1.7, tweedie_link_power = 0,
Lambda = 0,
intercept = True,
compute_p_values = True,
remove_collinear_columns = True,
standardize = True, weights_column = "age", solver="IRLSM")
model_cust.train(x = ["income", "gender"], y = "y", training_frame = data){code}
No error occurs, if you comment out ({{interactions}}) or comment out ({{Lambda}} and {{compute_p_values}})
Stacktrace:
The text was updated successfully, but these errors were encountered: