You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
H2O version, Operating System and Environment
Tested on latest h2o 3.42.0.3 on both Windows and Linux versions of R.
Actual behavior h2o appears to be resetting the values of the column, not the underlying index, which results in values essentially being recoded incorrectly.
Expected behavior
For a given record, its value should not change when releveling, only the underlying index should change.
Steps to reproduce
hair_dt<- as.data.table(HairEyeColor)
# expand back out the frequency table such that N records for each combinationhair_dt<-hair_dt[rep(seq(.N), N),][, N:=NULL]
hair<- as.h2o(hair_dt)
hair<- h2o.asfactor(hair)
hair# Hair Eye Sex# 1 Black Brown Male# 2 Black Brown Male# 3 Black Brown Male# 4 Black Brown Male# 5 Black Brown Male# 6 Black Brown Male## [592 rows x 3 columns]
h2o.levels(hair$Hair)
# [1] "Black" "Blond" "Brown" "Red"
h2o.group_by(data=hair, by="Hair", nrow(1))
# Hair nrow# 1 Black 108# 2 Blond 127# 3 Brown 286# 4 Red 71## [4 rows x 2 columns]hair$Hair_relevel<- h2o.relevel_by_frequency(hair$Hair)
h2o.levels(hair$Hair_relevel)
# [1] "Brown" "Blond" "Black" "Red"
h2o.group_by(data=hair, by="Hair_relevel", nrow(1))
# Hair_relevel nrow# 1 Brown 108# 2 Blond 71# 3 Black 286# 4 Red 127## [4 rows x 2 columns]hair# Hair Eye Sex Hair_relevel# 1 Black Brown Male Brown# 2 Black Brown Male Brown# 3 Black Brown Male Brown# 4 Black Brown Male Brown# 5 Black Brown Male Brown# 6 Black Brown Male Brown## [592 rows x 4 columns]
We can see the labels moved, but the underlying indices did not, so nrow for index 1 is 108 for both (Black in the original, and mistaken Brown in the releveled example). Looking at the transformed hair data, we can see Hair_relevel has a different value which is certainly wrong.
H2O version, Operating System and Environment
Tested on latest
h2o
3.42.0.3 on both Windows and Linux versions of R.Actual behavior
h2o
appears to be resetting the values of the column, not the underlying index, which results in values essentially being recoded incorrectly.Expected behavior
For a given record, its value should not change when releveling, only the underlying index should change.
Steps to reproduce
We can see the labels moved, but the underlying indices did not, so
nrow
for index 1 is 108 for both (Black in the original, and mistaken Brown in the releveled example). Looking at the transformedhair
data, we can seeHair_relevel
has a different value which is certainly wrong.Add any other context about the problem here.
A relevant issue that I believe was incorrectly closed: #6853
It's associated Stackoverflow post: https://stackoverflow.com/questions/74294256/h2o-python-relevel-vs-relevel-by-frequency-for-factor-columns
The text was updated successfully, but these errors were encountered: