-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong target! #54
Comments
Thank you for catching this error! Besides that, I think it should still be classification task instead of regression task one beacuse:
|
I think we should remove the train/test selector column completely to avoid serious downstream errors. Plus, this column (despite being "not entirely random") does not add meaningful information to the data. In this survey (pdf), they discussed papers listed on the UCI page that cited the dataset. Many studies used it incorrectly. A few other studies (Turney, Tang et al.), dichotomize the 6th variable (number of alcoholic drinks) using 3 as the numeric threshold (x6 < 3 vs x6 >= 3). If we want to keep the task as classification, I think we should use this threshold and note in the metadata description of (Note that the original study used different threshold values in different experiments.) Alternatively, we can have both datasets |
I fixed this issue with commits above. |
The current target is a train/test selector, not a dependent variable.
According to openML, most analysis was properly done on this dataset uses the 6th feature, "Drinks", as the target.
The PMLB dataset should be changed to reflect this: treat 6th feature, currently "Drinks", as the new
target
, and remove 7th feature.P.S. The metadata.yaml file is currently reflecting this change, minus a TODO that needs to be removed.
The text was updated successfully, but these errors were encountered: