Train with and without unbiasing procedure for unbalanced datasets #470

paxcema · 2021-03-24T17:41:19Z

Long story short, all OpenML suite datasets where we perform worse than a constant predictor (i.e. always output the most popular class) are significantly improved if we set the equal_accuracy_for_all_output_categories to False.

As this option is still highly dependent on each particular use case, we might want to enable a grid search of sorts where we test both options in a single predictor, even if only for benchmarking/competition purposes. On the other hand, we could try adding an auto mode for the flag to enable and disable the unbiasing procedure automatically.

The text was updated successfully, but these errors were encountered:

paxcema · 2021-05-03T14:07:58Z

Closed by #501

paxcema added the enhancement New feature or request label Mar 24, 2021

paxcema self-assigned this Mar 24, 2021

paxcema closed this as completed May 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train with and without unbiasing procedure for unbalanced datasets #470

Train with and without unbiasing procedure for unbalanced datasets #470

paxcema commented Mar 24, 2021

paxcema commented May 3, 2021

Train with and without unbiasing procedure for unbalanced datasets #470

Train with and without unbiasing procedure for unbalanced datasets #470

Comments

paxcema commented Mar 24, 2021

paxcema commented May 3, 2021