Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Employ a fallback str2bool mapping from the feature column's distinct values when the feature's values aren't boolean-like. #1469

Merged
merged 6 commits into from
Nov 9, 2021

Conversation

justinxzhao
Copy link
Collaborator

@justinxzhao justinxzhao commented Nov 5, 2021

In hyperopt, we infer a BINARY feature type if there are two distinct values.

Currently, we use a rather limited whitelist to automatically map string values to booleans. This means that any binary feature column that uses values that aren't in our whitelists explicitly (high/low, good/bad, human/bot) would all be mapped to False.

This is the culprit behind oddly perfect training curves, 0% loss, 100% accuracy on multiple datatsets seen on staging and king (see slack threads 1, 2).

The training_set_metadata also reveals this issue, e.g. twitterbots:

  'str2bool': {'bot': False, 'human': False},
  'bool2str': ['bot', 'human'],

The proposed solution is to use a fallback str2bool mapping derived from the feature column's distinct values when the feature's values aren't boolean-like, using the first distinct value as the value for True (alphabetical order).

  'str2bool': {'bot': True, 'human': False},
  'bool2str': ['human', 'bot'], 
  'fallback_true_label': 'bot',

This appears to fix the training loss curves and accuracy metrics (no longer 100%) 👍

There may still be some performance differences between representing the feature as a binary vs. category, which also use slightly different metrics for loss (BWCE vs. SoftmaxCrossEntropy) and accuracy (CategoryAccuracy vs. Accuracy). Binary could be an honest description of a feature with only two distinct possible values, but category may be more semantically correct for what the feature is supposed to represent. It’s also not impossible that some specific configuration of the model performs better or produces more useful metrics with the output feature represented as a binary, category, or textual representation.

We improve the default preprocessing behavior for binary features, but we still leave this configuration choice choice up to the user, with the option of specifying preprocessing.fallback_true_label to explicitly specify which label to use as true.

@github-actions
Copy link

github-actions bot commented Nov 5, 2021

Unit Test Results

         8 files  ±0         8 suites  ±0   1h 31m 36s ⏱️ - 7m 29s
  2 921 tests +1  2 372 ✔️ +2     549 💤 ±0  0  - 1 
11 684 runs  +4  9 488 ✔️ +5  2 196 💤 ±0  0  - 1 

Results for commit 4b67bc1. ± Comparison against base commit 0af79dd.

♻️ This comment has been updated with latest results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants