Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for transforming numeric predictions that were normalized #1015

Conversation

jimthompson5802
Copy link
Collaborator

Code Pull Requests

Proposed solution for the following situation: If a numerical output feature is 'normalized', e.g., 'zscore' or 'minmax', predictions are returned in the 'normalized' value space instead of the output feature's value space. This PR adds functions to do the inverse transformation on output features that were normalized.

@jimthompson5802 jimthompson5802 marked this pull request as ready for review November 20, 2020 21:35
@jimthompson5802
Copy link
Collaborator Author

jimthompson5802 commented Nov 20, 2020

@w4nderlust just marked this ready for review. Summary of changes:

  • added inverse transformations for zscore and minmax normalizations for output feature
  • added new normalization log1p that performs ln(1 + x) and its inverse exp(x) - 1.
  • Wrapped the above in an object structure with transform() and inverse_transform() methods
  • Created registry for the transformation objects.

Let me know what you think. If this looks good, I'll add a unit test for the transformation/inverse transformations.

ludwig/features/numerical_feature.py Outdated Show resolved Hide resolved
ludwig/features/numerical_feature.py Outdated Show resolved Hide resolved
ludwig/features/numerical_feature.py Outdated Show resolved Hide resolved
@jimthompson5802
Copy link
Collaborator Author

@w4nderlust your comments were perfect, code looks cleaner. Once I add the unit test, this should finish off the PR.

ludwig/features/numerical_feature.py Outdated Show resolved Hide resolved
ludwig/features/numerical_feature.py Outdated Show resolved Hide resolved
@w4nderlust
Copy link
Collaborator

@tgaddair we are almost done on this, but before merging I'd first merge your PRs, as this may need to be adapted acordingly. Do you agree?

@tgaddair
Copy link
Collaborator

@w4nderlust yeah that sounds good. It should be fine to merge #1014 followed by this PR. The other should not conflict.

@w4nderlust
Copy link
Collaborator

This looks good to me now. Let's hold a sec before merging as @tgaddair suggested. In the mean time we could add a test ;)

@jimthompson5802
Copy link
Collaborator Author

Added unit test for numeric transformers.
While not related to numeric transformers, I adjusted internal variable names in the cached checksum test.

@ifokeev
Copy link
Contributor

ifokeev commented Nov 23, 2020

Thanks for the feature. We use this code to postprocess numeric:

def postprocess_value(feature_name, value, train_set_metadata):
    # remove suffix
    norm_feature_name = feature_name.replace('_predictions', '')
    post_value = value

    if norm_feature_name in train_set_metadata:
        feature_data = train_set_metadata[norm_feature_name]

        idx2str = feature_data.get('idx2str')
        normalization = feature_data.get('preprocessing', {}).get('normalization')

        if idx2str:
            post_value = idx2str[value]
        elif normalization == 'zscore':
            mean = feature_data['mean']
            std = feature_data['std']
            post_value = float(post_value) * std + mean
        elif normalization == 'minmax':
            min_val = feature_data['min']
            max_val = feature_data['max']
            post_value = float(post_value) * (max_val - min_val) + min_val

    return post_value

and that's pretty awkward. Your solution is more robust

@jimthompson5802
Copy link
Collaborator Author

@ifokeev Thank you. Glad the feature is helpful.

@jimthompson5802
Copy link
Collaborator Author

@ifokeev I forgot to ask. In your postprocess_value() function, I noticed this code fragment:


        idx2str = feature_data.get('idx2str')
        normalization = feature_data.get('preprocessing', {}).get('normalization')

        if idx2str:
            post_value = idx2str[value]

This code implies an output feature type that is non-numeric. Would a similar capability be useful in that other feature type? What is the use case around that other feature type?

@ifokeev
Copy link
Contributor

ifokeev commented Nov 23, 2020

@jimthompson5802

Would a similar capability be useful in that other feature type?

Yeah. That's very useful.

What is the use case around that other feature type?

As I understand there are still no support for category postprocessing. And maybe some others.
I had to postprocess myself all the time.

https://github.com/uber/ludwig/blob/ecbcf334f762484b22cd171a666bea75921dc202/tests/integration_tests/test_savedmodel.py#L170

@w4nderlust
Copy link
Collaborator

The category postprocessing should alreadydo that:
https://github.com/uber/ludwig/blob/master/ludwig/features/category_feature.py#L407-L411

…redictions_transformations

# Conflicts:
#	ludwig/features/numerical_feature.py
@w4nderlust w4nderlust merged commit fcb01e3 into ludwig-ai:master Dec 4, 2020
@jimthompson5802 jimthompson5802 deleted the feature_numeric_predictions_transformations branch December 4, 2020 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants