Add support for transforming numeric predictions that were normalized #1015

jimthompson5802 · 2020-11-19T01:33:49Z

Code Pull Requests

Proposed solution for the following situation: If a numerical output feature is 'normalized', e.g., 'zscore' or 'minmax', predictions are returned in the 'normalized' value space instead of the output feature's value space. This PR adds functions to do the inverse transformation on output features that were normalized.

jimthompson5802 · 2020-11-20T21:41:35Z

@w4nderlust just marked this ready for review. Summary of changes:

added inverse transformations for zscore and minmax normalizations for output feature
added new normalization log1p that performs ln(1 + x) and its inverse exp(x) - 1.
Wrapped the above in an object structure with transform() and inverse_transform() methods
Created registry for the transformation objects.

Let me know what you think. If this looks good, I'll add a unit test for the transformation/inverse transformations.

ludwig/features/numerical_feature.py

jimthompson5802 · 2020-11-21T00:00:57Z

@w4nderlust your comments were perfect, code looks cleaner. Once I add the unit test, this should finish off the PR.

ludwig/features/numerical_feature.py

w4nderlust · 2020-11-22T22:04:33Z

@tgaddair we are almost done on this, but before merging I'd first merge your PRs, as this may need to be adapted acordingly. Do you agree?

tgaddair · 2020-11-22T22:31:07Z

@w4nderlust yeah that sounds good. It should be fine to merge #1014 followed by this PR. The other should not conflict.

ludwig/features/numerical_feature.py

w4nderlust · 2020-11-23T01:44:32Z

This looks good to me now. Let's hold a sec before merging as @tgaddair suggested. In the mean time we could add a test ;)

jimthompson5802 · 2020-11-23T04:07:33Z

Added unit test for numeric transformers.
While not related to numeric transformers, I adjusted internal variable names in the cached checksum test.

ifokeev · 2020-11-23T07:49:15Z

Thanks for the feature. We use this code to postprocess numeric:

def postprocess_value(feature_name, value, train_set_metadata):
    # remove suffix
    norm_feature_name = feature_name.replace('_predictions', '')
    post_value = value

    if norm_feature_name in train_set_metadata:
        feature_data = train_set_metadata[norm_feature_name]

        idx2str = feature_data.get('idx2str')
        normalization = feature_data.get('preprocessing', {}).get('normalization')

        if idx2str:
            post_value = idx2str[value]
        elif normalization == 'zscore':
            mean = feature_data['mean']
            std = feature_data['std']
            post_value = float(post_value) * std + mean
        elif normalization == 'minmax':
            min_val = feature_data['min']
            max_val = feature_data['max']
            post_value = float(post_value) * (max_val - min_val) + min_val

    return post_value

and that's pretty awkward. Your solution is more robust

jimthompson5802 · 2020-11-23T12:55:12Z

@ifokeev Thank you. Glad the feature is helpful.

jimthompson5802 · 2020-11-23T14:06:41Z

@ifokeev I forgot to ask. In your postprocess_value() function, I noticed this code fragment:


        idx2str = feature_data.get('idx2str')
        normalization = feature_data.get('preprocessing', {}).get('normalization')

        if idx2str:
            post_value = idx2str[value]

This code implies an output feature type that is non-numeric. Would a similar capability be useful in that other feature type? What is the use case around that other feature type?

ifokeev · 2020-11-23T14:14:52Z

@jimthompson5802

Would a similar capability be useful in that other feature type?

Yeah. That's very useful.

What is the use case around that other feature type?

As I understand there are still no support for category postprocessing. And maybe some others.
I had to postprocess myself all the time.

https://github.com/uber/ludwig/blob/ecbcf334f762484b22cd171a666bea75921dc202/tests/integration_tests/test_savedmodel.py#L170

w4nderlust · 2020-11-25T00:40:38Z

The category postprocessing should alreadydo that:
https://github.com/uber/ludwig/blob/master/ludwig/features/category_feature.py#L407-L411

…redictions_transformations # Conflicts: # ludwig/features/numerical_feature.py

ludwig/features/numerical_feature.py

…redictions_transformations # Conflicts: # ludwig/features/numerical_feature.py

jimthompson5802 added 2 commits November 18, 2020 20:25

feat: add support for transforming numeric predictions

fc55d36

feat: add log1p normalization for numerical feature

e96fcd2

jimthompson5802 marked this pull request as ready for review November 20, 2020 21:35

w4nderlust reviewed Nov 20, 2020

View reviewed changes

ludwig/features/numerical_feature.py Outdated Show resolved Hide resolved

ludwig/features/numerical_feature.py Outdated Show resolved Hide resolved

ludwig/features/numerical_feature.py Outdated Show resolved Hide resolved

jimthompson5802 added 2 commits November 20, 2020 18:55

refactor: incorporate reviewer comments

5e076bb

doc: incorporated reviewer comments re: error message

43f7261

w4nderlust reviewed Nov 21, 2020

View reviewed changes

ludwig/features/numerical_feature.py Outdated Show resolved Hide resolved

ludwig/features/numerical_feature.py Outdated Show resolved Hide resolved

w4nderlust reviewed Nov 21, 2020

View reviewed changes

ludwig/features/numerical_feature.py Outdated Show resolved Hide resolved

jimthompson5802 added 3 commits November 20, 2020 19:44

refactor: incorporated reviewer comments for improved abstraction

358302f

refactor: add fit_transform_params method

d0d62ed

doc: fix error message text

388ca52

w4nderlust reviewed Nov 22, 2020

View reviewed changes

ludwig/features/numerical_feature.py Outdated Show resolved Hide resolved

jimthompson5802 added 3 commits November 22, 2020 20:15

refactor: incorporated reviewer comments

6f446ea

refactor: old code clean-up

29fc97b

refactor: marked static methods in numeric transformers

d6a65cb

w4nderlust reviewed Nov 23, 2020

View reviewed changes

ludwig/features/numerical_feature.py Outdated Show resolved Hide resolved

refactor: incorporated reviewer comments

3eb5837

jimthompson5802 added 2 commits November 22, 2020 21:26

refactor: cache checksum test internal names

ba184fe

feat: add unit test for numeric transformers

dad61db

jimthompson5802 added 5 commits November 29, 2020 00:32

Merge remote-tracking branch 'upstream/master' into feature_numeric_p…

9cb59fe

…redictions_transformations # Conflicts: # ludwig/features/numerical_feature.py

fix: error retrieving backend df_engine compute

40be903

fix: error preprocessing numeric feature

5bcb7e2

refactor: incorporate new backend design

f2535a6

refactor: incorporate new backend design to unit test

59fd83b

tgaddair reviewed Nov 29, 2020

View reviewed changes

ludwig/features/numerical_feature.py Outdated Show resolved Hide resolved

doc: incorporated reviewer comments

630ef2b

jimthompson5802 mentioned this pull request Dec 4, 2020

Doc updates for new numeric transformation and updates to viz api function signatures ludwig-ai/ludwig-docs#21

Merged

Merge remote-tracking branch 'upstream/master' into feature_numeric_p…

09eacaf

…redictions_transformations # Conflicts: # ludwig/features/numerical_feature.py

w4nderlust merged commit fcb01e3 into ludwig-ai:master Dec 4, 2020

jimthompson5802 deleted the feature_numeric_predictions_transformations branch December 4, 2020 04:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for transforming numeric predictions that were normalized #1015

Add support for transforming numeric predictions that were normalized #1015

jimthompson5802 commented Nov 19, 2020

jimthompson5802 commented Nov 20, 2020 •

edited

Loading

jimthompson5802 commented Nov 21, 2020

w4nderlust commented Nov 22, 2020

tgaddair commented Nov 22, 2020

w4nderlust commented Nov 23, 2020

jimthompson5802 commented Nov 23, 2020

ifokeev commented Nov 23, 2020 •

edited

Loading

jimthompson5802 commented Nov 23, 2020

jimthompson5802 commented Nov 23, 2020

ifokeev commented Nov 23, 2020

w4nderlust commented Nov 25, 2020

Add support for transforming numeric predictions that were normalized #1015

Add support for transforming numeric predictions that were normalized #1015

Conversation

jimthompson5802 commented Nov 19, 2020

Code Pull Requests

jimthompson5802 commented Nov 20, 2020 • edited Loading

jimthompson5802 commented Nov 21, 2020

w4nderlust commented Nov 22, 2020

tgaddair commented Nov 22, 2020

w4nderlust commented Nov 23, 2020

jimthompson5802 commented Nov 23, 2020

ifokeev commented Nov 23, 2020 • edited Loading

jimthompson5802 commented Nov 23, 2020

jimthompson5802 commented Nov 23, 2020

ifokeev commented Nov 23, 2020

w4nderlust commented Nov 25, 2020

jimthompson5802 commented Nov 20, 2020 •

edited

Loading

ifokeev commented Nov 23, 2020 •

edited

Loading