Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configuration section for global feature parameters #2208

Conversation

arnavgarg1
Copy link
Contributor

@arnavgarg1 arnavgarg1 commented Jun 29, 2022

This PR enables users to specify type-specific default preprocessing, encoder, decoder and loss related parameters by introducing a new top-level config section called defaults.

In particular, this PR:

  1. Moves existing feature preprocessing globals out of the preprocessing section such that the preprocessing section is only for things like splitting, etc.
  2. Add a new defaults section that covers feature type defaults across all features.

Example:

preprocessing:
    split:
        type: random
        probabilities: [0.7, 0.1, 0.2]
    oversample_minority: 0.5
defaults:
    category:
        preprocessing:        
            missing_value_strategy: fill_with_const
            fill_value: <UNK>
        encoder:
            type: sparse
        decoder:
            norm_params: null
            dropout: 0
            use_bias: true
        loss:
            type: softmax_cross_entropy
            confidence_penalty: 0

@arnavgarg1 arnavgarg1 linked an issue Jun 29, 2022 that may be closed by this pull request
@github-actions
Copy link

github-actions bot commented Jun 29, 2022

Unit Test Results

       5 files   -     1         5 suites   - 1   2h 10m 33s ⏱️ - 21m 28s
2 943 tests +    8  2 894 ✔️ +  8    49 💤 ±  0  0 ±0 
8 704 runs   - 101  8 542 ✔️  - 80  162 💤  - 21  0 ±0 

Results for commit 2979cad. ± Comparison against base commit 640821a.

♻️ This comment has been updated with latest results.

@arnavgarg1 arnavgarg1 self-assigned this Jun 30, 2022
@arnavgarg1 arnavgarg1 added feature New feature or request v0.6 labels Jun 30, 2022
@arnavgarg1 arnavgarg1 marked this pull request as ready for review June 30, 2022 17:35
@arnavgarg1 arnavgarg1 marked this pull request as draft July 8, 2022 08:18
@arnavgarg1 arnavgarg1 marked this pull request as ready for review July 8, 2022 17:54
@@ -58,6 +63,21 @@
SCHEDULER_DICT = {"type": "async_hyperband", "time_attr": "time_total_s"}


def _merge_preprocessing(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we delete/update the merge_preprocessing function in preprocessing.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinxzhao I don't think that's necessary because https://github.com/ludwig-ai/ludwig/pull/2208/files#diff-671042bc7390c5a898a22f285e44a717dc6276bc0fd3e81142e8fc468e2e6e23R35 function does the work to pass in the data just like how it was being done before, which I also think is the cleanest way to do this.

This _merge_preprocessing is unnecessary and I've updated the code with all that's needed for this test.

Copy link
Contributor

@justinxzhao justinxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment, otherwise LGTM.

Copy link
Contributor

@justinxzhao justinxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnavgarg1 could you also take a stab at updating the documentation for 0.6? https://ludwig-ai.github.io/ludwig-docs/0.5/configuration/

@arnavgarg1
Copy link
Contributor Author

@justinxzhao One thing I noticed from Connor's PR is that he's updated the tests to reflect the new nesting of encoders and decoders. Right now everything works because of backward compatibility, but should I create a follow up PR that modifies the tests to use the updated Ludwig config with defaults?

@arnavgarg1
Copy link
Contributor Author

@justinxzhao One thing I noticed from Connor's PR is that he's updated the tests to reflect the new nesting of encoders and decoders. Right now everything works because of backward compatibility, but should I create a follow up PR that modifies the tests to use the updated Ludwig config with defaults?

Talked to @justinxzhao - will create a follow up PR with modified tests. Closing this issue for now.

@arnavgarg1 arnavgarg1 merged commit abca7b3 into master Jul 28, 2022
@arnavgarg1 arnavgarg1 deleted the 2130-rfc-add-configuration-section-for-global-feature-parameters branch July 28, 2022 03:33
@arnavgarg1 arnavgarg1 changed the title 2130 rfc add configuration section for global feature parameters Add configuration section for global feature parameters Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC: Add configuration section for global feature parameters
2 participants