Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Config] Validation in CI + CLI utility #468

Merged
merged 13 commits into from
Mar 15, 2024
Merged

Conversation

RdoubleA
Copy link
Contributor

@RdoubleA RdoubleA commented Mar 8, 2024

Context

There is currently no testing around any of our config files that we expose to the user. E2E recipe tests typically use their own test configs. In order to have fully tested configs, we need to add the following to the CI:

  • Ensuring configs are well-formed: components can be instantiated correctly
  • Ensuring configs can run the recipe as intended

This PR addresses the first gap.

Addresses #466.

Changelog

  • Create validate function that simply iterates through a given configs and tries to instantiate any components
  • Use this as an added test that loops through all our configs and validates them
  • Expose this as a convenient CLI utility that users can employ to quickly confirm that their custom configs are well-formed using tune validate --config my_config.yaml

Test plan

Added unit tests
pytest tests
tune validate --config recipes/configs/alpaca_llama2_full_finetune.yaml

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 8, 2024
Copy link

netlify bot commented Mar 8, 2024

Deploy Preview for torchtune-preview ready!

Name Link
🔨 Latest commit c17834f
🔍 Latest deploy log https://app.netlify.com/sites/torchtune-preview/deploys/65f4d60d964daa00089e2572
😎 Deploy Preview https://deploy-preview-468--torchtune-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@rohan-varma rohan-varma self-requested a review March 8, 2024 20:25
Copy link
Member

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall, thanks for addding this additional checking! A couple of comments inline.

conf = OmegaConf.create(self.invalid_config_string())
OmegaConf.save(conf, dest)

args = f"tune validate --config {dest}".split()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if tune validate will throw at the first invalid formation it detects, or if it should wait and find all of them, then throw an error with all of the issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an interesting idea, I created a custom error to handle raising multiple exceptions. Here's what it would look like to the user:
image

dummy: 3
"""

def test_validate(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this test differ from the one testing the tune CLI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't, maybe I can just mock the actual validate since it's already covered in its own unit test

from torchtune.config._instantiate import _has_component, _instantiate_node


def validate(cfg: DictConfig) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I pass in a config that has:

"blah: 8" where "blah" is just some random string that doesn't mean anything to torchtune's configs, will the system throw with an invalid parameter error or similar? Basically the case I'm thinking of is - if the user makes a typo and sets like "dtyype" instead of "dtype" and we still silently continue, not informing the user that their dtype isn't what they may think it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to distinguish a random, unused parameter from a parameter that has a default value already that you're overriding. For random strings, if the recipe does not use it there there will be no error. It is difficult to enforce not having random parameters because we don't have defined dataclasses/params/structured configs for the recipes, so I don't think it's worth it. In the case of typos, if the parameter is actually used in the recipe it should throw from the recipe itself.

Copy link

pytorch-bot bot commented Mar 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/468

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c17834f with merge base 9c75d48 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR. I think this is a good first step for config validation. I do think we should be very explicit about the errors that this catches vs the ones it doesn't catch, specifically around missing fields. But I acknowledge that's prob quite a bit harder and some validation is a lot better than none. No major concerns from my side on landing this version of validation.


https://docs.python.org/3/library/argparse.html#the-add-argument-method
"""
assert not kwargs.get("required", False), "Required not supported"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we have this to begin with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_component_ = _get_component_from_path(nodedict.get("_component_"))
kwargs = {k: v for k, v in nodedict.items() if k != "_component_"}
sig = inspect.signature(_component_)
sig.bind(**kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool

Comment on lines +35 to +37
except TypeError as e:
if "missing a required argument" in str(e):
sig.bind_partial(**kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, does this actually help catch missing arguments? Seems we are giving a lot of leeway here. E.g. if I have:

class ModelClass:
  def __init__(self, a: float, b: str):
    self.a = a
    self.b = b
...

Then a config like

model: 
  _component_: model.path.ModelClass
  a: something

will pass this check, right? I guess in general we would need to know whether the instantiation in the recipe is instantiate(cfg.model) or instantiate(cfg.model, b='something i construct inside the recipe'), which we can't really know on the config side alone. (Not sure there's a way around this, just pointing it out.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think we'll be able to know as you pointed out. We'll have to rely on an instantiation error from the recipe itself for catching actual missing arguments and not recipe instantiated arguments.

@@ -0,0 +1,84 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a dumb q: why do we need the shebang here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed, just copied over from another file's license stuff

Comment on lines 33 to 36
test2:
_component_: torchtune.utils.get_dtype
dtype: fp32
dummy: 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this redundant? (Maybe I am missing something)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see how this is copied from config/test_validate.py. I agree with the discussion below, would consolidate these in one place.

with pytest.raises(ConfigError) as excinfo:
config.validate(conf)
exc_config = excinfo.value
for e in exc_config.errors:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe assert length here too

@RdoubleA RdoubleA merged commit daf5467 into main Mar 15, 2024
21 checks passed
@RdoubleA RdoubleA deleted the rafiayub/validate_config branch March 15, 2024 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants