Move default operations and metrics to variables #249
Merged
+11
−29
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of the change
This pull request moves the defaults for metrics / operations into variables instead of yaml files. The main reason for doing this is for convenience of distribution; the current
build
in the Dockerfile (here) does not bundle the default yaml files into the wheel, which crashes the controller framework upon startup.There are workarounds in the distribution side, e.g., we could write a
setup.py
to include assets that aren't python files etc, but IMO it makes more sense to just keep the defaults in variables that can be easily found, since the yaml files are only read once at initialization time anyway.Note that as part of this, we're also changing default metrics to
{}
instead ofNone
to make the linter happierRelated issue number
How to verify the PR
docker build -t fms-hf-tuning:dev . -f build/Dockerfile
a. Make a directory named
tc_test_resources
.b. Create the following files:
i.
tc_test_resources/config.json
ii.
/tc_test_resources/twitter_complaints_small.json
iii.
/tc_test_resources/exit_if_epoch_num_exceeds_one.yaml
c. Start an interactive session mounting the absolute path to the dir holding the files you just made
TC_RESOURCE_DIR={...} docker run -it -v $TC_RESOURCE_DIR:/tc_test_resources docker.io/library/fms-hf-tuning:dev bash
d. Point to the new config and run accelerate launch
You'll see the controller framework is working properly because it will terminate training when the condition is satisfied, i.e., 2 epochs have passed, even though we asked to train for 20.
Was the PR tested
Ran instructions above ^