Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move default operations and metrics to variables #249

Merged

Conversation

alex-jw-brooks
Copy link
Collaborator

@alex-jw-brooks alex-jw-brooks commented Jul 17, 2024

Description of the change

This pull request moves the defaults for metrics / operations into variables instead of yaml files. The main reason for doing this is for convenience of distribution; the current build in the Dockerfile (here) does not bundle the default yaml files into the wheel, which crashes the controller framework upon startup.

There are workarounds in the distribution side, e.g., we could write a setup.py to include assets that aren't python files etc, but IMO it makes more sense to just keep the defaults in variables that can be easily found, since the yaml files are only read once at initialization time anyway.

Note that as part of this, we're also changing default metrics to {} instead of None to make the linter happier

Related issue number

How to verify the PR

  1. Build the Dockerfile with docker build -t fms-hf-tuning:dev . -f build/Dockerfile
  2. Try running the controller framework on a tiny model. For example...

a. Make a directory named tc_test_resources.
b. Create the following files:

i. tc_test_resources/config.json

{
    "model_name_or_path": "Maykeye/TinyLLama-v0",
    "training_data_path": "/tc_test_resources/twitter_complaints_small.json",
    "validation_data_path": "/tc_test_resources/twitter_complaints_small.json",
    "output_dir": "/tc_test_resources/tc_test_model",
    "num_train_epochs": 20.0,
    "per_device_train_batch_size": 4,
    "per_device_eval_batch_size": 4,
    "gradient_accumulation_steps": 1,
    "evaluation_strategy": "epoch",
    "save_strategy": "epoch",
    "learning_rate": 1e-5,
    "response_template": "\n### Label:",
    "dataset_text_field": "output",
    "peft_method": "pt",
    "num_virtal_tokens": 100,
    "prompt_tuning_init_text": "Classify if the tweet is a complaint or not:",
    "trainer_controller_config_file": "/tc_test_resources/exit_if_epoch_num_exceeds_one.yaml"
}

ii. /tc_test_resources/twitter_complaints_small.json

{"Tweet text":"@HMRCcustomers No this is my first job","ID":0,"Label":2,"text_label":"no complaint","output":"### Text: @HMRCcustomers No this is my first job\n\n### Label: no complaint"}
{"Tweet text":"@KristaMariePark Thank you for your interest! If you decide to cancel, you can call Customer Care at 1-800-NYTIMES.","ID":1,"Label":2,"text_label":"no complaint","output":"### Text: @KristaMariePark Thank you for your interest! If you decide to cancel, you can call Customer Care at 1-800-NYTIMES.\n\n### Label: no complaint"}
{"Tweet text":"If I can't get my 3rd pair of @beatsbydre powerbeats to work today I'm doneski man. This is a slap in my balls. Your next @Bose @BoseService","ID":2,"Label":1,"text_label":"complaint","output":"### Text: If I can't get my 3rd pair of @beatsbydre powerbeats to work today I'm doneski man. This is a slap in my balls. Your next @Bose @BoseService\n\n### Label: complaint"}
{"Tweet text":"@EE On Rosneath Arial having good upload and download speeds but terrible latency 200ms. Why is this.","ID":3,"Label":1,"text_label":"complaint","output":"### Text: @EE On Rosneath Arial having good upload and download speeds but terrible latency 200ms. Why is this.\n\n### Label: complaint"}
{"Tweet text":"Couples wallpaper, so cute. :) #BrothersAtHome","ID":4,"Label":2,"text_label":"no complaint","output":"### Text: Couples wallpaper, so cute. :) #BrothersAtHome\n\n### Label: no complaint"}
{"Tweet text":"@mckelldogs This might just be me, but-- eyedrops? Artificial tears are so useful when you're sleep-deprived and sp\u2026 https:\/\/t.co\/WRtNsokblG","ID":5,"Label":2,"text_label":"no complaint","output":"### Text: @mckelldogs This might just be me, but-- eyedrops? Artificial tears are so useful when you're sleep-deprived and sp\u2026 https:\/\/t.co\/WRtNsokblG\n\n### Label: no complaint"}
{"Tweet text":"@Yelp can we get the exact calculations for a business rating (for example if its 4 stars but actually 4.2) or do we use a 3rd party site?","ID":6,"Label":2,"text_label":"no complaint","output":"### Text: @Yelp can we get the exact calculations for a business rating (for example if its 4 stars but actually 4.2) or do we use a 3rd party site?\n\n### Label: no complaint"}
{"Tweet text":"@nationalgridus I have no water and the bill is current and paid. Can you do something about this?","ID":7,"Label":1,"text_label":"complaint","output":"### Text: @nationalgridus I have no water and the bill is current and paid. Can you do something about this?\n\n### Label: complaint"}
{"Tweet text":"Never shopping at @MACcosmetics again. Every time I go in there, their employees are super rude\/condescending. I'll take my $$ to @Sephora","ID":8,"Label":1,"text_label":"complaint","output":"### Text: Never shopping at @MACcosmetics again. Every time I go in there, their employees are super rude\/condescending. I'll take my $$ to @Sephora\n\n### Label: complaint"}
{"Tweet text":"@JenniferTilly Merry Christmas to as well. You get more stunning every year \ufffd\ufffd","ID":9,"Label":2,"text_label":"no complaint","output":"### Text: @JenniferTilly Merry Christmas to as well. You get more stunning every year \ufffd\ufffd\n\n### Label: no complaint"}

iii. /tc_test_resources/exit_if_epoch_num_exceeds_one.yaml

controller_metrics:
  - name: trainer_state
    class: TrainingState
  - name: evalmetric
    class: EvalMetrics
controllers:
  - name: exit_if_epoch_num_exceeds_one
    triggers:
      - on_epoch_end
    rule: trainer_state["epoch"] > 1
    operations:
      - hfcontrols.should_training_stop

c. Start an interactive session mounting the absolute path to the dir holding the files you just made

TC_RESOURCE_DIR={...}
docker run -it -v $TC_RESOURCE_DIR:/tc_test_resources docker.io/library/fms-hf-tuning:dev bash

d. Point to the new config and run accelerate launch

export SFT_TRAINER_CONFIG_JSON_PATH=/tc_test_resources/config.json
python accelerate_launch.py 

You'll see the controller framework is working properly because it will terminate training when the condition is satisfied, i.e., 2 epochs have passed, even though we asked to train for 20.

Was the PR tested

Ran instructions above ^

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Copy link
Collaborator

@anhuong anhuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for catching this Alex!

@anhuong anhuong merged commit ed619ed into foundation-model-stack:main Jul 17, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants