Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add eleuther_eval as recipe #549

Merged
merged 26 commits into from
Mar 24, 2024
Merged

Add eleuther_eval as recipe #549

merged 26 commits into from
Mar 24, 2024

Conversation

joecummings
Copy link
Contributor

@joecummings joecummings commented Mar 21, 2024

Context

Adding EleutherAI Eval Harness as a recipe and removing it as a hard dependency.

Changelog

  • Add eval recipe interface
  • Remove hard dependency on Eleuther
  • Move eval to recipes/
  • Update tests to account for additional functionality
  • Update Github workflow tests to install lm-eval

Testing

  1. CI

  2. Local testing

(torchtune-2) [jrcummings@devvm050.nha0 ~/projects/torchtune (update-eleuther-eval)]$ pytest tests/recipes/test_eleuther_eval.py
========================================================================================== 2 passed, 1 warning in 26.51s ===========================================================================================
  1. Speed & acc

Ours with meta-llama/Llama-2-7b: 1.41s, 0.39 acc, command: tune eleuther_eval --config eleuther_eval tasks=["truthfulqa_mc2"]
Eleuther Harness directly with meta-llama/Llama-2-7b-hf: 1.50s, 0.39 acc, command: lm_eval --model hf --model_args pretrained=meta-llama/Llama-2-7b-hf --tasks truthfulqa_mc2 --device cuda:0 --batch_size 32

Copy link

pytorch-bot bot commented Mar 21, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/549

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 79d1c55 with merge base 81d93bb (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 21, 2024
@joecummings joecummings marked this pull request as draft March 21, 2024 19:56
Copy link

netlify bot commented Mar 21, 2024

Deploy Preview for torchtune-preview ready!

Name Link
🔨 Latest commit 79d1c55
🔍 Latest deploy log https://app.netlify.com/sites/torchtune-preview/deploys/65ff7dc304f10800085e1034
😎 Deploy Preview https://deploy-preview-549--torchtune-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@joecummings joecummings marked this pull request as ready for review March 22, 2024 00:43
@joecummings joecummings changed the title [WIP] Update evals Add eleuther_eval as recipe Mar 22, 2024
@joecummings joecummings changed the title Add eleuther_eval as recipe [WIP] Add eleuther_eval as recipe Mar 22, 2024
from lm_eval.evaluator import evaluate
from lm_eval.models.huggingface import HFLM
from lm_eval.tasks import get_task_dict
except ImportError:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catches if the user has an incorrect version installed or if they don't have any version installed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is basically our workaround so that (a) we can still run eleuther eval as a recipe and (b) we do not have to take every dep on god's green earth in our package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oui - I think it's reasonable that certain recipes may require other dependencies and we can make sure it's called out, but we ourselves don't have to depend on it in our torchtune pkg.

self,
model: TransformerDecoder,
tokenizer: Tokenizer,
*,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KWARGS!

return self._model(inps)

def _model_generate(self, *args, **kwargs):
raise RuntimeError(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this out the hard way. In a rough estimate, 85% of all tasks in Eleuther are not free generation so we have the majority of our bases covered. However, if people open a bunch of issues asking for this, we can add a generation method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason to fail on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the question completely, but here's some possible responses.

Why raise an error here instead of letting it fail in Eleuther? Better UX, more descriptive message.
Why not implement something for generation now? Keeping this PR as simple as possible and b/c of the limited # of free generation tasks, I don't think it's a priority.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second one - got you!

max_seq_length=self._cfg.max_seq_length,
)

# Task initialization API changed between v0.4.1 and 0.4.2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied this from gpt-fast

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give a bit more detail here? And maybe an explicit type of exception?

@joecummings joecummings changed the title [WIP] Add eleuther_eval as recipe Add eleuther_eval as recipe Mar 22, 2024
@@ -65,6 +65,7 @@ jobs:
run: |
python -m pip install -r requirements.txt
python -m pip install -r dev-requirements.txt
python -m pip install lm-eval==0.4.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we decide on this particular version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used by lit-gpt and gpt-fast and is the most up-to-date release, plus there would be significant "hacks" to be put in place between 0.3 and 0.4 to make it work e.g. using BaseLM instead of HFLM.

@@ -25,7 +25,8 @@ The library provides:
- Native-PyTorch implementations of popular LLMs
- Support for checkpoints in various formats, including checkpoints in HF format
- Training recipes for popular fine-tuning techniques with reference benchmarks and comprehensive correctness checks
- Integration with HuggingFace Datasets for training and EleutherAI's Eval Harness for evaluation
- Evaluation of trained models with EleutherAI Eval Harness
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎

Comment on lines 20 to 23
pkg_path = Path(torchtune.__file__).parent.parent.absolute()
EVAL_CONFIG_PATH = Path.joinpath(
pkg_path, "recipes", "configs", "llama2_eleuther_eval.yaml"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this now? Just use the recipe name only, no?

pkg_path, "recipes", "configs", "llama2_eleuther_eval.yaml"
)

models.small_test_ckpt_tune = llama2_small_test_ckpt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge with testing PR changes in #537 and u will live a happy and fulfilling life

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge bad, rebase good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait yall don't use rebase?

torchtune/__init__.py Outdated Show resolved Hide resolved
assert "'acc,none': 0.3" in log_out

@pytest.fixture
def hide_available_pkg(self, monkeypatch):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice trick

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIHI

checkpointer:
_component_: torchtune.utils.FullModelTorchTuneCheckpointer
checkpoint_dir: /tmp/llama/
checkpoint_files: [finetuned_model.pt]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this coming from? Might be nice to align with our output checkpoint file format so that it works out of the box

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the epoch, so not sure what the nice solution is here.

def device(self):
return self._device

def tok_encode(self, string: str, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This choice of param name hurts me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy pasta from gpt-fast

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I think it's also coming from lm_eval tbh

)


_DEFAULT_TASKS = ["hellaswag"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I am wondering if we should even have this. Like yes it's convenient to not have to write it out but it's already caused us some confusion. Like in what case is a user just gonna say "yolo I'll just run eval without even thinking about the task"

@joecummings joecummings merged commit 49b523c into main Mar 24, 2024
22 checks passed
@joecummings joecummings deleted the update-eleuther-eval branch March 24, 2024 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants