Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llm] Create separate Predictor for LLMs and enable flash attention on CUDA #3409

Merged
merged 29 commits into from
May 22, 2023

Conversation

tgaddair
Copy link
Collaborator

@tgaddair tgaddair commented May 18, 2023

This PR introduces a new LlmPredictor class used only during batch prediction for the purpose of running text generation instead of simply outputting logits (as during training) from the forward pass. This is because at predict time we want the fully generated sequence, not just the logits.

Other fixes included in this PR:

  • Fix DeepSpeed inflight errors by moving unused modules and metrics out of the top-level LLM module.
  • Removed hack disabling eval mode when using DeepSpeed (fixed by [BUG] INFLIGHT parameters after evaluation microsoft/DeepSpeed#3068)
  • Fix bfloat16 metrics computation
  • Add support for fine-tuning without an adapter
  • Uses flash attention for training and prediction when the model is on CUDA

This PR also disables prompt_tuning as an adapter type for now, as generation mode does not currently work when this adapter is applied (see tests/integration_tests/test_llm.py::test_llm_finetuning_strategies[prompt_tuning_init_random-local]).

@github-actions
Copy link

github-actions bot commented May 18, 2023

Unit Test Results

       6 files  ±       0         6 suites  ±0   1h 28m 20s ⏱️ + 16m 55s
2 760 tests +2 727  2 747 ✔️ +2 718  12 💤 +8  1 +1 
2 814 runs  +2 715  2 796 ✔️ +2 709  17 💤 +5  1 +1 

For more details on these failures, see this check.

Results for commit bd0ddc2. ± Comparison against base commit cb37535.

♻️ This comment has been updated with latest results.

@tgaddair tgaddair requested a review from arnavgarg1 May 21, 2023 20:05
@tgaddair tgaddair marked this pull request as ready for review May 21, 2023 20:05
@tgaddair tgaddair changed the title WIP: Create separate predictors for LLMs [llm] Create separate Predictor for LLMs May 21, 2023
ludwig/models/llm.py Outdated Show resolved Hide resolved
ludwig/models/llm.py Outdated Show resolved Hide resolved
Comment on lines +40 to +45
class DictWrapper:
"""Wrapper for a LudwigFeatureDict module that allows for iteration over keys.

The purpose of this class is to avoid exposing input and output features as modules of the LLM. This is because we
only wish to train the underlying model, and having these additional modules can confuse systems like DeepSpeed.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clever way of getting around initializing all of the other modules

ludwig/models/llm.py Outdated Show resolved Hide resolved
@arnavgarg1 arnavgarg1 changed the title [llm] Create separate Predictor for LLMs [llm] Create separate Predictor for LLMs and enable flash attention on CUDA May 22, 2023
Copy link
Contributor

@arnavgarg1 arnavgarg1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@arnavgarg1
Copy link
Contributor

Failed test looks like a transient issue, it should be safe to merge

@tgaddair tgaddair merged commit ec194d9 into master May 22, 2023
14 of 16 checks passed
@tgaddair tgaddair deleted the llm-predictor branch May 22, 2023 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants