Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update models.transformers to use SequenceGeneratorAdapter and OutlinesLogitsProcessors #966

Merged
merged 1 commit into from
Jul 15, 2024

Conversation

lapp0
Copy link
Contributor

@lapp0 lapp0 commented Jun 13, 2024

In draft until huggingface/transformers#31448 makes it into a new transformers release

Fixes #1021

Fixes #789

Fixes #806 (does everything except remove the issues requirement of "remove torch")

Closes #910 (device inconsistency issue handled through other means)

Problem

Solution

  • Use SequenceGeneratorAdapter for transformers instead of SequenceGenerator
  • Implement Transformers.generate and Transformers.stream which use model.generate(logits_processor=...) argument with outlines.processors.OutlinesLogitsProcessor

Additional Changes

TODO:

@lapp0 lapp0 changed the title Transformers use logits processor Update outlines.models.transformers to use SequenceGeneratorAdapter and OutlinesLogitsProcessors Jun 13, 2024
@@ -39,8 +40,9 @@ def regex(model, regex_str: str, sampler: Sampler = multinomial()):


@regex.register(MLXLM)
def regex_mlxlm(
model: MLXLM,
@regex.register(Transformers)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the _unified dispatchers will become the default dispatcher as a next step. In this PR it's just used by MLXLM and Transformers

@lapp0 lapp0 force-pushed the transformers-use-logits-processor branch 18 times, most recently from d94527a to 5bd8832 Compare June 13, 2024 23:20
@lapp0 lapp0 marked this pull request as ready for review June 13, 2024 23:21
@lapp0 lapp0 marked this pull request as draft June 13, 2024 23:36
@lapp0 lapp0 force-pushed the transformers-use-logits-processor branch from 5bd8832 to 1537695 Compare June 13, 2024 23:59
@lapp0 lapp0 marked this pull request as ready for review June 14, 2024 00:00
@lapp0 lapp0 marked this pull request as draft June 14, 2024 02:08
@lapp0 lapp0 force-pushed the transformers-use-logits-processor branch 4 times, most recently from d0fb1a6 to 0167700 Compare June 14, 2024 22:39
@lapp0 lapp0 force-pushed the transformers-use-logits-processor branch 2 times, most recently from 8f9c317 to 6ea583e Compare June 18, 2024 04:46
@@ -30,4 +30,55 @@ tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = models.Transformers(llm, tokenizer)
```

# Using Logits Processors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to improve the documentation to reach something similar to the lamacpp integration's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should plan a restructuring and cleaning up of documentation in a separate issue. I could share some ideas in call on how we might approach this.

In this case, a lot of information documented for llamacpp applies to all other models including transformers. We shouldn't repeat ourselves. We should explain the behavior of all models generally, highlight the models differences with a feature table, and document only transformers specific information on its documentation page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was listing the main arguments to you can pass when initialising and calling the model, cf https://outlines-dev.github.io/outlines/reference/models/llamacpp/

@lapp0 lapp0 force-pushed the transformers-use-logits-processor branch 2 times, most recently from f5ae15e to b75beeb Compare June 21, 2024 14:59
rlouf pushed a commit that referenced this pull request Jun 30, 2024
….py (#998)

A lot of these fixes were intended for
#966 however that's blocked
until there's a new `transformers` release.

These improvements are general to all models and will enable PRs
resolving #806 and
#965

# Structure of `OutlinesLogitsProcessor`

The goal is to create a base class which allows a logits processors to
be implemented once and used for any `outlines.models` inference
library.

To accomplish this we must normalize the input array. It must have a
consistent type (`torch.Tensor`) and consistent dimensionality (2). We
can normalize both of these simply, and without any copy operations.

`mlx.core.array`, `numpy.array`, and `torch.Tensor` all support [pythons
array standard
`__dlpack__`](https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html).
This standard allows for casting between array types without copying.

`torch.Tensor` is the only input type which cannot always be cast to any
other type because torch tensors may live in GPU memory. Therefore, we
cast all arrays to `torch.Tensor`, implement logits processors using
torch methods, and convert back to the original array type in
`OutlinesLogitsProcessor`. See docstring of
`OutlinesLogitsProcessor.__call__()` for more details.

# Detailed Changes
- Rename `BaseLogitsProcessor` to `OutlinesLogitsProcessor`
- Ensure `OutlinesLogitsProcessor.process_logits()` is always passed a
2D batch request with `torch.Tensor` logits and `List` input_ids. Also
clean up code to be more readable in `OutlinesLogitsProcessor__call__()`
- Ensure `FSMLogitsProcessor` allows unstable sequence ordering (beam
search in transformers and vLLM change the order of sequences)
- Update `tests/generate/test_generate.py` to cover more permutations of
  - regex / text 
  - batch / single
  - greedy / multinomial / beam search
  - `stream()` / `generate()`
- Ensure performance stability with difference array libraries through
`benchmark_processors.py`
@lapp0 lapp0 force-pushed the transformers-use-logits-processor branch from b75beeb to cdc78f0 Compare June 30, 2024 21:14
@lapp0 lapp0 force-pushed the transformers-use-logits-processor branch 4 times, most recently from b495c2c to 326ab77 Compare June 30, 2024 21:49
@lapp0 lapp0 marked this pull request as ready for review June 30, 2024 21:59
@lapp0 lapp0 force-pushed the transformers-use-logits-processor branch 2 times, most recently from c24b1fa to 32319df Compare July 2, 2024 20:11
@rlouf
Copy link
Member

rlouf commented Jul 15, 2024

Very good work, thank you!

@rlouf rlouf merged commit 8224855 into dottxt-ai:main Jul 15, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants