-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update models.transformers
to use SequenceGeneratorAdapter
and OutlinesLogitsProcessors
#966
Conversation
outlines.models.transformers
to use SequenceGeneratorAdapter
and OutlinesLogitsProcessors
@@ -39,8 +40,9 @@ def regex(model, regex_str: str, sampler: Sampler = multinomial()): | |||
|
|||
|
|||
@regex.register(MLXLM) | |||
def regex_mlxlm( | |||
model: MLXLM, | |||
@regex.register(Transformers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the _unified
dispatchers will become the default dispatcher as a next step. In this PR it's just used by MLXLM
and Transformers
d94527a
to
5bd8832
Compare
5bd8832
to
1537695
Compare
d0fb1a6
to
0167700
Compare
8f9c317
to
6ea583e
Compare
@@ -30,4 +30,55 @@ tokenizer = AutoTokenizer.from_pretrained("gpt2") | |||
model = models.Transformers(llm, tokenizer) | |||
``` | |||
|
|||
# Using Logits Processors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to improve the documentation to reach something similar to the lamacpp integration's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should plan a restructuring and cleaning up of documentation in a separate issue. I could share some ideas in call on how we might approach this.
In this case, a lot of information documented for llamacpp
applies to all other models including transformers
. We shouldn't repeat ourselves. We should explain the behavior of all models generally, highlight the models differences with a feature table, and document only transformers
specific information on its documentation page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was listing the main arguments to you can pass when initialising and calling the model, cf https://outlines-dev.github.io/outlines/reference/models/llamacpp/
f5ae15e
to
b75beeb
Compare
….py (#998) A lot of these fixes were intended for #966 however that's blocked until there's a new `transformers` release. These improvements are general to all models and will enable PRs resolving #806 and #965 # Structure of `OutlinesLogitsProcessor` The goal is to create a base class which allows a logits processors to be implemented once and used for any `outlines.models` inference library. To accomplish this we must normalize the input array. It must have a consistent type (`torch.Tensor`) and consistent dimensionality (2). We can normalize both of these simply, and without any copy operations. `mlx.core.array`, `numpy.array`, and `torch.Tensor` all support [pythons array standard `__dlpack__`](https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html). This standard allows for casting between array types without copying. `torch.Tensor` is the only input type which cannot always be cast to any other type because torch tensors may live in GPU memory. Therefore, we cast all arrays to `torch.Tensor`, implement logits processors using torch methods, and convert back to the original array type in `OutlinesLogitsProcessor`. See docstring of `OutlinesLogitsProcessor.__call__()` for more details. # Detailed Changes - Rename `BaseLogitsProcessor` to `OutlinesLogitsProcessor` - Ensure `OutlinesLogitsProcessor.process_logits()` is always passed a 2D batch request with `torch.Tensor` logits and `List` input_ids. Also clean up code to be more readable in `OutlinesLogitsProcessor__call__()` - Ensure `FSMLogitsProcessor` allows unstable sequence ordering (beam search in transformers and vLLM change the order of sequences) - Update `tests/generate/test_generate.py` to cover more permutations of - regex / text - batch / single - greedy / multinomial / beam search - `stream()` / `generate()` - Ensure performance stability with difference array libraries through `benchmark_processors.py`
b75beeb
to
cdc78f0
Compare
b495c2c
to
326ab77
Compare
c24b1fa
to
32319df
Compare
32319df
to
7d43bbd
Compare
Very good work, thank you! |
In draft until huggingface/transformers#31448 makes it into a newtransformers
releaseFixes #1021
Fixes #789
Fixes #806 (does everything except remove the issues requirement of "remove
torch
")Closes #910 (device inconsistency issue handled through other means)
Problem
SequenceGenerator
ifinput_ids
is empty, it fails.models.transformers
has an inconsistent implementation with other modules making the codebase harder to manage.Solution
SequenceGeneratorAdapter
for transformers instead ofSequenceGenerator
Transformers.generate
andTransformers.stream
which usemodel.generate(logits_processor=...)
argument withoutlines.processors.OutlinesLogitsProcessor
Additional Changes
Temporarily disablesstop_at
argument for transformers and implements test to determine whether upstream is fixed (stop_strings
Argument inmodel.generate()
Results in Exception if Generation Completes Withoutstop_string
Being Generated huggingface/transformers#31435)seed
instead ofrng
outlines.generate.cfg
for now (Using context-free grammars to guide generation does not work #959)TODO:
GenerationConfig
defaultmax_tokens
being 20rng
withseed
in all tests and all documentation.transformers
fix forstop_strings
Argument inmodel.generate()
Results in Exception if Generation Completes Withoutstop_string
Being Generated huggingface/transformers#31435 sostop_at
works.