Add integration with Hugging Face `transformers` #713

saattrupdan · 2024-02-28T12:37:07Z

Presentation of the new feature

It should be possible to use the transformers package for inference of generative models, and simply add structured generation from outlines as a "plugin" rather than needing to wrap all models in outlines-specific classes, as it seems like is the current approach.

Instead, transformers supports a prefix_allowed_tokens_fn argument in the generate method, which is a function that returns the allowed tokens to be generated at a given step. The outlines package could thus have a simple function/class, which can be given as this argument, analogous to the current vLLM integration with logits_processors.

Where does it fit in Outlines?

Allows easier integration into inference frameworks that people are mostly using, making outlines more useful to many people.

Are you willing to open a PR?

I would be willing to implement this in a PR, yes. The implementation would be very similar to the current vLLM integration. If I am going to do this, I would need some guidance on preferred directory structures, however. The vLLM integration is inside the serve directory. Should vLLM and this transformers integration be moved into a separate integrations directory, perhaps?

The text was updated successfully, but these errors were encountered:

rlouf · 2024-02-28T19:47:35Z

We have a very good reason to not use the generate method. Why would you want to do that?

saattrupdan · 2024-02-28T19:59:37Z

We have a very good reason to not use the generate method. Why would you want to do that?

Oh, I see, how come?

And my use case is part of a larger evaluation framework, where one of the tasks require structured generation. I use vLLM but fall back to transformers if the model architecture is not supported by vLLM. It would thus be quite convenient to plug in support in the existing code, rather than work with a new model abstraction.

The lmfe structured generation package also feature a convenience function to allow integration with transformers, so I thought you wouldn't mind integrating into that as well?

rlouf · 2024-02-28T20:16:47Z

Oh, I see, how come?

Among other things we need to implementing more sampling algorithms than transformers, see #673.

And my use case is part of a larger evaluation framework, where one of the tasks require structured generation. I use vLLM but fall back to transformers if the model architecture is not supported by vLLM. It would thus be quite convenient to plug in support in the existing code, rather than work with a new model abstraction.

How do you currently use vLLM in your code?

saattrupdan · 2024-02-29T09:45:44Z

Among other things we need to implementing more sampling algorithms than transformers, see #673.

I see. For my case, I would either need to use outlines for all non-vLLM generation (including non-structured), or have outlines plug in to the generate method of Hugging Face. Would outlines be able to be used for all generation?

How do you currently use vLLM in your code?

I wrap vLLM models in a VLLMModel class here, making their API compatible with regular PreTrainedModel from transformers. I then have a convenience function get_ner_logits_processors, which uses outlines to build the logits processors to be included in the vllm.LLM.generate call.

I've now added my own wrapper for the prefix_allowed_tokens_fn in the PreTrainedModel.generate call, using the convenience function get_ner_prefix_allowed_tokens_fn. This uses a custom JSONPrefixAllowedTokens, adapted from your JSONLogitsProcessor, and something like that is what I was hoping to import directly from outlines rather than copying your source code and tweaking it.

Does that make sense? If this seems to out of scope for outlines then feel free to close this issue, and I'll just use my own wrapper 🙂

rlouf · 2024-03-01T12:45:50Z

Would outlines be able to be used for all generation?

It should via generate.text. What kwargs do you pass to generate?

I wrap vLLM models in a VLLMModel class here, making their API compatible with regular PreTrainedModel from transformers. I then have a convenience function get_ner_logits_processors, which uses outlines to build the logits processors to be included in the vllm.LLM.generate call.

I've now added my own wrapper for the prefix_allowed_tokens_fn in the PreTrainedModel.generate call, using the convenience function get_ner_prefix_allowed_tokens_fn. This uses a custom JSONPrefixAllowedTokens, adapted from your JSONLogitsProcessor, and something like that is what I was hoping to import directly from outlines rather than copying your source code and tweaking it.

Would a outlines.models.vllm alias for all this help?

saattrupdan · 2024-03-01T15:40:13Z

@rlouf Just looked a bit more in the code base.

If the generation speed is at least as fast as with transformers, then I think it could be useful to use your SequenceGenerator for the generation. I would need to be able to have it return logprobs as well, however (and turn this on and on, depending on the data). Basically, if the generation function supports everything in a Hugging Face GenerationConfig, that'd be great.

But it sounds like there's no interest from your side to integrate into transformers, only wrap around it? In that case I think I'll stick with my own prefix_allowed_tokens_fn I linked to above.

rlouf · 2024-03-01T15:42:50Z

If the generation speed is at least as fast as with transformers, then I think it could be useful to use your SequenceGenerator for the generation.

As far as I know, yes.

But it sounds like there's no interest from your side to integrate into transformers, only wrap around it? In that case I think I'll stick with my own prefix_allowed_tokens_fn I linked to above.

I'm not sure what you mean here.

saattrupdan · 2024-03-01T15:59:59Z

I'm not sure what you mean here.

I just mean to use the API from Hugging Face transformers, but plug in your structured generation functionality. So without using your Transformer abstraction or anything like that. Literally do everything exactly as normally with transformers, but plug in support for structured generation from outlines.

That's what I love about your vLLM integration, as that's precisely what you do: You allow me to continue working with the vLLM API, and all I have to change in my code is import your JSONLogitsProcessor and plug it into my vLLM config. Two lines of code changed in the code base.

But to do that for Hugging Face transformers is not as easy, since your suggestion was essentially to go from using their API to your custom outlines API. If you had a simple JSONPrefixAllowedFn class (or something like that), then I could do the same as I do with vLLM: Simply import it and plug it into my generate call. Two lines of code changed, and it just works.

Using your outlines.generate could also work, but in that case I'd love to just put in my (Hugging Face) model and put in the exact same arguments that I would put into a generate call. This would just require a lot more maintenance from your side though, since Hugging Face change their API all the time. That's why I think viewing outlines as a "plug-in", as described above, seems a lot more robust in the long run.

rlouf · 2024-03-01T16:13:27Z

That's why I think viewing outlines as a "plug-in", as described above, seems a lot more robust in the long run.

I would agree if we were not working on features that are not available in transformers, such as #667 #657 #673. We can still integrate the way you suggest with transformers, via logits processors. A middle ground solution would be to add transformers specific logits processor in the library. Would that work for you?

saattrupdan · 2024-03-01T16:44:35Z

I would agree if we were not working on features that are not available in transformers, such as #667 #657 #673.

Yeah I completely get that. Down the line this might very well mean that outlines will be go-to generation API of choice for me!

We can still integrate the way you suggest with transformers, via logits processors. A middle ground solution would be to add transformers specific logits processor in the library. Would that work for you?

Yeah I think this is exactly what I would like! The analogy of logits processors in transformers are these prefix allowed tokens functions, but they are very similar to the logits processors. I could add a PR that adds this. Where abouts in the code base would fit? serve/transformers.py?

rlouf · 2024-03-01T17:06:40Z

Yeah I think this is exactly what I would like! The analogy of logits processors in transformers are these prefix allowed tokens functions, but they are very similar to the logits processors. I could add a PR that adds this. Where abouts in the code base would fit? serve/transformers.py?

Yes. I'm not sure where it would make the most sense, maybe in a high-level module processors ?

saattrupdan · 2024-03-01T17:31:57Z

I'm not sure where it would make the most sense, maybe in a high-level module processors ?

Sounds good to me. Should I move the vLLM processors as well in that case? I could keep a reference to them in serve/vllm.py to ensure backwards compatibility.

rlouf · 2024-03-02T06:59:06Z

Yes that would be great, thank you!

saattrupdan · 2024-03-04T16:26:09Z

PR open now: #728

saattrupdan mentioned this issue Mar 4, 2024

Feat/add transformers integration #728

Merged

rlouf closed this as completed in #728 Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add integration with Hugging Face `transformers` #713

Add integration with Hugging Face `transformers` #713

saattrupdan commented Feb 28, 2024

rlouf commented Feb 28, 2024

saattrupdan commented Feb 28, 2024

rlouf commented Feb 28, 2024

saattrupdan commented Feb 29, 2024 •

edited

Loading

rlouf commented Mar 1, 2024

saattrupdan commented Mar 1, 2024

rlouf commented Mar 1, 2024

saattrupdan commented Mar 1, 2024 •

edited by rlouf

Loading

rlouf commented Mar 1, 2024

saattrupdan commented Mar 1, 2024

rlouf commented Mar 1, 2024 •

edited

Loading

saattrupdan commented Mar 1, 2024

rlouf commented Mar 2, 2024

saattrupdan commented Mar 4, 2024

Add integration with Hugging Face transformers #713

Add integration with Hugging Face transformers #713

Comments

saattrupdan commented Feb 28, 2024

Presentation of the new feature

Where does it fit in Outlines?

Are you willing to open a PR?

rlouf commented Feb 28, 2024

saattrupdan commented Feb 28, 2024

rlouf commented Feb 28, 2024

saattrupdan commented Feb 29, 2024 • edited Loading

rlouf commented Mar 1, 2024

saattrupdan commented Mar 1, 2024

rlouf commented Mar 1, 2024

saattrupdan commented Mar 1, 2024 • edited by rlouf Loading

rlouf commented Mar 1, 2024

saattrupdan commented Mar 1, 2024

rlouf commented Mar 1, 2024 • edited Loading

saattrupdan commented Mar 1, 2024

rlouf commented Mar 2, 2024

saattrupdan commented Mar 4, 2024

Add integration with Hugging Face `transformers` #713

Add integration with Hugging Face `transformers` #713

saattrupdan commented Feb 29, 2024 •

edited

Loading

saattrupdan commented Mar 1, 2024 •

edited by rlouf

Loading

rlouf commented Mar 1, 2024 •

edited

Loading