-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do i use trust_remove_code=True
for mosaic models?
#361
Comments
#363 is a first step to support |
hi! i am able to get the server to run properly with an MPT model (converted to HF format using their scripts in their llm-foundry repo).. I can run generate fine using normal python and HF generate().. but the generate server (using the same generate parameters) doesn't work sadly.. it just returns an empty string... wondering if you ran into the same problem? thank you!! okay.. edit.. seems like the issue is the the decoding is stopping prematurely due to the EOS token.. is there a way to make it just generate up to the sequence length? looking through it looks like there is something in rust-land to turn this on (it looks like for testing), but I can't pass anything through the REST endpoint to trigger this behavior |
@metemadi did you ever get any further with this? I managed to stand up MPT-7B in the container but I was also only getting a single returned token. |
@OlivierDehaene Do you know exactly what's causing the issues with the MPT model? I'm looking at making a fix. |
Hi @harryjulian and others! So unfortunately no - I can run the code (I take an MPT model I trained and then used their helper script in the |
PR #514 should help run MPT models on TGI. It doesn't use flash (yet) becauses that requires forking and extending flash attention kernel. |
# What does this PR do? This adds a non flash version of MPT. Flash is harder because we need to create a bias ready cuda kernel of flash attention. Fixes huggingface/text-generation-inference#361 Fixes huggingface/text-generation-inference#491 Fixes huggingface/text-generation-inference#290
# What does this PR do? This adds a non flash version of MPT. Flash is harder because we need to create a bias ready cuda kernel of flash attention. Fixes huggingface/text-generation-inference#361 Fixes huggingface/text-generation-inference#491 Fixes huggingface/text-generation-inference#290
# What does this PR do? This adds a non flash version of MPT. Flash is harder because we need to create a bias ready cuda kernel of flash attention. Fixes huggingface/text-generation-inference#361 Fixes huggingface/text-generation-inference#491 Fixes huggingface/text-generation-inference#290
Feature request
https://discuss.huggingface.co/t/how-to-use-trust-remote-code-true-with-load-checkpoint-and-dispatch/39849/1
Motivation
model-specific params
Your contribution
sure
The text was updated successfully, but these errors were encountered: