-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run production grade private models #26
Comments
For clarity, the ability to load private models exists for any model in the Here, which library does your model run in? |
@julien-c thanks for clarification. To more precise in the exchanges I had by mail with @Narsil and @jeffboudier, the problem would be more to use the hugging face API inference from a private model on model hub. My wish is to use hugging face API inference on this sentence transformer model: https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual However, output of this model has to pass mean pooling in order to get sentence embeddings. That's why I just customize the model to include this step within the model: https://huggingface.co/Matthieu/stsb-xlm-r-multilingual-custom Still output of API inference from this custom model hasn't passed mean pooling although I added Thanks! |
Hey Matthieu, thanks for reaching out! One of the key benefits of the 🤗 Accelerated Inference API is that you can serve any compatible model from the Model Hub, wether shared publicly, or uploaded privately to your Hugging Face account. The subtlety here is that the model you are using is a model from the library For more on this I will let @Narsil comment here or in the email thread. Cheers, |
I'm not sure if there's a problem. I just tried this and I got the sentence embedding: import json
import requests
API_URL = "https://api-inference.huggingface.co/models/Matthieu/stsb-xlm-r-multilingual-custom"
headers = {"Authorization": "Bearer YOUR_TOKEN"}
def query(payload):
data = json.dumps(payload)
response = requests.request("POST", API_URL, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
data = query({"inputs": "Hello, my name is John and I live in New York"})
print(len(data))
# 748 From your email, you got Note that you can also use the For future reference, the exact code that does |
Hi @osanseviero thanks for your feedback. Indeed the widget works! It seemed to be just a question of time to be available. @jeffboudier @Narsil since API seems to work, do you need to do in addition anything on your side to have access to CPU+GPU accelerated inference API based on this custom model? Best, |
Hi Matthieu,
To get access to accelerated CPU and/or GPU, you just need to start a Lab
(CPU) or Startup (CPU+GPU) trial. To specify GPU make sure to add the `-
use_gpu` parameter, False by default:
https://api-inference.huggingface.co/docs/python/html/detailed_parameters.html
To help future users with similar questions do you mind commenting on the
GitHub issue and closing it?
Cheers,
Jeff
…On Tue, Jun 8, 2021 at 1:59 PM Matthieu-Tinycoaching < ***@***.***> wrote:
Hi @osanseviero <https://github.com/osanseviero> thanks for your
feedback. Indeed the widget works! It seemed to be just a question of time
to be available.
@jeffboudier <https://github.com/jeffboudier> @Narsil
<https://github.com/Narsil> since API seems to work, do you need to do in
addition anything on your side to have access to CPU+GPU accelerated
inference API based on this custom model?
Best,
Matthieu
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<huggingface/api-inference-community#26>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARWHZBLKNLAHXLPORAMJ4MLTR2AD5ANCNFSM46KHEIDQ>
.
|
Hi, it's mostly linked to the fact that models from
Because it is not enforced by this repository, the docker will not be able to see any private models, and will fail at load time (the load mechanism will work). The reason for this issue, is to raise awarenesss, and gather a consensus on the direction.
I am in favor of option 1 (or 3), but in my eyes, the question of acceleration + GPU will come soon enough, and the amount of features will start piling up and be less community friendly. Edit: as expected, #34 and #31 expect acceleration + GPU which are not supported (by design) currently |
Tea por |
@david429429 I'm closing this very old thread. If you want production grade inference, you should try |
Hello,
Following discussion with @Narsil, he told me that actually model hub is not meant to be able to load private models (as of now), since the api-inference-community was originally intended to promote community libraries that use the hub.
However, in the scope of running production grade private models would it be possible to internally discuss this possibility within model hub?
Thanks!
The text was updated successfully, but these errors were encountered: