Run production grade private models #26

Matthieu-Tinycoaching · 2021-06-08T15:40:15Z

Hello,

Following discussion with @Narsil, he told me that actually model hub is not meant to be able to load private models (as of now), since the api-inference-community was originally intended to promote community libraries that use the hub.

However, in the scope of running production grade private models would it be possible to internally discuss this possibility within model hub?

Thanks!

julien-c · 2021-06-08T16:33:02Z

For clarity, the ability to load private models exists for any model in the transformers library.

Here, which library does your model run in?

Matthieu-Tinycoaching · 2021-06-08T18:14:19Z

@julien-c thanks for clarification.

To more precise in the exchanges I had by mail with @Narsil and @jeffboudier, the problem would be more to use the hugging face API inference from a private model on model hub.

My wish is to use hugging face API inference on this sentence transformer model: https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual

However, output of this model has to pass mean pooling in order to get sentence embeddings. That's why I just customize the model to include this step within the model: https://huggingface.co/Matthieu/stsb-xlm-r-multilingual-custom

Still output of API inference from this custom model hasn't passed mean pooling although I added sentence-transformers tag. I have made this model public since @Narsil seems to state that private model couldn't natively take this tag into account.

Thanks!

jeffboudier · 2021-06-08T19:44:46Z

Hey Matthieu, thanks for reaching out!

One of the key benefits of the 🤗 Accelerated Inference API is that you can serve any compatible model from the Model Hub, wether shared publicly, or uploaded privately to your Hugging Face account.

The subtlety here is that the model you are using is a model from the library sentence-transformers which is integrated with the Model Hub, and with the Inference API through api-inference-community. The custom post-processing behavior you are implementing may not be integrated with the pipeline currently implemented to serve the model.

For more on this I will let @Narsil comment here or in the email thread.

Cheers,
Jeff

osanseviero · 2021-06-08T19:58:48Z

Hi @Matthieu-Tinycoaching.

I'm not sure if there's a problem. I just tried this and I got the sentence embedding:

import json
import requests

API_URL = "https://api-inference.huggingface.co/models/Matthieu/stsb-xlm-r-multilingual-custom"
headers = {"Authorization": "Bearer YOUR_TOKEN"}

def query(payload):
	data = json.dumps(payload)
	response = requests.request("POST", API_URL, headers=headers, data=data)
	return json.loads(response.content.decode("utf-8"))

data = query({"inputs": "Hello, my name is John and I live in New York"})
print(len(data))
# 748

From your email, you got ## {'error': 'Model Matthieu/stsb-xlm-r-multilingual-custom is currently loading', 'estimated_time': 44.490336920000004}. Running the line above again should make you get the response after waiting some seconds.

Note that you can also use the feature-extraction widget in your repo to get the sentence embedding, which means things are working ok. Please let me know if I misunderstood anything.

For future reference, the exact code that does feature-extraction for sentence-transformers can be found here.

Matthieu-Tinycoaching · 2021-06-08T20:59:27Z

Hi @osanseviero thanks for your feedback. Indeed the widget works! It seemed to be just a question of time to be available.

@jeffboudier @Narsil since API seems to work, do you need to do in addition anything on your side to have access to CPU+GPU accelerated inference API based on this custom model?

Best,
Matthieu

jeffboudier · 2021-06-08T21:14:28Z

Hi Matthieu, To get access to accelerated CPU and/or GPU, you just need to start a Lab (CPU) or Startup (CPU+GPU) trial. To specify GPU make sure to add the `- use_gpu` parameter, False by default: https://api-inference.huggingface.co/docs/python/html/detailed_parameters.html To help future users with similar questions do you mind commenting on the GitHub issue and closing it? Cheers, Jeff

…

On Tue, Jun 8, 2021 at 1:59 PM Matthieu-Tinycoaching < ***@***.***> wrote: Hi @osanseviero <https://github.com/osanseviero> thanks for your feedback. Indeed the widget works! It seemed to be just a question of time to be available. @jeffboudier <https://github.com/jeffboudier> @Narsil <https://github.com/Narsil> since API seems to work, do you need to do in addition anything on your side to have access to CPU+GPU accelerated inference API based on this custom model? Best, Matthieu — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <huggingface/api-inference-community#26>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARWHZBLKNLAHXLPORAMJ4MLTR2AD5ANCNFSM46KHEIDQ> .

Narsil · 2021-06-09T07:47:32Z

Hi, it's mostly linked to the fact that models from api-inference-community does not use an auth_token mostly.

.from_pretrained(..., use_auth_token=XXXXX).

Because it is not enforced by this repository, the docker will not be able to see any private models, and will fail at load time (the load mechanism will work).

The reason for this issue, is to raise awarenesss, and gather a consensus on the direction.
I can see 3 directions:

Force docker images to use the auth token, enabling private models. Caveat: No acceleration on those models, probably not the same level of support either.
Do not force the images to support auth_tokens, make clearer that private models on community frameworks do not work.
Make sentence-transformers an exception as it is so closely aligned with transformers, that making it core is much simpler than other community models (I'm thinking spacy design for instance)

I am in favor of option 1 (or 3), but in my eyes, the question of acceleration + GPU will come soon enough, and the amount of features will start piling up and be less community friendly.

Edit: as expected, #34 and #31 expect acceleration + GPU which are not supported (by design) currently

david429429 · 2023-12-04T03:38:30Z

Tea por

Narsil · 2023-12-04T16:06:59Z

@david429429 I'm closing this very old thread.

If you want production grade inference, you should try spaces (free form inference) or hf-endpoints (something closer to what is here, with actually also free form but much simpler to setup if you just want to deploy a given model).

julien-c assigned jeffboudier Jun 8, 2021

jeffboudier assigned Narsil Jun 8, 2021

This was referenced Jun 9, 2021

[Startup Plan] Don't manage to get CPU optimized inference API #31

Open

[Startup Plan]: Failed to launch GPU inference #34

Open

osanseviero mentioned this issue Jun 18, 2021

Step by step guide on adding Model Hub support to libraries huggingface/huggingface_hub#86

Merged

LysandreJik transferred this issue from huggingface/huggingface_hub Mar 16, 2022

osanseviero mentioned this issue Mar 16, 2022

Support running private models with community inference API #28

Open

Narsil closed this as completed Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run production grade private models #26

Run production grade private models #26

Matthieu-Tinycoaching commented Jun 8, 2021

julien-c commented Jun 8, 2021

Matthieu-Tinycoaching commented Jun 8, 2021

jeffboudier commented Jun 8, 2021

osanseviero commented Jun 8, 2021

Matthieu-Tinycoaching commented Jun 8, 2021

jeffboudier commented Jun 8, 2021 via email

Narsil commented Jun 9, 2021 •

edited

david429429 commented Dec 4, 2023

Narsil commented Dec 4, 2023

Run production grade private models #26

Run production grade private models #26

Comments

Matthieu-Tinycoaching commented Jun 8, 2021

julien-c commented Jun 8, 2021

Matthieu-Tinycoaching commented Jun 8, 2021

jeffboudier commented Jun 8, 2021

osanseviero commented Jun 8, 2021

Matthieu-Tinycoaching commented Jun 8, 2021

jeffboudier commented Jun 8, 2021 via email

Narsil commented Jun 9, 2021 • edited

david429429 commented Dec 4, 2023

Narsil commented Dec 4, 2023

Narsil commented Jun 9, 2021 •

edited