[Startup Plan] Don't manage to get CPU optimized inference API #31

Matthieu-Tinycoaching · 2021-06-09T11:47:59Z

Hi community,

I have subscribed a 7-day free trial of the Startup Plan and I wish to test CPU optimized inference API on this model: https://huggingface.co/Matthieu/stsb-xlm-r-multilingual-custom

However, when using the below code:

import json
import requests

API_URL = "https://api-inference.huggingface.co/models/Matthieu/stsb-xlm-r-multilingual-custom"
headers = {"Authorization": "Bearer API_ORG_TOKEN"}

def query(payload):
    data = json.dumps(payload)
    response = requests.request("POST", API_URL, headers=headers, data=data)
    return json.loads(response.content.decode("utf-8")), response.headers.get('x-compute-type')

payload1 = {"inputs": "Navigateur Web : Ce logiciel permet d'accéder à des pages web depuis votre ordinateur. Il en existe plusieurs téléchargeables gratuitement comme Google Chrome ou Mozilla. Certains sont même déjà installés comme Safari sur Mac OS et Edge sur Microsoft.", "options": {"use_cache": False}}

sentence_embeddings1, x_compute_type1 = query(payload1)
print(sentence_embeddings1)
print(x_compute_type1)

I got the sentence embeddings but x-compute-type header of my request return cpu and not cpu+optimized. Do I have to ask something to have CPU optimized inference?

Thanks!

The text was updated successfully, but these errors were encountered:

LysandreJik · 2021-06-09T15:49:20Z

Maybe of interest to @Narsil

Narsil · 2021-06-09T15:54:35Z

Hi @Matthieu-Tinycoaching This is linked to:
#26

Community images do not implement:

private models
GPU inference
Acceleration

So what you are seeing is quite normal and is expected.
If you don't mind we should keep the discussion over there as all 3 are correlated.

Matthieu-Tinycoaching · 2021-06-09T16:25:28Z

Hi @Narsil thanks for the feedback.

However I don't understand so how I can test accelerated inference CPU API on my custom public model?

What is testable so on accelerated inference API and what should I benefit from the free trial startup plan from?

Narsil · 2021-06-09T16:51:41Z

Hi, You can test transformers based models with all the API features, not sentence-transformers at the moment.

Also feature-extraction even in transformers does not have every optimizations enabled by default.
feature-extraction extracts raw hidden states, so it might be more sensitive to quantization than other pipelines, we don't know about the end user sensibility to that. It is available for every architecture in transformers, which might also lead to poorer speedups (or slowdowns sometimes) than expected on some architectures if we simply use the defaults.

But if you pin your model we would be able to run a few tests and optimize this pipeline so you can test performance.

Anticipating but feature-extraction and sentence embeddings are usually very fast, so maybe try to batch part of the inputs, it will reduce the HTTP + network overhead of the overall computation. (Simply send a list of strings within inputs instead of a single sentence)

osanseviero · 2021-06-09T17:08:52Z

Hi @Narsil.

Anticipating but feature-extraction and sentence embeddings are usually very fast, so maybe try to batch part of the inputs, it will reduce the HTTP + network overhead of the overall computation. (Simply send a list of strings within inputs instead of a single sentence)

Please correct me if I'm wrong. There is no support batch at the moment (although it should be almost trivial to change, it was also requested by @Kvit in UKPLab/sentence-transformers#925 (comment)).

Matthieu-Tinycoaching · 2021-06-09T18:20:25Z

Hi @Narsil

You can test transformers based models with all the API features, not sentence-transformers at the moment.

Thank you for this light. Do you have an approximate schedule to when sentence-transformers will be available with all the API features?

I ran some load testing on my public model on model hub. So, if I couldn't have access to accelerated (CPU or GPU) inference for the moment I am intrigued by which architecture enabled me to load testing on CPU my public custom model. Could you precise to me physical characteristics/architecture are used then and to which pricing this correspond to since I could test it even with free plan. This, in order to better compare my benchmark on different cloud service solutions.

But if you pin your model we would be able to run a few tests and optimize this pipeline so you can test performance.

I have pin my custom model on both CPU and GPU devices. Thanks in advance for the optimization on your side in order to enable me to test performance before the end of my startup plan trial!

Anticipating but feature-extraction and sentence embeddings are usually very fast, so maybe try to batch part of the inputs, it will reduce the HTTP + network overhead of the overall computation. (Simply send a list of strings within inputs instead of a single sentence)

As highlighted by @osanseviero is there no support batch at the moment? Is there any practical tutorial on how to easily batch part of the inputs and retrieve corresponding outputs when dealing with real-time application where each input is a request from a different user?

Thanks for your time!

Narsil mentioned this issue Jun 9, 2021

Run production grade private models #26

Closed

LysandreJik transferred this issue from huggingface/huggingface_hub Mar 16, 2022

osanseviero mentioned this issue Sep 5, 2022

Text classification for Scikit-learn #92

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Startup Plan] Don't manage to get CPU optimized inference API #31

[Startup Plan] Don't manage to get CPU optimized inference API #31

Matthieu-Tinycoaching commented Jun 9, 2021

LysandreJik commented Jun 9, 2021

Narsil commented Jun 9, 2021

Matthieu-Tinycoaching commented Jun 9, 2021 •

edited

Narsil commented Jun 9, 2021 •

edited

osanseviero commented Jun 9, 2021 •

edited

Matthieu-Tinycoaching commented Jun 9, 2021

[Startup Plan] Don't manage to get CPU optimized inference API #31

[Startup Plan] Don't manage to get CPU optimized inference API #31

Comments

Matthieu-Tinycoaching commented Jun 9, 2021

LysandreJik commented Jun 9, 2021

Narsil commented Jun 9, 2021

Matthieu-Tinycoaching commented Jun 9, 2021 • edited

Narsil commented Jun 9, 2021 • edited

osanseviero commented Jun 9, 2021 • edited

Matthieu-Tinycoaching commented Jun 9, 2021

Matthieu-Tinycoaching commented Jun 9, 2021 •

edited

Narsil commented Jun 9, 2021 •

edited

osanseviero commented Jun 9, 2021 •

edited