Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to send batched inputs #222

Merged
merged 2 commits into from
Apr 2, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
26 changes: 23 additions & 3 deletions docs/source/en/quick_tour.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,12 @@ docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingf

<Tip>

Here we pass a `revision=refs/pr/5`, because the `safetensors` variant of this model is currently in a pull request.
Here we pass a `revision=refs/pr/5` because the `safetensors` variant of this model is currently in a pull request.
We also recommend sharing a volume with the Docker container (`volume=$PWD/data`) to avoid downloading weights every run.

</Tip>

Once you have deployed a model you can use the `embed` endpoint by sending requests:
Once you have deployed a model, you can use the `embed` endpoint by sending requests:

```bash
curl 127.0.0.1:8080/embed \
Expand Down Expand Up @@ -72,7 +72,7 @@ volume=$PWD/data
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model --revision $revision
```

Once you have deployed a model you can use the `rerank` endpoint to rank the similarity between a query and a list
Once you have deployed a model, you can use the `rerank` endpoint to rank the similarity between a query and a list
of texts:

```bash
Expand Down Expand Up @@ -101,3 +101,23 @@ curl 127.0.0.1:8080/predict \
-d '{"inputs":"I like you."}' \
-H 'Content-Type: application/json'
```

## Batching

You can send multiple inputs in a batch. For example, for embeddings

```bash
curl 127.0.0.1:8080/embed \
-X POST \
-d '{"inputs":[["Today is a nice day"], ["I like you"]]}' \
osanseviero marked this conversation as resolved.
Show resolved Hide resolved
-H 'Content-Type: application/json'
```

And for Sequence Classification:

```bash
curl 127.0.0.1:8080/predict \
-X POST \
-d '{"inputs":[["I like you."], ["I hate pineapples"]]}' \
-H 'Content-Type: application/json'
```