Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support roberta #62

Merged
merged 2 commits into from
Nov 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 5 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,7 @@ such as:

### Supported Models

You can use any JinaBERT model with Alibi or absolute positions or any BERT, CamemBERT or XLM-RoBERTa model with
absolute positions in `text-embeddings-inference`.
You can use any JinaBERT model with Alibi or absolute positions or any BERT, CamemBERT, RoBERTa, or XLM-RoBERTa model with absolute positions in `text-embeddings-inference`.

**Support for other model types will be added in the future.**

Expand Down Expand Up @@ -96,8 +95,8 @@ curl 127.0.0.1:8080/embed \
-H 'Content-Type: application/json'
```

**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
We also recommend using NVIDIA drivers with CUDA version 12.0 or higher.
**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
We also recommend using NVIDIA drivers with CUDA version 12.0 or higher.

To see all options to serve your models:

Expand Down Expand Up @@ -236,7 +235,7 @@ Text Embeddings Inference ships with multiple Docker images that you can use to
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-0.3.0 |
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-0.3.0 (experimental) |

**Warning**: Flash Attention is turned off by default for the Turing image as it suffers from precision issues.
**Warning**: Flash Attention is turned off by default for the Turing image as it suffers from precision issues.
You can turn Flash Attention v1 ON by using the `USE_FLASH_ATTENTION=True` environment variable.

### API documentation
Expand Down Expand Up @@ -329,7 +328,7 @@ cargo install --path router -F candle-cuda-turing --no-default-features
cargo install --path router -F candle-cuda --no-default-features
```

You can now launch Text Embeddings Inference on GPU with:
You can now launch Text Embeddings Inference on GPU with:

```shell
model=BAAI/bge-large-en-v1.5
Expand Down
1 change: 1 addition & 0 deletions backends/candle/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ impl CandleBackend {
if config.model_type != Some("bert".to_string())
&& config.model_type != Some("xlm-roberta".to_string())
&& config.model_type != Some("camembert".to_string())
&& config.model_type != Some("roberta".to_string())
{
return Err(BackendError::Start(format!(
"Model {:?} is not supported",
Expand Down
14 changes: 8 additions & 6 deletions router/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -215,12 +215,14 @@ async fn main() -> Result<()> {
tokenizer.with_padding(None);

// Position IDs offset. Used for Roberta and camembert.
let position_offset =
if &config.model_type == "xlm-roberta" || &config.model_type == "camembert" {
config.pad_token_id + 1
} else {
0
};
let position_offset = if &config.model_type == "xlm-roberta"
|| &config.model_type == "camembert"
|| &config.model_type == "roberta"
{
config.pad_token_id + 1
} else {
0
};
let max_input_length = config.max_position_embeddings - position_offset;

let tokenization_workers = args
Expand Down