`truncation` is not working

In my case (model is served whose architecture is same to bloom), It looks like the truncation doesn't apply. I think this situation occurs after https://github.com/huggingface/text-generation-inference/commit/9987960062e40de2deae030ab7e4ad6f57de0b20. How can I fix this?

```bash
text-generation-launcher \
--model-id /mount/lm_storage/checkpoints/alibi_2048_1.3b_v2 \ # same to bloom
--num-shard 4 \
--port 6006 \
--max-input-length 2048 \
--max-total-tokens 2560
```
```python
from text_generation import Client

client = Client(base_url=f"{my_url}")
text = "how are you?" * 1000
a=client.generate(text, max_new_tokens=1, truncate=2048)
```
```bash
text_generation.errors.ValidationError: Input validation error: `inputs` tokens + `max_new_tokens` must be <= 2560. Given: 5000 `inputs` tokens and 1 `max_new_tokens`
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`truncation` is not working #189

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

truncation is not working #189

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`truncation` is not working #189