Some inferences take forever to complete #450

gaspard-dv · 2023-12-19T20:00:47Z

Issue description

The issue was raised by other people on Discord too.

To quote one of them:

I'm running the same query 10 times (with equivalent prompts and output sizes), but some inferences are taking abnormally longer than others.

their screenshot

Repro

I made a reproduction code snippet that can run in Google Colab (w/ free T4 GPU):

💻 Code snippet

pip install outlines==0.0.13 transformers datasets optimum auto-gptq accelerate

from outlines import models
from outlines.text.generate import json, continuation
from json import dumps
from time import perf_counter
import torch


prompt = """<|system|>
You are a friendly AI assistant.
You're specialized in mathematics and open source Github repositories.
Your answers must be concise and factual.</s>
<|user|>
Write a very long poem</s>
<|assistant|>
"""
output_format = {
    "type": "object",
    "properties": {
        "poem": {"type": "string"}
    }
}
model = models.transformers("TheBloke/zephyr-7B-beta-GPTQ", device="cuda")

rng = torch.Generator(device="cuda")
rng.manual_seed(789001)

errors = []
for i in range(20):
  start_time = perf_counter()
  try:
    sequence = json(model, dumps(output_format))(prompt, rng=rng)
    poem = sequence.get('poem')
    elapsed_time = round(perf_counter() - start_time)
    n_characters_per_second = len(poem) // elapsed_time
    print(f"{i}\t{elapsed_time}\t{n_characters_per_second}\t{poem[:30]}..")
  except Exception as e:
    errors.append(e)
    print(f"{i}\t{elapsed_time}\tINFERENCE FAILED")

📃 Output

0	14	76	In the vastness of cosmic spac..
1	14	INFERENCE FAILED
2	769	0	In this universe, a vast expan..
3	389	0	In ancient lands, where skies ..
4	16	67	In the depths of the cosmos, w..
5	35	70	In the stillness of the mornin..
6	32	60	In a universe vast and unceasi..
7	13	77	75000 lines of blank verse, hi..
8	22	69	In a land of purest light, Who..
9	34	59	A cosmic dance of stars, a sym..
10	49	68	In the land of the digit, wher..
11	34	78	In a world vast and unknown,  ..
12	43	68	There was a time when words we..
13	54	70	In a world where chaos reigns..
14	12	62	Let the words unfurl like the ..
15	330	0	Infinity beckons from the far ..
16	31	60	In the depths of the universe,..
17	137	0	In this vast expanse of time a..
18	32	81	in this universe vast and unfa..

💥 Exceptions raised

import traceback

for error in errors:
    try:
        raise error
    except Exception as e:
        traceback.print_exc()


Traceback (most recent call last):
  File "<ipython-input-6-d8471672a411>", line 5, in <cell line: 3>
    raise error
  File "<ipython-input-5-1a425bb0404a>", line 8, in <cell line: 5>
    sequence = json(model, dumps(output_format))(prompt, rng=rng)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/outlines/text/generate/sequence.py", line 240, in __call__
    result = self.postprocess_completions(result)
  File "/usr/local/lib/python3.10/dist-packages/outlines/text/generate/regex.py", line 226, in postprocess_completions
    return [self.format_fn(result) for result in results]
  File "/usr/local/lib/python3.10/dist-packages/outlines/text/generate/regex.py", line 226, in <listcomp>
    return [self.format_fn(result) for result in results]
  File "/usr/local/lib/python3.10/dist-packages/outlines/text/generate/regex.py", line 397, in <lambda>
    format_fn = lambda x: pyjson.loads(x)
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid \escape: line 2 column 570 (char 571)

Results

✅ 14 inferences succeeded fast
⏰ 5 inferences succeeded but were extremely slow (indices: 2, 3, 15, 17, 19)
💥 1 inference failed fast (index: 1)

Outlines/Python version information:

Outlines 0.0.13
Python 3.10.12

The text was updated successfully, but these errors were encountered:

rlouf · 2023-12-19T20:49:16Z

Thank you so much for the detailed report! Will come back to you shortly.

brandonwillard · 2023-12-20T21:53:17Z

These timing results contain significant non-inference setup steps (e.g. json(model, dumps(output_format))).

gaspard-dv · 2023-12-21T12:51:06Z

Yes indeed!
json(model, dumps(output_format)) takes a few seconds to complete and shouldn't be in the for-loop.
But this is not the step that gets "stuck".

rlouf · 2024-01-07T08:11:30Z

It would still be nice to have results without having it in the loop, and use cProfile to understand which step "gets stuck". To get to similar experimental conditions I would also use the maxLength field constraint.

lapp0 · 2024-05-09T10:18:41Z

Please try

class OutputModel(BaseModel):
    poem: str

And pass OutputModel instead output_format. This schema ensures 'required': ['poem'] attr is included and you don't have any generations missing the poem key.

Additionally, you will need to set whitespace_pattern as explained here #690 (comment)

json(model, dumps(output_format), whitespace_pattern=r"[ ]?")...

With these changes your script works for me and doesn't have any slow or failed inference.

Fixes #839 #908 #690 #450 ## Problem A major problem, especially with smaller language models, is the repetition problem. For example, let's say a model is generating json and must provide 12 space tokens for indentation in json output. Often a language model will assign a high probability to a 13th space token, and do the same for a 14th space, and then enter an infinite space generation loop. This is a problem with NLG that has been known for half a decade, but only has mitigations (mirostat, repetition penalty, using hundreds of billions of weights, etc), no absolute solutions (except for **structured generation**) ## Solution For structured json generation, we set a sane default whitespace pattern of `r"[ ]?"`. This removes all newlines and indentation. It disallows any syntactic whitespace beyond a single space separator. Users can still set the argument `whitespace_pattern=` if they want different behavior

gaspard-dv added the bug label Dec 19, 2023

rlouf added enhancement optimization Related to performance optimizations structured generation Linked to structured generation and removed bug labels Dec 19, 2023

piercefreeman mentioned this issue Feb 20, 2024

Length constraint causes infinite looping of generation #690

Closed

lapp0 mentioned this issue May 23, 2024

Use less problematic whitespace token #916

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some inferences take forever to complete #450

Some inferences take forever to complete #450

gaspard-dv commented Dec 19, 2023 •

edited

Loading

rlouf commented Dec 19, 2023

brandonwillard commented Dec 20, 2023 •

edited

Loading

gaspard-dv commented Dec 21, 2023

rlouf commented Jan 7, 2024 •

edited

Loading

lapp0 commented May 9, 2024

Some inferences take forever to complete #450

Some inferences take forever to complete #450

Comments

gaspard-dv commented Dec 19, 2023 • edited Loading

Issue description

Repro

Results

Outlines/Python version information:

rlouf commented Dec 19, 2023

brandonwillard commented Dec 20, 2023 • edited Loading

gaspard-dv commented Dec 21, 2023

rlouf commented Jan 7, 2024 • edited Loading

lapp0 commented May 9, 2024

gaspard-dv commented Dec 19, 2023 •

edited

Loading

brandonwillard commented Dec 20, 2023 •

edited

Loading

rlouf commented Jan 7, 2024 •

edited

Loading