High Latency for a simple CFG generation #617
-
Describe the issue as clearly as possible:I was trying to use outlines with the CFG listed here along with the TinyLlama model on a GPU machine. The latency is very high even for a simple grammar like this. It takes 20+ seconds to generate the response for a simple prompt like "generate a random policy". Since Steps/code to reproduce the bug:# Install required libraries. Skipping that here. Also, run this on a GPU. On a CPU, this will take several minutes.
import outlines
import time
model = outlines.models.transformers("TinyLlama/TinyLlama-1.1B-Chat-v0.6", device="cuda")
policy_grammar = r'''
start: policy
policy : START_OBJECT _NL version_block _NL id_block _NL statement_block _NL END_OBJECT
version_block: ("Version:\t") ("2008-10-17" | "2012-10-17")
id_block: ("Id:\t") WORD
EOL: /(\n)/
_NL: /(\t\n)/
statement_block: ("Statement:\t") START_ARRAY _NL "\t" statement~2..4 _NL "\t" END_ARRAY
statement: START_OBJECT _NL "\t" sid_block _NL "\t" principal_block _NL "\t" effect_block _NL "\t" action_block _NL "\t"
...
INTEGER : /[0-9]{2,5}/
//WS: " "
WORD: /"[\w\d]{2,5}"/ // Alternative: /[^-:#()\[\]{}\n\s]{3,10}/
LONG_WORD: /"[\w\d]{6,10}"/
START_OBJECT: "{"
END_OBJECT: "}"
START_ARRAY: "["
END_ARRAY: "]"
COMMA: ","
COLON: ":"
//STRING: "@:/[^"]*/"
//INT: [0.9]+;
%import common.NUMBER
%import common.STRING
%import common.WS
%import common.WS_INLINE
%import common.NEWLINE
%ignore NEWLINE
//%ignore WS_INLINE
'''
generator = outlines.generate.cfg(model, policy_grammar)
start_time = time.time()
sequence = generator("Generate some random policy.")
print("Total time for generation: ", time.time() - start_time) Expected result:Total time for generation < 3 seconds Error message:No error but the time taken is 20+ seconds in each run which is unacceptable for practical usage. Outlines/Python version information:Version information
```
0.0.18
Python 3.11.6
aiohttp==3.9.1
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.1.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
attrs==23.1.0
aws-lambda-powertools==2.32.0
awscurl==0.32
Babel==2.13.1
backoff==2.2.1
beartype==0.16.4
beautifulsoup4==4.12.2
bleach==6.1.0
boto3==1.33.7
botocore==1.33.7
certifi==2023.11.17
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
clarifai==9.11.1
clarifai-grpc==9.11.5
click==8.1.7
cloudpickle==2.2.1
cohere==4.40
comm==0.2.0
ConfigArgParse==1.7
configparser==6.0.0
contextlib2==21.6.0
dataclasses-json==0.6.3
datasets==2.15.0
debugpy==1.8.0
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.7
distro==1.9.0
docker==7.0.0
executing==2.0.1
fastapi==0.95.2
fastavro==1.9.2
fastjsonschema==2.19.0
filelock==3.13.1
fqdn==1.5.1
frozenlist==1.4.1
fsspec==2023.10.0
glibc==0.6.1
google-pasta==0.2.0
googleapis-common-protos==1.62.0
greenlet==3.0.3
grpcio==1.60.1
grpcio-tools==1.60.1
h11==0.14.0
httpcore==1.0.2
httpx==0.26.0
huggingface-hub==0.19.4
icontract==2.6.6
idna==3.6
importlib-metadata==6.11.0
inquirerpy==0.3.4
InstructorEmbedding==1.0.1
interegular==0.3.2
ipykernel==6.27.1
ipython==8.18.1
ipywidgets==8.1.1
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.3.2
json5==0.9.14
jsonpatch==1.33
jsonpointer==2.4
jsonschema==4.20.0
jsonschema-specifications==2023.11.2
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.9.0
jupyter-lsp==2.2.1
jupyter_client==8.6.0
jupyter_core==5.5.0
jupyter_server==2.11.2
jupyter_server_terminals==0.4.4
jupyterlab==4.0.9
jupyterlab-widgets==3.0.9
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.2
langchain==0.1.0
langchain-community==0.0.10
langchain-core==0.1.8
langsmith==0.0.78
lark==1.1.8
llvmlite==0.41.1
manifest-ml==0.0.1
markdown-it-py==3.0.0
MarkupSafe==2.1.3
marshmallow==3.20.1
matplotlib-inline==0.1.6
mdurl==0.1.2
mistune==3.0.2
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.15
mypy-extensions==1.0.0
nbclient==0.9.0
nbconvert==7.12.0
nbformat==5.9.2
nest-asyncio==1.5.8
networkx==3.2.1
nlpcloud==1.1.45
nltk==3.8.1
notebook==7.0.6
notebook_shim==0.2.3
numba==0.58.1
numpy==1.26.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
openai==1.6.1
opencv-python==4.9.0.80
openlm==0.0.5
outlines==0.0.18
overrides==7.4.0
packaging==23.2
pandas==2.1.3
pandocfilters==1.5.0
parso==0.8.3
pathos==0.3.1
perscache==0.6.1
pexpect==4.9.0
pfzy==0.3.4
Pillow==10.1.0
platformdirs==4.1.0
pox==0.3.3
ppft==1.7.6.7
prometheus-client==0.19.0
prompt-toolkit==3.0.41
protobuf==4.25.1
protobuf-to-pydantic==0.2.3
psutil==5.9.6
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==14.0.2
pyarrow-hotfix==0.6
pycparser==2.21
pydantic==1.10.13
pydantic_core==2.14.5
Pygments==2.17.2
python-dateutil==2.8.2
python-json-logger==2.0.7
python-rapidjson==1.14
pytz==2023.3.post1
PyYAML==6.0.1
pyzmq==25.1.1
qtconsole==5.5.1
QtPy==2.4.1
redis==5.0.1
referencing==0.31.1
regex==2023.10.3
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.7.0
rpds-py==0.13.2
s3transfer==0.8.2
safetensors==0.4.1
sagemaker==2.203.1
schema==0.7.5
scikit-learn==1.3.2
scipy==1.11.4
Send2Trash==1.8.2
sentence-transformers==2.2.2
sentencepiece==0.1.99
six==1.16.0
smdebug-rulesconfig==1.0.1
sniffio==1.3.0
soupsieve==2.5
SQLAlchemy==2.0.25
sqlitedict==2.1.0
stack-data==0.6.3
starlette==0.27.0
sympy==1.12
tblib==2.0.0
tenacity==8.2.3
terminado==0.18.0
threadpoolctl==3.2.0
tinycss2==1.2.1
tokenizers==0.15.0
torch==2.1.1
torchvision==0.16.1
tornado==6.4
tqdm==4.66.1
traitlets==5.14.0
transformers==4.35.2
triton==2.1.0
tritonclient==2.41.1
types-python-dateutil==2.8.19.14
typing-inspect==0.9.0
typing_extensions==4.8.0
tzdata==2023.3
uri-template==1.3.0
urllib3==1.26.18
uvicorn==0.22.0
wcwidth==0.2.12
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0
widgetsnbextension==4.0.9
xxhash==3.4.1
yarl==1.9.4
zipp==3.17.0
```
Context for the issue:No response |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 4 replies
-
I see you're on |
Beta Was this translation helpful? Give feedback.
-
Ran it with version |
Beta Was this translation helpful? Give feedback.
-
In newer versions of outlines the automata are constructed at runtime and cached for future runs. The initial run is slow, but later runs should be faster. There is plenty of room for improvement though. While regex generation is extremely fast, CFG generation still has some problems to be solved. |
Beta Was this translation helpful? Give feedback.
-
CFG-structured generation is very much a WIP implementation. As @lapp0 said regex-structured generation (and JSON by extention) should be fast. |
Beta Was this translation helpful? Give feedback.
-
Quoting from the blog
Are there any pointers for generating a regex from a json schema? I have a json schema from a Pydantic model. Would like to convert it to a regex and try it as well. @rlouf @lapp0 Edit: I think I found it in the source code: |
Beta Was this translation helpful? Give feedback.
-
Only the regex-guided generation in |
Beta Was this translation helpful? Give feedback.
Only the regex-guided generation in
outlines
uses the efficient/optimal approach we described in our paper, to which that statement is referring. The community provided CFG-guided generation takes a different approach and does not offer similar performance guarantees.