Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LMQL fails to use llama-cpp-python to load models #297

Open
klutzydrummer opened this issue Dec 13, 2023 · 3 comments
Open

LMQL fails to use llama-cpp-python to load models #297

klutzydrummer opened this issue Dec 13, 2023 · 3 comments
Labels
question Questions about using LMQL.

Comments

@klutzydrummer
Copy link

I run into multiple issues when trying to use lmql in colab

When running a query without the verbose flag set or set to False, I get this error:

Code:

import requests
from pathlib import Path

model_file = "/content/zephyr-7b-beta.Q4_K_M.gguf"
if Path(model_file).exists():
    print("Model file exists")
else:
    print("Model file does not exist")
    # download model weights
    model_url = "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q4_K_M.gguf"
    r = requests.get(model_url)
    with open(model_file, "wb") as f:
        f.write(r.content)

query_string = """
"Q: What is the sentiment of the following review: ```The food was very good.```?\\n"
"A: [SENTIMENT]"
"""

lmql.run_sync(
    query_string, 
    model = lmql.model(f"local:llama.cpp:{model_file}", 
        tokenizer = 'HuggingFaceH4/zephyr-7b-beta', verbose=False))

Output:

[Loading llama.cpp model from  llama.cpp:/content/zephyr-7b-beta.Q4_K_M.gguf  with {'verbose': False} ]
Exception in thread scheduler-worker:
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lmql/models/lmtp/lmtp_scheduler.py", line 269, in worker
    model = LMTPModel.load(self.model_identifier, **self.model_args)
  File "/usr/local/lib/python3.10/dist-packages/lmql/models/lmtp/backends/lmtp_model.py", line 41, in load
    return LMTPModel.registry[backend_name](model_name, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lmql/models/lmtp/backends/llama_cpp_model.py", line 22, in __init__
    self.llm = Llama(model_path=model_identifier[len("llama.cpp:"):], logits_all=True, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 841, in __init__
    with suppress_stdout_stderr(disable=self.verbose):
  File "/usr/local/lib/python3.10/dist-packages/llama_cpp/_utils.py", line 23, in __enter__
    self.old_stdout_fileno_undup = self.sys.stdout.fileno()
io.UnsupportedOperation: fileno

Restarting the runtime or deleting and spinning up a new instance and setting verbose to True at least clears that portion of code, but never progresses beyond that:
Code:

import requests
from pathlib import Path

model_file = "/content/zephyr-7b-beta.Q4_K_M.gguf"
if Path(model_file).exists():
    print("Model file exists")
else:
    print("Model file does not exist")
    # download model weights
    model_url = "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q4_K_M.gguf"
    r = requests.get(model_url)
    with open(model_file, "wb") as f:
        f.write(r.content)

query_string = """
"Q: What is the sentiment of the following review: ```The food was very good.```?\\n"
"A: [SENTIMENT]"
"""

lmql.run_sync(
    query_string, 
    model = lmql.model(f"local:llama.cpp:{model_file}", 
        tokenizer = 'HuggingFaceH4/zephyr-7b-beta', verbose=True))

Output:

[Loading llama.cpp model from llama.cpp:/content/zephyr-7b-beta.Q4_K_M.gguf  with  {'verbose': True} ]
lmtp generate: [1, 28824, 28747, 1824, 349, 272, 21790, 302, 272, 2296, 4058, 28747, 8789, 1014, 2887, 403, 1215, 1179, 28723, 13940, 28832, 28804, 13, 28741, 28747, 28705] / '<s>Q: What is the sentiment of the following review: ```The food was very good.```?\nA: ' (26 tokens, temperature=0.0, max_tokens=128)

GPU RAM never increases, System RAM usage will increase sometimes, but doesn't seem consistent. It appears the model never actually loads and so never does any inferencing.

I've tried following examples from both the documentation and from other sites with no luck. A minimal example of the error canbe found at the following colab notebook:
https://colab.research.google.com/drive/1aND43pi3v11fW_2kTYDHLaoxXV69HPWq?usp=sharing

@klutzydrummer klutzydrummer changed the title LMQL fails to use llama to load models LMQL fails to use llama-cpp-python to load models Dec 13, 2023
@klutzydrummer
Copy link
Author

I realize the error with verbosity is a llama-cpp-python bug that has an open issue.

I've updated my example to show the lmql specific issue I am encountering.
I can load the LLM with llama-cpp-python directly and get completions, however, when I try to use lmql I still don't get output, just hanging.

@lbeurerkellner
Copy link
Collaborator

lbeurerkellner commented Dec 17, 2023

Hi there, I just tried to reproduce this on my workstation, and it seems to all work. Can you make sure to re-install llama.cpp with the correct build flags to enable GPU support?

My test code:

import requests
from pathlib import Path
import lmql

model_file = "/home/luca/repos/lmql/zephyr-7b-beta.Q5_K_M.gguf"

query_string = """
"Q: What is the sentiment of the following review: ```The food was very good.```?\\n"
"A: [SENTIMENT]" where len(TOKENS(SENTIMENT)) < 32
"""

result = lmql.run_sync(
    query_string, 
    model = lmql.model(f"local:llama.cpp:{model_file}", 
        tokenizer = 'HuggingFaceH4/zephyr-7b-beta', 
        verbose=False
    ))

print([result])

Note that I put a token constraint on SENTIMENT, just to make sure we don't have termination issues here. For Colab, please note that sometimes the root of problems with termination is asyncio-related, so maybe try running in a standalone script first.

After re-installing llama.cpp do you still the same issue?

@lbeurerkellner lbeurerkellner added the question Questions about using LMQL. label Dec 17, 2023
@lawyinking
Copy link

截圖 2024-01-21 20 33 55

not sure why I got this error, lmql has no attribute 'run_sync'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Questions about using LMQL.
Projects
None yet
Development

No branches or pull requests

3 participants