Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NeMo-Guardrails does not work with many other LLM providers #50

Open
serhatgktp opened this issue Jun 21, 2023 · 13 comments
Open

NeMo-Guardrails does not work with many other LLM providers #50

serhatgktp opened this issue Jun 21, 2023 · 13 comments
Assignees

Comments

@serhatgktp
Copy link
Contributor

serhatgktp commented Jun 21, 2023

This issue somewhat overlaps with #27. However, I've chosen to create this issue because it was mentioned in the thread of #27 that support for other LLM models would be added by the end of May. Support was seemingly added with commit e849ee9, but I am unable to use several of the engines due to various bugs.

NeMo-Guardrails doesn't seem to work with the following engines:

  • huggingface_pipeline
  • huggingface_textgen_inference
  • gpt4all

I am not confirming whether it works properly with any other engines. I have only tested with these three engines and failed to interact with the model for each engine.

I have included my configuration and output for gpt4all only. Attempting to use the other two engines above also causes similar issues. If you are able to construct a chatbot with guardrails using any of these engines, please let me know and I will re-evaluate.

Here are my configurations and code:

./config/colang_config.co:

define user express greeting
  "hi"

define bot remove last message
  "(remove last message)"

define flow
  user ...
  bot respond
  $updated_msg = execute check_if_constitutional
  if $updated_msg != $last_bot_message
    bot remove last message
    bot $updated_msg

# Basic guardrail against insults.
define flow
  user express insult
  bot express calmly willingness to help

./config/yaml_config.yaml

models:
  - type: main
    engine: gpt4all
    model: gpt4all-j-v1.3-groovy

./demo_guardrails.py

from nemoguardrails.rails import LLMRails, RailsConfig

def demo():
  # In practice, a folder will be used with the config split across multiple files.
  config = RailsConfig.from_path("./config")
  rails = LLMRails(config)

  # For chat
  new_message = rails.generate(messages=[{
      "role": "user",
      "content": "Hello! What can you do for me?"
  }])

  print("RESPONSE:", new_message)

if __name__ == "__main__":
    demo()

After running demo_guardrails.py with the above configurations, I receive the following output:

Traceback (most recent call last):
  File "/Users/efkan/anaconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/efkan/anaconda3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/efkan/.vscode/extensions/ms-python.python-2023.10.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/Users/efkan/.vscode/extensions/ms-python.python-2023.10.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/Users/efkan/.vscode/extensions/ms-python.python-2023.10.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/Users/efkan/.vscode/extensions/ms-python.python-2023.10.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/Users/efkan/.vscode/extensions/ms-python.python-2023.10.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/efkan/.vscode/extensions/ms-python.python-2023.10.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/Users/efkan/Desktop/repos/ibm-repos-private/demo_guardrails.py", line 17, in <module>
    demo()
  File "/Users/efkan/Desktop/repos/ibm-repos-private/demo_guardrails.py", line 6, in demo
    rails = LLMRails(config)
  File "/Users/efkan/anaconda3/lib/python3.10/site-packages/nemoguardrails/rails/llm/llmrails.py", line 79, in __init__
    self._init_llm()
  File "/Users/efkan/anaconda3/lib/python3.10/site-packages/nemoguardrails/rails/llm/llmrails.py", line 143, in _init_llm
    self.llm = provider_cls(**kwargs)
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for GPT4All
__root__
  Model.__init__() got an unexpected keyword argument 'n_parts' (type=type_error)

It seems that the code fails when initializing the GPT4All model. Please let me know if I've missed anything. Thanks!

@drazvan drazvan self-assigned this Jun 22, 2023
@drazvan
Copy link
Collaborator

drazvan commented Jun 22, 2023

Hi @serhatgktp! Yes, at the end of May we made it technically possible to connect multiple types of LLMs. But, in order to get them to work properly the prompts would also need to be tweaked. We've been working on a set of mechanisms for that. They will be pushed to the repo next week and released to PyPI at the end of next week as 0.3.0. We did manage to successfully use huggingface_pipeline. I will look into your particular config and come back to you (there seems to be a different type of issue). Thanks!

@drazvan
Copy link
Collaborator

drazvan commented Jul 1, 2023

@serhatgktp: you can check out this example: https://github.com/NVIDIA/NeMo-Guardrails/tree/main/examples/llm/hf_pipeline_dolly for how to use HuggingFacePipeline to run a local model. I did not get a chance to look into your specific configuration just yet. Let me know if this helps.

@serhatgktp
Copy link
Contributor Author

serhatgktp commented Jul 3, 2023

Thanks @drazvan, this example seems to work! However, I had to modify the configuration as I was getting the following error with regards to the device parameter:

│ /Users/efkan/anaconda3/lib/python3.10/site-packages/langchain/llms/huggingface_pipeline.py:106   │
│ in from_model_id                                                                                 │
│                                                                                                  │
│   103 │   │   │                                                                                  │
│   104 │   │   │   cuda_device_count = torch.cuda.device_count()                                  │
│   105 │   │   │   if device < -1 or (device >= cuda_device_count):                               │
│ ❱ 106 │   │   │   │   raise ValueError(                                                          │
│   107 │   │   │   │   │   f"Got device=={device}, "                                              │
│   108 │   │   │   │   │   f"device is required to be within [-1, {cuda_device_count})"           │
│   109 │   │   │   │   )                                                                          │
ValueError: Got device==0, device is required to be within [-1, 0)

I modified config.py by excluding device from the initialization of llm so that it uses the default value instead. Seen below:

@lru_cache
def get_dolly_v2_3b_llm():
    repo_id = "databricks/dolly-v2-3b"
    params = {"temperature": 0, "max_length": 1024}
    llm = HuggingFacePipeline.from_model_id(
        model_id=repo_id, task="text-generation", model_kwargs=params
    )
    return llm

It seems that my computer does not have any CUDA-enabled GPUs, which causes an issue when we try to select the "first" CUDA device.

Thanks again!

@QUANGLEA
Copy link

QUANGLEA commented Jul 5, 2023

@serhatgktp I was wondering if you got this warning when you successfully ran the example.
UserWarning: Using "max_length"'s default (1024) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using "max_new_tokens" to control the maximum length of the generation. warnings.warn(

@serhatgktp
Copy link
Contributor Author

@serhatgktp I was wondering if you got this warning when you successfully ran the example. UserWarning: Using "max_length"'s default (1024) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using "max_new_tokens" to control the maximum length of the generation. warnings.warn(

@QUANGLEA Yes, I'm getting that warning as well. However, it seems that using max_new_tokens causes an unexpected keyword error:

TypeError: GPTNeoXForCausalLM.__init__() got an unexpected keyword argument 'max_new_tokens'

I haven't had much time to look into it so I've chosen to ignore it for now.

@QUANGLEA
Copy link

QUANGLEA commented Jul 5, 2023

@serhatgktp I'm getting the same response when using max_new_tokens too. Have you tried using other models other than Dolly? And did you find any success?

@serhatgktp serhatgktp changed the title NeMo-Guardrails does not work with several LLM providers that store models locally NeMo-Guardrails does not work with many other LLM providers Jul 5, 2023
@serhatgktp
Copy link
Contributor Author

@serhatgktp I'm getting the same response when using max_new_tokens too. Have you tried using other models other than Dolly? And did you find any success?

@QUANGLEA I've tried several other models, mainly the promising models from Hugging Face Hub such as Falcon. Unfortunately, I haven't been able to get any of them to work yet.

@drazvan
Copy link
Collaborator

drazvan commented Jul 11, 2023

@serhatgktp What LLMs are you interested in? Were there any specific errors? I'm asking because we will be testing a few more LLM providers on our end in the next couple of weeks. So, maybe we can align our efforts.

Thanks!

@serhatgktp
Copy link
Contributor Author

Thank you for the follow-up @drazvan. I've been trying to choose LLMs that are seemingly powerful, accurate, and not too large.

The following two are the ones I'm most interested in at the moment:

  • tiiuae/falcon-7b-instruct (available on hugging face hub)
  • gpt4all-j-v1.3-groovy (available through gpt4all)

1) Issue with Hugging Face Hub Models

a) Using the Built-in Feature for Hugging Face Hub

The built-in support for Hugging Face Hub currently has bugs as it is giving me the following error:

Configuration:

models:
  - type: main
    engine: huggingface_hub
    model: tiiuae/falcon-7b-instruct    

Error:

Error argument of type 'NoneType' is not iterable while execution generate_user_intent
Traceback (most recent call last):
  File "/Users/efkan/.asdf/installs/python/3.10.5/lib/python3.10/site-packages/nemoguardrails/actions/action_dispatcher.py", line 125, in execute_action
    result = await fn(**params)
  File "/Users/efkan/.asdf/installs/python/3.10.5/lib/python3.10/site-packages/nemoguardrails/actions/llm/generation.py", line 257, in generate_user_intent
    with llm_params(self.llm, temperature=self.config.lowest_temperature):
  File "/Users/efkan/.asdf/installs/python/3.10.5/lib/python3.10/site-packages/nemoguardrails/llm/params.py", line 44, in __enter__
    elif hasattr(self.llm, "model_kwargs") and param in getattr(
TypeError: argument of type 'NoneType' is not iterable

it seems that the code below is failing:

            elif hasattr(self.llm, "model_kwargs") and param in getattr(
                self.llm, "model_kwargs", {}
            ):
                self.original_params[param] = self.llm.model_kwargs[param]
                self.llm.model_kwargs[param] = value

because the following line is returning None:

getattr(self.llm, "model_kwargs", {})

b) Importing Hugging Face Hub Models Externally

We can also use custom wrappers to fetch models externally, similar to how it's done here.

However, when we do so with Hugging Face Hub models, we get the following error:

raceback (most recent call last):
  File "/Users/efkan/anaconda3/lib/python3.10/site-packages/nemoguardrails/actions/action_dispatcher.py", line 125, in execute_action
    result = await fn(**params)
  File "/Users/efkan/anaconda3/lib/python3.10/site-packages/nemoguardrails/actions/llm/generation.py", line 258, in generate_user_intent
    result = await llm_call(self.llm, prompt)
  File "/Users/efkan/anaconda3/lib/python3.10/site-packages/nemoguardrails/actions/llm/utils.py", line 31, in llm_call
    result = await llm.agenerate_prompt(
  File "/Users/efkan/anaconda3/lib/python3.10/site-packages/langchain/llms/base.py", line 136, in agenerate_prompt
    return await self.agenerate(prompt_strings, stop=stop, callbacks=callbacks)
  File "/Users/efkan/anaconda3/lib/python3.10/site-packages/langchain/llms/base.py", line 250, in agenerate
    raise e
  File "/Users/efkan/anaconda3/lib/python3.10/site-packages/langchain/llms/base.py", line 244, in agenerate
    await self._agenerate(prompts, stop=stop, run_manager=run_manager)
  File "/Users/efkan/anaconda3/lib/python3.10/site-packages/langchain/llms/base.py", line 400, in _agenerate
    else await self._acall(prompt, stop=stop)
  File "/Users/efkan/anaconda3/lib/python3.10/site-packages/nemoguardrails/llm/providers.py", line 44, in _acall
    return self._call(*args, **kwargs)
  File "/Users/efkan/anaconda3/lib/python3.10/site-packages/langchain/llms/huggingface_hub.py", line 111, in _call
    raise ValueError(f"Error raised by inference API: {response['error']}")
ValueError: Error raised by inference API: Input validation error: `temperature` must be strictly positive

I believe the code attempts to use the LLM with temperature=0 to keep the output as close to the expected format as possible. However, if I'm not mistaken this cannot be done with Hugging Face Hub models as they expect a positive temperature.

2) GPT4All

a) Built-in

It looks like the issue here is related to how the model path is being passed to langchain. It looks like the values dictionary doesn't have a model key.

Configuration:

models:
  - type: main
    engine: gpt4all
    model: gpt4all-j-v1.3-groovy
    
    # (I also tried by using the path to the model)
    # model: ./models/ggml-gpt4all-l13b-snoozy.bin

Error:

Traceback (most recent call last):
  File "/Users/efkan/Desktop/repos/guardrails-demo/demo.py", line 13, in <module>
    demo()
  File "/Users/efkan/Desktop/repos/guardrails-demo/demo.py", line 5, in demo
    rails = LLMRails(config)
  File "/Users/efkan/.asdf/installs/python/3.10.5/lib/python3.10/site-packages/nemoguardrails/rails/llm/llmrails.py", line 79, in __init__
    self._init_llm()
  File "/Users/efkan/.asdf/installs/python/3.10.5/lib/python3.10/site-packages/nemoguardrails/rails/llm/llmrails.py", line 132, in _init_llm
    self.llm = provider_cls(**kwargs)
  File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__
  File "pydantic/main.py", line 1102, in pydantic.main.validate_model
  File "/Users/efkan/.asdf/installs/python/3.10.5/lib/python3.10/site-packages/langchain/llms/gpt4all.py", line 170, in validate_environment
    model_path=values["model"],
KeyError: 'model'

b) External

The error is below. Not too sure what this one is about.

Error:

Traceback (most recent call last):
  File "/Users/efkan/anaconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/efkan/anaconda3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/efkan/.vscode/extensions/ms-python.python-2023.10.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/Users/efkan/.vscode/extensions/ms-python.python-2023.10.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/Users/efkan/.vscode/extensions/ms-python.python-2023.10.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/Users/efkan/.vscode/extensions/ms-python.python-2023.10.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/Users/efkan/.vscode/extensions/ms-python.python-2023.10.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/efkan/.vscode/extensions/ms-python.python-2023.10.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/Users/efkan/Desktop/repos/ibm-repos-private/demo_guardrails.py", line 17, in <module>
    demo()
  File "/Users/efkan/Desktop/repos/ibm-repos-private/demo_guardrails.py", line 6, in demo
    rails = LLMRails(config)
  File "/Users/efkan/anaconda3/lib/python3.10/site-packages/nemoguardrails/rails/llm/llmrails.py", line 79, in __init__
    self._init_llm()
  File "/Users/efkan/anaconda3/lib/python3.10/site-packages/nemoguardrails/rails/llm/llmrails.py", line 143, in _init_llm
    self.llm = provider_cls(**kwargs)
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for GPT4All
__root__
  Model.__init__() got an unexpected keyword argument 'n_parts' (type=type_error)

To conclude, I think all problems mentioned here are worth of having their own issue but that might not be necessary if you are already working on it and it's close to completion.

Thanks!

@AIAnytime
Copy link

Any progress?

@shikhardadhich
Copy link

Any progress?

I also need the same...running Guardrail using local LLM

@dineshpiyasamara
Copy link

ERROR:nemoguardrails.actions.action_dispatcher:Error argument of type 'NoneType' is not iterable while execution generate_user_intent
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/action_dispatcher.py", line 125, in execute_action
    result = await fn(**params)
  File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/llm/generation.py", line 269, in generate_user_intent
    with llm_params(llm, temperature=self.config.lowest_temperature):
  File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/llm/params.py", line 44, in __enter__
    elif hasattr(self.llm, "model_kwargs") and param in getattr(
TypeError: argument of type 'NoneType' is not iterable
I'm sorry, an internal error has occurred.

Same here... Any progress?

@Sudhu2004
Copy link

Sudhu2004 commented Nov 1, 2023

ERROR:nemoguardrails.actions.action_dispatcher:Error argument of type 'NoneType' is not iterable while execution generate_user_intent
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/action_dispatcher.py", line 125, in execute_action
    result = await fn(**params)
  File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/llm/generation.py", line 269, in generate_user_intent
    with llm_params(llm, temperature=self.config.lowest_temperature):
  File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/llm/params.py", line 44, in __enter__
    elif hasattr(self.llm, "model_kwargs") and param in getattr(
TypeError: argument of type 'NoneType' is not iterable
I'm sorry, an internal error has occurred.

Same here... Any progress?

A same kind of issue is been solved over here /issues/155
You can go through that It helped me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants