Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Response Returning NONE (Custom LLM) #1216

Closed
onnyyonn opened this issue Apr 17, 2023 · 3 comments
Closed

Query Response Returning NONE (Custom LLM) #1216

onnyyonn opened this issue Apr 17, 2023 · 3 comments

Comments

@onnyyonn
Copy link

I am trying to reprduce the Paul Graham Essay example with custom LLM. Here is my code:

from langchain.llms.base import LLM
from llama_index import SimpleDirectoryReader, GPTTreeIndex, PromptHelper
from llama_index import LLMPredictor, ServiceContext
from transformers import pipeline
from typing import Optional, List, Mapping, Any

# define prompt helper
# set maximum input size
max_input_size = 2048
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

class CustomLLM(LLM):
    model_name = "google/flan-t5-base"
    pipeline = pipeline("text2text-generation", model=model_name, device="cuda:3")

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        prompt_length = len(prompt)
        response = self.pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]

        # only return newly generated tokens
        return response[prompt_length:]

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": self.model_name}

    @property
    def _llm_type(self) -> str:
        return "custom"

# define our LLM
llm_predictor = LLMPredictor(llm=CustomLLM())

documents = SimpleDirectoryReader('data').load_data()

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)    
index = GPTTreeIndex.from_documents(documents, service_context=service_context)

response = index.query("What did the author do growing up?")
print(response)

I am getting response None. Any idea what may be the issue?

I am running 0.5.16 version of llama-index.

@logan-markewich
Copy link
Collaborator

@onnyyonn FLAN max input size is 512. Trying lowering to that, and lower num_output to maybe 128?

@onnyyonn
Copy link
Author

@logan-markewich I just realized that I get an error while querying.

INFO:llama_index.indices.tree.leaf_query:> Starting query: What did the author do growing up?
--- Logging error ---
Traceback (most recent call last):
  File "../lib/python3.11/logging/__init__.py", line 1110, in emit
    msg = self.format(record)
          ^^^^^^^^^^^^^^^^^^^
  File "../lib/python3.11/logging/__init__.py", line 953, in format
    return fmt.format(record)
           ^^^^^^^^^^^^^^^^^^
  File "../lib/python3.11/logging/__init__.py", line 687, in format
    record.message = record.getMessage()
                     ^^^^^^^^^^^^^^^^^^^
  File "../lib/python3.11/logging/__init__.py", line 377, in getMessage
    msg = msg % self.args
          ~~~~^~~~~~~~~~~
TypeError: not all arguments converted during string formatting
Call stack:
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "../lib/python3.11/site-packages/ipykernel_launcher.py", line 17, in <module>
    app.launch_new_instance()
  File "../lib/python3.11/site-packages/traitlets/config/application.py", line 1043, in launch_instance
    app.start()
  File "../lib/python3.11/site-packages/ipykernel/kernelapp.py", line 725, in start
    self.io_loop.start()
  File "../lib/python3.11/site-packages/tornado/platform/asyncio.py", line 215, in start
    self.asyncio_loop.run_forever()
  File "../lib/python3.11/asyncio/base_events.py", line 607, in run_forever
    self._run_once()
  File "../lib/python3.11/asyncio/base_events.py", line 1922, in _run_once
    handle._run()
  File "../lib/python3.11/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "../lib/python3.11/site-packages/ipykernel/kernelbase.py", line 513, in dispatch_queue
    await self.process_one()
  File "../lib/python3.11/site-packages/ipykernel/kernelbase.py", line 502, in process_one
    await dispatch(*args)
  File "../lib/python3.11/site-packages/ipykernel/kernelbase.py", line 409, in dispatch_shell
    await result
  File "../lib/python3.11/site-packages/ipykernel/kernelbase.py", line 729, in execute_request
    reply_content = await reply_content
  File "../lib/python3.11/site-packages/ipykernel/ipkernel.py", line 422, in do_execute
    res = shell.run_cell(
  File "../lib/python3.11/site-packages/ipykernel/zmqshell.py", line 540, in run_cell
    return super().run_cell(*args, **kwargs)
  File "../lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3006, in run_cell
    result = self._run_cell(
  File "../lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3061, in _run_cell
    result = runner(coro)
  File "../lib/python3.11/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
    coro.send(None)
  File "../lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3266, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "../lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3445, in run_ast_nodes
    if await self.run_code(code, result, async_=asy):
  File "../lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_14047/3987531067.py", line 1, in <module>
    response = index.query("What did the author do growing up?")
  File "../lib/python3.11/site-packages/llama_index/indices/base.py", line 255, in query
    return query_runner.query(query_str)
  File "../lib/python3.11/site-packages/llama_index/indices/query/query_runner.py", line 349, in query
    return query_combiner.run(query_bundle, level)
  File "../lib/python3.11/site-packages/llama_index/indices/query/query_combiner/base.py", line 66, in run
    return self._query_runner.query_transformed(
  File "../lib/python3.11/site-packages/llama_index/indices/query/query_runner.py", line 209, in query_transformed
    return query_obj.query(query_bundle)
  File "../lib/python3.11/site-packages/llama_index/token_counter/token_counter.py", line 78, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
  File "../lib/python3.11/site-packages/llama_index/indices/query/base.py", line 396, in query
    return self._query(query_bundle)
  File "../lib/python3.11/site-packages/llama_index/indices/tree/leaf_query.py", line 238, in _query
    response_str = self._query_level(
  File "../lib/python3.11/site-packages/llama_index/indices/tree/leaf_query.py", line 149, in _query_level
    ) = self._service_context.llm_predictor.predict(
  File "../lib/python3.11/site-packages/llama_index/llm_predictor/base.py", line 223, in predict
    llm_prediction = self._predict(prompt, **prompt_args)
  File "../lib/python3.11/site-packages/llama_index/llm_predictor/base.py", line 197, in _predict
    llm_prediction = retry_on_exceptions_with_backoff(
  File "../lib/python3.11/site-packages/llama_index/utils.py", line 177, in retry_on_exceptions_with_backoff
    return lambda_fn()
  File "../lib/python3.11/site-packages/llama_index/llm_predictor/base.py", line 198, in <lambda>
    lambda: llm_chain.predict(**full_prompt_args),
  File "../lib/python3.11/site-packages/langchain/chains/llm.py", line 151, in predict
    return self(kwargs)[self.output_key]
  File "../lib/python3.11/site-packages/langchain/chains/base.py", line 113, in __call__
    outputs = self._call(inputs)
  File "../lib/python3.11/site-packages/langchain/chains/llm.py", line 57, in _call
    return self.apply([inputs])[0]
  File "../lib/python3.11/site-packages/langchain/chains/llm.py", line 118, in apply
    response = self.generate(input_list)
  File "../lib/python3.11/site-packages/langchain/chains/llm.py", line 62, in generate
    return self.llm.generate_prompt(prompts, stop)
  File "../lib/python3.11/site-packages/langchain/llms/base.py", line 107, in generate_prompt
    return self.generate(prompt_strings, stop=stop)
  File "../lib/python3.11/site-packages/langchain/llms/base.py", line 137, in generate
    output = self._generate(prompts, stop=stop)
  File "../lib/python3.11/site-packages/langchain/llms/base.py", line 324, in _generate
    text = self._call(prompt, stop=stop)
  File "/tmp/ipykernel_14047/3507775511.py", line 7, in _call
    response = self.pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]
  File "../lib/python3.11/site-packages/transformers/pipelines/text2text_generation.py", line 165, in __call__
    result = super().__call__(*args, **kwargs)
  File "../lib/python3.11/site-packages/transformers/pipelines/base.py", line 1109, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "../lib/python3.11/site-packages/transformers/pipelines/base.py", line 1116, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
  File "../lib/python3.11/site-packages/transformers/pipelines/base.py", line 1015, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "../lib/python3.11/site-packages/transformers/pipelines/text2text_generation.py", line 187, in _forward
    output_ids = self.model.generate(**model_inputs, **generate_kwargs)
  File "../lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "../lib/python3.11/site-packages/transformers/generation/utils.py", line 1322, in generate
    logger.warn(
Message: 'Both `max_new_tokens` (=128) and `max_length`(=129) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)'
Arguments: (<class 'UserWarning'>,)
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 350 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 0 tokens

Unfortunately changing max_input_size and num_output didn't resolve the issue. If this issue is FLAN -specific, I can test it with another model. Do you have any suggestion?

@logan-markewich
Copy link
Collaborator

Yea tbh 512 is way too small to work with. I suggest something with 2048 minimum actually. Feel free to reach out on discord if you have any other questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants