You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I'm trying to summarize text, I have collection with documents (5GB). The text is relatively short, just few sentences and I'm getting this error :
Traceback (most recent call last):
File "/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/gradio/queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/gradio/route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/gradio/blocks.py", line 1550, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/gradio/blocks.py", line 1199, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/gradio/utils.py", line 519, in async_iteration
return await iterator.__anext__()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/gradio/utils.py", line 512, in __anext__
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/gradio/utils.py", line 495, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/gradio/utils.py", line 649, in gen_wrapper
yield from f(*args, **kwargs)
File "/Users/slava/Documents/Development/private/AI/h2ogpt/src/gradio_runner.py", line 4401, in bot
for res in get_response(fun1, history, chatbot_role1, speaker1, tts_language1, roles_state1,
File "/Users/slava/Documents/Development/private/AI/h2ogpt/src/gradio_runner.py", line 4296, in get_response
for output_fun in fun1():
File "/Users/slava/Documents/Development/private/AI/h2ogpt/src/gen.py", line 3702, in evaluate
for r in run_qa_db(
File "/Users/slava/Documents/Development/private/AI/h2ogpt/src/gpt_langchain.py", line 5111, in _run_qa_db
get_chain(**sim_kwargs)
File "/Users/slava/Documents/Development/private/AI/h2ogpt/src/gpt_langchain.py", line 6562, in get_chain
docs_with_score, max_doc_tokens = split_merge_docs(docs_with_score,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/slava/Documents/Development/private/AI/h2ogpt/src/gpt_langchain.py", line 5459, in split_merge_docs
text_splitter = H2OCharacterTextSplitter.from_huggingface_tokenizer(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/slava/Documents/Development/private/AI/h2ogpt/src/gpt_langchain.py", line 5433, in from_huggingface_tokenizer
return cls(length_function=_huggingface_tokenizer_length, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/langchain/text_splitter.py", line 857, in __init__
super().__init__(keep_separator=keep_separator, **kwargs)
File "/Users/slava/.pyenv/versions/3.11.3/lib/python3.11/site-packages/langchain/text_splitter.py", line 122, in __init__
raise ValueError(
ValueError: Got a larger chunk overlap (0) than chunk size (-160), should be smaller.
Please advise.
Thanks
The text was updated successfully, but these errors were encountered:
Thanks for reporting! I think it should be fixed, but I assume you had many chunks in this case. Should have been maybe 1000 chunks, which is probably unusual.
Hi,
![image](https://private-user-images.githubusercontent.com/511540/302680573-e849467d-473b-4434-bbb4-7965f488d04c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk5MzQ1NTQsIm5iZiI6MTcxOTkzNDI1NCwicGF0aCI6Ii81MTE1NDAvMzAyNjgwNTczLWU4NDk0NjdkLTQ3M2ItNDQzNC1iYmI0LTc5NjVmNDg4ZDA0Yy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzAyJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwMlQxNTMwNTRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zODg3MzliZGQ5ZmUxNDVhMGZiZGYzYzdjYWUwMzI3NzQ2ZDRmZTdjMzIyZjkyMWNkZTVhNDY0ZjQ1NjU2NzYxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.erSkhPZMi6gp3Ig0xkqcJY1MoM8aqH1rHOMPACZVuRc)
I'm trying to summarize text, I have collection with documents (5GB). The text is relatively short, just few sentences and I'm getting this error :
Please advise.
Thanks
The text was updated successfully, but these errors were encountered: