Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLaMa.cpp broken right now #24

Closed
dany-on-demand opened this issue Apr 21, 2023 · 2 comments
Closed

LLaMa.cpp broken right now #24

dany-on-demand opened this issue Apr 21, 2023 · 2 comments
Labels
closed | done Fixed or otherwise implemented.

Comments

@dany-on-demand
Copy link
Contributor

Using vicuna-13B
Requested tokens exceed context window of 2000

(textgen) C:\Projects\generative\llm\Agent-LLM>python app.py
 * Serving Flask app 'app'
 * Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 242-200-888
127.0.0.1 - - [22/Apr/2023 01:31:06] "GET /api/docs/ HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:07] "GET /api/docs/ HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:07] "GET /api/get_agents HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:07] "GET /api/task/status/CatJokeFinder HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:08] "GET /api/task/status/CatJokeFinder HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:08] "GET /api/get_commands/CatJokeFinder HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:09] "GET /api/get_commands/CatJokeFinder HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:10] "OPTIONS /api/task/start/CatJokeFinder HTTP/1.1" 200 -
Using embedded DuckDB with persistence: data will be stored in: agents/default/memories
llama.cpp: loading model from C:/Projects/generative/llm/llama.cpp/models/vicuna/1.1TheBloke/ggml-vicuna-13b-1.1-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2000
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 4 (mostly Q4_1, some F16)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  73.73 KB
llama_model_load_internal: mem required  = 11749.65 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size  = 1562.50 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
127.0.0.1 - - [22/Apr/2023 01:31:13] "POST /api/task/start/CatJokeFinder HTTP/1.1" 200 -

*****COMMANDS*****

[{'friendly_name': 'Read Audio from File', 'name': 'read_audio_from_file', 'args': {'audio_path': None}, 'enabled': True}, {'friendly_name': 'Read Audio', 'name': 'read_audio', 'args': {'audio': None}, 'enabled': True}, {'friendly_name': 'Evaluate Code', 'name': 'evaluate_code', 'args': {'code': None}, 'enabled': True}, {'friendly_name': 'Analyze Pull Request', 'name': 'analyze_pull_request', 'args': {'pr_url': None}, 'enabled': True}, {'friendly_name': 'Perform Automated Testing', 'name': 'perform_automated_testing', 'args': {'test_url': None}, 'enabled': True}, {'friendly_name': 'Run CI-CD Pipeline', 'name': 'run_ci_cd_pipeline', 'args': {'repo_url': None}, 'enabled': True}, {'friendly_name': 'Improve Code', 'name': 'improve_code', 'args': {'suggestions': None, 'code': None}, 'enabled': True}, {'friendly_name': 'Write Tests', 'name': 'write_tests', 'args': {'code': None, 'focus': None}, 'enabled': True}, {'friendly_name': 'Create a new command', 'name': 'create_command', 'args': {'function_description': None}, 'enabled': True}, {'friendly_name': 'Execute Python File', 'name': 'execute_python_file', 'args': {'file': None}, 'enabled': True}, {'friendly_name': 'Execute Shell', 'name': 'execute_shell', 'args': {'command_line': None}, 'enabled': True}, {'friendly_name': 'Check Duplicate Operation', 'name': 'check_duplicate_operation', 'args': {'operation': None, 'filename': None}, 'enabled': True}, {'friendly_name': 'Log Operation', 'name': 'log_operation', 'args': {'operation': None, 'filename': None}, 'enabled': True}, {'friendly_name': 'Read File', 'name': 'read_file', 'args': {'filename': None}, 'enabled': True}, {'friendly_name': 'Ingest File', 'name': 'ingest_file', 'args': {'filename': None, 'memory': None, 'max_length': 4000, 'overlap': 200}, 'enabled': True}, {'friendly_name': 'Write to File', 'name': 'write_to_file', 'args': {'filename': None, 'text': None}, 'enabled': True}, {'friendly_name': 'Append to File', 'name': 'append_to_file', 'args': {'filename': None, 'text': None}, 'enabled': True}, {'friendly_name': 'Delete File', 'name': 'delete_file', 'args': {'filename': None}, 'enabled': True}, {'friendly_name': 'Search Files', 'name': 'search_files', 'args': {'directory': None}, 'enabled': True}, {'friendly_name': 'Google Search', 'name': 'google_search', 'args': {'query': None, 'num_results': 8}, 'enabled': True}, {'friendly_name': 'Google Official Search', 'name': 'google_official_search', 'args': {'query': None, 'num_results': 8}, 'enabled': True}, {'friendly_name': 'Generate Image', 'name': 'generate_image', 'args': {'prompt': None}, 'enabled': True}, {'friendly_name': 'Get Datetime', 'name': 'get_datetime', 'args': {}, 'enabled': True}, {'friendly_name': 'Send Tweet', 'name': 'send_tweet', 'args': {}, 'enabled': True}, {'friendly_name': 'Speak with TTS', 'name': 'speak', 'args': {'text': None, 'engine': 'gtts', 'voice_index': 0}, 'enabled': True}, {'friendly_name': 'Scrape Text with Playwright', 'name': 'scrape_text', 'args': {'url': None}, 'enabled': True}, {'friendly_name': 'Scrape Links with Playwright', 'name': 'scrape_links', 'args': {'url': None}, 'enabled': True}, {'friendly_name': 'Is Valid URL', 'name': 'is_valid_url', 'args': {'url': None}, 'enabled': True}, {'friendly_name': 'Sanitize URL', 'name': 'sanitize_url', 'args': {'url': None}, 'enabled': True}, {'friendly_name': 'Check Local File Access', 'name': 'check_local_file_access', 'args': {'url': None}, 'enabled': True}, {'friendly_name': 'Get Response', 'name': 'get_response', 'args': {'url': None, 'timeout': 10}, 'enabled': True}, {'friendly_name': 'Scrape Text', 'name': 'scrape_text', 'args': {'url': None}, 'enabled': True}, {'friendly_name': 'Scrape Links', 'name': 'scrape_links', 'args': {'url': None}, 'enabled': True}, {'friendly_name': 'Create Message', 'name': 'create_message', 'args': {'chunk': None, 'question': None}, 'enabled': True}, {'friendly_name': 'Browse Website', 'name': 'browse_website', 'args': {'url': None, 'question': None}, 'enabled': True}]

*****PROMPT*****

You are an AI who performs one task based on the following objective: Collect cat jokes from the internet and save them to a csv file called catjokes.csv.
Your role is to do anything asked of you with precision. You have the following constraints:
1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.
2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.
3. No user assistance.
4. Exclusively use the commands listed in double quotes e.g. "command name".

Take into account these previously completed tasks: None.
Your task: Develop an initial task list.

You have the following commands available to complete this task.
Read Audio from File - read_audio_from_file({'audio_path': None})
Read Audio - read_audio({'audio': None})
Evaluate Code - evaluate_code({'code': None})
Analyze Pull Request - analyze_pull_request({'pr_url': None})
Perform Automated Testing - perform_automated_testing({'test_url': None})
Run CI-CD Pipeline - run_ci_cd_pipeline({'repo_url': None})
Improve Code - improve_code({'suggestions': None, 'code': None})
Write Tests - write_tests({'code': None, 'focus': None})
Create a new command - create_command({'function_description': None})
Execute Python File - execute_python_file({'file': None})
Execute Shell - execute_shell({'command_line': None})
Check Duplicate Operation - check_duplicate_operation({'operation': None, 'filename': None})
Log Operation - log_operation({'operation': None, 'filename': None})
Read File - read_file({'filename': None})
Ingest File - ingest_file({'filename': None, 'memory': None, 'max_length': 4000, 'overlap': 200})
Write to File - write_to_file({'filename': None, 'text': None})
Append to File - append_to_file({'filename': None, 'text': None})
Delete File - delete_file({'filename': None})
Search Files - search_files({'directory': None})
Google Search - google_search({'query': None, 'num_results': 8})
Google Official Search - google_official_search({'query': None, 'num_results': 8})
Generate Image - generate_image({'prompt': None})
Get Datetime - get_datetime({})
Send Tweet - send_tweet({})
Speak with TTS - speak({'text': None, 'engine': 'gtts', 'voice_index': 0})
Scrape Text with Playwright - scrape_text({'url': None})
Scrape Links with Playwright - scrape_links({'url': None})
Is Valid URL - is_valid_url({'url': None})
Sanitize URL - sanitize_url({'url': None})
Check Local File Access - check_local_file_access({'url': None})
Get Response - get_response({'url': None, 'timeout': 10})
Scrape Text - scrape_text({'url': None})
Scrape Links - scrape_links({'url': None})
Create Message - create_message({'chunk': None, 'question': None})
Browse Website - browse_website({'url': None, 'question': None})

FORMAT RESPONSES IN THE FOLLOWING FORMAT:

THOUGHTS: Your thoughts on completing the task.

REASONING: The reasoning behind your responses.

PLAN: Your plan for achieving the task.

CRITICISM: Your critism of the thoughts, reasoning, and plan.

COMMANDS: If you choose to use any commands, list them and their inputs where necessary.  List the commands in the order that they need to be executed with the format being command_name(args). Do not explain, just list the command_name(args).

Response:
Exception in thread Thread-17 (run_task):
Traceback (most recent call last):
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Projects\generative\llm\Agent-LLM\AgentLLM.py", line 256, in run_task
    task = self.execute_next_task()
  File "C:\Projects\generative\llm\Agent-LLM\AgentLLM.py", line 229, in execute_next_task
    self.response = self.execution_agent(self.primary_objective, this_task_name, this_task_id)
  File "C:\Projects\generative\llm\Agent-LLM\AgentLLM.py", line 196, in execution_agent
    self.response = self.run(prompt)
  File "C:\Projects\generative\llm\Agent-LLM\AgentLLM.py", line 73, in run
    self.response = self.instruct(prompt)
  File "C:\Projects\generative\llm\Agent-LLM\provider\llamacpp.py", line 16, in instruct
    output = self.llamacpp(f"Q: {prompt}", max_tokens=self.max_tokens, stop=["Q:", "\n"], echo=True)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\llama_cpp\llama.py", line 681, in __call__
    return self.create_completion(
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\llama_cpp\llama.py", line 642, in create_completion
    completion: Completion = next(completion_or_chunks)  # type: ignore
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\llama_cpp\llama.py", line 406, in _create_completion
    raise ValueError(
ValueError: Requested tokens exceed context window of 2000
127.0.0.1 - - [22/Apr/2023 01:31:13] "GET /api/task/status/CatJokeFinder HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:13] "GET /api/get_agents HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:16] "GET /api/task/output/CatJokeFinder HTTP/1.1" 200 -
@Josh-XT
Copy link
Owner

Josh-XT commented Apr 22, 2023

Can you try with the latest version?

@Josh-XT Josh-XT added type | report | bug Confirmed bug in source code. reply needed | waiting for response Waiting for more info from the creator of the issue. If not responded to in a week, may be closed. reply needed | please retest Waiting for a retest from the creator of the issue. If not responded to in a week, may be closed. labels Apr 22, 2023
@dany-on-demand
Copy link
Contributor Author

Had to do pip install -r requirements.txt and npm i npm run build in /frontend
New error:

 * Serving Flask app 'app'
 * Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with watchdog (windowsapi)
 * Debugger is active!
 * Debugger PIN: 242-200-888
127.0.0.1 - - [22/Apr/2023 09:21:14] "GET /api/agent HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 09:21:14] "GET /api/agent/CatJokeFinder HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 09:21:14] "OPTIONS /api/agent/CatJokeFinder/instruct HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 09:21:15] "GET /api/agent/CatJokeFinder/command HTTP/1.1" 200 -
Using embedded DuckDB with persistence: data will be stored in: agents/default/memories
llama_model_load: loading model from 'C:/Projects/generative/llm/llama.cpp/models/vicuna/1.1TheBloke/ggml-vicuna-13b-1.1-q4_1.bin' - please wait ...
llama_model_load: GPTQ model detected - are you sure n_parts should be 2? we normally expect it to be 1
llama_model_load: use '--n_parts 1' if necessary
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 2000
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 4
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: type    = 2
llama_model_load: ggml map size = 9702.02 MB
llama_model_load: ggml ctx size = 101.25 KB
llama_model_load: mem required  = 11750.12 MB (+ 3216.00 MB per state)
llama_model_load: loading tensors from 'C:/Projects/generative/llm/llama.cpp/models/vicuna/1.1TheBloke/ggml-vicuna-13b-1.1-q4_1.bin'
llama_model_load: model size =  9701.58 MB / num tensors = 363
llama_init_from_file: kv self size  = 3125.00 MB
llama_generate: seed = 1682148077

system_info: n_threads = 8 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 2000, n_batch = 8, n_predict = 55, n_keep = 0


127.0.0.1 - - [22/Apr/2023 09:21:17] "POST /api/agent/CatJokeFinder/instruct HTTP/1.1" 500 -
[2023-04-22 09:21:17,807] {_internal.py:224} INFO - 127.0.0.1 - - [22/Apr/2023 09:21:17] "POST /api/agent/CatJokeFinder/instruct HTTP/1.1" 500 -
Traceback (most recent call last):
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask\app.py", line 2551, in __call__
    return self.wsgi_app(environ, start_response)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask\app.py", line 2531, in wsgi_app
    response = self.handle_exception(e)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask_restful\__init__.py", line 271, in error_router
    return original_handler(e)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask_cors\extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask\app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask_restful\__init__.py", line 271, in error_router
    return original_handler(e)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask_cors\extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask\app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask_restful\__init__.py", line 467, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask\views.py", line 107, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask_restful\__init__.py", line 582, in dispatch_request
    resp = meth(*args, **kwargs)
  File "C:\Projects\generative\llm\Agent-LLM\app.py", line 73, in post
    response = agent.run(objective, max_context_tokens=500, long_term_access=False)
  File "C:\Projects\generative\llm\Agent-LLM\AgentLLM.py", line 73, in run
    self.response = self.instruct(prompt)
  File "C:\Projects\generative\llm\Agent-LLM\provider\llamacpp.py", line 20, in instruct
    output = self.model.generate(f"Q: {prompt}", n_predict=55, new_text_callback=self.new_text_callback, n_threads=8)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\pyllamacpp\model.py", line 112, in generate
    pp.llama_generate(self._ctx, self.gpt_params, self._call_new_text_callback, verbose)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\pyllamacpp\model.py", line 83, in _call_new_text_callback
    Model._new_text_callback(text)
TypeError: AIProvider.new_text_callback() takes 1 positional argument but 2 were given

eraviart added a commit to eraviart/Agent-LLM that referenced this issue Apr 23, 2023
@Josh-XT Josh-XT added closed | done Fixed or otherwise implemented. and removed type | report | bug Confirmed bug in source code. reply needed | waiting for response Waiting for more info from the creator of the issue. If not responded to in a week, may be closed. reply needed | please retest Waiting for a retest from the creator of the issue. If not responded to in a week, may be closed. labels Apr 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed | done Fixed or otherwise implemented.
Projects
None yet
Development

No branches or pull requests

2 participants