LLaMa.cpp broken right now #24

dany-on-demand · 2023-04-21T23:32:59Z

Using vicuna-13B
Requested tokens exceed context window of 2000

(textgen) C:\Projects\generative\llm\Agent-LLM>python app.py
 * Serving Flask app 'app'
 * Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 242-200-888
127.0.0.1 - - [22/Apr/2023 01:31:06] "GET /api/docs/ HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:07] "GET /api/docs/ HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:07] "GET /api/get_agents HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:07] "GET /api/task/status/CatJokeFinder HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:08] "GET /api/task/status/CatJokeFinder HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:08] "GET /api/get_commands/CatJokeFinder HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:09] "GET /api/get_commands/CatJokeFinder HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:10] "OPTIONS /api/task/start/CatJokeFinder HTTP/1.1" 200 -
Using embedded DuckDB with persistence: data will be stored in: agents/default/memories
llama.cpp: loading model from C:/Projects/generative/llm/llama.cpp/models/vicuna/1.1TheBloke/ggml-vicuna-13b-1.1-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2000
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 4 (mostly Q4_1, some F16)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  73.73 KB
llama_model_load_internal: mem required  = 11749.65 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size  = 1562.50 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
127.0.0.1 - - [22/Apr/2023 01:31:13] "POST /api/task/start/CatJokeFinder HTTP/1.1" 200 -

*****COMMANDS*****

[{'friendly_name': 'Read Audio from File', 'name': 'read_audio_from_file', 'args': {'audio_path': None}, 'enabled': True}, {'friendly_name': 'Read Audio', 'name': 'read_audio', 'args': {'audio': None}, 'enabled': True}, {'friendly_name': 'Evaluate Code', 'name': 'evaluate_code', 'args': {'code': None}, 'enabled': True}, {'friendly_name': 'Analyze Pull Request', 'name': 'analyze_pull_request', 'args': {'pr_url': None}, 'enabled': True}, {'friendly_name': 'Perform Automated Testing', 'name': 'perform_automated_testing', 'args': {'test_url': None}, 'enabled': True}, {'friendly_name': 'Run CI-CD Pipeline', 'name': 'run_ci_cd_pipeline', 'args': {'repo_url': None}, 'enabled': True}, {'friendly_name': 'Improve Code', 'name': 'improve_code', 'args': {'suggestions': None, 'code': None}, 'enabled': True}, {'friendly_name': 'Write Tests', 'name': 'write_tests', 'args': {'code': None, 'focus': None}, 'enabled': True}, {'friendly_name': 'Create a new command', 'name': 'create_command', 'args': {'function_description': None}, 'enabled': True}, {'friendly_name': 'Execute Python File', 'name': 'execute_python_file', 'args': {'file': None}, 'enabled': True}, {'friendly_name': 'Execute Shell', 'name': 'execute_shell', 'args': {'command_line': None}, 'enabled': True}, {'friendly_name': 'Check Duplicate Operation', 'name': 'check_duplicate_operation', 'args': {'operation': None, 'filename': None}, 'enabled': True}, {'friendly_name': 'Log Operation', 'name': 'log_operation', 'args': {'operation': None, 'filename': None}, 'enabled': True}, {'friendly_name': 'Read File', 'name': 'read_file', 'args': {'filename': None}, 'enabled': True}, {'friendly_name': 'Ingest File', 'name': 'ingest_file', 'args': {'filename': None, 'memory': None, 'max_length': 4000, 'overlap': 200}, 'enabled': True}, {'friendly_name': 'Write to File', 'name': 'write_to_file', 'args': {'filename': None, 'text': None}, 'enabled': True}, {'friendly_name': 'Append to File', 'name': 'append_to_file', 'args': {'filename': None, 'text': None}, 'enabled': True}, {'friendly_name': 'Delete File', 'name': 'delete_file', 'args': {'filename': None}, 'enabled': True}, {'friendly_name': 'Search Files', 'name': 'search_files', 'args': {'directory': None}, 'enabled': True}, {'friendly_name': 'Google Search', 'name': 'google_search', 'args': {'query': None, 'num_results': 8}, 'enabled': True}, {'friendly_name': 'Google Official Search', 'name': 'google_official_search', 'args': {'query': None, 'num_results': 8}, 'enabled': True}, {'friendly_name': 'Generate Image', 'name': 'generate_image', 'args': {'prompt': None}, 'enabled': True}, {'friendly_name': 'Get Datetime', 'name': 'get_datetime', 'args': {}, 'enabled': True}, {'friendly_name': 'Send Tweet', 'name': 'send_tweet', 'args': {}, 'enabled': True}, {'friendly_name': 'Speak with TTS', 'name': 'speak', 'args': {'text': None, 'engine': 'gtts', 'voice_index': 0}, 'enabled': True}, {'friendly_name': 'Scrape Text with Playwright', 'name': 'scrape_text', 'args': {'url': None}, 'enabled': True}, {'friendly_name': 'Scrape Links with Playwright', 'name': 'scrape_links', 'args': {'url': None}, 'enabled': True}, {'friendly_name': 'Is Valid URL', 'name': 'is_valid_url', 'args': {'url': None}, 'enabled': True}, {'friendly_name': 'Sanitize URL', 'name': 'sanitize_url', 'args': {'url': None}, 'enabled': True}, {'friendly_name': 'Check Local File Access', 'name': 'check_local_file_access', 'args': {'url': None}, 'enabled': True}, {'friendly_name': 'Get Response', 'name': 'get_response', 'args': {'url': None, 'timeout': 10}, 'enabled': True}, {'friendly_name': 'Scrape Text', 'name': 'scrape_text', 'args': {'url': None}, 'enabled': True}, {'friendly_name': 'Scrape Links', 'name': 'scrape_links', 'args': {'url': None}, 'enabled': True}, {'friendly_name': 'Create Message', 'name': 'create_message', 'args': {'chunk': None, 'question': None}, 'enabled': True}, {'friendly_name': 'Browse Website', 'name': 'browse_website', 'args': {'url': None, 'question': None}, 'enabled': True}]

*****PROMPT*****

You are an AI who performs one task based on the following objective: Collect cat jokes from the internet and save them to a csv file called catjokes.csv.
Your role is to do anything asked of you with precision. You have the following constraints:
1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.
2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.
3. No user assistance.
4. Exclusively use the commands listed in double quotes e.g. "command name".

Take into account these previously completed tasks: None.
Your task: Develop an initial task list.

You have the following commands available to complete this task.
Read Audio from File - read_audio_from_file({'audio_path': None})
Read Audio - read_audio({'audio': None})
Evaluate Code - evaluate_code({'code': None})
Analyze Pull Request - analyze_pull_request({'pr_url': None})
Perform Automated Testing - perform_automated_testing({'test_url': None})
Run CI-CD Pipeline - run_ci_cd_pipeline({'repo_url': None})
Improve Code - improve_code({'suggestions': None, 'code': None})
Write Tests - write_tests({'code': None, 'focus': None})
Create a new command - create_command({'function_description': None})
Execute Python File - execute_python_file({'file': None})
Execute Shell - execute_shell({'command_line': None})
Check Duplicate Operation - check_duplicate_operation({'operation': None, 'filename': None})
Log Operation - log_operation({'operation': None, 'filename': None})
Read File - read_file({'filename': None})
Ingest File - ingest_file({'filename': None, 'memory': None, 'max_length': 4000, 'overlap': 200})
Write to File - write_to_file({'filename': None, 'text': None})
Append to File - append_to_file({'filename': None, 'text': None})
Delete File - delete_file({'filename': None})
Search Files - search_files({'directory': None})
Google Search - google_search({'query': None, 'num_results': 8})
Google Official Search - google_official_search({'query': None, 'num_results': 8})
Generate Image - generate_image({'prompt': None})
Get Datetime - get_datetime({})
Send Tweet - send_tweet({})
Speak with TTS - speak({'text': None, 'engine': 'gtts', 'voice_index': 0})
Scrape Text with Playwright - scrape_text({'url': None})
Scrape Links with Playwright - scrape_links({'url': None})
Is Valid URL - is_valid_url({'url': None})
Sanitize URL - sanitize_url({'url': None})
Check Local File Access - check_local_file_access({'url': None})
Get Response - get_response({'url': None, 'timeout': 10})
Scrape Text - scrape_text({'url': None})
Scrape Links - scrape_links({'url': None})
Create Message - create_message({'chunk': None, 'question': None})
Browse Website - browse_website({'url': None, 'question': None})

FORMAT RESPONSES IN THE FOLLOWING FORMAT:

THOUGHTS: Your thoughts on completing the task.

REASONING: The reasoning behind your responses.

PLAN: Your plan for achieving the task.

CRITICISM: Your critism of the thoughts, reasoning, and plan.

COMMANDS: If you choose to use any commands, list them and their inputs where necessary.  List the commands in the order that they need to be executed with the format being command_name(args). Do not explain, just list the command_name(args).

Response:
Exception in thread Thread-17 (run_task):
Traceback (most recent call last):
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Projects\generative\llm\Agent-LLM\AgentLLM.py", line 256, in run_task
    task = self.execute_next_task()
  File "C:\Projects\generative\llm\Agent-LLM\AgentLLM.py", line 229, in execute_next_task
    self.response = self.execution_agent(self.primary_objective, this_task_name, this_task_id)
  File "C:\Projects\generative\llm\Agent-LLM\AgentLLM.py", line 196, in execution_agent
    self.response = self.run(prompt)
  File "C:\Projects\generative\llm\Agent-LLM\AgentLLM.py", line 73, in run
    self.response = self.instruct(prompt)
  File "C:\Projects\generative\llm\Agent-LLM\provider\llamacpp.py", line 16, in instruct
    output = self.llamacpp(f"Q: {prompt}", max_tokens=self.max_tokens, stop=["Q:", "\n"], echo=True)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\llama_cpp\llama.py", line 681, in __call__
    return self.create_completion(
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\llama_cpp\llama.py", line 642, in create_completion
    completion: Completion = next(completion_or_chunks)  # type: ignore
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\llama_cpp\llama.py", line 406, in _create_completion
    raise ValueError(
ValueError: Requested tokens exceed context window of 2000
127.0.0.1 - - [22/Apr/2023 01:31:13] "GET /api/task/status/CatJokeFinder HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:13] "GET /api/get_agents HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 01:31:16] "GET /api/task/output/CatJokeFinder HTTP/1.1" 200 -

The text was updated successfully, but these errors were encountered:

Josh-XT · 2023-04-22T03:42:46Z

Can you try with the latest version?

dany-on-demand · 2023-04-22T07:23:31Z

Had to do pip install -r requirements.txt and npm i npm run build in /frontend
New error:

 * Serving Flask app 'app'
 * Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with watchdog (windowsapi)
 * Debugger is active!
 * Debugger PIN: 242-200-888
127.0.0.1 - - [22/Apr/2023 09:21:14] "GET /api/agent HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 09:21:14] "GET /api/agent/CatJokeFinder HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 09:21:14] "OPTIONS /api/agent/CatJokeFinder/instruct HTTP/1.1" 200 -
127.0.0.1 - - [22/Apr/2023 09:21:15] "GET /api/agent/CatJokeFinder/command HTTP/1.1" 200 -
Using embedded DuckDB with persistence: data will be stored in: agents/default/memories
llama_model_load: loading model from 'C:/Projects/generative/llm/llama.cpp/models/vicuna/1.1TheBloke/ggml-vicuna-13b-1.1-q4_1.bin' - please wait ...
llama_model_load: GPTQ model detected - are you sure n_parts should be 2? we normally expect it to be 1
llama_model_load: use '--n_parts 1' if necessary
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 2000
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 4
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: type    = 2
llama_model_load: ggml map size = 9702.02 MB
llama_model_load: ggml ctx size = 101.25 KB
llama_model_load: mem required  = 11750.12 MB (+ 3216.00 MB per state)
llama_model_load: loading tensors from 'C:/Projects/generative/llm/llama.cpp/models/vicuna/1.1TheBloke/ggml-vicuna-13b-1.1-q4_1.bin'
llama_model_load: model size =  9701.58 MB / num tensors = 363
llama_init_from_file: kv self size  = 3125.00 MB
llama_generate: seed = 1682148077

system_info: n_threads = 8 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 2000, n_batch = 8, n_predict = 55, n_keep = 0


127.0.0.1 - - [22/Apr/2023 09:21:17] "POST /api/agent/CatJokeFinder/instruct HTTP/1.1" 500 -
[2023-04-22 09:21:17,807] {_internal.py:224} INFO - 127.0.0.1 - - [22/Apr/2023 09:21:17] "POST /api/agent/CatJokeFinder/instruct HTTP/1.1" 500 -
Traceback (most recent call last):
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask\app.py", line 2551, in __call__
    return self.wsgi_app(environ, start_response)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask\app.py", line 2531, in wsgi_app
    response = self.handle_exception(e)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask_restful\__init__.py", line 271, in error_router
    return original_handler(e)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask_cors\extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask\app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask_restful\__init__.py", line 271, in error_router
    return original_handler(e)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask_cors\extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask\app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask_restful\__init__.py", line 467, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask\views.py", line 107, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\flask_restful\__init__.py", line 582, in dispatch_request
    resp = meth(*args, **kwargs)
  File "C:\Projects\generative\llm\Agent-LLM\app.py", line 73, in post
    response = agent.run(objective, max_context_tokens=500, long_term_access=False)
  File "C:\Projects\generative\llm\Agent-LLM\AgentLLM.py", line 73, in run
    self.response = self.instruct(prompt)
  File "C:\Projects\generative\llm\Agent-LLM\provider\llamacpp.py", line 20, in instruct
    output = self.model.generate(f"Q: {prompt}", n_predict=55, new_text_callback=self.new_text_callback, n_threads=8)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\pyllamacpp\model.py", line 112, in generate
    pp.llama_generate(self._ctx, self.gpt_params, self._call_new_text_callback, verbose)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\pyllamacpp\model.py", line 83, in _call_new_text_callback
    Model._new_text_callback(text)
TypeError: AIProvider.new_text_callback() takes 1 positional argument but 2 were given

Fix Josh-XT#24

eraviart added a commit to eraviart/Agent-LLM that referenced this issue Apr 23, 2023

Add missing self to llamacpp new_text_callback

b045ac5

Fix Josh-XT#24

Josh-XT closed this as completed in fe321f1 Apr 23, 2023

vRobM mentioned this issue Apr 24, 2023

Use offline private LLMs (Llama, Alpaca, Vicuna*, etc.) reworkd/AgentGPT#322

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaMa.cpp broken right now #24

LLaMa.cpp broken right now #24

dany-on-demand commented Apr 21, 2023

Josh-XT commented Apr 22, 2023

dany-on-demand commented Apr 22, 2023

LLaMa.cpp broken right now #24

LLaMa.cpp broken right now #24

Comments

dany-on-demand commented Apr 21, 2023

Josh-XT commented Apr 22, 2023

dany-on-demand commented Apr 22, 2023