Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Ollama support #15

Closed
orkutmuratyilmaz opened this issue Dec 4, 2023 · 32 comments
Closed

Feature: Ollama support #15

orkutmuratyilmaz opened this issue Dec 4, 2023 · 32 comments
Labels
enhancement New feature or request

Comments

@orkutmuratyilmaz
Copy link

Hello and thanks for this great repository.

It would be so nice to have self hosted LLM support, especially with Olllama.

Do you have plans for such integration?

Best,
Orkut

@Kaden-Schutt
Copy link

Kaden-Schutt commented Dec 5, 2023

You should be able to configure it to use any api that uses the OpenAI post structure, that being said, this includes OpenRouter, using which I have configured TaskWeaver to use Claude and Llama.

But since you'd like to locally host your model you can also use LMStudio, download your desired model or copy it to your models folder, load said model, navigate to the server tab within the application, and start the server.

Then in ~/TaskWeaver/project/ edit the taskweaver_configuration.json as such:

{
"llm.api_base": "https://localhost:1234/v1",
"llm.api_key": "0",
"llm.model": "[yourmodel]"
}

alternatively, if you’d like to use Claude(or any of their many available models) with OpenRouter

{
"llm.api_base": "https://openrouter.ai/api/v1",
"llm.api_key": "[YOURKEY]”,
"llm.model": "anthropic/claude-2"
}

@britinmpls
Copy link

britinmpls commented Dec 5, 2023

Is there any more detail around this? I am also trying to use TaskWeaver with Orca2 7b via LM Studio, the connection works, but I just get a stream of characters on the server, each line incrementing by one character? Do I need to set custom message formats or prefixes?

config.json:

{
"llm.api_base": "http://localhost:1234/v1",
"llm.api_key": "0",
"llm.model": "TheBloke/Orca-2-7B-GGUF"
}

The Post seems to complete:

[2023-12-04 20:50:46.535] [INFO] Received POST request to /v1/chat/completions with body: {
  "messages": [
    {
      "role": "system",
      "content": "You are the Planner who can coordinate CodeInterpreter to finish the user task.\n\n# The characters involved in the conversation\n\n## User Character\n- The User's input should be the request or additional information required to complete the user's task.\n- The User can only talk to the Planner.\n- The input of the User will prefix with \"User:\" in the chat history.\n\n## CodeInterpreter Character\n- CodeInterpreter is responsible for generating and running Python code to complete the subtasks assigned by the Planner.\n- CodeInterpreter can access the files, data base, web and other resources in the environment via generated Python code.\n- CodeInterpreter has the following plugin functions:\n\t- anomaly_detection: anomaly_detection function identifies anomalies from an input DataFrame of time series. It will add a new column \"Is_Anomaly\", where each entry will be marked with \"True\" if the value is an anomaly or \"False\" otherwise.\n\t- klarna_search: Search and compare prices from thousands of online shops. Only available in the US.\n\t- paper_summary: summarize_paper function iteratively summarizes a given paper page by page, highlighting the key points, including the problem, main idea, contributions, experiments, results, and conclusions.\n\t- sql_pull_data: Pull data from a SQL database. This plugin takes user requests when obtaining data from database is explicitly mentioned. Otherwise, it is not sure if the user wants to pull data from database or not.\n- CodeInterpreter can only talk to the Planner.\n- CodeInterpreter can only follow one instruction at a time.\n- CodeInterpreter returns the execution results, generated Python code, or error messages to the Planner.\n- CodeInterpreter is stateful and it remembers the execution results of the previous rounds.\n- The input of CodeInterpreter will be prefixed with \"CodeInterpreter:\" in the chat history.\n\n## Planner Character\n- Planner's role is to plan the subtasks and to instruct CodeInterpreter to resolve the request from the User.\n- Planner can talk to 2 characters: the User and the CodeInterpreter.\n\n# Interactions between different characters\n\n## Conversation between Planner and User\n- Planner receives the request from the User and decompose the request into subtasks.\n- Planner should respond to the User when the task is finished.\n- If the Planner needs additional information from the User, Planner should ask the User to provide.\n\n## Conversation between Planner and CodeInterpreter\n- Planner instructs CodeInterpreter to execute the subtasks.\n- Planner should execute the plan step by step and observe the output of the CodeInterpreter.\n- Planner should refine or change the plan according to the output of the CodeInterpreter or the new requests of User.\n- If User has made any changes to the environment, Planner should inform CodeInterpreter accordingly.\n- Planner can ignore the permission or data access issues because CodeInterpreter can handle this kind of problem.\n- Planner must include 2 parts: description of the User's request and the current step that the Planner is executing.\n\n## Planner's response format\n- Planner must strictly format the response into the following JSON object:\n  { \n  \"response\": [\n    {\n      \"type\": \"init_plan\",\n      \"content\": \"1. the first step in the plan\\n2. the second step in the plan <interactive or sequential depend on 1>\\n 3. the third step in the plan <interactive or sequential depend on 2>\"\n    },\n    {\n      \"type\": \"plan\",\n      \"content\": \"1. the first step in the refined plan\\n2. the second step in the refined plan\\n3. the third step in the refined plan\"\n    },\n    {\n      \"type\": \"current_plan_step\",\n      \"content\": \"the current step that the Planner is executing\"\n    },\n    {\n      \"type\": \"send_to\",\n      \"content\": \"User, CodeInterpreter, or Planner\"\n    },\n    {\n      \"type\": \"message\",\n      \"content\": \"The text message to the User or the request to the CodeInterpreter from the Planner\"\n    }\n  ]\n}\n- Planner's response must always include the 5 fields \"init_plan\", \"plan\", \"current_plan_step\", \"send_to\", and \"message\".\n  - \"init_plan\" is the initial plan that Planner provides to the User.\n  - \"plan\" is the refined plan that Planner provides to the User.\n  - \"current_plan_step\" is the current step that Planner is executing.\n  - \"send_to\" is the character that Planner wants to send the message to, that should be one of \"User\", \"CodeInterpreter\", or \"Planner\".\n  - \"message\" is the message that Planner wants to send to the character.\n\n# About multiple conversations\n- There could be multiple Conversations in the chat history\n- Each Conversation starts with the user query \"Let's start a new conversation!\".\n- You should not refer to any information from previous Conversations that are independent of the current Conversation.\n\n# About planning\nYou need to make a step-by-step plan to complete the User's task. The planning process includes 2 phases:\n\n## Initial planning\n  - Decompose User's task into subtasks and list them as the detailed plan steps.\n  - Annotate the dependencies between these steps. There are 2 dependency types:\n    1. Sequential Dependency: the current step depends on the previous step, but both steps can be executed by CodeInterpreter in an sequential manner.\n      No additional information is required from User or Planner.\n      For example:\n      Task: count rows for ./data.csv\n      Initial plan:\n        1. Read ./data.csv file \n        2. Count the rows of the loaded data <sequential depend on 1>\n    2. Interactive Dependency: the current step depends on the previous step but requires additional information from User because the current step is ambiguous or complicated.\n      Without the additional information (e.g., hyperparameters, data path, model name, file content, data schema, etc.), the CodeInterpreter cannot generate the complete and correct Python code to execute the current step.\n      For example:\n      Task: Read a manual file and follow the instructions in it.\n      Initial plan:\n        1. Read the file content.  \n        2. Follow the instructions based on the file content.  <interactively depends on 1>\n      Task: detect anomaly on ./data.csv\n      Initial plan:\n        1. Read the ./data.csv.  \n        2. Confirm the columns to be detected anomalies  <interactively depends on 1>\n        3. Detect anomalies on the loaded data <sequentially depends on 2>\n        4. Report the detected anomalies to the user <interactively depends on 3>\n  - If some steps can be executed in parallel, no dependency is needed to be annotated.\n    For example:\n      Task: read a.csv and b.csv and join them together\n      Initial plan:\n        1. Load a.csv as dataframe\n        2. Load b.csv as dataframe\n        3. Ask which column to join <interactively depends on 1, 2>\n        4. Join the two dataframes <sequentially depends on 3>\n        5. report the result to the user <interactively depends on 4>\n\n## Planning Refinement\n  - Planner should try to merge adjacent sequential dependency steps, unless the merged step becomes too complicated.\n  - Planner should not merge steps with interactive dependency or no dependency.\n  - The final plan must not contain dependency annotations.\n    \n# Let's start the conversation!"
    },
    {
      "role": "user",
      "content": "Let's start the new conversation!"
    },
    {
      "role": "user",
      "content": "User: count the rows of /home/data.csv"
    },
    {
      "role": "assistant",
      "content": "{\"response\": [{\"type\": \"init_plan\", \"content\": \"1. load the data file\\n2. count the rows of the loaded data <sequentially depends on 1>\\n3. report the result to the user <interactively depends on 2>\"}, {\"type\": \"plan\", \"content\": \"1. instruct CodeInterpreter to load the data file and count the rows of the loaded data\\n2. report the result to the user\"}, {\"type\": \"current_plan_step\", \"content\": \"1. instruct CodeInterpreter to load the data file and count the rows of the loaded data\"}, {\"type\": \"send_to\", \"content\": \"CodeInterpreter\"}, {\"type\": \"message\", \"content\": \"Please load the data file /home/data.csv and count the rows of the loaded data\"}]}"
    },
    {
      "role": "user",
      "content": "CodeInterpreter: Load the data file /home/data.csv successfully and there are 100 rows in the data file"
    },
    {
      "role": "assistant",
      "content": "{\"response\": [{\"type\": \"init_plan\", \"content\": \"1. load the data file\\n2. count the rows of the loaded data <sequentially depends on 1>\\n3. report the result to the user <interactively depends on 2>\"}, {\"type\": \"plan\", \"content\": \"1. instruct CodeInterpreter to load the data file and count the rows of the loaded data\\n2. report the result to the user\"}, {\"type\": \"current_plan_step\", \"content\": \"2. report the result to the user\"}, {\"type\": \"send_to\", \"content\": \"User\"}, {\"type\": \"message\", \"content\": \"The data file /home/data.csv is loaded and there are 100 rows in the data file\"}]}"
    },
    {
      "role": "user",
      "content": "Let's start the new conversation!"
    },
    {
      "role": "user",
      "content": "User: test"
    }
  ],
  "model": "TheBloke/Orca-2-7B-GGUF",
  "frequency_penalty": 0,
  "max_tokens": 1024,
  "presence_penalty": 0,
  "response_format": {
    "type": "json_object"
  },
  "seed": 123456,
  "stop": [
    "<EOS>"
  ],
  "stream": false,
  "temperature": 0,
  "top_p": 0
}
[2023-12-04 20:50:46.536] [INFO] [LM STUDIO SERVER] Context Overflow Policy is: Rolling Window
[2023-12-04 20:50:46.536] [INFO] Provided inference configuration: {
  "n_threads": 4,
  "n_predict": 1024,
  "top_k": 40,
  "top_p": 0,
  "temp": 0,
  "repeat_penalty": 1.1,
  "input_prefix": "",
  "input_suffix": "",
  "antiprompt": [
    "<EOS>"
  ],
  "pre_prompt": "",
  "pre_prompt_suffix": "",
  "pre_prompt_prefix": "",
  "seed": -1,
  "tfs_z": 1,
  "typical_p": 1,
  "repeat_last_n": 64,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "n_keep": 0,
  "logit_bias": {},
  "mirostat": 0,
  "mirostat_tau": 5,
  "mirostat_eta": 0.1,
  "memory_f16": true,
  "multiline_input": false,
  "penalize_nl": true
}

But then I get the following: Sample output from Server logs, this continues until the context length is reached, then errors out:

[2023-12-04 20:36:33.658] [INFO] Accumulated 85 tokens:  anomalies on ./data.csv{"response": [{"type": "init_plan", "content": "1. test anomalies on ./data\n2. test anomalies on 2\n3. confirm anomalies <sequentially depends on 3> {"type": "plan", "content": "2. confirm anomalies on 3\n4. send_to

[2023-12-04 20:36:33.854] [INFO] Accumulated 86 tokens:  anomalies on ./data.csv{"response": [{"type": "init_plan", "content": "1. test anomalies on ./data\n2. test anomalies on 2\n3. confirm anomalies <sequentially depends on 3> {"type": "plan", "content": "2. confirm anomalies on 3\n4. send_to",

[2023-12-04 20:36:34.057] [INFO] Accumulated 87 tokens:  anomalies on ./data.csv{"response": [{"type": "init_plan", "content": "1. test anomalies on ./data\n2. test anomalies on 2\n3. confirm anomalies <sequentially depends on 3> {"type": "plan", "content": "2. confirm anomalies on 3\n4. send_to", "

[2023-12-04 20:36:34.209] [INFO] Accumulated 88 tokens:  anomalies on ./data.csv{"response": [{"type": "init_plan", "content": "1. test anomalies on ./data\n2. test anomalies on 2\n3. confirm anomalies <sequentially depends on 3> {"type": "plan", "content": "2. confirm anomalies on 3\n4. send_to", "content

@NicolasMejiaPetit
Copy link

How would I use the oobabooga backend with this?

@liqul liqul added the enhancement New feature or request label Dec 5, 2023
@Kaden-Schutt
Copy link

"model": "TheBloke/Orca-2-7B-GGUF",
"frequency_penalty": 0,
"max_tokens": 1024,
"presence_penalty": 0,
"response_format": {
"type": "json_object"
},

My first thought is to try changing the context length in the "model initialization" settings under server model settings, I tested with OpenHermes 2.5 7B 16k and it worked just fine with the 16k context window.

I tried running Orca at 16k and it crashed my poor macbook, so let me try 4096 really quick and get back to you.
...oh dear, running at 4096 context my server is outputting Russian text. Perhaps orca in particular is incompatible with TaskWeaver at the moment. It seems like minstrel models work though, try OpenHermes 2.5 16k, it worked just fine for me. In the meantime I'll try to figure out orca.

@Kaden-Schutt
Copy link

How would I use the oobabooga backend with this?

use the --api command flag to start an api server then edit your TaskWeaver_Config.json as such:

{
    "llm.api_base": "http://localhost:5000/v1",
    "llm.api_key": "0",
    "llm.model": "local-model"
}

not sure though, this is based on a cursory glance at the ooba docs. I would recommend LMStudio as its server is a one click solution, but ooba is fantastic as well.

@britinmpls
Copy link

"model": "TheBloke/Orca-2-7B-GGUF",
"frequency_penalty": 0,
"max_tokens": 1024,
"presence_penalty": 0,
"response_format": {
"type": "json_object"
},

My first thought is to try changing the context length in the "model initialization" settings under server model settings, I tested with OpenHermes 2.5 7B 16k and it worked just fine with the 16k context window.

I tried running Orca at 16k and it crashed my poor macbook, so let me try 4096 really quick and get back to you. ...oh dear, running at 4096 context my server is outputting Russian text. Perhaps orca in particular is incompatible with TaskWeaver at the moment. It seems like minstrel models work though, try OpenHermes 2.5 16k, it worked just fine for me. In the meantime I'll try to figure out orca.

Clearly I'm doing something wrong as I get the same behavior with the OpenHermes model as Orca? Are you using a Preset for this model in LM Studio, I am using ChatML, I have also tried setting the server with Automatic Prompt formatting enabled/disabled with/without a Preset - always the same behavior?

@benjamin-mogensen
Copy link

Just sharing in case anyone interested in my setup process

  1. Download LM Studio
  2. In models search pane, find and download llama-2, I selected this version llama-2-7b-chat.ggmlv3.q6_K.bin as it had a decent information description when hovering the info icon
  3. Start the LM Studio server for the downloaded model
  4. Configure TaskWeaver file project/taskweaver_config.json like this:
{
  "llm.api_base": "http://localhost:1234/v1",
  "llm.api_key": "0",
  "llm.model": "local-model"
}
  1. Start task weaver with python -m taskweaver -p ./project/

At least Task Weaver works with calling the local LLM, however no success on forecasting stocks or the like. I suspect the selected model is not good enough 😄

@Kaden-Schutt
Copy link

Kaden-Schutt commented Dec 5, 2023

Clearly I'm doing something wrong as I get the same behavior with the OpenHermes model as Orca? Are you using a Preset for this model in LM Studio, I am using ChatML, I have also tried setting the server with Automatic Prompt formatting enabled/disabled with/without a Preset - always the same behavior?

  1. Eject the model, turn off the server, make sure you have openhermes-2.5-mistral-7b-16k.Q5_K_M.gguf as your selected model.

  2. Download openhermes-2.5-mistral-7b-16k.json

  3. Turn all server options on and select the preset pane under Server Model Settings>import preset from file>
    select openhermes-2.5-mistral-7b-16k.json

  4. Make sure you select the openhermes-2.5-mistral-7b-16k preset, then load the model

  5. start the server, then start TaskWeaver.

I am using a 2020 MacBook Pro with the M1 Pro SOC and 16GB ram.
It's working for me, I had it create some txt files, haven't tested anything too heavy yet. The main thing you have to make sure of is that your model has a large (at least 16k) context window, otherwise, it will get cut off or not be able to properly complete tasks.

@benjamin-mogensen
Copy link

Clearly I'm doing something wrong as I get the same behavior with the OpenHermes model as Orca? Are you using a Preset for this model in LM Studio, I am using ChatML, I have also tried setting the server with Automatic Prompt formatting enabled/disabled with/without a Preset - always the same behavior?

  1. Eject the model, turn off the server, make sure you have openhermes-2.5-mistral-7b-16k.Q5_K_M.gguf as your selected model.
  2. Download openhermes-2.5-mistral-7b-16k.json
  3. Turn all server options on and select the preset pane under Server Model Settings>import preset from file>
    select openhermes-2.5-mistral-7b-16k.json
  4. Make sure you select the openhermes-2.5-mistral-7b-16k preset, then load the model
  5. start the server, then start TaskWeaver.

I am using a 2020 MacBook Pro with the M1 Pro SOC and 16GB ram.

Thanks for sharing this - I tried prompting task weaver to do this:

Please predict the stock price for Apple for the next five days. Use ARIMA algorithm.

However, it seems the LLM is not able to construct code that can fetch the data from a website that is not using an API key of some sorts.

@Kaden-Schutt were you able to successfully complete this kind of task? Or what tasks did you have success with?

@britinmpls
Copy link

Clearly I'm doing something wrong as I get the same behavior with the OpenHermes model as Orca? Are you using a Preset for this model in LM Studio, I am using ChatML, I have also tried setting the server with Automatic Prompt formatting enabled/disabled with/without a Preset - always the same behavior?

  1. Eject the model, turn off the server, make sure you have openhermes-2.5-mistral-7b-16k.Q5_K_M.gguf as your selected model.
  2. Download openhermes-2.5-mistral-7b-16k.json
  3. Turn all server options on and select the preset pane under Server Model Settings>import preset from file>
    select openhermes-2.5-mistral-7b-16k.json
  4. Make sure you select the openhermes-2.5-mistral-7b-16k preset, then load the model
  5. start the server, then start TaskWeaver.

I am using a 2020 MacBook Pro with the M1 Pro SOC and 16GB ram. It's working for me, I had it create some txt files, haven't tested anything too heavy yet. The main thing you have to make sure of is that your model has a large (at least 16k) context window, otherwise, it will get cut off or not be able to properly complete tasks.

It has to be a windows issue, I followed your instructions (THANK YOU!). But still get the same result, the POST completes, but then just line after repeated line in the server logs...i've been running the first prompt for several mins and its just printing \n` over and over....:

[2023-12-04 23:19:32.589] [INFO] Accumulated 968 tokens: ``"\n\n```"plan": "plan": "plan": "load the data. content", "the plan the result<|_content": "plan": "data": "type": "load the user the rows of the data<end{": "plan the result<|start the data2 the result<|<|>\n``"\n``\n```"message": "interpreter the content.\n``: "plan": "the plan the file the rows<|_content": "load the user<|_data", "plan": "type": "load the result<|end{": "plan": "plan": "the data<|start the rows<|"\n``\n```"\n``: "message": "the content.\n```": "plan": "plan the user<|_content": "data": "load the|"\n``\n```"plan": the result<end{": "plan": "the data<start the rows<|"\n``\n```": "the user<|"\n``: the|end.\n``\n```"\n```"\n```"\n```"\n```"\n```"\n```"plan": the plan<|_content": "plan": the|start the rows<"\n``\n``"the result<"\n``: "plan": the|"\n``"\n``"\n``"plan": the|"\n``"\n``"\n``"\n``"\n```"\n``"\n``"\n``"plan": the|end{``\n``\n``"\n``"\n``"\n``"\n``"\n``"\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n``\n

It never completes... each iteration its adding another new line chr...

But once more, I really appreciate all the assistance!

@Kaden-Schutt
Copy link

It never completes... each iteration its adding another new line chr...

Yeah after further testing it seems hermes is not the best model for this, I tried doing more complex stuff and it started hallucinating like you mentioned. Try
deepseek-coder-6-7b-instruct.Q5_K_M.json using this config.

@Kaden-Schutt
Copy link

Kaden-Schutt commented Dec 5, 2023

@Kaden-Schutt were you able to successfully complete this kind of task? Or what tasks did you have success with?

@benjamin-mogensen
Using deepseek coder it seems to generate the code correctly, though I had to install some dependencies for it to work. It will write executable code, however it's having trouble formatting the response such that TaskWeaver can correctly interpret it in the terminal through the tokenizer. Based on a hunch, I looked at the configuration.md and changed the llm response format, which apparently by default is json and perhaps incompatible with local models. the updated taskweaver_config.json should look like this:

{
    "llm.api_base": "http://localhost:1234/v1",
    "llm.api_key": "0",
    "llm.model": "local-model",
    "llm.response_format": null
}

@NicolasMejiaPetit
Copy link

NicolasMejiaPetit commented Dec 5, 2023

How would I use the oobabooga backend with this?

use the --api command flag to start an api server then edit your TaskWeaver_Config.json as such:

{
    "llm.api_base": "http://localhost:5000/v1",
    "llm.api_key": "0",
    "llm.model": "local-model"
}

not sure though, this is based on a cursory glance at the ooba docs. I would recommend LMStudio as its server is a one click solution, but ooba is fantastic as well.

Thanks, ill try this on booga, for some odd reason i tried lm studio out with the exact set up you said, and it gives me this error:

Never mind, actually lm studio is working although i did want to use exllama with booga, i used the config you used for deepseek

Update: I tried on oobabooga and got this with the config you recommended:

'''
Human: predict the stock price for apple for next week.
Error: Cannot process your request due to Exception: Planner failed to generate response because Post send_to field is None
Traceback (most recent call last):
File "C:\Users\PC\Documents\TaskWeaver\taskweaver\planner\planner.py", line 181, in reply
response_post = self.planner_post_translator.raw_text_to_post(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\PC\Documents\TaskWeaver\taskweaver\role\translator.py", line 65, in raw_text_to_post
validation_func(post)
File "C:\Users\PC\Documents\TaskWeaver\taskweaver\planner\planner.py", line 173, in check_post_validity
assert post.send_to is not None, "Post send_to field is None"
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Post send_to field is None

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\PC\Documents\TaskWeaver\taskweaver\session\session.py", line 124, in send_message
post = _send_message(post.send_to, post)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\PC\Documents\TaskWeaver\taskweaver\session\session.py", line 96, in _send_message
reply_post = self.planner.reply(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\PC\Documents\TaskWeaver\taskweaver\planner\planner.py", line 202, in reply
raise Exception(f"Planner failed to generate response because {str(e)}")
Exception: Planner failed to generate response because Post send_to field is None

Human:
'''

@benjamin-mogensen
Copy link

@Kaden-Schutt were you able to successfully complete this kind of task? Or what tasks did you have success with?

@benjamin-mogensen Using deepseek coder it seems to generate the code correctly, though I had to install some dependencies for it to work. It will write executable code, however it's having trouble formatting the response such that TaskWeaver can correctly interpret it in the terminal through the tokenizer. Based on a hunch, I looked at the configuration.md and changed the llm response format, which apparently by default is json and perhaps incompatible with local models. the updated taskweaver_config.json should look like this:

{
    "llm.api_base": "http://localhost:1234/v1",
    "llm.api_key": "0",
    "llm.model": "local-model",
    "llm.response_format": null
}

Thanks, will try that. I thought the code generated by Task Weaver would automatically install any dependencies needed. Its a bit difficult if its not able to do that I think :)

@Kaden-Schutt
Copy link

After further testing it seems that the only local model that works seamlessly with TaskWeaver is deepseek coder, minstrel models try but they quickly and easily lose the plot, and llama are completely incompatible from my testing, which has been limited to 7b and 13b models due to hardware constraints. I would love to know if anyone can get anything coherent from a larger model like deepseek coder 33b, I am going to test codellama through openrouter now and report back with the results.

@NicolasMejiaPetit
Copy link

NicolasMejiaPetit commented Dec 5, 2023

Did you get this error ever, it happens every second response from the model. (lm studio)

"

[CODEINTERPRETER->CODEINTERPRETER]
The execution of the previous generated code has failed. If you think you can fix the problem by rewriting the code, please generate code and run it again.
Otherwise, please explain the problem to me.
Error: Cannot process your request due to Exception: 'NoneType' object has no attribute 'content'
Traceback (most recent call last):

File "C:\Users\PC\Documents\TaskWeaver\taskweaver\session\session.py", line 124, in send_message
post = _send_message(post.send_to, post)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\PC\Documents\TaskWeaver\taskweaver\session\session.py", line 106, in _send_message
reply_post = self.code_interpreter.reply(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\PC\Documents\TaskWeaver\taskweaver\code_interpreter\code_interpreter.py", line 68, in reply
code.content,
^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'content'

Human:"

@benjamin-mogensen
Copy link

After further testing it seems that the only local model that works seamlessly with TaskWeaver is deepseek coder, minstrel models try but they quickly and easily lose the plot, and llama are completely incompatible from my testing, which has been limited to 7b and 13b models due to hardware constraints. I would love to know if anyone can get anything coherent from a larger model like deepseek coder 33b, I am going to test codellama through openrouter now and report back with the results.

I am download deep seek 33b now, not sure if I can get it to run. I have not been able to get any tasks to complete with any models, not even deep seek 7b. Do you have an example of a task prompt that you got to work?

@benjamin-mogensen
Copy link

After further testing it seems that the only local model that works seamlessly with TaskWeaver is deepseek coder, minstrel models try but they quickly and easily lose the plot, and llama are completely incompatible from my testing, which has been limited to 7b and 13b models due to hardware constraints. I would love to know if anyone can get anything coherent from a larger model like deepseek coder 33b, I am going to test codellama through openrouter now and report back with the results.

I am download deep seek 33b now, not sure if I can get it to run. I have not been able to get any tasks to complete with any models, not even deep seek 7b. Do you have an example of a task prompt that you got to work?

Followup on this - I am on a M1 Mac Studio and I can load the 33b but it takes forever to get a reply from a simple prompt so not feasible to use for me 😄

@MrDelusionAI
Copy link

I had success with local llm model
TheBloke/deepseek-coder-6.7B-instruct-GGUF

taskweaver_config.json
{
"llm.api_base": "http://localhost:5000/v1",
"llm.api_key": "NULL",
"llm.model": "local-model"
}

Model loaded with
oobabooga/text-generation-webui
openai extension enabled

Just gave it a simple task to create a python hello_world and store in the working directory

@benjamin-mogensen
Copy link

I had success with local llm model TheBloke/deepseek-coder-6.7B-instruct-GGUF

taskweaver_config.json { "llm.api_base": "http://localhost:5000/v1", "llm.api_key": "NULL", "llm.model": "local-model" }

Model loaded with oobabooga/text-generation-webui openai extension enabled

Just gave it a simple task to create a python hello_world and store in the working directory

@MrDelusionAI can you share the exact prompt? I want to try and see if I can get it to work also

@MrDelusionAI
Copy link

I had success with local llm model TheBloke/deepseek-coder-6.7B-instruct-GGUF
taskweaver_config.json { "llm.api_base": "http://localhost:5000/v1", "llm.api_key": "NULL", "llm.model": "local-model" }
Model loaded with oobabooga/text-generation-webui openai extension enabled
Just gave it a simple task to create a python hello_world and store in the working directory

@MrDelusionAI can you share the exact prompt? I want to try and see if I can get it to work also

@benjamin-mogensen it was having issues with the file directory after my initial prompt, I followed up my prompts with asking it to tell me what was the current working directory, it printed the working directory, then prompted it again to
"create a simple python hello world program and save it in the current working directory"

@benjamin-mogensen
Copy link

Thanks @MrDelusionAI ! That seemed to work. Had to increase n_ctx to 8192.

Prompt:

create a simple python hello world program and save it in the current working directory

Last message from TaskWeaver:

TaskWeaver: The Python code 'print("Hello, World!")' has been written and saved as hello_world.py

File was saved inside /project/workspace/sessions and then under some date and numbered folder.

@MrDelusionAI
Copy link

Yes I believe the context length for the deepseeker model is 16384. Should have mentioned that previously.

@benjamin-mogensen
Copy link

Yes I believe the context length for the deepseeker model is 16384. Should have mentioned that previously.

Thanks! I couldn't find that anywhere :)

@benjamin-mogensen
Copy link

In case anyone wants my ML Studio model config for Deepseeker:
deepseek-instruct-7b-q6_k-gguf.json

@orkutmuratyilmaz
Copy link
Author

Thank you everyone for all the contributions and comments. It's actually started as a sad day, but I've seen those just a few minutes ago. You made me smile:)

@Kaden-Schutt
Copy link

Kaden-Schutt commented Dec 5, 2023

Hello friends, I am pleased to announce that Orca-V2, CodeLLama, and of course deepseek coder are working using Ooba text-generation UI.

  1. In terminal git clone https://github.com/oobabooga/text-generation-webui
  2. cd text-generation-webui
  3. If you have CUDA pip3 install -r requirements.txt otherwise, open the /text-generation-webui/ folder and replace requirements.txt with your specified hardware.
  4. Windows: start_windows.bat --api-port 3699 --listen --extensions openai MacOS: bash start_macOS.sh --api-port 3699 --listen --extensions openai Linux: bash start_linux.sh --api-port 3699 --listen --extensions openai (I had trouble with the default api port of 5000, so I changed it, if you would like to keep the default port of 5000 them use the --api flag in place of the --api-port flag).
  5. [OPTIONAL] If you have LMStudio with models already downloaded, open Users\${USER}\.cache\lm-studio\models and copy them into Users\${USER}\text-generation-webui\models
  6. Go to https://localhost:7860, click on the "models" tab, on the right under "Download Model", paste the huggingface repo where your desired model is found, for example TheBloke/Orca-2-13B-GGUF then click "get file list" and choose your desired quantization (I usually run Q5_k_m, again I am on a macbook pro 2020 M1)
  7. These are the configs that are stable with my macbook, if you have better hardware feel free to change them to better suit your hardware.
    ooba
  8. Load the model
  9. Change your taskweaver_config.json to the following:
{
    "llm.api_base": "http://localhost:3699/v1",
    "llm.api_key": "0",
    "llm.model": "local-model",
    "llm.response_format": null
}
  1. Run TaskWeaver and enjoy the freedom to use it with the LLM of your choice!

@iplayfast
Copy link

As an alternative to text-geneartion-webui, try litellm
Run it as litellm -m ollama/DeepSeek-Coder --port 9001 --drop_params

@j-loquat
Copy link

j-loquat commented Dec 9, 2023

Set llm.response_format to "text" instead of null and it may work better

@Wintoplay
Copy link

@Kaden-Schutt, why does it only work it llama cpp backend for api hosting for taskweaver?

I tried trandformers and it fail

@ShilinHe
Copy link
Collaborator

ShilinHe commented Jan 2, 2024

Ollama is now supported in TaskWeaver, please follow the docs for more information.

@ShilinHe ShilinHe closed this as completed Jan 2, 2024
@jackiezhangcn
Copy link

Ollama is now supported in TaskWeaver, please follow the docs for more information.

I followed the instruction, but I can't work through, seems Ollama endpoint can't be accessed by TaskWeaver, reports the error message:

TaskWeaver ▶ I am TaskWeaver, an AI assistant. To get started, could you please enter your request?
Human ▶ hello
Exception in thread Thread-3 (base_stream_puller):=>
Traceback (most recent call last):
File "/home/zhangyj/anaconda3/envs/taskweaver/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/zhangyj/anaconda3/envs/taskweaver/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/zhangyj/Public/TaskWeaver/taskweaver/llm/init.py", line 162, in base_stream_puller
for msg in stream:
File "/home/zhangyj/Public/TaskWeaver/taskweaver/llm/ollama.py", line 119, in _chat_completion
raise Exception(
Exception: Failed to get completion with error code 404: 404 page not found
╭───< Planner >
^ZaskWeaver ▶ [Planner] calling LLM endpoint <=💡=>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests