Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpt4all_api update and in working state - added gguf support and refactoring end points. All test pass except embedding #1659

Merged
merged 4 commits into from
Nov 21, 2023

Conversation

dpsalvatierra
Copy link
Contributor

Describe your changes

  • Edited docker-compose.yaml with the following changes:
  • New Variable: line 15 replaced bin model with variable ${MODEL_ID}
  • New volume: line 19 added models folder to place gguf llms
  • Added env file with instructions on using MODEL_BIN variable
  • created models folder under gpt4all_api with README place holder
  • Cleaned up some code in make test folder, specifically the batch completion function test
  • Added python-dotenv and openai==0.28.0 to requirements
  • Cleaning up some orphan init.py that don't need to be added to the branch
  • added *.gguf exception to gitignore to avoid uploading models during push
  • using "venv" instead of "env" for virtual environment
  • Issue ticket number and link
  • Refactored Chat Completion and engines endpoints.

Issue ticket number and link

Checklist before requesting a review

  • [x ] I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • I have added thorough documentation for my code.
  • I have tagged PR with relevant project labels. I acknowledge that a PR without labels may be dismissed.
  • If this PR addresses a bug, I have provided both a screenshot/video of the original bug and the working solution.

Before:
https://app.screenclip.com/GMfa

After change:
https://app.screenclip.com/Mi3r

##Steps to Reproduce
1- docker compose up --build : test gpt4all_api
2- docker compose -f docker-compose.yaml -f docker-compose.gpu.yaml up --build -d : test gpt4all_gpu
3- make test : all tests pass except embedding, This is really outdated and openai new API using "chat completion" instead of completion during the inference. The workaround is to use openai==0.28.0 (edited in requirements.txt)

Notes

Two problems addressed:
A) Docker API build is downloading old Bin files which are now deprecated. The new file format is gguf.
B) API using outdated completion functions in make file during testing. Workaround is to use openai==0.28.0
C) Not able to test embedding successfully
D) Adding variables
E) All endpoints working:

Engines.py

  • Added "requests.get" function to fetch engines from github list
  • Better error handling, moving loggers to bottom next to exception requests.
  • Removed unused libraries

chat.py

  • Error handling
  • Added api settings to fetch model to run tests
  • Added streaming response

@dpsalvatierra dpsalvatierra changed the title gpt4all_api update to get it working with gguf and refactoring end points gpt4all_api update and in working state - added gguf support and refactoring end points. All test pass except embedding Nov 18, 2023
@AndriyMulyar
Copy link
Contributor

Approved.

@nadar
Copy link

nadar commented Nov 20, 2023

since the docker-compose was broken until this PR, i have tested the PR and the API works again, thanks! Not sure its related but i got 2 problems maybe no related, but i thought it could be useful, if not i am sorry and i will create an issue.

1.) The /v1/completions always makes a redirect to a trailing slash version /v1/completions/

["request_headers"]=>
  array(6) {
    [0]=>
    string(29) "POST /v1/completions HTTP/1.1"
    [1]=>
    string(20) "Host: localhost:4891"
    [2]=>
    string(59) "User-Agent: PHP Curl/2.5 (+https://github.com/php-mod/curl)"
    [3]=>
    string(11) "Accept: */*"
    [4]=>
    string(30) "Content-Type: application/json"
    [5]=>
    string(18) "Content-Length: 86"
  }
  ["response_headers"]=>
  array(5) {
    [0]=>
    string(31) "HTTP/1.1 307 Temporary Redirect"
    [1]=>
    string(35) "date: Mon, 20 Nov 2023 08:18:32 GMT"
    [2]=>
    string(15) "server: uvicorn"
    [3]=>
    string(17) "content-length: 0"
    [4]=>
    string(47) "location: http://localhost:4891/v1/completions/"
  }

2.) No matter what the /v1/completions/ the always returns with finish_reasion "stop"

  ["request_headers"]=>
  array(6) {
    [0]=>
    string(30) "POST /v1/completions/ HTTP/1.1"
    [1]=>
    string(20) "Host: localhost:4891"
    [2]=>
    string(59) "User-Agent: PHP Curl/2.5 (+https://github.com/php-mod/curl)"
    [3]=>
    string(11) "Accept: */*"
    [4]=>
    string(30) "Content-Type: application/json"
    [5]=>
    string(18) "Content-Length: 86"
  }
  ["response_headers"]=>
  array(5) {
    [0]=>
    string(15) "HTTP/1.1 200 OK"
    [1]=>
    string(35) "date: Mon, 20 Nov 2023 08:19:43 GMT"
    [2]=>
    string(15) "server: uvicorn"
    [3]=>
    string(19) "content-length: 275"
    [4]=>
    string(30) "content-type: application/json"
  }
  ["response"]=>
  string(275) "{"id":"eb87d52c-e64d-4bb3-842c-fa1341f601ad","object":"text_completion","created":1700468392,"model":"mistral-7b-openorca.Q4_0.gguf","choices":[{"text":"\n","index":0,"logprobs":-1.0,"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}"
  ["response_header_continue":protected]=>
  bool(false)
}

response

{
  ["id"]=>
  string(36) "eb87d52c-e64d-4bb3-842c-fa1341f601ad"
  ["object"]=>
  string(15) "text_completion"
  ["created"]=>
  int(1700468392)
  ["model"]=>
  string(29) "mistral-7b-openorca.Q4_0.gguf"
  ["choices"]=>
  array(1) {
    [0]=>
    array(4) {
      ["text"]=>
      string(1) "
"
      ["index"]=>
      int(0)
      ["logprobs"]=>
      float(-1)
      ["finish_reason"]=>
      string(4) "stop"
    }
  }
  ["usage"]=>
  array(3) {
    ["prompt_tokens"]=>
    int(0)
    ["completion_tokens"]=>
    int(0)
    ["total_tokens"]=>
    int(0)
  }
}

@nadar
Copy link

nadar commented Nov 20, 2023

@dpsalvatierra in the Readme the variable is called MODEL_ID but in the docker-compose file its MODEL_BIN.

@dpsalvatierra
Copy link
Contributor Author

dpsalvatierra commented Nov 20, 2023

@dpsalvatierra in the Readme the variable is called MODEL_ID but in the docker-compose file its MODEL_BIN.

Thanks for bringing this up. The MODEL_ID variable is used on the GPU docker compose file, however I also noticed that the Makefile needs some house keeping.

Expect new adds and some proper documentation soon.

If you are only running the default docker compose file, just adding the model into the folder and adjusting the .env for "MODEL_BIN" variable should get you up and running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants