Add GPT4All local provider #209

krassowski · 2023-06-04T19:04:39Z

This is proof of concept for #190. I tried a number of models and GPT4All appears to be most straighforward to install.

Example 1	Example 2 (snoozy)	Example 3 (groovy)

It would be good if the language of the document selection gets included from were included in the prompt (here the model assumed it is C# for some reason). Results, as seen above, are not great for coding tasks, but supposedly these models are good at reasoning and conversations.

Langchain updates

A newer langchain version is required because:

>= 0.0.174: the bindings switched from pygpt4all to a new official gpt4all package (Update GPT4ALL integration langchain-ai/langchain#4567)
>= 0.0.188: to support allow_download attribute (add allow_download as class attribute for GPT4All langchain-ai/langchain#5512)

Model download

GPT4All bindings have a native support for downloading model weights (disabled by default in langchain). If we decide to toggle it on by default user would not not have to do anything and the model would just work. The experience will depend on network speed as downloading the model can take from minutes to hours, but then it is cached in ~/.cache/gpt4all/. The progress bar displays only in terminal, but download failures show up in UI as exception tracebacks. Ideally we would have a way to show that download is in progress in the UI.

Alternatively, users can download the model directly, e.g.

cd ~/.cache/gpt4all/
wget http://gpt4all.io/models/ggml-gpt4all-l13b-snoozy.bin

The download sizes are:

l13b-snoozy: 7.6G
j-v1.3-groovy: 3.79 GB
j-v1.2-jazzy: 3.79 GB

Performance

GPT4All runs on CPU (there is also a GPU version, GPT4AllGPU but there are no buindings in lanchaing - although we could contribute). The performance of CPU versions somewhat depends on number of threads (but then using too many threads can slow it down). This PR makes number of threads user-configurable.

Additionally a number of fields could be added to enhance user configurability, e.g. temp, n_predict (max output tokens), etc.

welcome · 2023-06-04T19:04:41Z

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.

You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

3coins · 2023-06-06T02:08:04Z

@krassowski
Thanks for doing all the research on this and providing a POC.

This seems like a reasonable option, but believe that we will need some UX changes (messaging, confirmation for downloading, progress bar etc.) to provide this model option. In some cases, users might also have this model already installed, so we will need to handle that. To start, I think we can go with the alternate option of letting users download it to a specific location, and configure in the UI.

I want to try this out locally. In case I download the model, does this code require it to be located at ~/.cache/gpt4all/?

krassowski · 2023-06-06T06:27:53Z

I want to try this out locally. In case I download the model, does this code require it to be located at ~/.cache/gpt4all/?

Currently yes, but we could change it by providing the model path. I could add a filed the same way there is a field for number of threads, does it sound good?

For reference, gpt4all documents the default model path here a nd defines it here while LangChain mentiones it here.

krassowski · 2023-06-06T07:45:02Z

What do you think about disabling auto-download and just displaying an error if the model is not available with instructions for download?

3coins · 2023-06-06T16:19:51Z

What do you think about disabling auto-download and just displaying an error if the model is not available with instructions for download?

Yes, that sounds good. Thanks for looking into this.

ellisonbg · 2023-06-06T18:12:30Z

@krassowski thanks for working on this, I think supporting local models is really important!

3coins

@krassowski
Thanks for working on this. I was able to download and connect with the ggml-gpt4all-j-v1.3-groovy model using these changes, and it worked. There were some issues with the LangChain version, and other models in the list. I also rebased from main, and can submit the fixes; would you be able to give me permissions to your fork to merge those?

packages/jupyter-ai-magics/pyproject.toml

packages/jupyter-ai-magics/jupyter_ai_magics/providers.py

packages/jupyter-ai/src/handler.ts

krassowski · 2023-06-09T09:23:21Z

I also rebased from main, and can submit the fixes; would you be able to give me permissions to your fork to merge those?

Thank you! You should be able to push to the branch already:

but I also just sent you invite to collaborate on my fork if that makes it easier.

krassowski · 2023-06-13T18:31:04Z

@3coins just checking if you wanted to push to my branch, or should I start working on addressing the review suggestions?

3coins · 2023-06-14T17:54:50Z

@krassowski
I have some of the suggestions available locally, and had some other observations. Will update the PR later today.

3coins · 2023-06-15T04:42:03Z

@krassowski
Rebased from main, updated the LangChain version, and made auto download false. I don't think this is ready to be merged yet. I observed a few issues while using this with the learn and ask commands, where there was a larger context passed to the LLM. I worked with the both the ggml-gpt4all-j-v1.3-groovy and the ggml-gpt4all-j-v1.3-groovy models, and they seem to have a very high latency in responding to any of the useful prompts. See the screenshots attached, where it took 5+ mins to respond, and the response was not complete.

I also ran into prompt size issue with just 2 consecutive prompts.

Is a latency of 5+ mins expected for these models? I am running these on a Mac M1 Pro (16gb).
For the prompt size issue, I think we have to look at truncating the chat_history object after a certain no of conversations.

3coins · 2023-06-15T05:31:17Z

Ok, it seems like the latency is directly related to the length of prompt passed. I truncated the chat_history to last 2 conversations, which helped with the response time, but ran again into it after the one of the previous responses became large.

These models also seem to ignore the guardrails in the prompt (If you don't know the answer, just say that you don't know, don't try to make up an answer), and adding information on it's own. This needs some tweaking on the prompt to make it work with these models.

It seems like there should be some more changes to the retrieval chain for this to work to make sure the prompt length never exceeds a certain length. I see that there is a max_tokens_limit on the ConversationalRetrievalChain, but for this to work, there are additional methods that need to be implemented on the LLM classes.

3coins · 2023-06-16T19:45:29Z

@krassowski
Thanks for starting the work on this feature, I am really excited to get this working for Jupyter AI users. I have created #224, #225, and #226 for tracking work on fixing some of the issues observed here. I believe this is an important feature for users, so we should continue work on this. We have planned a biweekly release cycle, so will include these for the next milestone.

There is also some encouraging progress on LLM compression, so this should help with better models in future which should behave closely to external providers.
https://arxiv.org/abs/2306.03078

krassowski · 2023-06-18T21:19:45Z

Thank you! On the performance side, when a model generates a few tokens/second streaming the response (token by token) gives much better UX (the fact that the process takes minutes for long responses is not as bad a problem when tokens are streamed to user this way); I think it was not discussed previously, so I opened #228 to track this (for your consideration).

krassowski · 2023-07-30T14:14:56Z

I resolved conflicts to push it along. What are the next steps here? Would you like to revisit the model choice, resolve any of the issues referenced? I can help, just not sure what is blocking here.

psychemedia · 2023-08-04T20:38:22Z

One approach for model selection might be to track some of the models supported by GPT4all, which runs a desktop app for playing with local chat models. The GPt4all repo often picks up issues requesting new and popular models, and the ones they support may be indicative of some sort of local LLM user community.

Their model list is here.

dlqqq · 2023-08-08T21:41:16Z

@krassowski Hey Michal! I'm very sorry that your PR was left in the queue for so long; this PR was submitted when I was out on vacation, and I had missed it in the recent weeks.

I submitted a PR to your branch to fix a few bugs I had encountered and add some documentation for prospective GPT4All users. Please review and merge this when you have time: krassowski#1

After that, the next step is to rebase this branch onto the latest commit on main. Finally, once we verify that CI still passes, I will approve and merge your changes. 🤗

We really appreciate your effort and patience on this PR! We are aiming to include your PR in the next release of Jupyter AI v1 and v2.

dlqqq

@krassowski This PR looks great! The only remaining task is to rebase this branch onto the latest commit of main to make sure CI passes. I would also remove the merge commits to preserve a linear history for this branch, as we will backport this PR to 1.x. 👍

for more information, see https://pre-commit.ci

see: langchain-ai/langchain@265c285

dlqqq

@krassowski Awesome work! 🎉

welcome · 2023-08-10T23:12:02Z

Congrats on your first merged pull request in this project! 🎉

Thank you for contributing, we are very proud of you! ❤️

Sajalj98 · 2023-09-26T09:42:22Z

Hi Team,
I want to use local models via gpt4all, I am unable to do so because of the issue #348 .
I am using Python 3.8, JupyterLab 3, Jupyter_ai 1.0, gpt4all (i already tried 1.0.0 and 1.0.8 ).langchain 0.0.277.

I downloaded the models in the cache folder as suggested -

Please suggest me steps and working versions to accomplish running local model over chat interface or jupyter cell via magic commands.

Please find attached snaps of errors -

i am still getting the same error ("wasn't able to index that path") in the chat interface (#348)

and when using ai_magic command, the response is the following -

and for another model the output is empty.

* Add GPT4All * Allow to tune number of threads * Disable auto-download * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix build * bump langchain to v0.0.223 see: langchain-ai/langchain@265c285 * implement async for GPT4All * update user docs with GPT4All installation instructions --------- Co-authored-by: 3coins <pyjain@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: David L. Qiu <david@qiu.dev>

krassowski added the enhancement New feature or request label Jun 4, 2023

3coins reviewed Jun 9, 2023

View reviewed changes

3coins force-pushed the local-providers branch from 94174cc to 18b69f5 Compare June 15, 2023 04:27

krassowski mentioned this pull request Jun 18, 2023

Stream textual responses token-by-token #228

Open

dlqqq approved these changes Aug 10, 2023

View reviewed changes

krassowski and others added 8 commits August 10, 2023 23:16

Add GPT4All

f804d83

Allow to tune number of threads

05100dd

Disable auto-download

23f7d04

[pre-commit.ci] auto fixes from pre-commit.com hooks

c3c98cc

for more information, see https://pre-commit.ci

fix build

94556bf

bump langchain to v0.0.223

06624d5

see: langchain-ai/langchain@265c285

implement async for GPT4All

6ada49c

update user docs with GPT4All installation instructions

06ced6d

krassowski force-pushed the local-providers branch from df53b37 to 06ced6d Compare August 10, 2023 22:20

dlqqq approved these changes Aug 10, 2023

View reviewed changes

dlqqq merged commit ca45817 into jupyterlab:main Aug 10, 2023
5 checks passed

This was referenced Aug 10, 2023

Add GPT4All local provider (1.x) #334

Merged

Support for locally hosted models #190

Closed

JasonWeill mentioned this pull request Aug 28, 2023

Bug in Dev Strategy - No Support for LOCAL only models (StarCoder, Llama2, others) for Generative AI models for coding #360

Closed

JasonWeill mentioned this pull request Sep 25, 2023

Local Model Support in Jupyter AI #396

Closed

Sajalj98 mentioned this pull request Sep 26, 2023

cannot instantiate local gpt4all model in chat #348

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPT4All local provider #209

Add GPT4All local provider #209

krassowski commented Jun 4, 2023 •

edited by 3coins

welcome bot commented Jun 4, 2023

3coins commented Jun 6, 2023 •

edited

krassowski commented Jun 6, 2023

krassowski commented Jun 6, 2023

3coins commented Jun 6, 2023

ellisonbg commented Jun 6, 2023

3coins left a comment

krassowski commented Jun 9, 2023

krassowski commented Jun 13, 2023

3coins commented Jun 14, 2023

3coins commented Jun 15, 2023

3coins commented Jun 15, 2023

3coins commented Jun 16, 2023

krassowski commented Jun 18, 2023

krassowski commented Jul 30, 2023

psychemedia commented Aug 4, 2023

dlqqq commented Aug 8, 2023 •

edited

dlqqq left a comment •

edited

dlqqq left a comment

welcome bot commented Aug 10, 2023

Sajalj98 commented Sep 26, 2023 •

edited

Add GPT4All local provider #209

Add GPT4All local provider #209

Conversation

krassowski commented Jun 4, 2023 • edited by 3coins

Langchain updates

Model download

Performance

welcome bot commented Jun 4, 2023

3coins commented Jun 6, 2023 • edited

krassowski commented Jun 6, 2023

krassowski commented Jun 6, 2023

3coins commented Jun 6, 2023

ellisonbg commented Jun 6, 2023

3coins left a comment

Choose a reason for hiding this comment

krassowski commented Jun 9, 2023

krassowski commented Jun 13, 2023

3coins commented Jun 14, 2023

3coins commented Jun 15, 2023

3coins commented Jun 15, 2023

3coins commented Jun 16, 2023

krassowski commented Jun 18, 2023

krassowski commented Jul 30, 2023

psychemedia commented Aug 4, 2023

dlqqq commented Aug 8, 2023 • edited

dlqqq left a comment • edited

Choose a reason for hiding this comment

dlqqq left a comment

Choose a reason for hiding this comment

welcome bot commented Aug 10, 2023

Sajalj98 commented Sep 26, 2023 • edited

krassowski commented Jun 4, 2023 •

edited by 3coins

3coins commented Jun 6, 2023 •

edited

dlqqq commented Aug 8, 2023 •

edited

dlqqq left a comment •

edited

Sajalj98 commented Sep 26, 2023 •

edited