Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self hosted LLMs Support #263

Closed
Tracked by #290
gsaivinay opened this issue May 9, 2023 · 28 comments
Closed
Tracked by #290

Self hosted LLMs Support #263

gsaivinay opened this issue May 9, 2023 · 28 comments

Comments

@gsaivinay
Copy link
Contributor

Hello,

Thanks for this awesome work.

Is there any support for custom self hosted LLMs? Like I host multiple models in AWS EC2 instances using https://github.com/huggingface/text-generation-inference. If so, could you please point me to example.

Happy to help or contribute regarding this if not already exists.

@TMRolle
Copy link

TMRolle commented May 12, 2023

Seconded, even just support for HuggingFacePipeline LLM's would be really useful

@ogabrielluiz
Copy link
Contributor

Hey all!
I completely agree. I had a hard time testing them and thought the problem was due to streaming which is supported now so maybe all give it another go.

Feel free to try it out too. We might add the pipeline to dev just to set it up for dev testing.

@ogabrielluiz
Copy link
Contributor

I'll open an issue where we'll track the missing modules for each type starting with LLMs.

@pounde
Copy link

pounde commented May 16, 2023

Thanks for the consideration. Big +1 here. Thanks!

@ogabrielluiz
Copy link
Contributor

Should it be SelfHostedHuggingFaceLLM or HuggingFaceTextGenInference?

ogabrielluiz added a commit that referenced this issue May 16, 2023
…FaceLLM to llms list in config.yaml

fix(langflow): fix import of SUFFIX_WITH_DF in custom.py
refactor(langflow): refactor LLMCreator to import llms and chat_models modules and create type_to_loader_dict from them
feat(langflow): add inference_server_url, max_new_tokens, top_k, top_p, typical_p, temperature, and repetition_penalty fields to LLMFrontendNode and show them in the advanced section

Issue #263
@ogabrielluiz
Copy link
Contributor

I've added both of them in this branch but I don't think implementing SelfHostedHuggingFaceLLM will be trivial as it seems it needs some Runhouse objects that are not inside LangChain.

What do y'all think?

Could you take it for a spin to see what breaks?

@pounde
Copy link

pounde commented May 16, 2023

You'll have to forgive my ignorance but I'm just getting up to speed on hosting. I created a simple API to implement Dolly using the HuggingFacePipeline in langchain. In the one minute look, I think that's more akin to SelfHostedHuggingFaceLLM as it's hosted entirely on our own system. Perhaps someone with more experience in this domain has some light to shed on that. I'll try to carve some time to try that branch.

@ogabrielluiz
Copy link
Contributor

SelfHostedHuggingFaceLLM seems to require Runhouse to be set up and to pass a hardware object of some kind.

We could (and probably should) implement that but then we'd have to define a maintainable way of doing so.

@gsaivinay
Copy link
Contributor Author

gsaivinay commented May 16, 2023

I've actually contributed to HuggingFaceTextGenInference, usually we use this server on local machines or Cloud service like AWS EC2, to which we can connect via the API.

If the local machine is able to run a model with SelfHostedHuggingFaceLLM then mostly it can also run the same model withHuggingFaceTextGenInference. Since the latter gives an API to interact with, it'll be easy to use in multiple applications.

ogabrielluiz added a commit that referenced this issue May 16, 2023
feat(langflow): add support for HuggingFacePipeline in loading.py
feat(langflow): add model_id field to LLMFrontendNode's SHOW_FIELDS list

Issue #263
@pounde
Copy link

pounde commented May 16, 2023

@ogabrielluiz -- I built the dockerfile in the branch and I'm not seeing the self-hosted models listed. Has that not caught up to the backend perhaps?

@ogabrielluiz
Copy link
Contributor

ogabrielluiz commented May 16, 2023

@pounde in the config.yml there's this section:

llms:
  - OpenAI
  # - AzureOpenAI
  - ChatOpenAI
  - HuggingFaceHub
  - LlamaCpp
  - HuggingFaceTextGenInference
  - SelfHostedHuggingFaceLLM
  - HuggingFacePipeline

Theoretically all of these should show up but there could be a bug preventing one of them of showing in the frontend.
Since SelfHostedHuggingFaceLLM requires runhouse, maybe we should focus on @gsaivinay HuggingFaceTextGenInference and HuggingFacePipeline.

@pounde
Copy link

pounde commented May 16, 2023

@ogabrielluiz -- sure enough. It's in my config.yaml. So luck on the frontend though. I have:

  • OpenAI
  • ChatOpenAI
  • LlamaCpp
  • HuggingFaceHub
    No luck on the others.

@gustavoschaedler
Copy link
Contributor

I've added the LLM HuggingFaceTextGenInference, locally the behavior was as expected, could you check if it's okay on your side?
image

@dongreenberg
Copy link

dongreenberg commented Jun 7, 2023

Hey folks, just stumbled upon this. I work on Runhouse - you're correct that the HFTextGen LLM can offer the same functionality for an optimized set of models and a relatively simple setup for access to the server, whereas the SelfHosted models via Runhouse can support any model and a more flexible set of compute (e.g. launching automatically on any cloud), but without automatically handling distribution and model-specific optimizations. The increased flexibility is particularly important in enterprise, but I'm not sure if that's your target userset? I'm happy to help if you're interested in supporting that use case. If you're mainly focused on local compute with a specific set of models HFTextGen should be totally fine.

@ogabrielluiz
Copy link
Contributor

Hey, @dongreenberg.
Thanks for reaching out. Runhouse's solution fits very well into our plans.
We'd have to build a new way of setting up models to work with Runhouse and help is definitely appreciated.

Please let me know if I can assist you with anything.

@toby-lm
Copy link

toby-lm commented Jun 30, 2023

I've added the LLM HuggingFaceTextGenInference, locally the behavior was as expected, could you check if it's okay on your side?
image

I've tried building from the 263-self-hosted-llms-support branch and I can't seem to get any response in the chat window that pops up. I can connect it to my local text-generation-inference API, and there are responses in the browser developer console but no text appears as a reply. Can you show how this should work please?

EDIT: I get a response correctly if it's just the LLM node. Once I connect it to a ConversationChain it didn't display the LLM reply (but still received it).

@2good4hisowngood
Copy link

2good4hisowngood commented Jul 8, 2023

Just to throw another option out there, LangChain supports Ooobabooga's TextGen Web API but it's not in LangFlow yet. In my experience testing different tools, it's one of the most consistently functional and improving locally hosted options for running models and using Nvidia GPUs. Many tools default to the CPU and require advanced setup efforts. TextGen has a one click installer that helps configure it for your system. They also quickly adopt new features like ExLlama to increase token rates. It has many advanced options configurable through the launch flags, or through the ui, allowing users to modify the configurations, and retest to validate if their changes improve performance on their specific machine, rather than trying to shoehorn a non-complete feature list into langchain arguments.

This example goes over how to use LangChain to interact with LLM models via the text-generation-webui API integration. Please ensure that you have text-generation-webui configured and an LLM installed. Recommended installation via the one-click installer appropriate for your OS. Once text-generation-webui is installed and confirmed working via the web interface, please enable the api option either through the web model configuration tab, or by adding the run-time arg --api to your start command.

LangChain Page for TextGen: https://python.langchain.com/docs/modules/model_io/models/llms/integrations/textgen
GitHub Page: https://github.com/oobabooga/text-generation-webui/tree/main

@RandomInternetPreson
Copy link

I've added the LLM HuggingFaceTextGenInference, locally the behavior was as expected, could you check if it's okay on your side? image

I don't know if you are still working on this but I would really like to try it out! Being able to use langflow with oobabooga would be amazing!!

I found the repo you made here: https://github.com/logspace-ai/langflow/tree/263-self-hosted-llms-support

But I don't know how to install it. I Can install the current version of langflow with pip install langflow, but I'm not sure how to install your version.

I would give you feedback on the branch if you could tell me how to install it. Seriously langflow with oobabooga would be amazing!!!

@pounde
Copy link

pounde commented Jul 9, 2023

Still no luck on my end. I do have additional options now but no TextGenInterface.
Screenshot 2023-07-09 at 10 57 24 AM

@thomclae33
Copy link

Any news on this? Would love to know how to use the Textgen API with Langflow.

@Jirito0
Copy link

Jirito0 commented Aug 27, 2023

Also looking for an update on this please! Been scouring the entire internet for a solution but cant find anything

@vvlEURO
Copy link

vvlEURO commented Sep 2, 2023

image
image
Its work in 0.5.0a0 version

@tonypius
Copy link

Hey, would really like an update on this feature for LLM HuggingFaceTextGenInference

Also, is the custom component a good workaround ?

@ogabrielluiz
Copy link
Contributor

hey!
The CustomComponent is a good workaround and we've added the Hugging Face Inference API component last week.

Can it be used in place of the TextGen?

@tonypius
Copy link

HuggingFaceTextGenInference is different from Hugging Face Inference API right ?

@m1ll10n
Copy link

m1ll10n commented Sep 14, 2023

image image Its work in 0.5.0a0 version

I wonder if the whole reason this thing work is due to different langchain versions as Im facing a "streaming option currently unsupported" issue in both 0.4.17 and 0.4.18

@vvlEURO
Copy link

vvlEURO commented Oct 6, 2023

image image Its work in 0.5.0a0 version

I wonder if the whole reason this thing work is due to different langchain versions as Im facing a "streaming option currently unsupported" issue in both 0.4.17 and 0.4.18

Yes, related. Support for langchain where streaming is added starts with langflow 0.5.0a0.

Copy link

stale bot commented Nov 20, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.