Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate opensource LLMs into autogen #46

Closed
3 tasks done
LeoLjl opened this issue Jul 3, 2023 · 43 comments
Closed
3 tasks done

Integrate opensource LLMs into autogen #46

LeoLjl opened this issue Jul 3, 2023 · 43 comments

Comments

@LeoLjl
Copy link
Collaborator

LeoLjl commented Jul 3, 2023

Currently, autogen.oai only supports OpenAI models. I am planning to integrate open source LLMs into autogen. This would require Hugging Face transformers library.

Some of the open source LLMs I have in mind include LLaMa, Alpaca, Falcon, Vicuna, WizardLM, Starcoder, Guanaco. Most of their inference codes are integrated into Hugging Face transformers library, which can be unified into something like this.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "<name-of-model>"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
inputs = tokenizer("The world is", return_tensors="pt")
sample = model.generate(**inputs, max_length=128)
print(tokenizer.decode(sample[0]))

Tasks

  1. jtongxin
  2. enhancement
  3. enhancement good first issue
@LeoLjl
Copy link
Collaborator Author

LeoLjl commented Jul 3, 2023

The current goal is to support open-source LLMs in LLM-based agents with as little modification as possible.

The core code that needs modification is L41 in autogen/agent/assistant_agent.py. It only supports making completions using OpenAI models.

        responses = oai.ChatCompletion.create(messages=self._conversations[sender.name], **self._config)

There are two ways to do so. One is to create a utility function that checks the model name and call the regarding inference APIs.

openai_models = ["gpt-4", "gpt-3"]
open_source_models = ["llama", "alpaca"]
google_models = ["bard"]

def create(model, **kwargs):
    if model in openai_models:
        # generate response using  "oai.ChatCompletion.create"
        pass
    elif model in open_source_models:
        # generate response using related Hugging Face APIs
        pass
    elif model in google_models:
        # generate response using related Google APIs
        pass

This is quick to implement and easy to use. However it may introduce some unsolicited problems when developing other applications.

Another is to inherit the autogen.oai.Completion class and rewrite the methods.

class OpensourceCompletion(autogen.oai.Completion):
    @classmethod
    def create(**kwargs):
        # Make completion using related APIs
        pass

This method is not consistent with folder name oai.

I am researching what other libraries are handling with this situation and find potential solutions, e.g. LangChain, JARVIS.

@sonichi
Copy link
Contributor

sonichi commented Jul 3, 2023

The naming consistency is not an issue because we can interpret it. For example, oai offers an inference API compatible with openai. It doesn't prevent supporting non-openai models using the same inference API.

@sonichi
Copy link
Contributor

sonichi commented Jul 3, 2023

Related issue: microsoft/FLAML#1037 #15 microsoft/FLAML#1017

@sonichi
Copy link
Contributor

sonichi commented Jul 11, 2023

@LeoLjl
Copy link
Collaborator Author

LeoLjl commented Jul 12, 2023

FastChat is a useful tool as a local drop-in replacement for OpenAI APIs. This greatly simplifies integration.
ToDo:

  • Update docs on how to spin up endpoints using FastChat.
  • Write an automated code to start OpenAI-compatible API server.

@sonichi
Copy link
Contributor

sonichi commented Jul 12, 2023

Thanks for creating a task list. Could you edit the "tasks" in the main body of the issue? There you can track sub issues. For example, the two items you listed here correspond to two existing issues microsoft/FLAML#1017 and microsoft/FLAML#1037. You can comment on those existing issues about the approach you'll take to solve them. Also add #15 to the task list.

@ishaan-jaff
Copy link

Hi @sonichi @LeoLjl I’m the maintainer of LiteLLM (abstraction to call 100+ LLMs)- we allow you to create a proxy server to call 100+ LLMs, and I think it can solve your problem (I'd love your feedback if it does not)

Try it here: https://docs.litellm.ai/docs/proxy_server
https://github.com/BerriAI/litellm

Using LiteLLM Proxy Server

import openai
openai.api_base = "http://0.0.0.0:8000/" # proxy url
print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))

Creating a proxy server

Ollama models

$ litellm --model ollama/llama2 --api_base http://localhost:11434

Hugging Face Models

$ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
$ litellm --model claude-instant-1

Anthropic

$ export ANTHROPIC_API_KEY=my-api-key
$ litellm --model claude-instant-1

Palm

$ export PALM_API_KEY=my-palm-key
$ litellm --model palm/chat-bison

@sonichi sonichi transferred this issue from microsoft/FLAML Sep 29, 2023
@sonichi sonichi mentioned this issue Sep 29, 2023
@yhyu13
Copy link

yhyu13 commented Oct 3, 2023

@LeoLjl There are projects like textgen webui that allows openai api adapter to local llms.
And there are proejcts like one-api that servers as middleware for a lot of different remote llm backends

All of them simply redirect the openai api without introducing new code or modifying code

@LeoLjl
Copy link
Collaborator Author

LeoLjl commented Oct 3, 2023

Hi @yhyu13, thanks for the valuable information. This is really helpful. I will include it in the next docs update!

@LeoLjl
Copy link
Collaborator Author

LeoLjl commented Oct 3, 2023

Hi @ishaan-jaff, liteLLM looks like just the solution we need. I will try it out and include it in the new docs update!

@YaswanthDasamandam
Copy link

Hi ,
can someone answer my question #89 .
I am overriding the create class to generate the answers .

@ishaan-jaff
Copy link

@TheCompAce I can help you on our discord: https://discord.com/invite/wuPM9dRgDw

@TheCompAce
Copy link
Collaborator

@TheCompAce I can help you on our discord: https://discord.com/invite/wuPM9dRgDw

I was a idiot to fix it I had to set the api route to "/chat/completions" to match openAIs, so I used `from flask import Flask, request, jsonify
from flask_swagger_ui import get_swaggerui_blueprint
from litellm import completion

app = Flask(name)

Swagger UI setup

SWAGGER_URL = '/swagger'
API_URL = '/static/swagger.json'
swaggerui_blueprint = get_swaggerui_blueprint(
SWAGGER_URL,
API_URL,
config={
'app_name': "Litellm Server"
}
)
app.register_blueprint(swaggerui_blueprint, url_prefix=SWAGGER_URL)

@app.route('/chat/completions', methods=['POST'])
def complete():
data = request.json
model = "huggingface/mistralai/Mistral-7B-Instruct-v0.1"
messages = data.get('messages', [])

response = completion(model=model, messages=messages, max_tokens=8192)
return jsonify(response)

if name == 'main':
app.run(debug=True)
And now I can use the config for AutoGen[
{
"model": "custom_liteLLM",
"api_base": "http://127.0.0.1:5000"
}
]`

@ishaan-jaff
Copy link

made a PR to autogen: #95 @LeoLjl

@henrywithu
Copy link

This looks quite promising. Do you have any example on it? TBH GPT-4 token is pricey, and Code Llama can be a nice choice

@TheCompAce
Copy link
Collaborator

TheCompAce commented Oct 4, 2023

I have code to get it working above, but seemed to work at first, then started getting token of the limit errors. So maybe I can work on it some more another time.

@fxtoofaan
Copy link

can you please see if this api integration will work with TheBloke models if possible like this one TheBloke/Mistral-7B-OpenOrca-GPTQ

https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GPTQ

maybe FastChat+vLLM server in the backend with openai api
serving TheBloke/Mistral-7B-OpenOrca-GPTQ model
any prompt adjustments if needed like chatml etc.

and example of python code to show how to connect to this using autogen in python script.

@LeoLjl
Copy link
Collaborator Author

LeoLjl commented Oct 5, 2023

Hi @fxtoofaan, I think this will solve your question. https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs and https://github.com/microsoft/autogen/blob/osllm/notebook/open_source_language_model_example.ipynb

@fxtoofaan
Copy link

fxtoofaan commented Oct 5, 2023

@LeoLjl thank you. can you look into these various prompt tempalates:
https://github.com/oobabooga/text-generation-webui/tree/main/instruction-templates

Because of the OpenOrca models I am interested in this prompt template: https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates/MPT-Chat.yaml

can this type or prompt template be used in autogen? example of this in python script would be useful. thank you,

by the way, the model I am playing around with now is this: https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-AWQ

@bigsk1
Copy link
Collaborator

bigsk1 commented Oct 8, 2023

@LeoLjl thank you. can you look into these various prompt tempalates: https://github.com/oobabooga/text-generation-webui/tree/main/instruction-templates

Because of the OpenOrca models I am interested in this prompt template: https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates/MPT-Chat.yaml

can this type or prompt template be used in autogen? example of this in python script would be useful. thank you,

by the way, the model I am playing around with now is this: https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-AWQ

I was able to get oobabooga openai api working last night but was getting token limit errors 2048 and still need to troubleshoot. Was running text gen webui locally and running autogen locally on wsl2. Running oobaba with openai api flag and in a notebook using

import autogen

config_list = [
    {
        "model": "gpt-4",
        "api_key": "sk-111111111111111111111111111111111111111111111111",
        "api_base": "http://192.168.XX.XX:5001/v1/chat/completions",
    },
]

using model TheBloke_Llama-2-13B-chat-GPTQ.

Was working just fine and worked just like a open ai api call using gpt4

@apoorv28goel
Copy link

@LeoLjl thank you. can you look into these various prompt tempalates: https://github.com/oobabooga/text-generation-webui/tree/main/instruction-templates
Because of the OpenOrca models I am interested in this prompt template: https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates/MPT-Chat.yaml
can this type or prompt template be used in autogen? example of this in python script would be useful. thank you,
by the way, the model I am playing around with now is this: https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-AWQ

I was able to get oobabooga openai api working last night but was getting token limit errors 2048 and still need to troubleshoot. Was running text gen webui locally and running autogen locally on wsl2. Running oobaba with openai api flag and in a notebook using

import autogen

config_list = [
    {
        "model": "gpt-4",
        "api_key": "sk-111111111111111111111111111111111111111111111111",
        "api_base": "http://192.168.XX.XX:5001/v1/chat/completions",
    },
]

using model TheBloke_Llama-2-13B-chat-GPTQ.

Was working just fine and worked just like a open ai api call using gpt4

How is the performance ? can you share some examples ?

@bigsk1
Copy link
Collaborator

bigsk1 commented Oct 8, 2023

How is the performance ? can you share some examples ?

good, I'm on a 4090, below is a example to check if working

import requests
import json

# Define the API endpoint and headers
api_base = "http://192.168.1.68:5001/v1"
api_endpoint = f"{api_base}/chat/completions"
headers = {
    "Authorization": "Bearer sk-111111111111111111111111111111111111111111111111", # Dummy key, replace if your oobabooga instance requires a specific key
    "Content-Type": "application/json"
}

# Define the payload (data to send)
payload = {
    "model": "TheBloke_EverythingLM-13B-16K-GPTQ",  # Replace with your specific model name if different
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2021?"},
        {"role": "assistant", "content": "The Atlanta Braves won the World Series in 2021."},
        {"role": "user", "content": "Where was it played?"}
    ]
}

try:
    # Make the API call
    response = requests.post(api_endpoint, headers=headers, json=payload, timeout=20)

    # Check the response
    if response.status_code == 200:
        print("API call successful.")
        print("Response:", json.dumps(response.json(), indent=4))
    else:
        print(f"API call failed. Status code: {response.status_code}")
        print("Error message:", response.text)

except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")
	
API call successful.
Response: {
    "id": "chatcmpl-1696808596275027712",
    "object": "chat.completions",
    "created": 1696808596,
    "model": "TheBloke_EverythingLM-13B-16K-GPTQ",
    "choices": [
        {
            "index": 0,
            "finish_reason": "stop",
            "message": {
                "role": "assistant",
                "content": "It was played at various locations throughout the United States, including Houston, Texas.\n"
            }
        }
    ],
    "usage": {
        "prompt_tokens": 91,
        "completion_tokens": 18,
        "total_tokens": 109
    }
}

in cli you see Output generated in 4.02 seconds (4.73 tokens/s, 19 tokens, context 91, seed 1022663743)
192.168.1.68 - - [08/Oct/2023 16:43:20] "POST /v1/chat/completions HTTP/1.1" 200 -

start oobab with flags --listen --extensions openai

Also here is a sample notebook were a local llm was used https://github.com/bigsk1/autogen/blob/main/notebook/agentchat_stream.ipynb

in the root directory I have OAI_CONFIG_LIST and inside

{
        "model": "gpt-4",
        "api_key": "sk-111111111111111111111111111111111111111111111111",
        "api_base": "http://192.168.1.68:5001/v1/chat/completions"
    },

@TheAnachronism
Copy link

When using OpenAI's models, one can easily provide a weblink to a paper, for example, and the models can access that. I guess that is handled on OpenAI's side, as trying this out with local LLMs, this completely fails.
Is there a version of this for local LLMs, or does this have to be built yet?

@sonichi
Copy link
Contributor

sonichi commented Oct 14, 2023

For new web link that's not in the training data of the OpenAI model, the agent needs to write code to visit the link. Check this example: https://github.com/microsoft/autogen/blob/main/notebook/agentchat_web_info.ipynb

@babycommando
Copy link
Collaborator

wrote a medium article for doing this with oobabooga.

https://babycmd.medium.com/local-llms-and-autogen-an-uprising-of-local-powered-agents-d472f2c3d0e3

@bomsn
Copy link
Collaborator

bomsn commented Nov 4, 2023

I was able to make it work with Google's Vertex AI ( bard, chat-bison..etc ) by creating a FastAPI endpoint that matches OpenAI /chat/completions/ endpoint and takes the same data:

chat = APIRouter(prefix="/vertexai/chat", tags=['Vertex AI Chat'])

@chat.post("/completions/", response_model=ResponseData, summary="Send a chat message to Vertex AI LLM and get a response.")
async def chat_completions_endpoint(request: RequestData, api_key: str = Depends(get_api_key)):
    """
    This function `chat_completions_endpoint` is an API endpoint that accepts POST requests at "/chat/completions/".
    It takes in a JSON request body and an API key from the Authorization header. The JSON request body should contain the following fields:
    - `model`: The name of the Vertex AI model to use for generating chat completions ( only "codechat-bison" and "chat-bison" are supported ).
    - `credentials`: The service account credentials for accessing Vertex AI.
    - `location`: The location of the Vertex AI service (default is 'us-central1').
    - `messages`: A list of chat messages. Each message is a dictionary with 'role' and 'content' fields.
    - `temperature`: The randomness of the generated responses (default is 0).
    - `max_tokens`: The maximum number of tokens in the generated response (default is 2048).

    The function sends a chat message to Vertex AI using the specified model and returns a JSON response containing the generated chat completion.
    The response includes the following fields:
    - `id`: A random ID for the chat completion.
    - `object`: The type of the object, which is "chat.completion".
    - `created`: The timestamp when the chat completion was created.
    - `model`: The name of the Vertex AI model used.
    - `choices`: A list containing the generated chat completion. Each choice is a dictionary with 'index', 'message', and 'finish_reason' fields.
    - `usage`: A dictionary containing the number of tokens used in the prompt, completion, and total.

    If the API key is invalid, the function raises an HTTP 403 error. If any other error occurs during request processing, it raises an HTTP 500 error.
    """
    return 'Match Open AI /chat/completions/ response'

In my config file, I just added something like this:

    {
        "model": "chat-bison",
        "api_key": "my_api_key",
        "api_base": "http://website-or-localhost/vertexai",
        "location": "us-central1",
        "credentials": "Google cloud service account credentials here"
    }

Worked like charm.

@sonichi
Copy link
Contributor

sonichi commented Dec 1, 2023

Can folks interested in this issue review #831 ? Many thanks.

@sonichi sonichi added the roadmap Issues related to roadmap of AutoGen label Dec 1, 2023
@MilutinTokic
Copy link

MilutinTokic commented Jan 30, 2024

You guys here are craizy. I tought that I am the only one in the world trying to defeat Open AI ;) I am currently trying to connect autogen to HugChat. I am trying to conect hugchat.cli to autogen so autogen an use it as Open AI API. Has anyone tried to achive that and if not why not ? That hugchat package doesent uses local LLMs instead it conects to really good space on HuggingFace and you can chose multiple models turn off/on internet access and many other. I can not find anything about using autogen with that hugchat.

@bomsn
Copy link
Collaborator

bomsn commented Jan 30, 2024

@MilutinTokic , you'll just have to be creative with it. Create an API in the middle that will replicate openAI response format, and use that to connect to any other LLM.

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 30, 2024

We are working on local model support: #1345

@MilutinTokic
Copy link

@bomsn I think that it can be achived with langchain. It can be used with autogen and it can be connected to huggingface

@bomsn
Copy link
Collaborator

bomsn commented Jan 31, 2024

@MilutinTokic yes, that's also an option, but I haven't tried it as I only wanted it to work for a single service without a lot of dependencies. But yeah, that could work.

@MilutinTokic
Copy link

MilutinTokic commented Jan 31, 2024

@bomsn but that is only way to use AutoGen for free. That is my goal. I mean I am working on a big projects but I dont want to pay to open AI. What are you wonking on? How you use aoutgen? I am into figuring out how to use autogen for free for a few weeks and this is only that I come up with.

@bomsn
Copy link
Collaborator

bomsn commented Jan 31, 2024

@MilutinTokic, to be honest, you'll have to spend money to work with LLM one way or another. You either going to pay OpenAI, Microsoft, Google..etc, or rent a server to run some open source LLM, or run it locally on your computer which will require a decent GPU to get not so good results. If you're trying to get decent results, you'll have to either pay for the existing APIs or, as said, use a very large open source model which will cost you much more money than you'd pay for OpenAI, Microsoft or Google. However, if it's just for learning and it's nothing serious, you can use quantized open source models on hugging face below 7B params on your computer if you have at least 12GB vRam and still, it would be too slow.

@MilutinTokic
Copy link

@bomsn haha ok I know all that but I acived a lot of things for free so I will this, belive me. I am just starting I know it wouldnt be to fast and to reliable but I dont need it to be I have my methods. I never Paid for any digital/online product that I wanted and I got it eventualy. ;)

@PyroGenesis
Copy link
Collaborator

PyroGenesis commented Jan 31, 2024

Personally, I use FastChat (as detailed in the AutoGen blog) and there are a few more alternatives mentioned in this thread already like LiteLLM, Langchain, Oobabooga's text-generation-webui. You can even roll your own endpoint with FastAPI as @bomsn mentioned.

This is until the local model support PR #1345 is merged which would be pretty helpful.

@Josephrp
Copy link

Josephrp commented Feb 4, 2024

I've a stack available to test various models for various roles , most require custom code and parameters so i'll be using the custom model client to make tests and notebooks i guess :-)

@sonichi
Copy link
Contributor

sonichi commented Feb 4, 2024

@PyroGenesis Thanks for the comment. #1345 has been merged. Looking forward to your further feedback.
Thanks @Josephrp for your feedback too.
cc @olgavrou

@hghalebi
Copy link

@AmirmohammadShakeri

@ekzhu ekzhu removed the roadmap Issues related to roadmap of AutoGen label Mar 13, 2024
@daoxuliu
Copy link

daoxuliu commented Apr 2, 2024

I use Ollama, and adding corresponding 'model' and 'base_url' configurations in OAI_CONFIG_LIST enables local use of autogen. As follows:

json
Copy code
{
"model": "mistral",
"api_key": "ollama",
"base_url": "http://localhost:11434/v1"
}
But there is an issue that I can't invoke the tool call no matter what. Does anyone know what the problem is?

@ekzhu
Copy link
Collaborator

ekzhu commented Apr 2, 2024

Some models don't support tool calls. But have you tried user defined functions? https://microsoft.github.io/autogen/docs/topics/code-execution/user-defined-functions

@daoxuliu
Copy link

daoxuliu commented Apr 3, 2024

Some models don't support tool calls. But have you tried user defined functions? https://microsoft.github.io/autogen/docs/topics/code-execution/user-defined-functions

Thanks for the comments.It's seems like you are right.

@ekzhu
Copy link
Collaborator

ekzhu commented Apr 4, 2024

We now support open source and open weight models: https://microsoft.github.io/autogen/docs/topics/non-openai-models/about-using-nonopenai-models

@ekzhu ekzhu closed this as completed Apr 4, 2024
jackgerrits pushed a commit that referenced this issue Oct 2, 2024
randombet pushed a commit to randombet/autogen that referenced this issue Oct 31, 2024
…icrosoft#46)

* Fix role mapping in GPTAssistantAgent for OpenAI API compatibility

* Update test_gpt_assistant.py

This test ensures proper role mapping for message role types tool and function in GPTAssistantAgent.  It verifies that these roles are correctly converted to 'assistant' before API calls to maintain compatibility with OpenAI's Assistant API.

* fix pre-commit formatting

* Resolve merge conflict

---------

Co-authored-by: Evan David <evandavid@evans-air-2.lan>
Co-authored-by: Chi Wang <4250911+sonichi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

No branches or pull requests