-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate opensource LLMs into autogen #46
Comments
The current goal is to support open-source LLMs in LLM-based agents with as little modification as possible. The core code that needs modification is L41 in responses = oai.ChatCompletion.create(messages=self._conversations[sender.name], **self._config) There are two ways to do so. One is to create a utility function that checks the model name and call the regarding inference APIs. openai_models = ["gpt-4", "gpt-3"]
open_source_models = ["llama", "alpaca"]
google_models = ["bard"]
def create(model, **kwargs):
if model in openai_models:
# generate response using "oai.ChatCompletion.create"
pass
elif model in open_source_models:
# generate response using related Hugging Face APIs
pass
elif model in google_models:
# generate response using related Google APIs
pass This is quick to implement and easy to use. However it may introduce some unsolicited problems when developing other applications. Another is to inherit the class OpensourceCompletion(autogen.oai.Completion):
@classmethod
def create(**kwargs):
# Make completion using related APIs
pass This method is not consistent with folder name I am researching what other libraries are handling with this situation and find potential solutions, e.g. LangChain, JARVIS. |
The naming consistency is not an issue because we can interpret it. For example, |
Related issue: microsoft/FLAML#1037 #15 microsoft/FLAML#1017 |
FastChat is a useful tool as a local drop-in replacement for OpenAI APIs. This greatly simplifies integration.
|
Thanks for creating a task list. Could you edit the "tasks" in the main body of the issue? There you can track sub issues. For example, the two items you listed here correspond to two existing issues microsoft/FLAML#1017 and microsoft/FLAML#1037. You can comment on those existing issues about the approach you'll take to solve them. Also add #15 to the task list. |
Hi @sonichi @LeoLjl I’m the maintainer of LiteLLM (abstraction to call 100+ LLMs)- we allow you to create a proxy server to call 100+ LLMs, and I think it can solve your problem (I'd love your feedback if it does not) Try it here: https://docs.litellm.ai/docs/proxy_server Using LiteLLM Proxy Serverimport openai
openai.api_base = "http://0.0.0.0:8000/" # proxy url
print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}])) Creating a proxy serverOllama models $ litellm --model ollama/llama2 --api_base http://localhost:11434 Hugging Face Models $ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
$ litellm --model claude-instant-1 Anthropic $ export ANTHROPIC_API_KEY=my-api-key
$ litellm --model claude-instant-1 Palm $ export PALM_API_KEY=my-palm-key
$ litellm --model palm/chat-bison |
@LeoLjl There are projects like textgen webui that allows openai api adapter to local llms. All of them simply redirect the openai api without introducing new code or modifying code |
Hi @yhyu13, thanks for the valuable information. This is really helpful. I will include it in the next docs update! |
Hi @ishaan-jaff, liteLLM looks like just the solution we need. I will try it out and include it in the new docs update! |
Hi , |
@TheCompAce I can help you on our discord: https://discord.com/invite/wuPM9dRgDw |
I was a idiot to fix it I had to set the api route to "/chat/completions" to match openAIs, so I used `from flask import Flask, request, jsonify app = Flask(name) Swagger UI setupSWAGGER_URL = '/swagger' @app.route('/chat/completions', methods=['POST'])
if name == 'main': |
This looks quite promising. Do you have any example on it? TBH GPT-4 token is pricey, and Code Llama can be a nice choice |
I have code to get it working above, but seemed to work at first, then started getting token of the limit errors. So maybe I can work on it some more another time. |
can you please see if this api integration will work with TheBloke models if possible like this one TheBloke/Mistral-7B-OpenOrca-GPTQ https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GPTQ maybe FastChat+vLLM server in the backend with openai api and example of python code to show how to connect to this using autogen in python script. |
Hi @fxtoofaan, I think this will solve your question. https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs and https://github.com/microsoft/autogen/blob/osllm/notebook/open_source_language_model_example.ipynb |
@LeoLjl thank you. can you look into these various prompt tempalates: Because of the OpenOrca models I am interested in this prompt template: https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates/MPT-Chat.yaml can this type or prompt template be used in autogen? example of this in python script would be useful. thank you, by the way, the model I am playing around with now is this: https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-AWQ |
I was able to get oobabooga openai api working last night but was getting token limit errors 2048 and still need to troubleshoot. Was running text gen webui locally and running autogen locally on wsl2. Running oobaba with openai api flag and in a notebook using import autogen
config_list = [
{
"model": "gpt-4",
"api_key": "sk-111111111111111111111111111111111111111111111111",
"api_base": "http://192.168.XX.XX:5001/v1/chat/completions",
},
] using model TheBloke_Llama-2-13B-chat-GPTQ. Was working just fine and worked just like a open ai api call using gpt4 |
How is the performance ? can you share some examples ? |
good, I'm on a 4090, below is a example to check if working import requests
import json
# Define the API endpoint and headers
api_base = "http://192.168.1.68:5001/v1"
api_endpoint = f"{api_base}/chat/completions"
headers = {
"Authorization": "Bearer sk-111111111111111111111111111111111111111111111111", # Dummy key, replace if your oobabooga instance requires a specific key
"Content-Type": "application/json"
}
# Define the payload (data to send)
payload = {
"model": "TheBloke_EverythingLM-13B-16K-GPTQ", # Replace with your specific model name if different
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2021?"},
{"role": "assistant", "content": "The Atlanta Braves won the World Series in 2021."},
{"role": "user", "content": "Where was it played?"}
]
}
try:
# Make the API call
response = requests.post(api_endpoint, headers=headers, json=payload, timeout=20)
# Check the response
if response.status_code == 200:
print("API call successful.")
print("Response:", json.dumps(response.json(), indent=4))
else:
print(f"API call failed. Status code: {response.status_code}")
print("Error message:", response.text)
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
API call successful.
Response: {
"id": "chatcmpl-1696808596275027712",
"object": "chat.completions",
"created": 1696808596,
"model": "TheBloke_EverythingLM-13B-16K-GPTQ",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "It was played at various locations throughout the United States, including Houston, Texas.\n"
}
}
],
"usage": {
"prompt_tokens": 91,
"completion_tokens": 18,
"total_tokens": 109
}
} in cli you see Output generated in 4.02 seconds (4.73 tokens/s, 19 tokens, context 91, seed 1022663743) start oobab with flags --listen --extensions openai Also here is a sample notebook were a local llm was used https://github.com/bigsk1/autogen/blob/main/notebook/agentchat_stream.ipynb in the root directory I have OAI_CONFIG_LIST and inside {
"model": "gpt-4",
"api_key": "sk-111111111111111111111111111111111111111111111111",
"api_base": "http://192.168.1.68:5001/v1/chat/completions"
}, |
When using OpenAI's models, one can easily provide a weblink to a paper, for example, and the models can access that. I guess that is handled on OpenAI's side, as trying this out with local LLMs, this completely fails. |
For new web link that's not in the training data of the OpenAI model, the agent needs to write code to visit the link. Check this example: https://github.com/microsoft/autogen/blob/main/notebook/agentchat_web_info.ipynb |
wrote a medium article for doing this with oobabooga. https://babycmd.medium.com/local-llms-and-autogen-an-uprising-of-local-powered-agents-d472f2c3d0e3 |
I was able to make it work with Google's Vertex AI ( bard, chat-bison..etc ) by creating a FastAPI endpoint that matches OpenAI /chat/completions/ endpoint and takes the same data: chat = APIRouter(prefix="/vertexai/chat", tags=['Vertex AI Chat'])
@chat.post("/completions/", response_model=ResponseData, summary="Send a chat message to Vertex AI LLM and get a response.")
async def chat_completions_endpoint(request: RequestData, api_key: str = Depends(get_api_key)):
"""
This function `chat_completions_endpoint` is an API endpoint that accepts POST requests at "/chat/completions/".
It takes in a JSON request body and an API key from the Authorization header. The JSON request body should contain the following fields:
- `model`: The name of the Vertex AI model to use for generating chat completions ( only "codechat-bison" and "chat-bison" are supported ).
- `credentials`: The service account credentials for accessing Vertex AI.
- `location`: The location of the Vertex AI service (default is 'us-central1').
- `messages`: A list of chat messages. Each message is a dictionary with 'role' and 'content' fields.
- `temperature`: The randomness of the generated responses (default is 0).
- `max_tokens`: The maximum number of tokens in the generated response (default is 2048).
The function sends a chat message to Vertex AI using the specified model and returns a JSON response containing the generated chat completion.
The response includes the following fields:
- `id`: A random ID for the chat completion.
- `object`: The type of the object, which is "chat.completion".
- `created`: The timestamp when the chat completion was created.
- `model`: The name of the Vertex AI model used.
- `choices`: A list containing the generated chat completion. Each choice is a dictionary with 'index', 'message', and 'finish_reason' fields.
- `usage`: A dictionary containing the number of tokens used in the prompt, completion, and total.
If the API key is invalid, the function raises an HTTP 403 error. If any other error occurs during request processing, it raises an HTTP 500 error.
"""
return 'Match Open AI /chat/completions/ response' In my config file, I just added something like this:
Worked like charm. |
Can folks interested in this issue review #831 ? Many thanks. |
You guys here are craizy. I tought that I am the only one in the world trying to defeat Open AI ;) I am currently trying to connect autogen to HugChat. I am trying to conect hugchat.cli to autogen so autogen an use it as Open AI API. Has anyone tried to achive that and if not why not ? That hugchat package doesent uses local LLMs instead it conects to really good space on HuggingFace and you can chose multiple models turn off/on internet access and many other. I can not find anything about using autogen with that hugchat. |
@MilutinTokic , you'll just have to be creative with it. Create an API in the middle that will replicate openAI response format, and use that to connect to any other LLM. |
We are working on local model support: #1345 |
@bomsn I think that it can be achived with langchain. It can be used with autogen and it can be connected to huggingface |
@MilutinTokic yes, that's also an option, but I haven't tried it as I only wanted it to work for a single service without a lot of dependencies. But yeah, that could work. |
@bomsn but that is only way to use AutoGen for free. That is my goal. I mean I am working on a big projects but I dont want to pay to open AI. What are you wonking on? How you use aoutgen? I am into figuring out how to use autogen for free for a few weeks and this is only that I come up with. |
@MilutinTokic, to be honest, you'll have to spend money to work with LLM one way or another. You either going to pay OpenAI, Microsoft, Google..etc, or rent a server to run some open source LLM, or run it locally on your computer which will require a decent GPU to get not so good results. If you're trying to get decent results, you'll have to either pay for the existing APIs or, as said, use a very large open source model which will cost you much more money than you'd pay for OpenAI, Microsoft or Google. However, if it's just for learning and it's nothing serious, you can use quantized open source models on hugging face below 7B params on your computer if you have at least 12GB vRam and still, it would be too slow. |
@bomsn haha ok I know all that but I acived a lot of things for free so I will this, belive me. I am just starting I know it wouldnt be to fast and to reliable but I dont need it to be I have my methods. I never Paid for any digital/online product that I wanted and I got it eventualy. ;) |
Personally, I use FastChat (as detailed in the AutoGen blog) and there are a few more alternatives mentioned in this thread already like LiteLLM, Langchain, Oobabooga's text-generation-webui. You can even roll your own endpoint with FastAPI as @bomsn mentioned. This is until the local model support PR #1345 is merged which would be pretty helpful. |
I've a stack available to test various models for various roles , most require custom code and parameters so i'll be using the custom model client to make tests and notebooks i guess :-) |
@PyroGenesis Thanks for the comment. #1345 has been merged. Looking forward to your further feedback. |
I use Ollama, and adding corresponding 'model' and 'base_url' configurations in OAI_CONFIG_LIST enables local use of autogen. As follows: json |
Some models don't support tool calls. But have you tried user defined functions? https://microsoft.github.io/autogen/docs/topics/code-execution/user-defined-functions |
Thanks for the comments.It's seems like you are right. |
We now support open source and open weight models: https://microsoft.github.io/autogen/docs/topics/non-openai-models/about-using-nonopenai-models |
…icrosoft#46) * Fix role mapping in GPTAssistantAgent for OpenAI API compatibility * Update test_gpt_assistant.py This test ensures proper role mapping for message role types tool and function in GPTAssistantAgent. It verifies that these roles are correctly converted to 'assistant' before API calls to maintain compatibility with OpenAI's Assistant API. * fix pre-commit formatting * Resolve merge conflict --------- Co-authored-by: Evan David <evandavid@evans-air-2.lan> Co-authored-by: Chi Wang <4250911+sonichi@users.noreply.github.com>
Currently, autogen.oai only supports OpenAI models. I am planning to integrate open source LLMs into autogen. This would require Hugging Face transformers library.
Some of the open source LLMs I have in mind include LLaMa, Alpaca, Falcon, Vicuna, WizardLM, Starcoder, Guanaco. Most of their inference codes are integrated into Hugging Face transformers library, which can be unified into something like this.
Tasks
The text was updated successfully, but these errors were encountered: