Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to LiteLLM, to get Azure, Anthropic, and many others out of the box #10

Merged
merged 2 commits into from
Aug 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ chainlit.md
build/
langstream.egg-info/
.chroma
litellm_uuid.txt

# Generated markdown files from jupyter notebooks
docs/docs/examples/*.md
Expand Down
51 changes: 51 additions & 0 deletions docs/docs/llms/lite_llm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
sidebar_position: 5
---

# Lite LLM (Azure, Anthropic, etc)

[Lite LLM](https://github.com/BerriAI/litellm) is a library that wraps the API of many other LLMs APIs such as Azure, Anthropic, Cohere, HuggingFace, Replicate, and so on, standardizing all of them to use the same API surface as OpenAI Chat Completion API. LangStream provides a wrapper to LiteLLM so you can build your streams on most of the LLMs available on the market. To check all possible models, take a look at [Lite LLM Docs](https://docs.litellm.ai/docs/completion/supported).

To use the multiple LLMs provided by Lite LLM, you will need to have in your environment the API keys depending on which model you are going to use of course, for example:

```
export OPENAI_API_KEY=<your key here>
export AZURE_API_BASE=<your key here>
export ANTHROPIC_API_KEY=<your key here>
export HUGGINGFACE_API_KEY=<your key here>
```

Then, you should be able to use the [`LiteLLMChatStream`](pathname:///reference/langstream/contrib/index.html#langstream.contrib.LiteLLMChatStream), which has basically the same interface as as the [`OpenAIChatStream`](pathname:///reference/langstream/contrib/index.html#langstream.contrib.OpenAIChatStream), check it out:

## Chat Completion

```python
from langstream import Stream, join_final_output
from langstream.contrib import LiteLLMChatStream, LiteLLMChatMessage, LiteLLMChatDelta

recipe_stream: Stream[str, str] = LiteLLMChatStream[str, LiteLLMChatDelta](
"RecipeStream",
lambda recipe_name: [
LiteLLMChatMessage(
role="system",
content="You are ChefLiteLLM, an assistant bot trained on all culinary knowledge of world's most proeminant Michelin Chefs",
),
LiteLLMChatMessage(
role="user",
content=f"Hello, could you write me a recipe for {recipe_name}?",
),
],
model="gpt-3.5-turbo",
).map(lambda delta: delta.content)

await join_final_output(recipe_stream("instant noodles"))
#=> "Of course! Here's a simple and delicious recipe for instant noodles:\n\nIngredients:\n- 1 packet of instant noodles (your choice of flavor)\n- 2 cups of water\n- 1 tablespoon of vegetable oil\n- 1 small onion, thinly sliced\n- 1 clove of garlic, minced\n- 1 small carrot, julienned\n- 1/2 cup of sliced mushrooms\n- 1/2 cup of shredded cabbage\n- 2 tablespoons of soy sauce\n- 1 teaspoon of sesame oil\n- Optional toppings: sliced green onions, boiled egg, cooked chicken or shrimp, chili flakes\n\nInstructions:\n1. In a medium-sized pot, bring the water to a boil. Add the instant noodles and cook according to the package instructions until they are al dente. Drain and set aside.\n\n2. In the same pot, heat the vegetable oil over medium heat. Add the sliced onion and minced garlic, and sauté until they become fragrant and slightly caramelized.\n\n3. Add the julienned carrot, sliced mushrooms, and shredded cabbage to the pot. Stir-fry for a few minutes until the vegetables are slightly softened.\n\n4. Add the cooked instant noodles to the pot and toss them with the vegetables.\n\n5. In a small bowl, mix together the soy sauce and sesame oil. Pour this mixture over the noodles and vegetables, and toss everything together until well combined.\n\n6. Cook for an additional 2-3 minutes, stirring occasionally, to allow the flavors to meld together.\n\n7. Remove the pot from heat and divide the noodles into serving bowls. Top with your desired toppings such as sliced green onions, boiled egg, cooked chicken or shrimp, and chili flakes.\n\n8. Serve the instant noodles hot and enjoy!\n\nFeel free to customize this recipe by adding your favorite vegetables or protein. Enjoy your homemade instant noodles!"
```

Analogous to OpenAI, it takes [`LiteLLMChatMessage`](pathname:///reference/langstream/contrib/index.html#langstream.contrib.LiteLLMChatMessage)s and produces [`LiteLLMChatDelta`](pathname:///reference/langstream/contrib/index.html#langstream.contrib.LiteLLMChatDelta)s, it can also take `functions` as an argument for function calling, but keep in mind not all models support it and it might simply be ignored, be sure to check [Lite LLM Docs](https://docs.litellm.ai/docs/completion/supported) for the model you are using.

Another caveat is that, by default, LangStream tries to stream all outputs, but not all models and APIs support streaming, so you might need to disable it with `stream=False` or they might throw exceptions, again be sure to check which models support it or not.

We hope that with that you will be able to use the best LLM available to you, or even mix and match on the middle of your streams depending on the need, or falling back if one LLM is not generating the right answer, and so on.

Keep on reading the next part of the docs!
2 changes: 1 addition & 1 deletion docs/docs/llms/memory.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
sidebar_position: 5
sidebar_position: 6
---

# Adding Memory
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/llms/zero_temperature.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
sidebar_position: 6
sidebar_position: 7
---

# Zero Temperature
Expand Down
8 changes: 8 additions & 0 deletions langstream/contrib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,19 @@
OpenAIChatDelta,
)
from langstream.contrib.llms.gpt4all_stream import GPT4AllStream
from langstream.contrib.llms.lite_llm import (
LiteLLMChatStream,
LiteLLMChatMessage,
LiteLLMChatDelta,
)

__all__ = (
"OpenAICompletionStream",
"OpenAIChatStream",
"OpenAIChatMessage",
"OpenAIChatDelta",
"GPT4AllStream",
"LiteLLMChatStream",
"LiteLLMChatMessage",
"LiteLLMChatDelta",
)
304 changes: 304 additions & 0 deletions langstream/contrib/llms/lite_llm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,304 @@
import asyncio
from dataclasses import dataclass
import itertools
from types import GeneratorType
from typing import (
Any,
AsyncGenerator,
Callable,
Dict,
List,
Literal,
Optional,
TypeVar,
Union,
cast,
)

import litellm
from colorama import Fore
from retry import retry

from langstream.core.stream import Stream, StreamOutput

T = TypeVar("T")
U = TypeVar("U")
V = TypeVar("V")


@dataclass
class LiteLLMChatMessage:
"""
LiteLLMChatMessage is a data class that represents a chat message for building `LiteLLMChatStream` prompt.

Attributes
----------
role : Literal["system", "user", "assistant", "function"]
The role of who sent this message in the chat, can be one of `"system"`, `"user"`, `"assistant"` or "function"

name: Optional[str]
The name is used for when `role` is `"function"`, it represents the name of the function that was called

content : str
A string with the full content of what the given role said

"""

role: Literal["system", "user", "assistant", "function"]
content: str
name: Optional[str] = None

def to_dict(self):
return {k: v for k, v in self.__dict__.items() if v is not None}


@dataclass
class LiteLLMChatDelta:
"""
LiteLLMChatDelta is a data class that represents the output of an `LiteLLMChatStream`.

Attributes
----------
role : Optional[Literal["assistant", "function"]]
The role of the output message, the first message will have the role, while
the subsequent partial content output ones will have the role as `None`.
For now the only possible values it will have is either None or `"assistant"`

name: Optional[str]
The name is used for when `role` is `"function"`, it represents the name of the function that was called

content : str
A string with the partial content being outputted by the LLM, this generally
translate to each token the LLM is producing

"""

role: Optional[Literal["assistant", "function"]]
content: str
name: Optional[str] = None

def __stream_debug__(self):
name = ""
if self.name:
name = f" {self.name}"
if self.role is not None:
print(f"{Fore.YELLOW}{self.role.capitalize()}{name}:{Fore.RESET} ", end="")
print(
self.content,
end="",
flush=True,
)


class LiteLLMChatStream(Stream[T, U]):
"""
`LiteLLMChatStream` is a wrapper for [LiteLLM](https://github.com/BerriAI/litellm), which gives you access to OpenAI, Azure OpenAI, Anthropic, Google VertexAI,
HuggingFace, Replicate, A21, Cohere and a bunch other LLMs all the the same time, all while keeping the standard OpenAI chat interface. Check it out the completion API
and the available models [on their docs](https://docs.litellm.ai/docs/).

Be aware not all models support streaming, and LangStream by default tries to stream everything. So if the model you choose is not working, you might need to set `stream=False`,
when calling the `LiteLLMChatStream`

The `LiteLLMChatStream` takes a lambda function that should return a list of `LiteLLMChatMessage` for the assistant to reply, it is stateless, so it doesn't keep
memory of the past chat messages, you will have to handle the memory yourself, you can [follow this guide to get started on memory](https://rogeriochaves.github.io/langstream/docs/llms/memory).

The `LiteLLMChatStream` also produces `LiteLLMChatDelta` as output, one per token, it contains the `role` that started the output, and then subsequent `content` updates.
If you want the final content as a string, you will need to use the `.content` property from the delta and accumulate it for the final result.

To use this stream you will need to have the proper environment keys available depending on the model you are using, like `OPENAI_API_KEY`, `COHERE_API_KEY`, `HUGGINGFACE_API_KEY`, etc,
check it out more details on [LiteLLM docs](https://docs.litellm.ai/docs/completion/supported)

Example
-------

>>> from langstream import Stream, join_final_output
>>> from langstream.contrib import LiteLLMChatStream, LiteLLMChatMessage, LiteLLMChatDelta
>>> import asyncio
...
>>> async def example():
... recipe_stream: Stream[str, str] = LiteLLMChatStream[str, LiteLLMChatDelta](
... "RecipeStream",
... lambda recipe_name: [
... LiteLLMChatMessage(
... role="system",
... content="You are Chef Claude, an assistant bot trained on all culinary knowledge of world's most proeminant Michelin Chefs",
... ),
... LiteLLMChatMessage(
... role="user",
... content=f"Hello, could you write me a recipe for {recipe_name}?",
... ),
... ],
... model="claude-2",
... max_tokens=10,
... ).map(lambda delta: delta.content)
...
... return await join_final_output(recipe_stream("instant noodles"))
...
>>> asyncio.run(example()) # doctest:+SKIP
"Of course! Here's a simple and delicious recipe"

You can also pass LiteLLM function schemas in the `function` argument with all parameter definitions, just like for OpenAI model, but be aware that not all models support it.
Once you pass a `function` param, the model may then produce a `function` role `LiteLLMChatDelta` as output,
using your function, with the `content` field as a json which you can parse to call an actual function.

Take a look [at our OpenAI guide](https://rogeriochaves.github.io/langstream/docs/llms/open_ai_functions) to learn more about LLM function calls in LangStream, it works the same with LiteLLM.

Function Call Example
---------------------

>>> from langstream import Stream, collect_final_output
>>> from langstream.contrib import LiteLLMChatStream, LiteLLMChatMessage, LiteLLMChatDelta
>>> from typing import Literal, Union, Dict
>>> import asyncio
...
>>> async def example():
... def get_current_weather(
... location: str, format: Literal["celsius", "fahrenheit"] = "celsius"
... ) -> Dict[str, str]:
... return {
... "location": location,
... "forecast": "sunny",
... "temperature": "25 C" if format == "celsius" else "77 F",
... }
...
... stream : Stream[str, Union[LiteLLMChatDelta, Dict[str, str]]] = LiteLLMChatStream[str, Union[LiteLLMChatDelta, Dict[str, str]]](
... "WeatherStream",
... lambda user_input: [
... LiteLLMChatMessage(role="user", content=user_input),
... ],
... model="gpt-3.5-turbo",
... functions=[
... {
... "name": "get_current_weather",
... "description": "Gets the current weather in a given location, use this function for any questions related to the weather",
... "parameters": {
... "type": "object",
... "properties": {
... "location": {
... "description": "The city to get the weather, e.g. San Francisco. Guess the location from user messages",
... "type": "string",
... },
... "format": {
... "description": "A string with the full content of what the given role said",
... "type": "string",
... "enum": ("celsius", "fahrenheit"),
... },
... },
... "required": ["location"],
... },
... }
... ],
... temperature=0,
... ).map(
... lambda delta: get_current_weather(**json.loads(delta.content))
... if delta.role == "function" and delta.name == "get_current_weather"
... else delta
... )
...
... return await collect_final_output(stream("how is the weather today in Rio de Janeiro?"))
...
>>> asyncio.run(example()) # doctest:+SKIP
[{'location': 'Rio de Janeiro', 'forecast': 'sunny', 'temperature': '25 C'}]

"""

def __init__(
self: "LiteLLMChatStream[T, LiteLLMChatDelta]",
name: str,
call: Callable[
[T],
List[LiteLLMChatMessage],
],
model: str,
custom_llm_provider: Optional[str] = None,
stream: bool = True,
functions: Optional[List[Dict[str, Any]]] = None,
function_call: Optional[Union[Literal["none", "auto"], Dict[str, Any]]] = None,
temperature: Optional[float] = 0,
max_tokens: Optional[int] = None,
timeout: int = 5,
retries: int = 3,
) -> None:
async def chat_completion(
messages: List[LiteLLMChatMessage],
) -> AsyncGenerator[StreamOutput[LiteLLMChatDelta], None]:
loop = asyncio.get_event_loop()

@retry(tries=retries)
def get_completions():
function_kwargs = {}
if functions is not None:
function_kwargs["functions"] = functions
if function_call is not None:
function_kwargs["function_call"] = function_call

return litellm.completion(
request_timeout=timeout,
model=model,
custom_llm_provider=custom_llm_provider,
messages=[m.to_dict() for m in messages],
temperature=temperature, # type: ignore (why is their type int?)
stream=stream,
max_tokens=max_tokens, # type: ignore (why is their type float?)
**function_kwargs,
)

completions = await loop.run_in_executor(None, get_completions)

pending_function_call: Optional[LiteLLMChatDelta] = None

completions = (
completions if isinstance(completions, GeneratorType) else [completions]
)
for output in completions:
output = cast(dict, output)
if "choices" not in output:
continue

if len(output["choices"]) == 0:
continue

delta = (
output["choices"][0]["message"]
if "delta" not in output["choices"][0]
else output["choices"][0]["delta"]
)

if "function_call" in delta:
role = delta["role"] if "role" in delta else None
function_name: Optional[str] = delta["function_call"].get("name")
function_arguments: Optional[str] = delta["function_call"].get(
"arguments"
)

if function_name is not None:
pending_function_call = LiteLLMChatDelta(
role="function",
name=function_name,
content=function_arguments or "",
)
elif (
pending_function_call is not None
and function_arguments is not None
):
pending_function_call.content += function_arguments
elif "content" in delta:
role = delta["role"] if "role" in delta else None
yield self._output_wrap(
LiteLLMChatDelta(
role=role,
content=delta["content"],
)
)
else:
if pending_function_call:
yield self._output_wrap(pending_function_call)
pending_function_call = None
if pending_function_call:
yield self._output_wrap(pending_function_call)
pending_function_call = None

super().__init__(
name,
lambda input: cast(AsyncGenerator[U, None], chat_completion(call(input))),
)
Loading
Loading