# ChatUnify

This notebook covers how to get started with Unify chat models.

[Unify](https://unify.ai/hub) dynamically routes each query to the best LLM, with support for providers such as OpenAI, MistralAI, Perplexity AI, and Together AI. You can also access all providers individually using a single API key.

You can check out our [live benchmarks](https://unify.ai/benchmarks) to see where the data is coming from!


## Installation

First thing to do is installing the `Unify` package.

In [1]:
!pip install -U unifyai

Collecting unifyai
  Downloading unifyai-0.9.5-py3-none-any.whl.metadata (7.4 kB)
Collecting jsonlines<5.0.0,>=4.0.0 (from unifyai)
  Downloading jsonlines-4.0.0-py3-none-any.whl.metadata (1.6 kB)
Downloading unifyai-0.9.5-py3-none-any.whl (53 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.7/53.7 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading jsonlines-4.0.0-py3-none-any.whl (8.7 kB)
Installing collected packages: jsonlines, unifyai
Successfully installed jsonlines-4.0.0 unifyai-0.9.5

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Environment Setup

Make sure to set the `UNIFY_KEY` environment variable. You can get a key in the [Unify Console](https://console.unify.ai/login).

In [2]:
import os
os.environ["UNIFY_KEY"] = "API_KEY"

## Usage 

Let's take a look at how to use the package now.

The first thing we can do is initialize a model. To configure Unify, pass an endpoint string to `ChatUnify`. You can read more about this in [Unify's docs](https://docs.unify.ai/universal_api/chatbot).

In [3]:
from langchain_community.chat_models import ChatUnify

chat = ChatUnify(model="gpt-4o@openai")

Once we have initialized the model, we can query it with `invoke`

In [4]:
chat.invoke("Hello! How are you?")

AIMessage(content="Hello! I'm just a computer program, so I don't have feelings, but I'm here and ready to help you with whatever you need. How can I assist you today?", additional_kwargs={}, response_metadata={'usage': {'completion_tokens': 34, 'prompt_tokens': 13, 'total_tokens': 47, 'completion_tokens_details': {'reasoning_tokens': 0}, 'cost': 0.000575}, 'model': 'gpt-4o@openai', 'finish_reason': 'stop'}, id='run-10640642-200a-41c6-acc2-f651c0ded4ad-0')

### Single Sign-On

If you don't want the router to select the provider, you can also use our SSO to query endpoints in different providers without making accounts with all of them. For example, all of these are valid endpoints:

In [5]:
chat = ChatUnify(model="llama-3.1-8b-chat@together-ai")
chat = ChatUnify(model="gpt-4o@openai")
chat = ChatUnify(model="mistral-nemo@mistral-ai")

This allows you to quickly switch and test different models and providers. For example, if you are working on an application that uses gpt-4 under the hood, you can use this to query a much cheaper LLM during development and/or testing to reduce costs.

Take a look at the available ones [here](https://unify.ai/benchmarks)!

### Chaining Inputs

Let's build a simple chain that leverages prompt templates now.

We will need to define a prompt template:

In [6]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant that translates English to French."),
        ("human", "Translate this sentence from English to French. {english_text}."),
    ]
)

And then simply build and invoke the resulting chain:

In [7]:
chat = ChatUnify(model="llama-3.1-8b-chat@input-cost")
chain = prompt | chat
chain.invoke({"english_text": "Hello! How are you?"})

AIMessage(content='The translation of the sentence "Hello! How are you?" from English to French is:\n\n"Bonjour ! Comment allez-vous ?"', additional_kwargs={}, response_metadata={'usage': {'completion_tokens': 29, 'prompt_tokens': 60, 'total_tokens': 89, 'completion_tokens_details': None, 'queue_time': 0.002340239999999997, 'prompt_time': 0.01620952, 'completion_time': 0.038666667, 'total_time': 0.054876187, 'cost': 5.32e-06}, 'model': 'llama-3.1-8b-chat@input-cost', 'finish_reason': 'stop'}, id='run-d91ac89d-92e2-41a4-a3a6-6415ebb860a5-0')

### Streaming and optimizing for latency

If you are building an application where responsiveness is key, you most likely want to get a streaming response. On top of that, ideally you would use the provider with the lowest Time to First Token, to reduce the time your users are waiting for a response. Using Unify this would look something like:

In [8]:
chat_ttft = ChatUnify(model="mistral-large@ttft")
for chunk in chat_ttft.stream("What is a large language model?"):
    print(chunk.content, end="")

A large language model is a type of artificial intelligence model designed to understand and generate human-like text based on patterns it has learned from extensive datasets. Here are some key aspects of large language models:

1. **Size**: These models are typically very large, with billions of parameters. The size allows them to capture complex linguistic patterns and generate coherent text.

2. **Training Data**: They are trained on vast amounts of text data from the internet, up until a certain point in time. This data can include books, articles, websites, and more.

3. **Versatility**: Large language models can perform a wide range of tasks, such as translating languages, summarizing text, answering questions, generating code, and more, often without being specifically trained for each task.

4. **Context Understanding**: They can understand and generate text based on the context provided. However, they don't have personal experiences, feelings, or consciousness.

5. **Limitatio

### Batching and Lowest Output Cost

On the other hand, maybe you are building an AI service that processes inputs in batches to generate content. In this case, you may want to get the cheaper provider for longer outputs. Let's see how you can do this using `batch` and dynamic routing!

In [9]:
messages = [
    "Write a blog post about Rome",
    "Write a blog post about Paris"
]

chat_cheapest = ChatUnify(model="llama-3.1-8b-chat@output-cost")
chat_cheapest.batch(messages)

[AIMessage(content="**Discover the Eternal City: A Guide to Rome**\n\nRome, the capital of Italy, is a city that embodies the very essence of history, culture, and beauty. With its rich past, stunning architecture, and vibrant atmosphere, Rome is a destination that has captivated the hearts of travelers for centuries. In this blog post, we'll delve into the must-see sights, hidden gems, and insider tips to help you make the most of your Roman adventure.\n\n**Must-see Sights**\n\nRome is a treasure trove of iconic landmarks, each one more breathtaking than the last. Here are a few of the top attractions to add to your itinerary:\n\n*   **The Colosseum**: This ancient amphitheater is one of Rome's most recognizable symbols. Take a guided tour to learn about the gladiators who once fought here and the engineering feats that made this massive structure possible.\n*   **The Vatican City**: The Vatican is home to numerous iconic landmarks, including St. Peter's Basilica, the Sistine Chapel, 

### Async calls and Lowest Input Cost

Last but not least, you can also run request asynchronously. For tasks like long document summarization, optimizing for input costs is crucial. Unify's dynamic router can do this too!

In [10]:
messages = [
    "Summarize this in 10 words or less. OpenAI is a U.S. based artificial intelligence "
    "(AI) research organization founded in December 2015, researching artificial intelligence "
    "with the goal of developing 'safe and beneficial' artificial general intelligence, "
    "which it defines as 'highly autonomous systems that outperform humans at most economically "
    "valuable work'. As one of the leading organizations of the AI spring, it has developed "
    "several large language models, advanced image generation models, and previously, released "
    "open-source models. Its release of ChatGPT has been credited with starting the AI spring", 

    "Summarize this in 10 words or less. Mistral AI is a French company selling"
    " artificial intelligence (AI) products. "
    "It was founded in April 2023 by previous employees of Meta Platforms and Google DeepMind. "
    "The company raised €385 million in October 2023 and in December 2023 it was valued at "
    "more than $2 billion. It produces open source large language models, citing the "
    "foundational importance of open-source software, and as a response to proprietary models. "
    "As of March 2024, two models have been published and are available as weights. "
    "Three more models, Small, Medium and Large, are available via API only.", 

    "Summarize this in 10 words or less. LLaMA (Large Language Model Meta AI) is a family of"
    " autoregressive large language models (LLMs), "
    "released by Meta AI starting in February 2023. For the first version of LLaMA, four model sizes "
    "were trained: 7, 13, 33, and 65 billion parameters. LLaMA's developers reported that the 13B "
    "parameter model's performance on most NLP benchmarks exceeded that of the much larger GPT-3 "
    "(with 175B parameters) and that the largest model was competitive with state of the art models "
    "such as PaLM and Chinchilla. Whereas the most powerful LLMs have generally been accessible only "
    "through limited APIs (if at all), Meta released LLaMA's model weights to the research community "
    "under noncommercial license. Within a week of LLaMA's release, its weights were leaked to the "
    "public on 4chan via BitTorrent."
]

chat_model = ChatUnify(model="mistral-large@input-cost")


await chat_model.abatch(messages)

[AIMessage(content='OpenAI develops safe, beneficial AI; released ChatGPT.', additional_kwargs={}, response_metadata={'usage': {'completion_tokens': 14, 'prompt_tokens': 127, 'total_tokens': 141, 'completion_tokens_details': None, 'cost': 0.000338}, 'model': 'mistral-large@input-cost', 'finish_reason': 'stop'}, id='run-b36074ff-9550-440e-be93-17a029673fcd-0'),
 AIMessage(content='French AI startup Mistral, valued $2 billion, offers open-source language models.', additional_kwargs={}, response_metadata={'usage': {'completion_tokens': 18, 'prompt_tokens': 152, 'total_tokens': 170, 'completion_tokens_details': None, 'cost': 0.00041200000000000004}, 'model': 'mistral-large@input-cost', 'finish_reason': 'stop'}, id='run-fc7b411b-a444-4e3a-9bb5-99479331c625-0'),
 AIMessage(content="LLaMA, Meta's language models, outperform GPT-3; weights leaked.", additional_kwargs={}, response_metadata={'usage': {'completion_tokens': 21, 'prompt_tokens': 221, 'total_tokens': 242, 'completion_tokens_details'