# ChatUnify

This notebook covers how to get started with Unify chat models.

[Unify](https://unify.ai/hub) dynamically routes each query to the best LLM, with support for providers such as OpenAI, MistralAI, Perplexity AI, and Together AI. You can also access all providers individually using a single API key.

You can check out our [live benchmarks](https://unify.ai/hub/mixtral-8x7b-instruct-v0) to see where the data is coming from!


## Installation

First thing to do is installing the `LangChain x Unify` and `LangChain Core` packages.

In [None]:
!pip install -U langchain-unify langchain-core

## Environment Setup

Make sure to set the `UNIFY_API_KEY` environment variable. You can get a key in the [Unify Console](https://console.unify.ai/login).

In [2]:
import os
os.environ["UNIFY_API_KEY"] = "YOUR_API_KEY"

## Usage 

Let's take a look at how to use the package now.

The first thing we can do is initialize a model. To configure Unify's router, pass an endpoint string to `ChatUnify`. You can read more about this in [Unify's docs](https://unify.ai/docs/hub/concepts/runtime_routing.html).

In this case, we will use the cheapest endpoint for `llama2-70b` in terms of input cost.

In [3]:
from langchain_unify.chat_models import ChatUnify

chat = ChatUnify(model="llama-2-70b-chat@input-cost")

Once we have initialized the model, we can query it with `invoke`

In [5]:
chat.invoke("Hello! How are you?")

AIMessage(content="  Hello! I'm doing well, thanks for asking. I'm a large language model, so I don't have feelings like humans do, but I'm always happy to chat with you. How about you? How's your day going?", response_metadata={'usage': {'completion_tokens': 54, 'prompt_tokens': 14, 'total_tokens': 68, 'cost': 0.000111}, 'model': 'llama-2-70b-chat@input-cost', 'finish_reason': 'stop'})

### Single Sign-On

If you don't want the router to select the provider, you can also use our SSO to query endpoints in different providers without making accounts with all of them. For example, all of these are valid endpoints:

In [6]:
chat = ChatUnify(model="llama-2-70b-chat@together-ai")
chat = ChatUnify(model="gpt-3.5-turbo@openai")
chat = ChatUnify(model="mixtral-8x7b-instruct-v0.1@mistral-ai")

This allows you to quickly switch and test different models and providers. For example, if you are working on an application that uses gpt-4 under the hood, you can use this to query a much cheaper LLM during development and/or testing to reduce costs.

Take a look at the available ones [here](https://unify.ai/hub)!

### Chaining Inputs

Let's build a simple chain that leverages prompt templates now.

We will need to define a prompt template:

In [7]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant that translates English to French."),
        ("human", "Translate this sentence from English to French. {english_text}."),
    ]
)

And then simply build and invoke the resulting chain:

In [9]:
from langchain_unify.chat_models import ChatUnify
chat = ChatUnify(model="llama-2-70b-chat@input-cost")
chain = prompt | chat
chain.invoke({"english_text": "Hello! How are you?"})

AIMessage(content='  Sure, I\'d be happy to help! Here\'s the translation of "Hello! How are you?" from English to French:\n\nBonjour! Comment allez-vous?\n\nHere\'s a breakdown of the translation:\n\n* "Hello" becomes "Bonjour" in French.\n* "How are you?" becomes "Comment allez-vous?" in French. The word "comment" means "how," and "allez-vous" is a phrase that means "are you."\n\nSo, the full translation is "Bonjour! Comment allez-vous?" which means "Hello! How are you?" in French.', response_metadata={'usage': {'completion_tokens': 145, 'prompt_tokens': 48, 'total_tokens': 193, 'cost': 0.00030429999999999997}, 'model': 'llama-2-70b-chat@input-cost', 'finish_reason': 'stop'})

### Streaming and optimizing for latency

If you are building an application where responsiveness is key, you most likely want to get a streaming response. On top of that, ideally you would use the provider with the lowest Time to First Token, to reduce the time your users are waiting for a response. Using Unify this would look something like:

In [12]:
chat_ttft = ChatUnify(model="mistral-7b-instruct-v0.2@ttft")
for chunk in chat_ttft.stream("What is a large language model?"):
    print(chunk.content, end="")

A large language model is a type of artificial intelligence (AI) model that is designed to understand and generate human-like text. It is called "large" because it requires a substantial amount of data and computational resources to train, which in turn enables it to capture a broad range of linguistic patterns and relationships.

Language models learn to predict the next word in a sequence based on the context of the previous words. They are trained on vast amounts of text data, such as books, websites, and other written materials. During training, the model learns the statistical patterns and relationships between words and sentences, enabling it to generate coherent and contextually relevant text.

Large language models have achieved impressive results in various natural language processing tasks, such as text completion, translation, summarization, and chatbots. They have been shown to generate creative and sometimes surprising text, and they have also been used to develop more adv

### Batching and Lowest Output Cost

On the other hand, maybe you are building an AI service that processes inputs in batches to generate content. In this case, you may want to get the cheaper provider for longer outputs. Let's see how you can do this using `batch` and dynamic routing!

In [13]:
messages = [
    "Write a blog post about Rome",
    "Write a blog post about Paris"
]

chat_fastest = ChatUnify(model="llama-2-70b-chat@houtput-cost")
chat_fastest.batch(messages)

[AIMessage(content="  Rome, the Eternal City, is a must-visit destination for any traveler. With a rich history spanning over 2,000 years, Rome is home to an array of iconic landmarks, world-class museums, and a vibrant culinary scene. Whether you're interested in history, art, architecture, or food, Rome has something for everyone.\n\nOne of the most iconic landmarks in Rome is the Colosseum, also known as the Flavian Amphitheatre. This ancient structure was built in 80 AD and could hold up to 50,000 spectators. The Colosseum was used for gladiatorial contests, animal hunts, and theatrical performances. Today, it is a UNESCO World Heritage Site and a symbol of Rome's rich history.\n\nAnother famous landmark in Rome is the Pantheon, a temple built in 126 AD. The Pantheon is considered one of the greatest architectural achievements of all time and is still standing after nearly 2,000 years. Its impressive dome, which was the largest in the world for over 1,000 years, is a testament to t

### Async calls and Lowest Input Cost

Last but not least, you can also run request asynchronously. For tasks like long document summarization, optimizing for input costs is crucial. Unify's dynamic router can do this too!

In [14]:
messages = [
    "Summarize this in 10 words or less. OpenAI is a U.S. based artificial intelligence "
    "(AI) research organization founded in December 2015, researching artificial intelligence "
    "with the goal of developing 'safe and beneficial' artificial general intelligence, "
    "which it defines as 'highly autonomous systems that outperform humans at most economically "
    "valuable work'. As one of the leading organizations of the AI spring, it has developed "
    "several large language models, advanced image generation models, and previously, released "
    "open-source models. Its release of ChatGPT has been credited with starting the AI spring", 

    "Summarize this in 10 words or less. Mistral AI is a French company selling"
    " artificial intelligence (AI) products. "
    "It was founded in April 2023 by previous employees of Meta Platforms and Google DeepMind. "
    "The company raised €385 million in October 2023 and in December 2023 it was valued at "
    "more than $2 billion. It produces open source large language models, citing the "
    "foundational importance of open-source software, and as a response to proprietary models. "
    "As of March 2024, two models have been published and are available as weights. "
    "Three more models, Small, Medium and Large, are available via API only.", 

    "Summarize this in 10 words or less. LLaMA (Large Language Model Meta AI) is a family of"
    " autoregressive large language models (LLMs), "
    "released by Meta AI starting in February 2023. For the first version of LLaMA, four model sizes "
    "were trained: 7, 13, 33, and 65 billion parameters. LLaMA's developers reported that the 13B "
    "parameter model's performance on most NLP benchmarks exceeded that of the much larger GPT-3 "
    "(with 175B parameters) and that the largest model was competitive with state of the art models "
    "such as PaLM and Chinchilla. Whereas the most powerful LLMs have generally been accessible only "
    "through limited APIs (if at all), Meta released LLaMA's model weights to the research community "
    "under a noncommercial license. Within a week of LLaMA's release, its weights were leaked to the "
    "public on 4chan via BitTorrent."
]

chat_model = ChatUnify(model="mixtral-8x7b-instruct-v0.1@input-cost")


await chat_model.abatch(messages)

[AIMessage(content=" OpenAI: Pioneering 'safe' artificial general intelligence.", response_metadata={'usage': {'completion_tokens': 14, 'prompt_tokens': 133, 'total_tokens': 147, 'cost': 3.969000000000001e-05}, 'model': 'mixtral-8x7b-instruct-v0.1@input-cost', 'finish_reason': 'stop'}),
 AIMessage(content=' Mistral AI: French startup producing open-source AI language models, raised $2bn+.', response_metadata={'usage': {'completion_tokens': 21, 'prompt_tokens': 158, 'total_tokens': 179, 'cost': 4.833e-05}, 'model': 'mixtral-8x7b-instruct-v0.1@input-cost', 'finish_reason': 'stop'}),
 AIMessage(content=' Meta releases large language model LLaMA; weights leaked to public within a week.', response_metadata={'usage': {'completion_tokens': 19, 'prompt_tokens': 227, 'total_tokens': 246, 'cost': 6.642e-05}, 'model': 'mixtral-8x7b-instruct-v0.1@input-cost', 'finish_reason': 'stop'})]