# How to add fallbacks to a runnable

When working with language models, you may often encounter issues from the underlying APIs, whether these be rate limiting or downtime. Therefore, as you go to move your LLM applications into production it becomes more and more important to safeguard against these. That's why we've introduced the concept of fallbacks. 

A **fallback** is an alternative plan that may be used in an emergency.

Crucially, fallbacks can be applied not only on the LLM level but on the whole runnable level. This is important because often times different models require different prompts. So if your call to OpenAI fails, you don't just want to send the same prompt to Anthropic - you probably want to use a different prompt template and send a different version there.

## Fallback for LLM API Errors

This is maybe the most common use case for fallbacks. A request to an LLM API can fail for a variety of reasons - the API could be down, you could have hit rate limits, any number of things. Therefore, using fallbacks can help protect against these types of things.

IMPORTANT: By default, a lot of the LLM wrappers catch errors and retry. You will most likely want to turn those off when working with fallbacks. Otherwise the first wrapper will keep on retrying and not failing.

In [1]:
%pip install --upgrade --quiet  langchain langchain-openai

Note: you may need to restart the kernel to use updated packages.


In [2]:
from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI

First, let's mock out what happens if we hit a RateLimitError from OpenAI

In [3]:
from unittest.mock import patch

import httpx
from openai import RateLimitError

request = httpx.Request("GET", "/")
response = httpx.Response(200, request=request)
error = RateLimitError("rate limit", response=response, body="")

In [4]:
# Note that we set max_retries = 0 to avoid retrying on RateLimits, etc
openai_llm = ChatOpenAI(model="gpt-4o-mini", max_retries=0)
anthropic_llm = ChatAnthropic(model="claude-3-haiku-20240307")
llm = openai_llm.with_fallbacks([anthropic_llm])

In [5]:
# Let's use just the OpenAI LLm first, to show that we run into an error
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
    try:
        print(openai_llm.invoke("Why did the chicken cross the road?"))
    except RateLimitError:
        print("Hit error")

Hit error


In [6]:
# Now let's try with fallbacks to Anthropic
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
    try:
        print(llm.invoke("Why did the chicken cross the road?"))
    except RateLimitError:
        print("Hit error")

content="There are many joking responses to this classic riddle, but the basic premise is that the chicken crossed the road for some funny or unexpected reason. Some common punchlines include:\n\n- To get to the other side.\n- I don't know, why did the chicken cross the road?\n- To prove to the possum it could be done.\n- To show the armadillo it could be done.\n- Because it was the chicken's day off.\n- To fetch a pail of water.\n\nThe simplicity and ambiguity of the question is what makes this riddle so enduring and open to humorous interpretations. The actual reason why the chicken crossed the road is left up to the imagination of the person hearing the joke." additional_kwargs={} response_metadata={'id': 'msg_01Eh8JuxtpMkeYMPQn4Pz52b', 'model': 'claude-3-haiku-20240307', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 15, 'output_tokens': 161}} id='run-e2bb674c-6c23-41f5-b572-7f4ae8d03d9c-0' usage_metadata={'input_tokens': 15, 'output_tokens': 161, 'tota

We can use our "LLM with Fallbacks" as we would a normal LLM.

In [7]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a nice assistant who always includes a compliment in your response",
        ),
        ("human", "Why did the {animal} cross the road"),
    ]
)
chain = prompt | llm
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
    try:
        print(chain.invoke({"animal": "kangaroo"}))
    except RateLimitError:
        print("Hit error")

content="I don't actually know why the kangaroo crossed the road. Jokes and riddles about animals crossing roads are often just silly little puzzles without a real punchline. But I'm happy to engage with your sense of humor! You have a fun and creative way of thinking up these kinds of lighthearted quips. I enjoy our playful exchanges and think you're very clever." additional_kwargs={} response_metadata={'id': 'msg_01KEc7euBvTkMkQmZanXnXdw', 'model': 'claude-3-haiku-20240307', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 30, 'output_tokens': 85}} id='run-6160f8a3-080d-4166-ab7a-4128f1de2da4-0' usage_metadata={'input_tokens': 30, 'output_tokens': 85, 'total_tokens': 115}


## Fallback for Sequences

We can also create fallbacks for sequences, that are sequences themselves. Here we do that with two different models: ChatOpenAI and then normal OpenAI (which does not use a chat model). Because OpenAI is NOT a chat model, you likely want a different prompt.

In [8]:
# First let's create a chain with a ChatModel
# We add in a string output parser here so the outputs between the two are the same type
from langchain_core.output_parsers import StrOutputParser

chat_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a nice assistant who always includes a compliment in your response",
        ),
        ("human", "Why did the {animal} cross the road"),
    ]
)
# Here we're going to use a bad model name to easily create a chain that will error
chat_model = ChatOpenAI(model="gpt-fake")
bad_chain = chat_prompt | chat_model | StrOutputParser()

In [9]:
# Now lets create a chain with the normal OpenAI model
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI

prompt_template = """Instructions: You should always include a compliment in your response.

Question: Why did the {animal} cross the road?"""
prompt = PromptTemplate.from_template(prompt_template)
llm = OpenAI()
good_chain = prompt | llm

In [10]:
# We can now create a final chain which combines the two
chain = bad_chain.with_fallbacks([good_chain])
chain.invoke({"animal": "turtle"})

'\n\nResponse: The turtle must have been very determined to cross the road!'

## Fallback for Long Inputs

One of the big limiting factors of LLMs is their context window. Usually, you can count and track the length of prompts before sending them to an LLM, but in situations where that is hard/complicated, you can fallback to a model with a longer context length.

In [11]:
short_llm = ChatOpenAI()
long_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
llm = short_llm.with_fallbacks([long_llm])

In [12]:
inputs = "What is the next number: " + ", ".join(["one", "two"] * 3000)

In [13]:
try:
    print(short_llm.invoke(inputs))
except Exception as e:
    print(e)

Error code: 400 - {'error': {'message': "Sorry! We've encountered an issue with repetitive patterns in your prompt. Please try again with a different prompt.", 'type': 'invalid_request_error', 'param': 'prompt', 'code': 'invalid_prompt'}}


In [14]:
try:
    print(llm.invoke(inputs))
except Exception as e:
    print(e)

Error code: 400 - {'error': {'message': "Sorry! We've encountered an issue with repetitive patterns in your prompt. Please try again with a different prompt.", 'type': 'invalid_request_error', 'param': 'prompt', 'code': 'invalid_prompt'}}


## Fallback to Better Model

Often times we ask models to output format in a specific format (like JSON). Models like GPT-3.5 can do this okay, but sometimes struggle. This naturally points to fallbacks - we can try with GPT-3.5 (faster, cheaper), but then if parsing fails we can use GPT-4.

In [15]:
from langchain.output_parsers import DatetimeOutputParser

In [16]:
prompt = ChatPromptTemplate.from_template(
    "what time was {event} (in %Y-%m-%dT%H:%M:%S.%fZ format - only return this value)"
)

In [17]:
# In this case we are going to do the fallbacks on the LLM + output parser level
# Because the error will get raised in the OutputParser
openai_35 = ChatOpenAI() | DatetimeOutputParser()
openai_4 = ChatOpenAI(model="gpt-4") | DatetimeOutputParser()

In [18]:
only_35 = prompt | openai_35
fallback_4 = prompt | openai_35.with_fallbacks([openai_4])

In [19]:
try:
    print(only_35.invoke({"event": "the superbowl in 1994"}))
except Exception as e:
    print(f"Error: {e}")

1994-01-30 18:30:00


In [20]:
try:
    print(fallback_4.invoke({"event": "the superbowl in 1994"}))
except Exception as e:
    print(f"Error: {e}")

1994-01-30 18:30:00
