# Portkey | Building Resilient Llamaindex Apps

Portkey integration with Llamaindex adds the following capabilities to your apps out of the box:

1. AI Gateway
    - Automated Fallbacks & Retries
    - Load Balancing
    - Semantic Caching
2. Observability
    - Logging of all requess
    - Requests Tracing
    - Adding custom tags to each request

It does all this with just one command: 

In [None]:
from llama_index.llms import Portkey
from llama_index.llms import ChatMessage #We'll use this later

You do not need to keep track of installing any other SDKs or importing them to your Llamaindex app.

### Here's how it works:

Step 1: Get your Portkey API key by logging into [Portkey here](https://app.portkey.ai/) --> Click on the profile icon on top right and "Copy API Key".

In [None]:
os.environ["PORTKEY_API_KEY"]=""
os.environ["OPENAI_API_KEY"]=""
os.environ["ANTHROPIC_API_KEY"]=""

Step 2: Add all the Portkey features you want (as illustrated above) by calling the Portkey class

This is a quick guide to all Portkey features and what they expect:

| Feature             | Config Key              | Value(Type)                                      | Required    |
|---------------------|-------------------------|--------------------------------------------------|-------------|
| API Key             | `api_key`               | `string`                                         | ✅ Required |
| Mode                | `mode`                  | `fallback`, `loadbalance`, `single`              | ✅ Required |
| Cache Type          | `cache_status`          | `simple`, `semantic`                             | ❔ Optional |
| Force Cache Refresh | `cache_force_refresh`   | `True`, `False`                                  | ❔ Optional |
| Cache Age           | `cache_age`             | `integer` (in seconds)                           | ❔ Optional |
| Trace ID            | `trace_id`              | `string`                                         | ❔ Optional |
| Retries         | `retry_count`           | `integer` [0,5]                                  | ❔ Optional |
| Metadata            | `metadata`              | `json object` [More info](https://docs.portkey.ai/key-features/custom-metadata)          | ❔ Optional |
| Base URL | `base_url` | 

In [None]:
pk_llm = Portkey(mode="single",cache_status="semantic", cache_force_refresh="True", cache_age="1000", trace_id="portkey_llamaindex", retry="5") 

# Since we have defined the Portkey API Key with os.environ, we do not need to set it again here
# Let us also add some metadata!

metadata={
    "_environment":"production",
    "_prompt":"test",
    "_user":"user",
    "_organisation":"acme"
}

pk_llm.metadata=metadata

Step 3: Now let's pick which LLMs we want. 

With the Portkey integration, we have simplified how you construct an LLM by using a single function for all differnt providers: PortkeyBase. It has all the exact same keys you are already using with your OpenAI or Anthropic constructors, with the only addition of one new key - `weight`. This key is used for the load balancing feature. Scroll here if you want to jump to seeing how to implement load balancing.

In [None]:
openai_llm = LLMBase(provider="openai", model="gpt-4")

Step 4: Now let's activate our Portkey LLM!

In [None]:
pk_llm.add_llms(openai_llm)

And that's it, in these 4 steps, you have infused your Llamaindex app with the most sophisticated production capabilities. Let's test our our integration now:

In [None]:
messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What can you do?")
]
print("Testing Portkey Llamaindex integration:")
response = pk_llm.chat(messages)
print(response)

Your request and responses, along with the trace id, cache status, and all metadata are now logged to Portkey, and you can [see them here](https://app.portkey.ai/).

To recap,

Step 1 - Import Portkey from llama_index.llms.
Step 2 - Grab your Portkey API Key from [here](https://app.portkey.ai/).
Step 3 - Construct your Portkey LLM with `pk_llm=Portkey(mode="fallback")` and any other Portkey features you want
Step 4 - Construct your provider LLM with opneai_llm=PortkeyBase(provider="openai",model="gpt-4")
Step 5 - Add the provider LLM to Portkey LLM with `pk_llm.add_llms(openai_llm)`
Step 6 - Call the Portkey LLM regularly like you would any other LLM, with `pk_llm.chat(messages)`

Here's the guide to all the functions and their params:
- [Portkey LLM Constructor]()
- [PortkeyBase Constructor]() 
- [List of Portkey + Llamaindex Features]()
- [List of Portkey + Llamaindex limitations]()

# Implementing Fallbacks with Portkey

In [None]:
pk_llm.mode="fallback"

llm1 = LLMBase(provider="openai", model="gpt-4")
llm2 = LLMBase(provider="openai", model="gpt-3.5-turbo")

pk_llm.add_llms(llm_params=[llm1,llm2])

print("Testing Fallback functionality:")
response = pk_llm.chat(messages)
print(response)

# Implementing Load Balancing with Portkey

For Load Balancing, we have to add one more param to PortkeyBase - weight.

The way this works is, for each new request that comes, we load balance it according to your defined weights among the LLMs. It's that simple!

Weight for all LLMs that are passed to Portkey should sum up to 1. Here's an example:

In [None]:
pk_llm.mode="loadbalance"

llm1 = LLMBase(provider="openai", model="gpt-4",weight=0.2)
llm2 = LLMBase(provider="openai", model="gpt-3.5-turbo",weight=0.8)

pk_llm.add_llms(llm_params=[llm1,llm2])

print("Testing Loadbalance functionality:")
response = pk_llm.chat(messages)
print(response)

# Implementing Semantic Caching with Portkey

See the cache status on your Portkey dashboard

In [None]:
import time

pk_llm.cache_status="semantic"

current_messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What are the ingredients of a pizza?")
]

print("Testing Portkey Semantic Cache:")

start = time.time()
response = pk_llm.chat(current_messages)
end = time.time() - start

print(response)
print("\n--------------------------------------\n")
print(f'Served in {end} seconds.')

new_messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="Ingredients of pizza")
]

print("Testing Portkey Semantic Cache:")

start = time.time()
response = pk_llm.chat(new_messages)
end = time.time() - start

print(response)
print("\n--------------------------------------\n")
print(f'Served in {end} seconds.')

Portkey's cache supports two more cache-critical functions - Force Refresh and Age.

cache_force_refresh: Force-send a request to your provider instead of serving it from a cache
cache_age: Decide the interval at which the cache store for this particular string should get automatically refreshed

Here's how you can use it:

In [None]:
pk_llm.cache_age=1000
pk_llm.cache_force_refresh=True

# Observability with Portkey

All of your requests are automatically logged to Portkey where you can see the whole payload, 