# Warning, this notebook is a WIP and will eventually replace 2_llm_gateway. target July 14th 2025.

# Agentic Platform: LLM Gateway

This lab introduces the concept of an LLM Gateway. LLM Gateways let you track and throttle requests in a multi-tenant environment. Your tenancy could be by department, workload, customer organization, or even individual users on your platform.

There are many options for this from open source projects, private offerings, or DIY. In this lab we'll focus on using LiteLLM as our proxy. There are many options available (Portkey, Kong, Envoy, etc..). We opted to use LiteLLM because of it's existing integrations in the ecosystem. 

Lets get started

## Make a call to Bedrock
LiteLLM has two ways to interact with it:
1. As an SDK
2. As a proxy server

LiteLLM in both cases normalizes LLM provider APIs to OpenAIs ChatCompletion format. This is not only useful for integrating with other LLM APIs, but also works with models hosted in SageMaker, locally (through Ollama), etc.. Going back to our previous labs, if we're calling the API directly, we convert into our own types regardless. This makes it easy to switch out LiteLLM for another gateway down the road if needed.

We'll start with the SDK. Lets make a call to Bedrock. In the example below we'll call Bedrock but we'll use the ChatCompletion format. 

In [None]:
import litellm
from litellm import ModelResponse
import os

# Simple LiteLLM call to Bedrock
response: ModelResponse = litellm.completion(
    model="bedrock/anthropic.claude-3-haiku-20240307-v1:0",
    messages=[
        {"role": "user", "content": "Hello! Can you tell me a fun fact about AI?"}
    ],
    max_tokens=100
)

print("Response:")
print(response.choices[0].message.content)

print('-----------------')

print(type(response))

## Abstract into our own types
Great! We now have a standard API output format. However we want to normalize it into our own types to better future proof our system. To do that, we follow a similar pattern where we write converts into the types we've been using throughout the labs.

In [6]:
from agentic_platform.core.converter.litellm_converters import LiteLLMRequestConverter, LiteLLMResponseConverter
from agentic_platform.core.models.llm_models import LLMRequest, LLMResponse

In [7]:
def call_litellm(request: LLMRequest) -> LLMResponse:
    """
    Call LiteLLM directly using the SDK approach.
    Converts our internal LLMRequest format to LiteLLM and back to LLMResponse.
    """
    # Convert internal request to LiteLLM format
    litellm_payload = LiteLLMRequestConverter.convert_llm_request(request)
    
    # Make the call using LiteLLM SDK
    litellm_response = litellm.completion(**litellm_payload)
    
    # Convert LiteLLM response back to our internal format
    return LiteLLMResponseConverter.to_llm_response(litellm_response.model_dump())

## Test the function
Now let's test our `call_litellm` function with our internal types:

In [None]:
from agentic_platform.core.models.memory_models import Message, TextContent

# Create a test LLMRequest using our internal types
test_request: LLMRequest = LLMRequest(
    system_prompt="You are a helpful AI assistant.",
    messages=[
        Message(
            role="user",
            content=[TextContent(type="text", text="What is the capital of France?")]
        )
    ],
    model_id="bedrock/anthropic.claude-3-haiku-20240307-v1:0",
    hyperparams={"max_tokens": 100}
)

# Call our function
response: LLMResponse = call_litellm(test_request)

# Display the results
print("Response text:", response.text)
print("Response type:", type(response))
print("Usage:", response.usage)
print("Stop reason:", response.stop_reason)

# Integrate LiteLLM With Frameworks
Now that we have the basics down, lets see how to integrate litellm with different frameworks. Lets start with Strands which has a native LiteLLM integration.

In [None]:
from strands import Agent as StrandsAgent
from strands.models.litellm import LiteLLMModel as StrandsLiteLLMModel
from strands_tools import calculator

model = StrandsLiteLLMModel(
    model_id="bedrock/anthropic.claude-3-haiku-20240307-v1:0",
    params={
        "max_tokens": 1000,
        "temperature": 0.7,
    }
)

agent = StrandsAgent(model=model, tools=[calculator])
response = agent("What is 2+2")
print(response)

## Integrate with LangGraph
Next lets see how we can integrate litellm into the other two frameworks. Lets do LangChain / LangGraph next.

In [None]:
from langchain_litellm import ChatLiteLLM
from langgraph.prebuilt import create_react_agent
from langchain_core.tools import tool as lc_tool

llm = ChatLiteLLM(model="bedrock/anthropic.claude-3-haiku-20240307-v1:0", temperature=0.1)

@lc_tool
def get_weather(location: str) -> str:
    """Get weather for a location."""
    return f"The weather in {location} is sunny and 75°F"

@lc_tool
def calculate(expression: str) -> str:
    """Calculate a mathematical expression."""
    try:
        result = eval(expression)
        return str(result)
    except:
        return "Error in calculation"

# Create the React agent - works exactly like with any other LangChain LLM
agent = create_react_agent(llm, [get_weather, calculate])

result = agent.invoke({
    "messages": [("user", "What's the weather in NYC and what's 25 * 4?")]
})
print(result["messages"][-1].content)

<div style="background-color: #FEEFB3; color: #9F6000; padding: 15px; border-radius: 5px; border-left: 6px solid #9F6000; margin-bottom: 15px;">
<strong>⚠️ WARNING:</strong> You will need to have the platform stack deployed for these next steps and will also need to be running this on the Code Server in the bastion host. If you do not wish to deploy the stack, you can continue to the 3_agent_evaluation notebook onwards which does not require access to aws resources.
</div>

# Proxy Server
So far we've been using LiteLLM locally as an SDK. While that's useful, we need to call Bedrock through the proxy server so we can get all the benefits of an LLM Gateway. For this next section we'll need to deploy litellm into our EKS cluster.

# Deploy LiteLLM

Now that we understand what our LLM Gateway needs to look like, we can start calling it from our code. Before we do that, we need to deploy the gateway and aws application load balancer controller into kubernetes. Open up the terminal window if it's not already open and run the following commands in it. 

First test that you can hit kubernetes. You should see a couple nodes pop up. If not, please reach out to your facilitar. 
```bash
kubectl get nodes
```

**Note:** If your helm installs and docker builds are failing in the code server environment, you may want to ssm into the host directly and execute the commands as the ubuntu user.

Next lets install all the cluster essentials (like our LB controller). make sure your in /home/ubuntu/sample-agentic-platform in your terminal. Run the following command

```bash
. ./bootstrap/eks-bootstrap.sh
```
Once completed run this command to see if the controller is deployed
```bash
kubectl get pods -n kube-system
```

You should see a bunch of pods. Two of them should be prefixed with "lb-controller-aws-load-balancer-controller". Great! We have our load balancer controller!

Next we need to deploy our llm gateway. Run the following command in that same terminal window
```bash
. ./deploy/deploy-litellm.sh --build
```

After completing, you should see the llm gateway by running the command below
```bash
kubectl get pods
```

### Check Your Load balancer
The load balancer controller detects that we've set an ingress rule on our llm gateway. When we deployed the gateway, kubernetes will automatically deploy the load balancer but this can take a couple minutes. 

During the deployment of the platform, we configured a load balancer that points to our LLM gateway. In the free OSS version of LiteLLM, it only supports API key auth. We'll need to grab the load balancers name and grab the API key from secrets manager.

The first step is to grab the load balancers name. You can find it in the console or use boto3 below

In [None]:
import boto3
from typing import List, Dict

# Initialize the client
elbv2 = boto3.client('elbv2')

# List all load balancers
load_balancers: List[Dict] = elbv2.describe_load_balancers()['LoadBalancers']

# Get the load balancer name. It should be prefixed by k8s-platform
dns_name: str = [lb['DNSName'] for lb in load_balancers if 'k8s-platform' in lb['LoadBalancerName']][0]
dns_name

# Get our Secret for Auth
Now we need to get our secret containing our machine 2 machine client auth token. In the deployment script we've set up two client applications in cognito. The first one is for users and teh second one is for machine 2 machine oAuth.

In [None]:
# Get our Secret for Auth
import json

# Get the parameter
response = ssm_client.get_parameter(
    Name='/agentic-platform/config/dev',
    WithDecryption=True
)

# Parse the JSON
json_value = response['Parameter']['Value']
config = json.loads(json_value)

secret_arn= config['LITELLM_CONFIG_SECRET_ARN']


secret = boto3.client('secretsmanager').get_secret_value(SecretId=secret_arn)
secret_value: str = secret['SecretString']

# Parse the secret value
secret_value_dict: Dict = json.loads(secret_value)


## Call LiteLLM from the proxy
Now that we have our api key & proxy deployed, we can call it from the notebook.