# Endpoint Access
The goal of this notebook is to examine the ability of a kernel agent to generate kernels that are memory bw limited.


## Setup and Sanity check

In [1]:
import os
import sys 
root_dir = os.path.dirname(os.path.abspath(''))
root_dir
sys.path.append(root_dir)
from endpoints import MODEL_NAME_TO_ID,ask_frontier_llm,ask_nim_llm
env_path=os.path.join(root_dir,'endpoints','.env')
env_path

  from .autonotebook import tqdm as notebook_tqdm


'/home/gkoren/code/local/kgen_problems/endpoints/.env'

In [2]:
import os
import re
import math
import json

from openai import OpenAI, AzureOpenAI
from dotenv import load_dotenv
from IPython.display import display_markdown
# Remember to load the environment variables. You should have the Groq API Key in there :)
load_dotenv(env_path)
api_key=os.getenv("AZURE_OPENAI_API_KEY")
azure_endpt=os.getenv("AZURE_OPENAI_ENDPOINT")
api_version = os.getenv("OPENAI_API_VERSION")
perlab_api_key = os.getenv("PERFLAB_API_KEY")

print(api_version)
print(MODEL_NAME_TO_ID)

2024-12-01-preview
{'clds35': 'claude-3-5-sonnet-20241022', 'clds37': 'claude-3-7-sonnet-20250219', 'clds4': 'claude-sonnet-4-20250514', 'cldo4': 'claude-opus-4-20250514', 'gpt-4o': 'gpt-4o-20241120', 'gpt-4o-mini': 'gpt-4o-mini-20240718', 'gpt-4-turbo': 'gpt-4-turbo-20240409', 'o1-preview': 'o1-preview-20240912', 'o1-mini': 'o1-mini-20240912', 'o1': 'o1-20241217', 'o3mini': 'o3-mini-20250131', 'llama3.3': 'nvdev/meta/llama-3.3-70b-instruct', 'dsr1': 'nvdev/deepseek-ai/deepseek-r1'}


### Quick sanity check and usage example

#### Frontier model

In [3]:
# for frontier models 
model_id=MODEL_NAME_TO_ID['clds35']
client = AzureOpenAI(azure_endpoint=azure_endpt,
                     api_version=api_version,
                     api_key=api_key)

generation_chat_history = [
    {
        "role": "system",
        "content": "You are a Python programmer tasked with generating high quality Python code."
        "Your task is to Generate the best content possible for the user's request. If the user provides critique," 
        "respond with a revised version of your previous attempt."
    }
]

generation_chat_history.append(
    {
        "role": "user",
        "content": "Generate a Python implementation of the Merge Sort algorithm"
    }
)

mergesort_code = client.chat.completions.create(
    messages=generation_chat_history,
    model=model_id
).choices[0].message.content

generation_chat_history.append(
    {
        "role": "assistant",
        "content": mergesort_code
    }
)
display_markdown(mergesort_code, raw=True)

Here's a clean and well-documented implementation of the Merge Sort algorithm in Python:

```python
def merge_sort(arr):
    """
    Sorts an array using the merge sort algorithm.
    
    Args:
        arr (list): The input array to be sorted
        
    Returns:
        list: The sorted array
    """
    # Base case: if array has 1 or 0 elements, it's already sorted
    if len(arr) <= 1:
        return arr
    
    # Divide the array into two halves
    mid = len(arr) // 2
    left = arr[:mid]
    right = arr[mid:]
    
    # Recursively sort both halves
    left = merge_sort(left)
    right = merge_sort(right)
    
    # Merge the sorted halves
    return merge(left, right)

def merge(left, right):
    """
    Merges two sorted arrays into a single sorted array.
    
    Args:
        left (list): First sorted array
        right (list): Second sorted array
        
    Returns:
        list: Merged sorted array
    """
    result = []
    i = j = 0
    
    # Compare elements from both arrays and merge them in sorted order
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    
    # Add remaining elements from left array, if any
    result.extend(left[i:])
    
    # Add remaining elements from right array, if any
    result.extend(right[j:])
    
    return result

# Example usage
if __name__ == "__main__":
    # Test the merge sort implementation
    test_array = [64, 34, 25, 12, 22, 11, 90]
    print("Original array:", test_array)
    
    sorted_array = merge_sort(test_array)
    print("Sorted array:", sorted_array)
```

This implementation includes:

1. A main `merge_sort` function that implements the divide-and-conquer strategy:
   - Divides the array into two halves
   - Recursively sorts each half
   - Merges the sorted halves

2. A helper `merge` function that combines two sorted arrays into a single sorted array:
   - Uses two pointers to track progress through each array
   - Compares elements and builds the result array in sorted order
   - Handles remaining elements from either array

3. Clear documentation with docstrings explaining the purpose and parameters of each function

4. Example usage that demonstrates how to use the sorting algorithm

The algorithm has the following characteristics:
- Time complexity: O(n log n)
- Space complexity: O(n)
- Stable sort: Yes (maintains relative order of equal elements)

To use this implementation, you can simply call `merge_sort(your_array)` with any list of comparable elements. The function will return a new sorted list, leaving the original array unchanged.

The example usage will output:
```python
Original array: [64, 34, 25, 12, 22, 11, 90]
Sorted array: [11, 12, 22, 25, 34, 64, 90]
```

In [None]:
model_id=MODEL_NAME_TO_ID['clds35']
user_prompt = "Generate a Python implementation of the Merge Sort algorithm"
system_prompt = "You are a Python programmer tasked with generating high quality Python code."
mergesort_code = ask_frontier_llm(system_prompt,user_prompt,model_id)
display_markdown(mergesort_code,raw=True)

## Accessing through Langchain 

Make sure you have the required libraries installed : 
```bash
! pip install langchain langgraph langchain_openai 
```


In [4]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import AzureChatOpenAI

llm = AzureChatOpenAI(
    azure_endpoint=azure_endpt,
    api_version=api_version,
    api_key=api_key,
    model=MODEL_NAME_TO_ID['clds35'],
    temperature=0.0
)

# Test the setup
response = llm.invoke("Hello! Are you working?")
print(response.content)

Yes, I'm working and ready to help! What can I assist you with today?


In [6]:
model_id=MODEL_NAME_TO_ID['clds35']
user_prompt_template = """Generate a Python implementation of the {algo_name} algorithm"""
system_prompt_template = "You are a Python programmer tasked with generating high quality Python code."

template = ChatPromptTemplate.from_messages([
    ("system", system_prompt_template),
    ("human", user_prompt_template)
])

llm = AzureChatOpenAI(
    azure_endpoint=azure_endpt,
    api_version=api_version,
    api_key=api_key,
    model=model_id,
    temperature=0.0
)

chain = template | llm

response = chain.invoke({"algo_name": "Merge Sort"})

# Clean up the response, removing markdown code fences
clean_code = response.content.strip().replace("```python", "").replace("```", "").strip()

print(clean_code)


Here's a clean and well-documented implementation of the Merge Sort algorithm in Python:


def merge_sort(arr):
    """
    Sorts an array using the merge sort algorithm.
    
    Args:
        arr: List of comparable elements
        
    Returns:
        Sorted list in ascending order
    """
    # Base case: if array has 1 or fewer elements, it's already sorted
    if len(arr) <= 1:
        return arr
    
    # Divide the array into two halves
    mid = len(arr) // 2
    left = arr[:mid]
    right = arr[mid:]
    
    # Recursively sort both halves
    left = merge_sort(left)
    right = merge_sort(right)
    
    # Merge the sorted halves
    return merge(left, right)

def merge(left, right):
    """
    Merges two sorted arrays into a single sorted array.
    
    Args:
        left: First sorted array
        right: Second sorted array
        
    Returns:
        Merged sorted array
    """
    result = []
    left_idx = right_idx = 0
    
    # Compare elements from both arra

#### NIM models


In [None]:
model_id=MODEL_NAME_TO_ID['llama3.3']
user_prompt = "Generate a Python implementation of the Merge Sort algorithm"
system_prompt = "You are a Python programmer tasked with generating high quality Python code."
mergesort_code = ask_nim_llm(system_prompt,user_prompt,model_id)
display_markdown(mergesort_code,raw=True)

# Accessing local (HF) model

In [None]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-32B",torch_dtype=torch.bfloat16,device_map="auto")

In [None]:
generation_chat_history = [
    {
        "role": "system",
        "content": "You are a Python programmer tasked with generating high quality Python code."
        "Your task is to Generate the best content possible for the user's request. If the user provides critique," 
        "respond with a revised version of your previous attempt."
    }
]
generation_chat_history.append(
    {
        "role": "user",
        "content": "Generate a Python implementation of the Merge Sort algorithm"
    }
)
generation_chat_history

In [None]:
# one way to generate the prompt
prompt = ""
for message in generation_chat_history:
    if message["role"] == "system":
        prompt += f"System: {message['content']}\n"
    elif message["role"] == "user":
        prompt += f"User: {message['content']}\n"
    elif message["role"] == "assistant":
        prompt += f"Assistant: {message['content']}\n"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
prompt_length = inputs["input_ids"].shape[1]

In [22]:
# a more direct
inputs2 = tokenizer.apply_chat_template(
    generation_chat_history,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)
prompt_length = inputs2.shape[1]  # Number of tokens in the prompt
# inputs = {k: v.to(model.device) for k, v in inputs2.items()}

In [25]:
with torch.no_grad():
    output = model.generate(
        inputs2,
        max_new_tokens=4096,
        do_sample=True,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id
    )


In [None]:
new_tokens = output[0][prompt_length:]

# 5. Decode only the new tokens
assistant_response = tokenizer.decode(new_tokens, skip_special_tokens=True)

print("Assistant response:")
print(assistant_response)

## Testing the generation code

In [27]:
def merge_sort(arr):
    """
    Sorts a list using the Merge Sort algorithm.
    
    Args:
        arr (list): The list to be sorted.
        
    Returns:
        list: A new sorted list.
    """
    # Base case: if the array has one element or is empty, it's already sorted
    if len(arr) <= 1:
        return arr
    
    # Divide the array into two halves
    mid = len(arr) // 2
    left_half = merge_sort(arr[:mid])  # Recursively sort the left half
    right_half = merge_sort(arr[mid:])  # Recursively sort the right half
    
    # Combine the sorted halves
    return merge(left_half, right_half)


def merge(left, right):
    """
    Merges two sorted lists into a single sorted list.
    
    Args:
        left (list): The first sorted list.
        right (list): The second sorted list.
        
    Returns:
        list: A merged sorted list.
    """
    merged = []  # Result list
    i = j = 0    # Pointers for left and right lists
    
    # Merge the two lists by comparing elements
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:  # Ensure stability by using <=
            merged.append(left[i])
            i += 1
        else:
            merged.append(right[j])
            j += 1
    
    # Add any remaining elements from left and right
    merged.extend(left[i:])
    merged.extend(right[j:])
    
    return merged

In [None]:
# unsorted = [34, 7, 23, 32, 5, 62]
unsorted = [64, 34, 25, 12, 22, 11, 90]
sorted_list = merge_sort(unsorted)
print(sorted_list)  # Output: [5, 7, 23, 32, 34, 62]

# Accessing inference service
using ollama :
- setup docker network : 
```bash
docker network create llmnet
```
- launch an ollama container 
```bash
docker run -d  --gpus all --name ollama   --network llmnet   -p 11434:11434  ollama/ollama
```

Note: make sure that this notebook's container is also launched with `--network llmnet`

- attach to the ollama container to pull the model
```bash
docker exec -it ollama bash
```
- from within the container, pull the model
```
ollama pull qwen3:8b
```



In [1]:
import requests
import json
import openai 

In [2]:
def get_response(messages):
    prompt = ""
    for msg in messages:
        if msg["role"] == "system":
            prompt += f"System: {msg['content']}\n"
        elif msg["role"] == "user":
            prompt += f"User: {msg['content']}\n"
        elif msg["role"] == "assistant":
            prompt += f"Assistant: {msg['content']}\n"
    response = requests.post(
        "http://ollama:11434/api/generate",
        json={"model": "qwen3:8b", "prompt": prompt}
    )
    print(response.status_code)
    full_text = ""
    for line in response.iter_lines():
        if line:
            data = json.loads(line.decode('utf-8'))
            # The generated text is usually in the 'response' field
            full_text += data.get("response", "")
    return full_text

def get_response_openai(messages, model="qwen3:8b", base_url="http://ollama:11434/v1"):
    # Create a client that points to the Ollama OpenAI-compatible endpoint
    client = openai.OpenAI(
        api_key="ollama",  # Any string, Ollama doesn't check it
        base_url=base_url
    )
    # Call the chat completion endpoint
    response = client.chat.completions.create(
        model=model,
        messages=messages
    )
    # Extract the assistant's reply
    return response.choices[0].message.content

In [None]:
generation_chat_history = [
    {
        "role": "system",
        "content": "You are a Python programmer tasked with generating high quality Python code."
        "Your task is to Generate the best content possible for the user's request. If the user provides critique," 
        "respond with a revised version of your previous attempt."
    }
]
generation_chat_history.append(
    {
        "role": "user",
        "content": "Generate a Python implementation of the Merge Sort algorithm"
    }
)
generation_chat_history

In [None]:
# using native ollama api
response = get_response(generation_chat_history)
print(response)

In [None]:
# using openai api
response = get_response_openai(generation_chat_history)
print(response)

In [27]:
def merge_sort(arr):
    """
    Sorts an array using the Merge Sort algorithm.
    
    Parameters:
    arr (list): The list of elements to be sorted.
    
    Returns:
    list: A new list containing all elements from the original list, sorted in ascending order.
    """
    if len(arr) <= 1:
        return arr  # Base case: single-element list is already sorted
    
    # Split the array into left and right halves
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])  # Recursively sort the left half
    right = merge_sort(arr[mid:])  # Recursively sort the right half
    
    # Merge the sorted halves
    return merge(left, right)

def merge(left, right):
    """
    Merges two sorted lists into a single sorted list.
    
    Parameters:
    left (list): The first sorted list.
    right (list): The second sorted list.
    
    Returns:
    list: A new list containing all elements from both input lists, sorted in ascending order.
    """
    merged = []
    i = j = 0
    
    # Merge elements from both lists
    while i < len(left) and j < len(right):
        if left[i] < right[j]:
            merged.append(left[i])
            i += 1
        else:
            merged.append(right[j])
            j += 1
    
    # Add any remaining elements from the left or right list
    merged.extend(left[i:])
    merged.extend(right[j:])
    
    return merged

In [None]:
# unsorted = [34, 7, 23, 32, 5, 62]
unsorted = [64, 34, 25, 12, 22, 11, 90]
sorted_list = merge_sort(unsorted)
print(sorted_list)  # Output: [5, 7, 23, 32, 34, 62]