# Benchmark a Model Bundle

Welcome to this tutorial on benchmarking a model bundle in SambaNova dedicated offerings!

**What is a model bundle?**  
A model bundle (a.k.a `composite model`) is a group of models that share a hardware node and that can be accessed through a common endpoint. It allows multiple models to share memory in a single hardware node, utilizing the memory better, and making it quick & easy to switch between them.

Before you get started, please follow the set up instructions given in the [README](./README.md)

## 1.  Imports

In [None]:


import json
import os
import sys

from dotenv import load_dotenv
import pprint

from IPython.display import display, HTML
display(HTML("<style>:root { --jp-notebook-max-width: 100% !important; }</style>"))

load_dotenv();

In [None]:
# Import the SambaStudio SDK
from snsdk import SnSdk

## 2. Set up environment connector

Connects to the remote dedicated environment using the variables defined in `.env`

In [None]:
sn_env = SnSdk(
    host_url=os.getenv("SAMBASTUDIO_HOST_NAME"), 
    access_key=os.getenv("SAMBASTUDIO_ACCESS_KEY"), 
    tenant_id=os.getenv("SAMBASTUDIO_TENANT_NAME"),
)

## 3. Select models to bundle

The models on Sambastudio are divided into the following groups:
- actually available
- still in the process of uploading
- exist in a remote storage from which they can be made available
- not in a usable state

In [None]:
# Get the complete list of models.
models = sn_env.list_models()["models"]
print('All models: ', len(models))

# Filter down to the models that are actually available on the environment
available_models = [m for m in models if m['status'] == 'Available']

# Print names of the available models for Llama and DeepSeek
print('Available models: ', len(available_models))
for model in sorted([m["model_checkpoint_name"] for m in available_models]):
    if 'llama' in model.lower() or 'deepseek' in model.lower():
        print(model)

#### Select models to include in the bundle

In [None]:
# Name of the model bundle
model_name = 'llama_deepseek_coe'

# List of models to include in the bundle
selected_models = [
    'Meta-Llama-3.1-8B-Instruct',
    'Meta-Llama-3.3-70B-Instruct',
    'DeepSeek-R1-Distill-Llama-70B'
]

## 4. Create a bundle

Set `rdu_required=16` for maximum performance. [RDUs](https://sambanova.ai/technology/sn40l-rdu-ai-chip) are SambaNova's cutting-edge replacements for GPUs.

Further checks on bundle compatibility can be achieved using the `sn_env.validate_model_bundle(dependencies, rdu_required)` function. If that or the following command fails, please try changing the constituents of `selected_models`. 

In [None]:
composite_model = sn_env.add_composite_model(
    name=model_name,
    description="Bundle containing Meta-Llama-3.1-8B-Instruct,  Meta-Llama-3.3-70B-Instruct, and DeepSeek-R1-Distill-Llama-70B.",
    dependencies=[{'name': model} for model in selected_models],
    rdu_required=16,
    config_params={},
    app=""
)

print(composite_model['status'])

Once created, a bundle can be deleted using `sn_env.delete_model(bundle_name)`

## 5. Create or select a project

Projects are a way to organize endpoints and training/inference jobs

#### List available projects
You can list existing projects in which the endpoint can be created for model deployment

In [None]:
projects = sn_env.list_projects()["projects"]
sorted([project["name"] for project in projects])

#### Create a new project
If you do not wish to use an existing project, you may create a new one.

In [None]:
project_name = "coe_benchmarking_francesca"
new_project = sn_env.create_project(
                    project_name=project_name,
                    description="A test project for CoE benchmarking.."
                )

#### Deleting a project

If required, a project can be deleted using the `sn_env.delete_project(project_name)` function. Please be sure to stop and delete all endpoints and jobs before deleting a project.

## 5. Create an endpoint

In [None]:
endpoint_name = model_name.lower().replace('_','-')
endpoint = sn_env.create_endpoint(
    project=project_name,
    endpoint_name=endpoint_name,
    description="Endpoint for " + model_name,
    model_checkpoint=model_name,
    model_version=1,
    instances=1,
    hyperparams='{"model_parallel_rdus": "16", "num_tokens_at_a_time": "10"}',
    rdu_arch="SN40L-16",
    inference_api_openai_compatible=True
)

endpoint = sn_env.endpoint_info(project_name, endpoint_name)
print(endpoint['status'])

## 6. Get Endpoint Details
To test the endpoint, we will need to obtain some of its information. Note that this information can be obtained even while the model is setting up.

#### Get the endpoint URL

In [None]:
endpoint_url = os.getenv("SAMBASTUDIO_HOST_NAME") + "/v1/" + endpoint["id"]

#### Get the default endpoint API key
Note that:
  - New keys can be added using the `sn_env.add_endpoint_api_key` API.    
  - All keys can be revoked using the `sn_env.edit_endpoint_api_key` API.

In [None]:
endpoint_key = endpoint["api_keys"][0]["api_key"]

#### Get model names in the endpoint

In [None]:
endpoint_model_id = endpoint['targets'][0]["model"]
model_info = sn_env.model_info(endpoint_model_id, job_type="deploy")

#### Check if the model is standalone or composite (bundle)

In [None]:
model_info["type"]

#### If the model is a composite/bundle, list its constituents

In [None]:
model_constituents = [m["name"] for m in model_info["dependencies"]]
sorted(model_constituents)

## 7. Test Endpoint
Once the endpoint is live, you can test it using the OpenAI API

#### Make sure endpoint is live

In [None]:
endpoint = sn_env.endpoint_info(project_name, endpoint_name)
endpoint['status']

#### Create test messages to send to the endpoint

In [None]:
test_messages = [
    {
        "role": "system",
         "content": "You are a helpful assistant"
    },
    {
        "role": "user",
        "content": "How large is the Earth?"
    }
]

#### Send test messsages to the endpoint
In this example, we test all the constituents of the model bundle. An endpoint may only have one model deployed, in which case this test can be done against that model alone.

**Note: If a model uses speculative decoding, its name will not match the name expected by the endpoint. Instead, we need to get and use the name of the target model.**

In [None]:
import os
import openai

client = openai.OpenAI(
    api_key=endpoint_key,
    base_url=endpoint_url,
)

for constituent_name in model_constituents:    
    model_name = constituent_name

    # Check for speculative decoding
    constituent_info = sn_env.model_info(constituent_name, job_type="deploy")
    if 'target_model' in constituent_info['config']:
        target_name = constituent_info['config']['target_model']        
        if len(target_name) > 0:
            model_name = target_name

    # Send messages to endpoint
    response = client.chat.completions.create(
        model=model_name,
        messages=test_messages,
        temperature =  0.01,
        top_p = 0.1
    )
    print(f"-------- {model_name} --------")
    print(response.choices[0].message.content)
    print()

## 8. Stopping/deleting an Endpoint
An endpoint can be:
  - stopped: sn_env.stop_endpoint(project_name, endpoint_name)
  - deleted: sn_env.delete_endpoint(project_name, endpoint_name)

## 9. Benchmarking

Modify the following files:
- _PATH TO AISK REPO HERE/benchmarking/benchmarking_scripts/config.yaml_

    With the desired input and output paths.

    ```yaml
    model_configs_path: '<PATH TO AISK REPO HERE>/benchmarking/benchmarking_scripts/model_configs_example.csv'
    llm_api: 'sncloud'
    output_files_dir: '<PATH TO AISK REPO HERE>/benchmarking/data/benchmarking_tracking_tests/logs/output_files'
    consolidated_results_dir: '<PATH TO AISK REPO HERE>/benchmarking/data/benchmarking_tracking_tests/consolidated_results'
    timeout: 3600
    time_delay: 0
    ```

- _PATH TO AISK REPO HERE/benchmarking/benchmarking_scripts/model_configs_example.csv_

    With the desired model configurations to test.

    ```csv
    model_name,input_tokens,output_tokens,num_requests,concurrent_requests,qps,qps_distribution,multimodal_img_size
    ```

Run the following commands.

```bash
cd ai-starter-kit
python benchmarking/benchmarking_scripts/synthetic_performance_eval_script.py
```

benchmarking results will be saved accordingly.