# Deploy a Model or Bundle to an Endpoint

Welcome to this tutorial on deploying a model or a model bundle to a SambaNova dedicated node!

Before you get started, please follow the set up instructions given in the [README](./README.md)

## 1.  Imports

In [1]:
import sys
sys.version

'3.11.11 (main, Dec 11 2024, 10:28:39) [Clang 14.0.6 ]'

In [2]:
from IPython.display import display, HTML
display(HTML("<style>:root { --jp-notebook-max-width: 100% !important; }</style>"))
import json
import os
from dotenv import load_dotenv
import pprint
load_dotenv()

True

In [4]:
from snsdk import SnSdk

## 2. Set up environment connector

Connects to the remote dedicated environment using the variables defined in `.env`

In [5]:
sn_env = SnSdk(host_url=os.getenv("SAMBASTUDIO_HOST_NAME"), 
                   access_key=os.getenv("SAMBASTUDIO_ACCESS_KEY"), 
                   tenant_id=os.getenv("SAMBASTUDIO_TENANT_NAME"))

## 3. Create or select a project

Projects are a way to organize endpoints and training/inference jobs

#### List available projects
You can list existing projects in which the endpoint can be created for model deployment

In [6]:
projects = sn_env.list_projects()["projects"]
sorted([project["name"] for project in projects])

['benchmarking', 'test_project_4']

#### Create a new project
If you do not wish to use an existing project, you may create a new one.

In [7]:
project_name = "test_project"
new_project = sn_env.create_project(
                    project_name=project_name,
                    description="A test project with a test endpoint"
                )

#### Deleting a project

If required, a project can be deleted using the `sn_env.delete_project(project_name)` function. Please be sure to stop and delete all endpoints and jobs before deleting a project.

## 4. Select model or bundle to deploy

#### List models

Get the complete list of models. This includes models that are  
  - actually available
  - still in the process of uploading
  - exist in a remote storage from which they can be made available
  - not in a usable state

In [8]:
models = sn_env.list_models()["models"]
len(models)

140

Filter down to the models that are actually available on the environment

In [9]:
available_models = [m for m in models if m['status'] == 'Available']
len(available_models)

29

Print names of the available models

In [10]:
sorted([m["model_checkpoint_name"] for m in available_models])

['DeepSeek-R1',
 'DeepSeek-R1-Distill-Llama-70B',
 'DeepSeek-V3',
 'Meta-Llama-3-70B-Instruct',
 'Meta-Llama-3-8B-Instruct',
 'Meta-Llama-3.1-405B-Instruct',
 'Meta-Llama-3.1-70B-Instruct',
 'Meta-Llama-3.1-70B-SD-Llama-3.2-1B',
 'Meta-Llama-3.1-8B-Instruct',
 'Meta-Llama-3.2-1B-Instruct',
 'Meta-Llama-3.3-70B-Instruct',
 'Meta-Llama-3.3-70B-SD-Llama-3.2-1B-TP16',
 'Mistral-7B-Instruct-V0.2',
 'QwQ-32B-Preview',
 'QwQ-32B-Preview-SD-Qwen-2.5-QWQ-0.5B',
 'Qwen 2.5 72B TP16',
 'Qwen-2.5-72B-SD-Qwen-2.5-0.5B',
 'Qwen2-72B-Instruct',
 'Qwen2-7B-Instruct',
 'Qwen2.5-0.5B-Instruct',
 'Qwen2.5-0.5B-SFT-Instruct',
 'Qwen2.5-72B-Instruct',
 'Qwen2.5-7B-Instruct',
 'Salesforce--Llama-xLAM-2-70b-fc-r',
 'Salesforce--Llama-xLAM-2-8b-fc-r',
 'Samba-1 Turbo',
 'e5-mistral-7B-instruct',
 'meta-llama-3.1-70b',
 'qwen_llama_salesforce']

#### Select model to deploy

In [11]:
selected_model = "qwen_llama_salesforce"

## 5. Create endpoint

In [None]:
endpoint_name = selected_model.lower().replace('_','-')
endpoint = sn_env.create_endpoint(
    project=project_name,
    endpoint_name=endpoint_name,
    description="Endpoint for " + selected_model,
    model_checkpoint=selected_model,
    model_version=1,
    instances=1,
    hyperparams='{"model_parallel_rdus": "16", "num_tokens_at_a_time": "10"}',
    rdu_arch="SN40L-16",
    inference_api_openai_compatible=True
)

#### Check the status of the endpoint

In [17]:
endpoint = sn_env.endpoint_info(project_name, endpoint_name)
endpoint['status']

'SettingUp'

## 6. Get Endpoint Details
To test the endpoint, we will need to obtain some of its information. Note that this information can be obtained even while the model is setting up.

#### Get the endpoint URL

In [None]:
endpoint_url = os.getenv("SAMBASTUDIO_HOST_NAME") + "/v1/" + endpoint["id"]

#### Get the default endpoint API key
Note that:
  - New keys can be added using the `sn_env.add_endpoint_api_key` API.    
  - All keys can be revoked using the `sn_env.edit_endpoint_api_key` API.

In [24]:
endpoint_key = endpoint["api_keys"][0]["api_key"]

#### Get model names in the endpoint

In [25]:
endpoint_model_id = endpoint['targets'][0]["model"]
model_info = sn_env.model_info(endpoint_model_id, job_type="deploy")

#### Check if the model is standalone or composite (bundle)

In [28]:
model_info["type"]

'Composite'

#### If the model is a composite/bundle, list its constituents

In [30]:
model_constituents = [m["name"] for m in model_info["dependencies"]]
sorted(model_constituents)

['Meta-Llama-3.3-70B-Instruct',
 'Qwen-2.5-72B-SD-Qwen-2.5-0.5B',
 'Salesforce--Llama-xLAM-2-70b-fc-r',
 'Salesforce--Llama-xLAM-2-8b-fc-r']

## 7. Test Endpoint
Once the endpoint is live, you can test it using the OpenAI API

#### Make sure endpoint is live

In [31]:
endpoint = sn_env.endpoint_info(project_name, endpoint_name)
endpoint['status']

'Live'

#### Create test messages to send to the endpoint

In [36]:
test_messages = [
    {
        "role": "system",
         "content": "You are a helpful assistant"
    },
    {
        "role": "user",
        "content": "How large is the Earth?"
    }
]

#### Send test messsages to the endpoint
In this example, we test all the constituents of the model bundle. An endpoint may only have one model deployed, in which case this test can be done against that model alone.

**Note: If a model uses speculative decoding, its name will not match the name expected by the endpoint. Instead, we need to get and use the name of the target model.**

In [46]:
import os
import openai

client = openai.OpenAI(
    api_key=endpoint_key,
    base_url=endpoint_url,
)

for constituent_name in model_constituents:    
    model_name = constituent_name

    # Check for speculative decoding
    constituent_info = sn_env.model_info(constituent_name, job_type="deploy")
    if 'target_model' in constituent_info['config']:
        target_name = constituent_info['config']['target_model']        
        if len(target_name) > 0:
            model_name = target_name

    # Send messages to endpoint
    response = client.chat.completions.create(
        model=model_name,
        messages=test_messages,
        temperature =  0.01,
        top_p = 0.1
    )
    print(f"-------- {model_name} --------")
    print(response.choices[0].message.content)
    print()

-------- Qwen2.5-72B-Instruct --------
The Earth is a nearly spherical object with an average radius of about 6,371 kilometers (3,959 miles). Its diameter, which is the distance from one side of the Earth to the other through its center, is approximately 12,742 kilometers (7,918 miles).

The Earth's circumference, which is the distance around the planet at the equator, is about 40,075 kilometers (24,901 miles). However, because the Earth is not a perfect sphere but rather an oblate spheroid (flattened at the poles and bulging at the equator), the distance from the center of the Earth to the surface varies slightly. The equatorial radius is about 6,378 kilometers (3,963 miles), while the polar radius is about 6,357 kilometers (3,950 miles).

These measurements give you a sense of the size of the Earth, which is the third planet from the Sun and the largest of the terrestrial planets in our solar system.

-------- Meta-Llama-3.3-70B-Instruct --------
The Earth is a massive planet, and it

## 8. Stopping/deleting an Endpoint
An endpoint can be:
  - stopped: sn_env.stop_endpoint(project_name, endpoint_name)
  - deleted: sn_env.delete_endpoint(project_name, endpoint_name)