# Create a Speculative Decoding Pair

Welcome to this tutorial on creating a speculative decoding pair in SambaNova dedicated offerings!

**What is a speculative decoding pair?**  
A speculative decoding pair improves the inference speed of a larger `target_model` by having a smaller/faster `draft model` propose tokens. A good draft model will propose tokens that the target model is more likely to accept. A poor draft model will not improve the inference speed of the target model if its proposed tokens are not frequently accepted.

Before you get started, please follow the set up instructions given in the [README](./README.md)

## 1.  Imports

In [1]:
import sys
sys.version

'3.11.11 (main, Dec 11 2024, 10:28:39) [Clang 14.0.6 ]'

In [2]:
from IPython.display import display, HTML
display(HTML("<style>:root { --jp-notebook-max-width: 100% !important; }</style>"))
import json
import os
from dotenv import load_dotenv
import pprint
load_dotenv()

True

In [3]:
from snsdk import SnSdk

## 2. Set up environment connector

Connects to the remote dedicated environment using the variables defined in `.env`

In [4]:
sn_env = SnSdk(host_url=os.getenv("SAMBASTUDIO_HOST_NAME"), 
                   access_key=os.getenv("SAMBASTUDIO_ACCESS_KEY"), 
                   tenant_id=os.getenv("SAMBASTUDIO_TENANT_NAME"))

## 3. Select models to pair

#### List models

Get the complete list of models. This includes models that are  
  - actually available
  - still in the process of uploading
  - exist in a remote storage from which they can be made available
  - not in a usable state

In [5]:
models = sn_env.list_models()["models"]
len(models)

140

Filter down to the models that are actually available on the environment

In [6]:
available_models = [m for m in models if m['status'] == 'Available']
len(available_models)

53

Print names of the available models

In [7]:
sorted([m["model_checkpoint_name"] for m in available_models if "Meta" in m["model_checkpoint_name"]])

['Meta-Llama-3-70B-Instruct',
 'Meta-Llama-3-8B-Instruct',
 'Meta-Llama-3.1-405B-Instruct',
 'Meta-Llama-3.1-405B-Instruct-FP8',
 'Meta-Llama-3.1-405B-SD-Llama-3.1-8B',
 'Meta-Llama-3.1-70B-Instruct',
 'Meta-Llama-3.1-70B-SD-Llama-3.1-8B',
 'Meta-Llama-3.1-70B-SD-Llama-3.2-1B',
 'Meta-Llama-3.1-8B-Instruct',
 'Meta-Llama-3.2-11B-Vision-Instruct',
 'Meta-Llama-3.2-1B-Instruct',
 'Meta-Llama-3.2-3B-Instruct',
 'Meta-Llama-3.2-3B-Instruct-TP16',
 'Meta-Llama-3.2-90B-Vision-Instruct',
 'Meta-Llama-3.3-70B-Instruct',
 'Meta-Llama-3.3-70B-SD-Llama-3.2-1B-TP16',
 'Meta-Llama-Guard-3-8B']

#### Select draft and target models

Note that models whose names end in "TP16" have been optimized to run performantly on SambaNova's nodes that contain 16 RDUs ([RDUs](https://sambanova.ai/technology/sn40l-rdu-ai-chip) are SambaNova's cutting-edge replacements for GPUs).

In [9]:
target_model = 'Meta-Llama-3.1-405B-Instruct'
draft_model = 'Meta-Llama-3.2-3B-Instruct-TP16'

## 4. Validate SD Pair

Set `rdu_required=16` to specify that this SD pair will leverage all 16 RDUs for maximum performance. 

In [10]:
sd_pair_name = 'Meta-Llama-3.1-405B-SD-Llama-3.2-3B'
dependencies = [{'name': target_model}, {'name': draft_model}]
rdu_required = 16

If the following command fails, please try a different SD pair. 

In [11]:
validation_status = sn_env.validate_spec_decoding(
    target=target_model,
    draft=draft_model,
    rdu_required=rdu_required,
    dependencies=dependencies
)
print(validation_status['valid'])

True


In [12]:
validation_status['validations']

[{'level': 'INFO',
  'message': 'Target and draft models are compatible',
  'reason': 'SpeculativeDecodingPairCompatible'}]

## 5. Create SD Pair

In [13]:
sd_pair = sn_env.add_composite_model(
    name=sd_pair_name,
    description=f"SD pair with target model: {target_model} and draft model: {draft_model}.",
    dependencies=dependencies,
    rdu_required=rdu_required,
    config_params={'target_model': target_model, 'draft_model': draft_model},
    app="Spec Decoding"
)

In [14]:
sd_pair['status']

'Available'

Once created, a SD pair can be deleted using `sn_env.delete_model(sd_pair_name)`

## Next steps

Now that you have your SD pair created, you may want to [deploy it to an endpoint](<./Deploy a Model or Bundle to an Endpoint.ipynb>) or [create a bundle with it](<./Create a Model Bundle.ipynb>)