# SageMaker JumpStart Deployment & LangGraph Chat

This notebook demonstrates how to:
1. Deploy the `deepseek-llm-r1-distill-qwen-1-5b` model using SageMaker JumpStart.
2. Authenticate using a specific AWS Profile.
3. Integrate the deployed endpoint with LangGraph for stateful chat.

In [1]:
# Install dependencies
# Installing sagemaker with --no-deps to avoid conflicts on some environments (like Python 3.13 + numpy)
%pip install -r requirements.txt -q
%pip install 'sagemaker==2.251.1' --no-deps

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## 1. Setup & Authentication
We configure the AWS session using the specified profile.

In [2]:
import sagemaker
import boto3
from sagemaker.jumpstart.model import JumpStartModel

# Configuration
PROFILE_NAME = 'default'
MODEL_ID = "deepseek-llm-r1-distill-qwen-1-5b"
MODEL_VERSION = "*"

# Establish Session
try:
    boto_session = boto3.Session(profile_name=PROFILE_NAME)
    sagemaker_session = sagemaker.Session(boto_session=boto_session)
    region = boto_session.region_name
    print(f"Authenticated with profile: {PROFILE_NAME} in region: {region}")
except Exception as e:
    print(f"Failed to use profile {PROFILE_NAME}. Falling back to default credentials.")
    boto_session = boto3.Session()
    sagemaker_session = sagemaker.Session(boto_session=boto_session)
    region = boto_session.region_name
    print(f"Authenticated with default credentials in region: {region}")

sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/kartik/Library/Application Support/sagemaker/config.yaml
Authenticated with profile: STX-APPLICATION-PLATFORM-ADMIN in region: ap-south-1


## 2. Deploy JumpStart Model
We define the JumpStart model and deploy it. 
**Note**: You must accept the EULA if this is a gated model (`accept_eula=True`).

In [3]:
# Define the model
# We pass the sagemaker_session to ensure it uses the correct profile
model = JumpStartModel(
    model_id=MODEL_ID,
    model_version=MODEL_VERSION,
    sagemaker_session=sagemaker_session
)

# Deploy the endpoint
# accept_eula=True is often required for JumpStart models
try:
    predictor = model.deploy(accept_eula=True)
    ENDPOINT_NAME = predictor.endpoint_name
    print(f"Model deployed successfully! Endpoint Name: {ENDPOINT_NAME}")
except Exception as e:
    print(f"Deployment failed or endpoint already exists: {e}")
    # Attempt to retrieve existing endpoint if deployment fails (e.g. if you ran this cell twice)
    # Note: Logic to find existing endpoint by name would go here if needed.

Using model 'deepseek-llm-r1-distill-qwen-1-5b' with wildcard version identifier '*'. You can pin to version '2.20.0' for more stable results. Note that models may have different input/output signatures after a major version upgrade.


Deployment failed or endpoint already exists: An error occurred (ValidationException) when calling the CreateModel operation: Could not access model data at s3://jumpstart-cache-prod-ap-south-1/deepseek-llm/deepseek-llm-r1-distill-qwen-1-5b/artifacts/inference-prepack/v2.0.0/. Please ensure that the role "arn:aws:iam::842675988009:role/aws-reserved/sso.amazonaws.com/ap-south-1/AWSReservedSSO_stx-devops-super-admin-kt4t_75caa9a1ad342c88" exists and that its trust relationship policy allows the action "sts:AssumeRole" for the service principal "sagemaker.amazonaws.com". Also ensure that the role has "s3:GetObject" permissions and that the object is located in ap-south-1. If your Model uses multiple models or uncompressed models, please ensure that the role has "s3:ListBucket" permission.


## 3. Basic Invoke Test
Test the endpoint with a simple payload.

In [4]:
payload = {
    "inputs": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nWhat is Amazon SageMaker?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
    "parameters": {
        "max_new_tokens": 256,
        "temperature": 0.7,
        "top_p": 0.9
    }
}

if 'predictor' in locals():
    response = predictor.predict(payload)
    print(response)

## 4. LangGraph Integration
Now we wrap the endpoint in a LangGraph `call_model` node.

In [5]:
from typing import Annotated, TypedDict, List, Dict, Any
from langgraph.graph import StateGraph, START, END
import json

# Define Graph State
class State(TypedDict):
    messages: List[Dict[str, str]]

def call_model(state: State):
    messages = state["messages"]
    
    # Prepare payload
    # Using standard Chat API format if supported, or formatting manually.
    # Here assuming the model supports 'inputs' string prompts or chat-formatted inputs.
    
    # Manual formatting for DeepSeek/Llama-3 style
    prompt = "<|begin_of_text|>"
    for msg in messages:
        role = msg["role"]
        content = msg["content"]
        prompt += f"<|start_header_id|>{role}<|end_header_id|>\n\n{content}<|eot_id|>"
    prompt += "<|start_header_id|>assistant<|end_header_id|>\n\n"

    payload = {
        "inputs": prompt,
        "parameters": {
            "max_new_tokens": 1024,
            "temperature": 0.7,
            "top_p": 0.9,
            "stop": ["<|eot_id|>"]
        }
    }
    
    try:
        response = predictor.predict(payload)
        
        # Parse Logic
        if isinstance(response, bytes):
            response_data = json.loads(response.decode('utf-8'))
        else:
            response_data = response
            
        content = None
        
        # Handle [{'generated_text': '...'}]
        if isinstance(response_data, list) and len(response_data) > 0:
            item = response_data[0]
            full_text = item.get('generated_text')
            if full_text:
                # Strip prompt if echoed
                if full_text.startswith(prompt):
                    content = full_text[len(prompt):].strip()
                else:
                    content = full_text.strip()
        elif isinstance(response_data, dict) and 'generated_text' in response_data:
             content = response_data['generated_text']

        if not content:
            content = "Error: No content generated."
            
        return {"messages": messages + [{"role": "assistant", "content": content}]}
        
    except Exception as e:
        return {"messages": messages + [{"role": "assistant", "content": f"Error: {str(e)}"}]}

# Build Graph
workflow = StateGraph(State)
workflow.add_node("agent", call_model)
workflow.add_edge(START, "agent")
workflow.add_edge("agent", END)
app = workflow.compile()

In [6]:
# Interactive Chat
def chat():
    print("Starting Chat (type 'quit' to exit)...")
    history = []
    while True:
        user_input = input("User: ")
        if user_input.lower() in ['quit', 'exit']:
            break
        
        history.append({"role": "user", "content": user_input})
        output = app.invoke({"messages": history})
        history = output["messages"]
        print(f"Assistant: {history[-1]['content']}")

# chat() # Uncomment to run

In [7]:
# Clean up (Uncomment to delete endpoint when done)
# predictor.delete_predictor()