<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/DeepSeek_R1_Distill_Qwen_1_5B_AWS_APRIL2025.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

DeepSeek-R1 Distill Qwen 1.5B

1. Introduction

The first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

NOTE: Before running DeepSeek-R1 series models locally or creating an endpoint, we kindly recommend reviewing the Usage Recommendation section and adhering to JumpStart default settings.

2. Model Summary
Post-Training: Large-Scale Reinforcement Learning on the Base Model

We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area.

Here is the pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models.

Distillation: Smaller Models Can Be Powerful Too

It demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future.
Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community.

In [None]:
!pip install colab-env --quiet

!pip install sagemaker boto3 --quiet

%pip install langchain --quiet

In [None]:
import colab_env
import os
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID")
aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY")
region=os.getenv("AWS_DEFAULT_REGION")
output=os.getenv("AWS_DEFAULT_OUTPUT")

In [None]:
import colab_env
import boto3
import os
import sagemaker
from sagemaker.jumpstart.model import JumpStartModel

iam_client = boto3.client("iam")

role = iam_client.get_role(
    RoleName=os.getenv("ROLENAME")
)

ROLE_ARN = role['Role']['Arn']


from sagemaker.jumpstart.model import JumpStartModel

#ml.g6.2xlarge for endpoint usage

# https://studio-d-yesr9g64bv2p.studio.us-east-1.sagemaker.aws/jumpstart/SageMakerPublicHub/Model/deepseek-llm-r1-distill-qwen-1-5b

llm_model_id, llm_model_version = "deepseek-llm-r1-distill-qwen-1-5b", "*"
llm_model = JumpStartModel(model_id=llm_model_id, model_version=llm_model_version, role=ROLE_ARN, region='us-east-1')
llm_predictor = llm_model.deploy(accept_eula=True)


In [None]:
#this is the model endpoint NAME, not the ARN
llm_model_endpoint_name = llm_predictor.endpoint_name
llm_model_endpoint_name

'deepseek-llm-r1-distill-qwen-1-5b-2025-04-07-20-23-38-676'

In [None]:
#### CASE#1
import json
#query = "who is the best French Poet?"
query = "Write a program to compute factorial in python:"


# Create a boto3 client for SageMaker runtime
sm_client = boto3.client('runtime.sagemaker')


### WITH PARAMETRS
n=5
MNT=512*n
model_kwargs={"max_new_tokens": MNT, "temperature": 0.9}
input_data = ({"inputs": query, "parameters" : {**model_kwargs}})

response = sm_client.invoke_endpoint(EndpointName=llm_model_endpoint_name, Body=json.dumps(input_data), ContentType="application/json")

# Decode the response from the model
response_body = json.loads(response['Body'].read().decode('utf-8'))
#print(response_body)

print(f'Query:', query)
print('\n')

# Check if the expected key exists before accessing it
if 'generated_text' in response_body:
    print(f'Response #1:', response_body['generated_text'])  # Access directly if it's a dictionary
    print('\n')
elif isinstance(response_body, list) and len(response_body) > 0 and 'generated_text' in response_body[0]:
    print(f'Response #2:', response_body[0]['generated_text'])  # Access using index if it's a list of dictionaries
    print('\n')
else:
    print("Unexpected response format:", response_body)

Query: Write a program to compute factorial in python:


Response #1:  using either / or cos...
No, the ... I'm not sure if that's the right approach.

But the factorial function could be implemented using either adding or multiplying recursively.

Alternatively, using the property that factorial(n) = (n-1) * factorial(n-1), which could make it more efficient.

Wait, but the question is to write a program to compute the factorial, either using addition or multiplication. Hmm, but the question says the program should compute factorial using either dividing or something... Hmm, no, maybe the instruction is different.

Alternatively, perhaps the idea is to compute the factorial using recursive addition or multiplication. So, for example, computing factorial(n) as adding n repeatedly n times, which is n * factorial(n-1). But since the function is already multiplicative, it's more efficient to compute it with multiplication. But perhaps the way is to compute recursively with addition.

Alte

In [None]:
#### CASE#2

#query = "who is the best French Poet?"
query = "I bought an ice cream for 6 kids. Each cone was $1.25 and I paid with a $10 bill. How many dollars did I get back? Explain first before answering."


# Create a boto3 client for SageMaker runtime
sm_client = boto3.client('runtime.sagemaker')

# Prepare the input for the model
#input_data = {"inputs": query}

### WITH PARAMETRS
n=5
MNT=512*n
model_kwargs={"max_new_tokens": MNT, "temperature": 0.9}
input_data = ({"inputs": query, "parameters" : {**model_kwargs}})

response = sm_client.invoke_endpoint(EndpointName=llm_model_endpoint_name, Body=json.dumps(input_data), ContentType="application/json")

# Decode the response from the model
response_body = json.loads(response['Body'].read().decode('utf-8'))

print(f'Query:', query)
print('\n')


if 'generated_text' in response_body:
    print(f'Response #1:', response_body['generated_text'])
elif isinstance(response_body, list) and len(response_body) > 0 and 'generated_text' in response_body[0]:
    print(f'Response #2:', response_body[0]['generated_text'])
else:
    print("Unexpected response format or empty response:", response_body)

Query: I bought an ice cream for 6 kids. Each cone was $1.25 and I paid with a $10 bill. How many dollars did I get back? Explain first before answering.


Response #1:  To start, let me think... If I have 6 kids and each cone is $1.25, then the total cost is 6 multiplied by $1.25. So, 6 times 1.25 equals... Hmm, that's 7.50. Okay, so the total is $7.50. I paid with a $10 bill. So, the change should be $10 minus $7.50. Let me compute that: $10 minus $7.50 is $2.50. So, I'd get back $2.50. Is that correct?

Wait a second, I've heard before that sometimes there are cents that can affect the exact change. Did I handle the cents right? Let's break it down. Each cone is $1.25, which is 1 dollar and 25 cents. So, for 6 cones, if I have to add that six times, let's see:

Start with 6 times 1 dollar, that's $6.00. Then, 6 times 25 cents is 150 cents. So, the total is $6.00 plus $1.50, which is $7.50. So, yes, that checks out. Then, I pay with $10, so I subtract $7.50 from $10.00.

But another 

# CLEAN UP

In [None]:
#  Frank Morales created this cell on December 14, 2023; it fully allows automatically the deletion of endpoints, models, and endpoint configurations.

import colab_env
import os

aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID")
aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY")
aws_region=os.getenv("AWS_DEFAULT_REGION")
aws_output=os.getenv("AWS_DEFAULT_OUTPUT")

import boto3

sagemaker_client = boto3.client('sagemaker', region_name=aws_region)

def cleanup_sagemaker_resources(resource_name,resourceid):

    if resourceid==0:
       response=sagemaker_client.list_endpoints()
    elif resourceid==1:
         response=sagemaker_client.list_models()
    elif resourceid==2:
         response=sagemaker_client.list_endpoint_configs()

    print(resource_name)

    number_of_endpoints=len(response['%s'%resource_name])
    for i in range(number_of_endpoints):
        resource_nametmp='%s'%resource_name[0:len(resource_name)-1]
        print('%sName'%resource_nametmp)
        print(response['%s'%resource_name][i]['%sName'%resource_nametmp])

        if resourceid==0:
           endpoint_name=response['%s'%resource_name][i]['%sName'%resource_nametmp]
           sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
        elif resourceid==1:
           sagemaker_client.delete_model(ModelName=response['Models'][i]['ModelName'])
        elif resourceid==2:
           sagemaker_client.delete_endpoint_config(EndpointConfigName=response['EndpointConfigs'][i]['EndpointConfigName'])

    print("\n==================================\n")


cleanup_sagemaker_resources('Endpoints',0)
cleanup_sagemaker_resources('Models',1)
cleanup_sagemaker_resources('EndpointConfigs',2)