# Import DeepSeek-R1-Distill-Llama Models to Amazon Bedrock

This notebook demonstrates how to import DeepSeek's distilled Llama models to Amazon Bedrock using Custom Model Import (CMI) feature. We'll use the 8B parameter model as an example, <u>but the same process applies to the 70B variant</u>.

## Introduction

DeepSeek has released several distilled versions of their models based on Llama architecture. These models maintain strong performance while being more efficient than their larger counterparts. The 8B model we'll use here is derived from Llama 3.1 and has been **optimized for reasoning tasks**.

## Prerequisites

- An AWS account with access to Amazon Bedrock
- Appropriate IAM roles and permissions for Bedrock and Amazon S3, following [the instruction here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-import-iam-role.html)
- A S3 bucket prepared to store the custom model
- Sufficient local storage space (At least 17GB for 8B and 135GB for 70B models)


### Step 1: Install Required Packages

First, let's install the necessary Python packages:

In [1]:
!pip install transformers
!pip install boto3 --upgrade
!pip install -U huggingface_hub
!pip install hf_transfer huggingface huggingface_hub "huggingface_hub[hf_transfer]"

Collecting boto3
  Downloading boto3-1.36.14-py3-none-any.whl.metadata (6.7 kB)
Collecting botocore<1.37.0,>=1.36.14 (from boto3)
  Downloading botocore-1.36.14-py3-none-any.whl.metadata (5.7 kB)
Downloading boto3-1.36.14-py3-none-any.whl (139 kB)
Downloading botocore-1.36.14-py3-none-any.whl (13.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.3/13.3 MB[0m [31m161.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: botocore, boto3
  Attempting uninstall: botocore
    Found existing installation: botocore 1.36.3
    Uninstalling botocore-1.36.3:
      Successfully uninstalled botocore-1.36.3
  Attempting uninstall: boto3
    Found existing installation: boto3 1.36.3
    Uninstalling boto3-1.36.3:
      Successfully uninstalled boto3-1.36.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
autogluon-multimodal 1.2 requ

### Step 2: Configure Parameters

Update these parameters according to your AWS environment:

In [10]:
# Define your parameters (please update this part based on your setup)
bucket_name = "sagemaker-us-west-2-710299592439"
s3_prefix = "DeepSeek-R1-Distill-Llama-8B" # E.x. DeepSeek-R1-Distill-Llama-8B
local_directory = "DeepSeek-R1-Distill-Llama-8B" # E.x. DeepSeek-R1-Distill-Llama-8B

job_name = 'Deepseek-8B-job-2' # E.x. Deepseek-8B-job
imported_model_name = 'DeepSeek-R1-Distill-Llama-8B' # E.x. Deepseek-8B-model
role_arn = 'arn:aws:iam::710299592439:role/service-role/AmazonSageMaker-ExecutionRole-20250124T194097' # Please make sure it has sufficient permission as listed in the pre-requisite

# Region (currently only 'us-west-2' and 'us-east-1' support CMI with Deepseek-Distilled-Llama models)
region_info = 'us-west-2' # You can modify to 'us-east-1' based on your need

### Step 3: Download Model from Hugging Face

Download the model files from Hugging Face. 

- Note that you can also use the 70B model by changing the model_id to "deepseek-ai/DeepSeek-R1-Distill-Llama-70B":

<div class="alert alert-warning">
<b>Note:</b> Downloading the 8B model files may take 2-10 minutes depending on your internet connection speed.
</div>

In [3]:
import os
from huggingface_hub import snapshot_download

hf_model_id = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"

# Enable hf_transfer for faster downloads
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

# Download using snapshot_download with hf_transfer enabled
snapshot_download(repo_id=hf_model_id, local_dir=f"./{local_directory}")

Fetching 11 files:   0%|          | 0/11 [00:00<?, ?it/s]

model-00001-of-000002.safetensors:   0%|          | 0.00/8.67G [00:00<?, ?B/s]

model-00002-of-000002.safetensors:   0%|          | 0.00/7.39G [00:00<?, ?B/s]

figures/benchmark.jpg:   0%|          | 0.00/777k [00:00<?, ?B/s]

LICENSE:   0%|          | 0.00/1.06k [00:00<?, ?B/s]

README.md:   0%|          | 0.00/19.0k [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.06k [00:00<?, ?B/s]

'/home/sagemaker-user/DeepSeek-R1-Distill-Llama-8B'

### Step 4: Upload Model to S3

Upload the downloaded model files to your S3 bucket

<div class="alert alert-warning">
<b>Note:</b> Uploading the 8B model files normally takes 10-20 minutes.
</div>

In [4]:
import os
import time
import json
import boto3
from pathlib import Path
from tqdm import tqdm

def upload_directory_to_s3(local_directory, bucket_name, s3_prefix):
    s3_client = boto3.client('s3')
    local_directory = Path(local_directory)
    
    # Get list of all files first
    all_files = []
    for root, dirs, files in os.walk(local_directory):
        for filename in files:
            local_path = Path(root) / filename
            relative_path = local_path.relative_to(local_directory)
            s3_key = f"{s3_prefix}/{relative_path}"
            all_files.append((local_path, s3_key))
    
    # Upload with progress bar
    for local_path, s3_key in tqdm(all_files, desc="Uploading files"):
        try:
            s3_client.upload_file(
                str(local_path),
                bucket_name,
                s3_key
            )
        except Exception as e:
            print(f"Error uploading {local_path}: {str(e)}")


# Upload all files
upload_directory_to_s3(local_directory, bucket_name, s3_prefix)


Uploading files: 100%|██████████| 34/34 [02:06<00:00,  3.72s/it]


### Step 5: Create Custom Model Import Job

Initialize the import job in Amazon Bedrock

<div class="alert alert-warning">
<b>Note:</b> Creating CMI job for 8B model could take 5-20 minutes to complete.
</div>

In [11]:
# Initialize the Bedrock client
bedrock = boto3.client('bedrock', region_name=region_info)

s3_uri = f's3://{bucket_name}/{s3_prefix}/'

# Create the model import job
response = bedrock.create_model_import_job(
    jobName=job_name,
    importedModelName=imported_model_name,
    roleArn=role_arn,
    modelDataSource={
        's3DataSource': {
            's3Uri': s3_uri
        }
    }
)

job_Arn = response['jobArn']

# Output the job ARN
print(f"Model import job created with ARN: {response['jobArn']}")


Model import job created with ARN: arn:aws:bedrock:us-west-2:710299592439:model-import-job/8p0mnpfc4yyx


### Step 6: Monitor Import Job Status

Check the status of your import job

In [12]:
# Check CMI job status
while True:
    response = bedrock.get_model_import_job(jobIdentifier=job_Arn)
    status = response['status'].upper()
    print(f"Status: {status}")
    
    if status in ['COMPLETED', 'FAILED']:
        break
        
    time.sleep(60)  # Check every 60 seconds

# Get the model ID
model_id = response['importedModelArn']

Status: INPROGRESS
Status: INPROGRESS
Status: INPROGRESS
Status: INPROGRESS
Status: INPROGRESS
Status: INPROGRESS
Status: INPROGRESS
Status: COMPLETED


### Step 7: Wait for Model Initialization

Allow time for the model to initialize:

In [13]:
# Wait for 5mins for cold start 
time.sleep(300)

### Step 8: Model Inference with Proper Tokenization

#### Understanding the Tokenization Process
When working with DeepSeek models, proper tokenization is crucial for optimal performance. The model expects inputs to follow a specific format defined in its `tokenizer_config.json`. This format ensures the model receives prompts in the same structure it was trained on.

#### Key Components
1. **Tokenizer**: Uses HuggingFace's AutoTokenizer to properly format inputs
2. **Generation Function**: Handles the core interaction with the model
3. **Auto-Generate Function**: Manages longer responses that might exceed token limits

#### 8.1 Setting Up the Tokenizer
First, we'll initialize the tokenizer and Bedrock runtime client:

In [14]:
from transformers import AutoTokenizer
import json
import boto3
from botocore.config import Config
from IPython.display import Markdown, display

# Initialize the tokenizer
tokenizer = AutoTokenizer.from_pretrained(hf_model_id)

# Initialize Bedrock Runtime client
session = boto3.Session()
client = session.client(
    service_name='bedrock-runtime',
    region_name=region_info,
    config=Config(
        connect_timeout=300,  # 5 minutes
        read_timeout=300,     # 5 minutes
        retries={'max_attempts': 3}
    )
)

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

tokenizer_config.json:   0%|          | 0.00/3.06k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

#### 8.2 Core Generation Function

This function handles the basic model interaction with proper tokenization:

In [15]:
def generate(messages, temperature=0.3, max_tokens=4096, top_p=0.9, continuation=False, max_retries=10):
    """
    Generate response using the model with proper tokenization and retry mechanism
    
    Parameters:
        messages (list): List of message dictionaries with 'role' and 'content'
        temperature (float): Controls randomness in generation (0.0-1.0)
        max_tokens (int): Maximum number of tokens to generate
        top_p (float): Nucleus sampling parameter (0.0-1.0)
        continuation (bool): Whether this is a continuation of previous generation
        max_retries (int): Maximum number of retry attempts
    
    Returns:
        dict: Model response containing generated text and metadata
    """
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, 
                                         add_generation_prompt=not continuation)
    
    attempt = 0
    while attempt < max_retries:
        try:
            response = client.invoke_model(
                modelId=model_id,
                body=json.dumps({
                    'prompt': prompt,
                    'temperature': temperature,
                    'max_gen_len': max_tokens,
                    'top_p': top_p
                }),
                accept='application/json',
                contentType='application/json'
            )
            
            result = json.loads(response['body'].read().decode('utf-8'))
            return result
            
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {str(e)}")
            attempt += 1
            if attempt < max_retries:
                time.sleep(30)
    
    raise Exception("Failed to get response after maximum retries")

#### 8.3 Extended Generation Function

The thinking process of the model can become quite extensive, especially when dealing with complex reasoning problems that require step-by-step analysis. This often exceeds the output context length we set for the model. To address this limitation:

1. We first attempt to generate a complete response
2. If the response is truncated (indicated by stop_reason = "length"), we:
   - Concatenate the partial response to the original prompt
   - Make another API call with `continuation=True`
   - This sets `add_generation_prompt=False` in the tokenizer call
3. This process continues until we get a complete response

This approach ensures we capture the model's complete reasoning process while maintaining coherence throughout the response.


In [16]:
def auto_generate(messages, **kwargs):
    """
    Handle longer responses that exceed token limit
    
    Parameters:
        messages (list): List of message dictionaries
        **kwargs: Additional parameters for generate function
    
    Returns:
        dict: Enhanced response including thinking process and final answer
    """
    res = generate(messages, **kwargs)
    while res["stop_reason"] == "length":
        for v in messages:
            if v.get("role") == "user":
               v["content"] += res["generation"]
        res = generate(messages, **kwargs, continuation=True)

    for v in messages:
        if v.get("role") == "user":
           gen = v["content"] + res["generation"]
           answer = gen.split("</think>")[-1]
           think = gen.split("</think>")[0].split("<think>")[-1]
           res = {**res, "generation": gen, "answer": answer, "think": think}
           return res

### Usage Examples
#### Basic Usage

In [18]:
test_prompt = """Given the following financial data:
- Company A's revenue grew from $10M to $15M in 2023
- Operating costs increased by 20%
- Initial operating costs were $7M

Calculate the company's operating margin for 2023. Please reason step by step.
"""

messages = [{"role": "user", "content": test_prompt}]
response = generate(messages)
print("Model Response:")
print(response["generation"])

Model Response:
<think>
First, I need to understand the given information. The company's revenue increased from $10 million to $15 million in 2023. The initial operating costs were $7 million, and these costs increased by 20% in 2023.

Next, I'll calculate the new operating costs by applying the 20% increase to the initial cost. So, 20% of $7 million is $1.4 million, and adding that to the original $7 million gives a total operating cost of $8.4 million for 2023.

Then, to find the operating margin, I'll subtract the new operating costs from the 2023 revenue. That means $15 million minus $8.4 million equals a $6.6 million operating margin.

Finally, to express the operating margin as a percentage, I'll divide the operating margin by the revenue and multiply by 100. So, $6.6 million divided by $15 million equals a 44% operating margin.
</think>

To calculate the **operating margin** for Company A in 2023, follow these steps:

### Given:
- **Revenue in 2023:** \$15 million
- **Initial op

#### Advanced Usage with Complex Prompt

In [19]:
complex_prompt = """Solve the following optimization problem:

A manufacturing company produces two types of products: A and B. 
They need to determine the optimal production quantities to maximize profit.

Given constraints:
1. Manufacturing capacity: 60 hours per week
2. Product A takes 4 hours to produce
3. Product B takes 3 hours to produce
4. Storage space can hold maximum 20 units total
5. Profit per unit:
   - Product A: $200
   - Product B: $150
6. Minimum required production:
   - At least 3 units of Product A
   - At least 2 units of Product B

Please:
1. Set up the linear programming equations
2. Solve step by step
3. Verify all constraints are met
4. Calculate maximum profit
5. Analyze sensitivity to changes in constraints
6. Recommend optimal production plan

Show all your work and reasoning at each step."""

# System prompt to encourage detailed mathematical reasoning
system_prompt = """You are a mathematical optimization expert. 
Please provide detailed step-by-step solutions showing:
- All equations and their development
- Each calculation step
- Verification of constraints
- Clear reasoning for each decision
- Visual representations where helpful"""

# Run the analysis with auto_generate
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": complex_prompt}
]

response = auto_generate(messages, temperature=0.7, max_tokens=4096, top_p=0.9)

# Display the response
print("\n=== Thinking Process ===")
display(Markdown(response["think"]))
print("\n=== Solution ===")
display(Markdown(response["answer"]))


=== Thinking Process ===



Okay, so I need to help this manufacturing company figure out how many units of Product A and Product B to produce to maximize their profit. Hmm, I think I should start by understanding the problem.

First, the company has a manufacturing capacity of 60 hours per week. Product A takes 4 hours each, and Product B takes 3 hours each. So, if I let x be the number of Product A and y be the number of Product B, the total hours used would be 4x + 3y. This total can't exceed 60 hours, so the first constraint is 4x + 3y ≤ 60.

Next, there's a storage space constraint. The maximum they can store is 20 units total. So, x + y ≤ 20. That's the second constraint.

Profit is another important factor. Product A gives $200 profit each, and B gives $150. So, the profit function would be 200x + 150y, which we want to maximize.

Also, there are minimum production requirements: at least 3 units of A and at least 2 units of B. So, x ≥ 3 and y ≥ 2.

Alright, so to set up the linear programming model:

Maximize Z = 200x + 150y

Subject to:
4x + 3y ≤ 60
x + y ≤ 20
x ≥ 3
y ≥ 2
x, y ≥ 0 (though since x ≥3 and y ≥2, these are redundant)

Now, I need to solve this step by step. I think the next step is to graph the feasible region. But since I can't draw here, I'll find the corner points by solving the equations.

First, let's find where the lines intersect.

1. Intersection of 4x + 3y = 60 and x + y = 20.

Let me solve these two equations:

From the second equation: y = 20 - x

Substitute into the first equation: 4x + 3(20 - x) = 60
4x + 60 - 3x = 60
x + 60 = 60
x = 0

So, y = 20 - 0 = 20. But wait, x can't be 0 because of the minimum production requirement. So, this point (0,20) is not feasible.

2. Next, find where 4x + 3y = 60 intersects y = 2.

If y=2, then 4x + 6 = 60 → 4x=54 → x=13.5. So, point (13.5, 2).

But check if x + y ≤20: 13.5 +2=15.5 ≤20, so it's within the constraint.

3. Find where x + y =20 intersects y=2. Then x=18. So, point (18,2).

4. Now, where does 4x +3y=60 intersect x=3? Let's see, x=3, so 12 +3y=60 → 3y=48 → y=16. So, point (3,16). Check x+y=19, which is ≤20, so okay.

5. Also, check where 4x +3y=60 intersects y=2 and x=3. That's point (13.5,2) as above.

6. Now, the feasible region is bounded by x=3, y=2, 4x +3y=60, and x+y=20. The corner points are (3,2), (13.5,2), (18,2), and the intersection point of 4x +3y=60 and x + y=20, which was (0,20), but that's not feasible because x must be at least 3. Wait, so maybe I missed a point.

Wait, actually, the feasible region is a polygon with vertices at (3,2), (13.5,2), (0,20), and (3,16). But (0,20) is not feasible because x must be at least 3, so the feasible region is actually a polygon with vertices at (3,2), (13.5,2), (3,16). Hmm, no, wait, because 4x +3y=60 and x+y=20 intersect at (0,20), but since x can't be 0, the feasible region is bounded by (3,2), (13.5,2), (3,16), and (18,2). Wait, I'm getting confused.

Let me clarify.

The constraints are:

1. 4x +3y ≤60

2. x + y ≤20

3. x ≥3

4. y ≥2

So, the feasible region is the area where all these inequalities overlap.

The corner points would be:

- (3,2): intersection of x=3 and y=2.

- (13.5,2): intersection of x=3 and 4x +3y=60.

- (3,16): intersection of x=3 and 4x +3y=60.

- (18,2): intersection of y=2 and x + y=20.

- (0,20): intersection of x=0 and x + y=20, but x=0 is not allowed because x ≥3.

Wait, so the feasible region is a polygon with vertices at (3,2), (13.5,2), (18,2), and (3,16). Because 4x +3y=60 and x + y=20 intersect at (0,20), which is not feasible, so the actual corner points are where the constraints cross within the feasible region.

So, the corner points are:

1. (3,2)

2. (13.5,2)

3. (18,2)

4. (3,16)

Now, I need to evaluate the objective function Z=200x +150y at each of these points.

Let me compute Z for each:

1. At (3,2): Z=200*3 +150*2=600 +300=900

2. At (13.5,2): Z=200*13.5 +150*2=2700 +300=3000

3. At (18,2): Z=200*18 +150*2=3600 +300=3900

4. At (3,16): Let's check if this point is within the constraints. x + y=19 ≤20, okay. 4x +3y=12 +48=60, which is within the capacity. So Z=200*3 +150*16=600 +2400=3000

So, comparing these Z values: 900, 3000, 3900, 3000. The maximum is 3900 at (18,2). So, producing 18 units of A and 2 units of B gives the maximum profit of $3900.

But wait, let me double-check the point (18,2). Does it satisfy all constraints?

x=18, y=2

4x +3y=72 +6=78, which is more than 60. Wait, that's a problem. It exceeds the manufacturing capacity.

Oh no, I made a mistake earlier. The point (18,2) comes from x + y=20, but 4x +3y=4*18 +6=72 +6=78, which is way over 60. So, that point is not feasible because it violates the manufacturing constraint.

So, I need to correct that. Therefore, the point (18,2) is not in the feasible region. So, the feasible region is actually only up to where 4x +3y=60 intersects x + y=20, but that was at (0,20), which is not allowed. So, the feasible region is bounded by (3,2), (13.5,2), and (3,16). But wait, (3,16) is where 4x +3y=60 and x=3, which is (3,16). Let's check x + y=19, which is okay.

So, the feasible region is a polygon with vertices at (3,2), (13.5,2), and (3,16). Now, evaluate Z at these points.

At (3,2): 900

At (13.5,2): 3000

At (3,16): 3000

So, the maximum is 3000 at both (13.5,2) and (3,16). Wait, but (3,16) is also a vertex. So, the maximum profit is $3000.

But wait, earlier I thought (18,2) was a vertex, but it's not feasible because it violates the manufacturing capacity. So, the maximum profit is $3000.

But let me make sure. Maybe I should also check if there's another point where x + y=20 intersects 4x +3y=60, but that was at (0,20), which isn't allowed. So, the feasible region is a triangle with vertices at (3,2), (13.5,2), and (3,16). Therefore, the maximum profit is at (13.5,2) and (3,16), both giving $3000.

But wait, let's double-check (3,16):

4*3 +3*16=12 +48=60, which is okay.

x + y=19, which is within 20.

So, yes, both points are feasible.

But why does (3,16) give the same profit as (13.5,2)? Because 200*3 +150*16=600 +2400=3000, and 200*13.5 +150*2=2700 +300=3000.

So, both points yield the same profit.

Therefore, the optimal production is either 13.5 units of A and 2 of B, or 3 of A and 16 of B, both giving $3000 profit.

But wait, the problem asks for the optimal production quantities. So, maybe both are optimal. However, since 13.5 is not an integer, they might need to round. But the problem didn't specify integer constraints, so maybe 13.5 is okay. But sometimes in production, you can't produce half units, but the question didn't specify, so I think fractional solutions are acceptable.

Now, for sensitivity analysis. Let's see which constraints are binding.

The objective function is Z=200x +150y.

The constraints are:

1. 4x +3y ≤60 (binding at (3,16) and (13.5,2))

2. x + y ≤20 (not binding since it's not hit at any vertex)

3. x ≥3 (binding at (3,2), (3,16))

4. y ≥2 (binding at (3,2), (13.5,2))

So, the binding constraints are 4x +3y=60, x=3, and y=2.

So, the optimal solution is determined by these three constraints.

If we increase the profit of A, say, beyond 200, then x would increase. Similarly, if profit of B increases, y might increase.

If manufacturing capacity increases, more units can be produced.

If storage space increases beyond 20, more can be produced.

If the minimums change, that would affect the feasible region.

So, the optimal solution is either (13.5,2) or (3,16), both giving same profit.

But to confirm, I think (13.5,2) is more efficient in terms of using more manufacturing hours, but both are valid.



=== Solution ===




To solve the optimization problem, we first set up the linear programming model and then determine the optimal production quantities that maximize profit while satisfying all constraints.

### Step 1: Define the Linear Programming Model

**Objective Function:**
Maximize profit, which is given by:
\[ Z = 200x + 150y \]
where:
- \( Z \) = Total profit
- \( x \) = Number of units of Product A produced
- \( y \) = Number of units of Product B produced

**Constraints:**
1. Manufacturing capacity:
\[ 4x + 3y \leq 60 \]
2. Storage space:
\[ x + y \leq 20 \]
3. Minimum production:
\[ x \geq 3 \]
\[ y \geq 2 \]

### Step 2: Determine the Feasible Region

To find the feasible region, we solve the system of inequalities.

1. **Intersection of \( 4x + 3y = 60 \) and \( x + y = 20 \):**
   - Solving these equations:
     - From \( x + y = 20 \), we get \( y = 20 - x \).
     - Substitute into \( 4x + 3(20 - x) = 60 \):
       \[ 4x + 60 - 3x = 60 \]
       \[ x = 0 \]
     - Then, \( y = 20 \).
   - Point: \( (0, 20) \) (Not feasible since \( x \geq 3 \)).

2. **Intersection of \( 4x + 3y = 60 \) and \( x = 3 \):**
   - Substitute \( x = 3 \):
     \[ 12 + 3y = 60 \]
     \[ y = 16 \]
   - Point: \( (3, 16) \)

3. **Intersection of \( 4x + 3y = 60 \) and \( y = 2 \):**
   - Substitute \( y = 2 \):
     \[ 4x + 6 = 60 \]
     \[ x = 13.5 \]
   - Point: \( (13.5, 2) \)

4. **Intersection of \( x = 3 \) and \( y = 2 \):**
   - Point: \( (3, 2) \)

### Step 3: Identify the Feasible Region

The feasible region is a polygon bounded by:
- \( x \geq 3 \)
- \( y \geq 2 \)
- \( 4x + 3y \leq 60 \)
- \( x + y \leq 20 \)

The feasible region's vertices are:
1. \( (3, 2) \)
2. \( (13.5, 2) \)
3. \( (3, 16) \)

### Step 4: Evaluate the Objective Function at Each Vertex

1. **At \( (3, 2) \):**
   \[ Z = 200(3) + 150(2) = 600 + 300 = 900 \]

2. **At \( (13.5, 2) \):**
   \[ Z = 200(13.5) + 150(2) = 2700 + 300 = 3000 \]

3. **At \( (3, 16) \):**
   \[ Z = 200(3) + 150(16) = 600 + 2400 = 3000 \]

The maximum profit of \$3000 occurs at both \( (13.5, 2) \) and \( (3, 16) \).

### Step 5: Verify Constraints

Both points satisfy all constraints:
- \( 4(13.5) + 3(2) = 60 \) and \( 3 + 16 = 19 \leq 20 \)
- \( 4(3) + 3(16) = 60 \) and \( 3 + 2 = 5 \geq 3 \)

### Step 6: Analyze Sensitivity to Changes in Constraints

- **Manufacturing Capacity:** If this increases, more units can be produced.
- **Storage Space:** If increased, more units can be stored.
- **Minimum Production:** Adjusting these affects the lower bounds, potentially changing the optimal solution.

### Conclusion

The optimal production quantities are either **13.5 units of Product A and 2 units of Product B** or **3 units of Product A and 16 units of Product B**, both yielding a maximum profit of **\$3000**.

In [20]:
test_prompt = """中国汽车企业出海的成功要素和挑战有哪些？给出一些建议解决挑战. 请给出一步一步的思考过程.
"""

messages = [{"role": "user", "content": test_prompt}]
response = generate(messages)
print("Model Response:")
print(response["generation"])

Model Response:
<think>
嗯，我现在要分析中国汽车企业出海的成功要素和挑战，还要给出解决建议。首先，我得先了解出海的基本概念。出海指的是中国企业将产品或服务出口到国际市场，尤其是发达国家和新兴市场。对于汽车企业来说，这可能意味着要进入全球的汽车市场，面对不同的法规、文化和竞争。

首先，品牌影响力是一个关键因素。中国的汽车品牌在国内市场占据很大份额，比如大众、丰田、本田等，但在国际市场上可能不太有知名度。所以，品牌推广和国际化是必须的。比如，特斯拉在中国的成功是因为品牌价值和技术的结合。中国企业需要投资品牌建设，可能需要与国际知名品牌合作或者进行广告宣传，提升全球知名度。

接下来是研发能力。汽车行业技术更新换代很快，发达国家的汽车企业在技术上有很强的研发能力。中国企业如果想在国际市场上竞争，必须有自己的技术突破或者与国际合作，否则可能会被挤出局面。比如，中国的比亚迪在电动汽车领域有很强的研发能力，通过自主创新和技术突破，提升了国际竞争力。

然后是质量和安全标准。国际市场对产品质量和安全性有严格的要求，中国企业需要符合欧洲、美国等市场的标准。这可能需要投资质量控制体系，建立国际化的生产和检测流程，确保产品符合各国法规。

法律和法规也是一个大问题。每个国家有不同的法规，中国企业需要了解并遵守这些法规，可能需要雇佣国际化的法律团队，或者与当地企业合作，确保合规性。

文化差异方面，消费者习惯、语言、商业习惯等都不同。中国企业需要进行市场调研，了解目标市场的需求，调整产品和营销策略，才能成功进入。

供应链管理也是关键。国际供应链复杂，涉及多个国家和地区，中国企业需要建立稳定的供应链，确保原材料供应和生产流程的顺畅。同时，可能需要与国际供应商合作，优化供应链管理。

竞争优势方面，价格优势可能不适用，因为发达国家消费者更看重品牌和质量。中国企业需要通过技术创新和差异化竞争来保持优势，比如特斯拉在电动汽车领域的技术突破。

政府支持也很重要。政府可以提供税收优惠、融资支持和贸易保护，这些都有助于企业国际化。同时，企业需要利用这些政策，制定长远战略。

最后，进入新兴市场可能更容易，比如东南亚和东欧，这些市场增长快，竞争压力小，适合中国企业作为起点，积累经验和资源。

总结一下，成功要素包括品牌建设、技术研发、质量管理、法律合规、文化适应、供应链优化

## Conclusion

This notebook demonstrates the end-to-end process of importing DeepSeek's distilled Llama models to Amazon Bedrock using Custom Model Import (CMI). Starting from downloading the model from HuggingFace, through preparing and uploading files to S3, to creating a CMI job and performing inference, we've covered the essential steps to get your DeepSeek distilled Llama models running on Amazon Bedrock.


While we've used the DeepSeek-R1-Distill-Llama-8B model in this example, the same process applies to other variants including the 70B model. For more information about Custom Model Import and its features, refer to the [Amazon Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-import-model.html).