# Get Started with Generative AI

Generative AI is a type of artificial intelligence that can create new content and ideas, including conversations, stories, images, videos, and music. Like all artificial intelligence, generative AI is powered by machine learning models—very large models that are pre-trained on vast amounts of data and commonly referred to as foundation models (FMs). Apart from content creation, generative AI is also used to improve the quality of digital images, edit video, build prototypes quickly for manufacturing, augment data with synthetic datasets, and more.

## Using DeepSeek models 

---
This demo notebook shows how to interact with a deployed DeepSeek model endpoint on Amazon SageMaker AI by using the SageMaker Python SDK for text generation. DeepSeek models are known for their strong performance, particularly in coding and reasoning tasks. We show several example prompt engineering use cases, including code generation, question answering, and controlled model output.

Note: This notebook assumes you've already deployed a DeepSeek model to a SageMaker endpoint. You connect to this existing endpoint.
---

### Model details

---
DeepSeek LLM is a family of models developed by DeepSeek AI. The models come in various sizes and are trained on large datasets, with a significant portion dedicated to code, making them adept at programming-related tasks. 

This notebook focuses on interacting with a predeployed endpoint. For details on the specific DeepSeek model version, training data, and potential limitations (such as language support or inherent biases), refer to the model's documentation or the SageMaker JumpStart page used for its deployment.

DeepSeek models often include the following characteristics:
- Provide strong coding and mathematical reasoning capabilities
- Available in base and instruction-tuned/chat variants
- Trained on a diverse datasets, including web text and code

DeepSeek models include the following limitations:
- Like most large language models (LLMs), DeepSeek models can inherit biases from their training data. Use guardrails and appropriate precautions for production use.
- Performance can vary across different languages or highly specialized domains not well-represented in the training data.

---

## Connect to the deployed DeepSeek endpoint
Instead of deploying a model in this practice lab, you connect to an existing SageMaker endpoint that hosts the DeepSeek model. You need to provide the name of your specific endpoint.

In [1]:
import sagemaker
from sagemaker.predictor import retrieve_default
import json 
import boto3

# --- IMPORTANT --- 
# Replace this with the actual name of the deployed DeepSeek endpoint
endpoint_name = "jumpstart-dft-deepseek-llm-r1-disti-20250617-125247" 
# ----------------- 

predictor = None # Initialize predictor
try:
    print(f"Connecting to endpoint: {endpoint_name}...")
    # Use retrieve_default which automatically handles serializers/deserializers for known JumpStart containers
    predictor = retrieve_default(endpoint_name)
    print(f"Successfully connected to endpoint: {endpoint_name}")
except Exception as e:
    print(f"[Error] connecting to endpoint {endpoint_name}: {e}")
    print("Please ensure the endpoint name is correct and the endpoint is in 'InService' status.")
    # Optionally raise the error or handle it as needed
    # raise e 


sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
Connecting to endpoint: jumpstart-dft-deepseek-llm-r1-disti-20250617-125247...
Successfully connected to endpoint: jumpstart-dft-deepseek-llm-r1-disti-20250617-125247


### Supported parameters (example for DeepSeek chat)

DeepSeek models deployed through SageMaker AI often accept parameters within a JSON payload. The exact parameters depend on the specific deployment container (for example, TGI, vLLM, and DJL). Based on the reference code, a common structure for chat/instruct models involves a `messages` list and other generation controls. Common parameters include:

* **messages:** A list of message objects, each with `role` ('user', 'assistant', or sometimes 'system') and `content` (the text of the message). This allows for conversational context.
* **max_tokens** (or `max_new_tokens`): Maximum number of tokens to generate in the response.
* **temperature:** Controls randomness. Lower values (0.1-0.3) make the output more focused and deterministic; higher values (0.7-1.0) make it more creative and diverse.
* **top_p:** Nucleus sampling parameter. Considers only the most probable tokens whose cumulative probability exceeds `top_p`.
* **stop:** A list of strings. Generation stops if the model produces any of these strings.

Refer to the documentation of your specific SageMaker AI deployment (for example, TGI container images or the SageMaker JumpStart model card) for the definitive list of supported parameters and payload structure.

## Create a query endpoint function

In [3]:
import json
# Assume 'predictor' object and 'endpoint_name' variable exist from previous cells

def query_endpoint(prompt, temperature=0.7, max_tokens=10240):
    """
    This function handles sending the request, parsing the response to find 
    both the model's reasoning (thought process) and the final answer, 
    and prints them separately for clarity.

    Args:
        prompt (str): The text prompt to send to the model.
        temperature (float): Controls the randomness of the output (0.0 to ~1.0).
                             Lower values are more focused, higher are more creative.
        max_tokens (int): The maximum number of tokens (words/subwords) the model
                          should generate in its response.

    Returns:
        dict or None: The full response dictionary from the endpoint, or None if an error occurred.
    """

    print(f"\n--> Sending prompt to endpoint: {endpoint_name}")
    print("----------------------------------------------------")
    # Show the input prompt clearly
    print(f"[Input Prompt]:\n{prompt}")
    print("----------------------------------------------------")

    # 1. Prepare the data payload (the 'package' you send to the model)
    #    DeepSeek chat models expect data in a specific format:
    #    - A 'messages' list containing one or more message dictionaries.
    #    - Each message dictionary has a 'role' ('user' for us, 'assistant' for the model)
    #      and 'content' (the actual text of the message).
    #    - You also include parameters such as 'max_tokens' and 'temperature'.
    payload = {
        "messages": [
            {"role": "user", "content": prompt} # Our prompt goes here
        ],
        "max_tokens": max_tokens,         # Max length of the model's reply
        "temperature": temperature      # Controls creativity compared to focus
        # Other parameters, such as 'top_p', could be added here if needed
    }

    # Initialize variables to store parts of the response later
    raw_response = None     # To store the entire response dictionary
    reasoning = None        # To store the model's thought process
    final_answer = None     # To store the model's final answer
    warning_message = None  # To store any warnings (like truncation)

    try:
        # 2. Send the payload to the SageMaker endpoint by using the 'predictor'
        #    The 'predictor.predict()' function handles the communication.
        #    We need to make sure the 'predictor' object was successfully created earlier.
        if 'predictor' not in globals() or predictor is None:
             print("[Error]: 'predictor' object not found. \nDid the connection to the endpoint in the previous cell succeed?")
             return None # Stop the function here if predictor doesn't exist
             
        print("Waiting for response from the model...")
        raw_response = predictor.predict(payload)

        # 3. Process the response received from the endpoint
        #    The response is usually a dictionary that contains details about the generation.
        #    You need to carefully look inside it to find the generated text.

        # Use '.get(key)' which is safer than '[key]' as it returns None if the key doesn't exist,
        # preventing errors if the response structure is slightly different.
        if isinstance(raw_response, dict):
            choices = raw_response.get('choices')
            # Check if 'choices' exists and is a list with at least one item
            if isinstance(choices, list) and len(choices) > 0:
                first_choice = choices[0] # Get the first (usually only) choice
                if isinstance(first_choice, dict):
                    message = first_choice.get('message') # Get the 'message' dictionary
                    if isinstance(message, dict):
                        # Try to get the model's reasoning (thought process)
                        reasoning = message.get('reasoning_content')
                        # Try to get the final answer/content
                        final_answer = message.get('content')

                    # Check if the response was cut short (truncated)
                    finish_reason = first_choice.get('finish_reason')
                    if finish_reason == 'length':
                        warning_message = f"[Warning]: The model's output might have been cut short because it reached the maximum token limit ({max_tokens}). You might need to increase 'max_tokens' for a longer response."
        
        # --- If you couldn't find the expected text parts --- 
        if final_answer is None and reasoning is None:
             # Append to existing warning or create new one
             if warning_message:
                 warning_message += "\n[Warning]: Could not extract text using keys 'reasoning_content' or 'content'."
             else:
                 warning_message = "[Warning]: Could not extract text using keys 'reasoning_content' or 'content'."

    except Exception as e:
        # Handle errors that might happen during the prediction 
        # (for example, network problems, errors from the endpoint itself)
        print(f"\n[Error] Occurred while querying the endpoint: {e}")
        try:
             # Show the data you tried to send (helps find problems)
             print(f"   Payload attempted: {json.dumps(payload)}")
        except TypeError:
             print("   Payload attempted: (Contains non-serializable types)")
        if raw_response is not None:
             # Show the raw response if you received anything before the error
             print(f"   Raw response received before error: {raw_response}")
        return None # Indicate function failed by returning None

    # 4. Print the results in a structured and easy-to-read way

    print("\n<-- Received response:")
    print("====================================================")

    # Print any warnings first
    if warning_message:
        print(f"{warning_message}\n")

    # Print the reasoning/thought process if it was found
    # Check if reasoning is not None and is not empty string
    if reasoning and reasoning.strip():
        print("[Model's Reasoning (Thought Process)]:")
        print(reasoning.strip()) # .strip() removes leading/trailing whitespace
        print("----------------------------------------------------")
    else:
        # Inform the user if reasoning wasn't found or was empty
        print("(No 'reasoning_content' found or it was empty in the response)")
        print("----------------------------------------------------")

    # Print the final answer if it was found
    # Check if final_answer is not None and is not empty string
    if final_answer and final_answer.strip():
        print("[Final Answer]:")
        print(final_answer.strip())
        print("====================================================")
    else:
        # Inform the user if the final answer wasn't found or was empty
        print("[Error] (No final 'content' found or it was empty in the response)")
        print("====================================================")
        # If *both* parts were effectively missing, show the raw response to help debug
        if not (reasoning and reasoning.strip()):
             print("\nRaw Response Received (for debugging, as key content seems missing):")
             # Pretty print the JSON for better readability
             try:
                 print(json.dumps(raw_response, indent=2))
             except Exception:
                 print(raw_response) # Fallback if JSON formatting fails

    # 5. Return the complete raw response dictionary
    #    This allows the code calling this function to inspect other details
    #    such as token usage ('usage' dictionary) if needed.
    return raw_response

# Prompt engineering techniques

Now, it's time to test the same prompt engineering techniques by using the connected DeepSeek model.

## Zero-shot prompting
In zero-shot prompting, you ask the model to perform a task without any examples.


In [4]:
zero_shot_prompt = """
Write a program to compute factorial in Python.
"""
# Use the updated query function
raw_response = query_endpoint(zero_shot_prompt)


--> Sending prompt to endpoint: jumpstart-dft-deepseek-llm-r1-disti-20250617-125247
----------------------------------------------------
[Input Prompt]:

Write a program to compute factorial in Python.

----------------------------------------------------
Waiting for response from the model...

<-- Received response:
[Model's Reasoning (Thought Process)]:
To compute the factorial of a number in Python, I can create a function that takes an integer as input and returns the product of all integers from 1 up to that number. 

I should handle the case where the input is 0, as 0! is defined to be 1. 

For positive integers, I'll loop from 1 to the input number, multiplying each value to accumulate the result. 

For negative integers, since factorials are only defined for non-negative integers, I'll add a check to return 0 if the input is negative.

I'll test the function with a few examples to ensure it works correctly and efficiently.
----------------------------------------------------
[

## One-shot prompting
In one-shot prompting, you provide one example to guide the model. 

Note: For chat models such as DeepSeek, structuring this example within the `messages` list as a user/assistant pair might be more effective than including it directly in the user prompt, but the original structure is kept in this practice lab for comparison.

In [22]:
one_shot_prompt = """
Here's an example of an AWS Lambda function that generates weather forecasts:

```python
import json
import random

def lambda_handler(event, context):
    try:
        weather_types = ["sunny", "rainy", "cloudy"]
        forecast = random.choice(weather_types)
        return {
            'statusCode': 200,
            'body': json.dumps({'forecast': forecast})
        }
    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps({'error': str(e)})
        }
```

Now, write a similar Lambda function that generates random colors (e.g., red, green, blue).
"""
# Use the updated query function
raw_response =  query_endpoint(one_shot_prompt)


--> Sending prompt to endpoint: jumpstart-dft-llama-3-2-1b-instruct-20250617-140458
----------------------------------------------------
[Input Prompt]:

Here's an example of an AWS Lambda function that generates weather forecasts:

```python
import json
import random

def lambda_handler(event, context):
    try:
        weather_types = ["sunny", "rainy", "cloudy"]
        forecast = random.choice(weather_types)
        return {
            'statusCode': 200,
            'body': json.dumps({'forecast': forecast})
        }
    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps({'error': str(e)})
        }
```

Now, write a similar Lambda function that generates random colors (e.g., red, green, blue).

----------------------------------------------------
Waiting for response from the model...

<-- Received response:
(No 'reasoning_content' found or it was empty in the response)
----------------------------------------------------
[Fina

## Few-shot prompting
In few-shot prompting, you provide multiple examples to establish a pattern.

Note: Similar to one-shot prompting, providing these examples as distinct user/assistant turns in the `messages` list is often better for chat models.

In [23]:
few_shot_prompt = """
Example 1 - Simple Lambda:
```python
def lambda_handler(event, context):
    return {'message': 'Hello World'}
```

Example 2 - Lambda with error handling:
```python
import json

def lambda_handler(event, context):
    try:
        # Your logic here
        result = 'Success'
        return {'statusCode': 200, 'body': json.dumps({'message': result})}
    except Exception as e:
        return {'statusCode': 500, 'body': json.dumps({'error': str(e)})}
```

Example 3 - Lambda with logging:
```python
import logging
import json

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    logger.info(f'Processing event: {event}')
    try:
        # Your logic here
        result = 'Logged and Processed'
        return {'statusCode': 200, 'body': json.dumps({'message': result})}
    except Exception as e:
        logger.error(f'Error processing event: {e}')
        return {'statusCode': 500, 'body': json.dumps({'error': str(e)})}
```

Now, create a Lambda function that:
1. Imports necessary libraries (random, logging, json)
2. Includes proper error handling (try/except block)
3. Implements logging for requests and errors
4. Defines a list of quotes from Greek philosophers (e.g., Socrates, Plato, Aristotle)
5. Randomly selects and returns one quote in the JSON body
6. Includes basic docstrings and comments
7. Returns a 200 status code on success and 500 on error.
"""

# Use the updated query function, keeping max_tokens high for longer code
# Increased max_tokens slightly more just in case
raw_response = query_endpoint(few_shot_prompt, max_tokens=1600)


--> Sending prompt to endpoint: jumpstart-dft-llama-3-2-1b-instruct-20250617-140458
----------------------------------------------------
[Input Prompt]:

Example 1 - Simple Lambda:
```python
def lambda_handler(event, context):
    return {'message': 'Hello World'}
```

Example 2 - Lambda with error handling:
```python
import json

def lambda_handler(event, context):
    try:
        # Your logic here
        result = 'Success'
        return {'statusCode': 200, 'body': json.dumps({'message': result})}
    except Exception as e:
        return {'statusCode': 500, 'body': json.dumps({'error': str(e)})}
```

Example 3 - Lambda with logging:
```python
import logging
import json

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    logger.info(f'Processing event: {event}')
    try:
        # Your logic here
        result = 'Logged and Processed'
        return {'statusCode': 200, 'body': json.dumps({'message': result})}
    except Exception a

## Use clear and specific instructions

Prompt engineering techniques—such as zero-shot, one-shot, and few-shot approaches—offer different ways to guide AI models to produce desired outputs. Zero-shot prompting is the most basic approach, where you directly ask the model to perform a task without examples. Zero-shot prompting is useful for straightforward requests, but it might lack precision for complex tasks. One-shot prompting provides a single example along with the request, which helps the model understand the expected format and style of the response. Few-shot prompting takes this further by providing multiple examples, which is particularly effective for complex tasks that require pattern recognition or specific output structures.

Clear instructions are crucial across all these techniques because they reduce ambiguity and help the model understand exactly what's expected. When instructions are specific, well-structured, and include details about the desired format, context, and constraints, the model is more likely to generate accurate and relevant responses. For example, instead of asking the model to "generate some code," a clear instruction might specify that the model should "write a Python function that handles errors, includes logging, and returns JSON responses with specific fields."


In [7]:
clear_prompt = """
Create an AWS Lambda function in Python with the following specific requirements:
1.  Function name should ideally be `greek_philosopher_quote_lambda` (inside the handler code).
2.  It must include a list containing quotes from at least these three philosophers: Socrates, Plato, Aristotle.
3.  Each quote in the list should ideally be stored as a dictionary or tuple, pairing the quote text with the philosopher's name (e.g., {'philosopher': 'Socrates', 'quote': 'An unexamined life is not worth living.'}).
4.  The function should randomly select one quote object (philosopher and quote) from the list.
5.  It must include error handling using a try/except block to catch potential issues during execution.
6.  It must implement logging using the `logging` library to log the start of the function execution and any errors encountered.
7.  The function must return a dictionary suitable for API Gateway proxy integration, specifically:
    - On success: `{'statusCode': 200, 'body': json.dumps({'philosopher': selected_philosopher, 'quote': selected_quote})}`
    - On error: `{'statusCode': 500, 'body': json.dumps({'error': 'An error occurred'})}` or a more specific error message.
8. Include basic docstrings for the function and comments where necessary.
"""

# Use the updated query function, keeping max_tokens high
raw_response = query_endpoint(clear_prompt, max_tokens=1600)


--> Sending prompt to endpoint: jumpstart-dft-deepseek-llm-r1-disti-20250617-125247
----------------------------------------------------
[Input Prompt]:

Create an AWS Lambda function in Python with the following specific requirements:
1.  Function name should ideally be `greek_philosopher_quote_lambda` (inside the handler code).
2.  It must include a list containing quotes from at least these three philosophers: Socrates, Plato, Aristotle.
3.  Each quote in the list should ideally be stored as a dictionary or tuple, pairing the quote text with the philosopher's name (e.g., {'philosopher': 'Socrates', 'quote': 'An unexamined life is not worth living.'}).
4.  The function should randomly select one quote object (philosopher and quote) from the list.
5.  It must include error handling using a try/except block to catch potential issues during execution.
6.  It must implement logging using the `logging` library to log the start of the function execution and any errors encountered.
7.  The

# Temperature and creativity control

Temperature in prompt engineering controls the randomness or predictability of a model's responses. A lower temperature (closer to 0) makes responses more deterministic and focused, which is better for factual or technical tasks where accuracy is crucial. A higher temperature (closer to 1 or above) introduces more randomness, leading to more creative, diverse, and surprising outputs, which works well for creative writing or brainstorming. The optimal temperature setting depends on your specific use case; use lower values when you need consistency and precision, and use higher values when you want variety and creativity.

### Controlling model creativity

In [8]:
prompt="What is AWS Lambda?"

# Low temperature (more focused, deterministic)
print("\n--- Low Temperature (Focused, temp=0.2) Response --- ")
raw_response = query_endpoint(prompt, temperature=0.2, max_tokens=32000)


--- Low Temperature (Focused, temp=0.2) Response --- 

--> Sending prompt to endpoint: jumpstart-dft-deepseek-llm-r1-disti-20250617-125247
----------------------------------------------------
[Input Prompt]:
What is AWS Lambda?
----------------------------------------------------
Waiting for response from the model...

<-- Received response:
[Model's Reasoning (Thought Process)]:
Okay, so I'm trying to understand what AWS Lambda is. I've heard the term before, maybe in the context of cloud computing or something related to serverless functions. Let me start by breaking down the name. AWS is a big company, right? They make a lot of cloud services. Lambda must be one of them.

I remember hearing about serverless functions before. They're something like functions that run on a serverless platform, which I think is a type of cloud computing service. So maybe Lambda is a specific service within that category. But I'm not entirely sure how it works.

Let me think about what I know. Lambda f

In [9]:
prompt="Write a haiku about the challenges of software development and code maintenance."

# Higher temperature (more creative, diverse)
print("\n--- High Temperature (Creative, temp=0.9) Response --- ")
raw_response = query_endpoint(prompt, temperature=0.9, max_tokens=32000)


--- High Temperature (Creative, temp=0.9) Response --- 

--> Sending prompt to endpoint: jumpstart-dft-deepseek-llm-r1-disti-20250617-125247
----------------------------------------------------
[Input Prompt]:
Write a haiku about the challenges of software development and code maintenance.
----------------------------------------------------
Waiting for response from the model...

<-- Received response:
[Model's Reasoning (Thought Process)]:
Alright, so I need to write a haiku about the challenges of software development and code maintenance. I'm not very familiar with haikus, but I know they're three-line poems with a specific rhythm—uses of i, o, u, and ia, o, u, repeating. It’s quite structured and has a nice cadence to it.

First, I should understand what kind of challenges software developers and maintainers face. From what I know, there are various issues like time and resource constraints, the complexity of programming languages, ensuring code quality, and collaboration challen

# Clean up

Important: Before proceding to the DIY section of this solution, delete the deployed model and endpoint.

In [10]:
sagemaker_client = boto3.client('sagemaker')
from botocore.exceptions import ClientError

try:
    sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
    print(f"Successfully deleted endpoint: {endpoint_name}")
except ClientError as e:
    print(f"Failed to delete endpoint {endpoint_name}: {e}")
    raise

Successfully deleted endpoint: jumpstart-dft-deepseek-llm-r1-disti-20250617-125247


# Do it yourself 

Stop here and deploy a new model as part of the DIY section.

In [24]:
import boto3
import json
import sagemaker
from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

def create_predictor(endpoint_name, session=None):
    """
    Create a predictor for an existing SageMaker JumpStart LLaMA 3 endpoint
    
    Parameters:
    -----------
    endpoint_name : str
        The name of the existing SageMaker endpoint
    session : sagemaker.session.Session, optional
        SageMaker session to use. If not provided, a new one will be created
        
    Returns:
    --------
    predictor : sagemaker.predictor.Predictor
        A predictor object that can be used to make text generation requests
    """
    if session is None:
        boto_session = boto3.Session()
        session = sagemaker.Session(boto_session=boto_session)
    
    # Create a predictor with proper configuration
    predictor = Predictor(
        endpoint_name=endpoint_name,
        sagemaker_session=session,
        serializer=JSONSerializer(),
        deserializer=JSONDeserializer()
    )
    
    
    print(f"Successfully connected to LLaMA 3 endpoint: {endpoint_name}")
    return predictor

def format_llama3_prompt(instruction, examples=None):
    """
    Format a prompt for LLaMA 3 following Meta's recommended format
    
    Parameters:
    -----------
    instruction : str
        The main instruction or question
    examples : list of dict, optional
        List of examples for one-shot or few-shot learning
        Each example should be a dict with 'user' and 'assistant' keys
    
    Returns:
    --------
    formatted_prompt : str
        Properly formatted prompt for LLaMA 3
    """
    # Base system prompt that works well with LLaMA 3
    system_prompt = "You are a helpful, harmless, and honest AI assistant."
    
    # Start with the system prompt
    formatted_prompt = f"<|system|>\n{system_prompt}\n<|end|>\n"
    
    # Add examples for one-shot or few-shot learning
    if examples:
        for example in examples:
            formatted_prompt += f"<|user|>\n{example['user']}\n<|end|>\n"
            formatted_prompt += f"<|assistant|>\n{example['assistant']}\n<|end|>\n"
    
    # Add the current instruction
    formatted_prompt += f"<|user|>\n{instruction}\n<|end|>\n"
    formatted_prompt += "<|assistant|>\n"
    
    return formatted_prompt

def generate_text(predictor, instruction, examples=None, max_new_tokens=4096, temperature=0.3, top_p=0.9):
    """
    Generate text by using the LLaMA 3 model with proper prompt formatting
    
    Parameters:
    -----------
    predictor : sagemaker.predictor.Predictor
        The predictor object for the endpoint
    instruction : str
        The main instruction or question
    examples : list of dict, optional
        List of examples for one-shot or few-shot learning
        Each example should be a dict with 'user' and 'assistant' keys
    max_new_tokens : int
        Maximum number of tokens to generate
    temperature : float
        Temperature for sampling (higher = more creative)
    top_p : float
        Top-p sampling parameter
        
    Returns:
    --------
    response : dict
        The complete response from the model
    """
    # Format the prompt according to LLaMA 3 requirements
    formatted_prompt = format_llama3_prompt(instruction, examples)
    
    payload = {
        "inputs": formatted_prompt,
        "parameters": {
            "max_new_tokens": max_new_tokens,
            "temperature": temperature,
            "top_p": top_p,
            "return_full_text": False,
            "stop": ["<|end|>", "</s>"]  # Stop tokens for LLaMA 3
        }
    }
    
    try:
        response = predictor.predict(payload)
        print(f"\033[1m Input Instruction:\033[0m {instruction}")
        
        # Extract and clean the generated text
        if isinstance(response, dict) and "generated_text" in response:
            generated_text = response["generated_text"]
            # Clean any trailing stop tokens that might have been included
            for stop_token in ["<|end|>", "</s>"]:
                if generated_text.endswith(stop_token):
                    generated_text = generated_text[:-len(stop_token)].strip()
            
            print(f"\033[1m Output:\033[0m {generated_text}")
            return {"generated_text": generated_text}
        else:
            # Handle different response formats
            print(f"\033[1m Output:\033[0m {response}")
    except Exception as e:
        print(f"Error during prediction: {e}")
        return {"error": str(e)}

### Your turn, using the Meta LLAMA 3 model!

Create a prompt that generates an AWS Lambda function by using the **Llama 3** model with the following features:

1. Generates quotes from a specific philosopher of your choice
2. Includes sentiment analysis of the quote
3. Returns both the quote and its sentiment score
4. Implements proper error handling and logging

In the next cell, enter your prompt and experiment with different prompting techniques for Llama 3.

In [18]:
# Example solution prompt (update the following prompt)

diy_prompt = """

     Write a program to compute factorial in Python.

"""

In [19]:
# Your specific endpoint name
endpoint_name = "jumpstart-dft-llama-3-2-1b-instruct-20250617-140458"

# Create the predictor
predictor = create_predictor(endpoint_name)


Successfully connected to LLaMA 3 endpoint: jumpstart-dft-llama-3-2-1b-instruct-20250617-140458


In [20]:
raw_response = generate_text(predictor,diy_prompt)

[1m Input Instruction:[0m 

     Write a program to compute factorial in Python.


[1m Output:[0m ## Step 1: Define the factorial function
The factorial of a number n, denoted by n!, is the product of all positive integers less than or equal to n. We can define a function to compute the factorial of a given number.

## Step 2: Initialize the result variable
We will initialize a variable `result` to 1, which will store the factorial of the input number.

## Step 3: Iterate from 1 to the input number
We will use a for loop to iterate from 1 to the input number. In each iteration, we will multiply the current number with the `result` variable.

## Step 4: Return the result
After the loop finishes, we will return the `result` variable, which will hold the factorial of the input number.

## Step 5: Test the function
We will test the function with different input numbers to verify its correctness.

## Step 6: Write the Python code
Here is the Python code that implements the factorial fun