# Chapter 6: Prompt Engineering - Easy Tasks

This notebook covers basic prompt engineering concepts: temperature effects, prompt components, in-context learning, and chain-of-thought reasoning.

## Setup

Run all cells in this section to set up the environment and load the model.

Before running these cells, review the concepts from the main Chapter 6 notebook (00_Start_Here.ipynb).

### [Optional] - Installing Packages on Google ColabIf you are viewing this notebook on Google Colab, uncomment and run the following code to install dependencies.**Note**: Use a GPU for this notebook. In Google Colab, go to Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4.

In [1]:
%%capture
!pip install transformers>=4.40.0 torch accelerate

### Model Loading

In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [3]:
model_path = "microsoft/Phi-3-mini-4k-instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=False,
)

tokenizer = AutoTokenizer.from_pretrained(model_path)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

### Helper Functions

In [4]:
def generate_text(prompt, temperature=0.7, max_tokens=200):
    """Generate text with specified parameters"""
    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        return_full_text=False,
        max_new_tokens=max_tokens,
        do_sample=True if temperature > 0 else False,
        temperature=temperature if temperature > 0 else None,
    )

    messages = [{"role": "user", "content": prompt}]
    output = pipe(messages)
    return output[0]['generated_text']

## Challenges

Complete the following tasks by implementing the starter code.

### Level: Easy

**About This Task:**
Temperature controls randomness in generation. Lower values give consistent outputs, higher values give varied outputs.

#### Easy Task 1: Finding the Right Temperature

### Instructions

1. Execute code to compare temperature effects on three use cases
2. Fill in missing temperature values based on your observations
3. Run determinism test to verify temperature=0 consistency
4. Test with your own prompts
5. Analyze which temperatures work best for different tasks

In [43]:
temperatures = [0.0, 0.3, 0.7, 1.0, 1.5, 4.0]

Notice how different temperatures affect each use case.

In [52]:
# Factual: What is the capital city of France?
prompt1 = "What is the capital city of France?"
print(f"Prompt 1: {prompt1}")
for temp in temperatures:
  output = generate_text(prompt1, temperature=temp, max_tokens=50)
  print(f"Temp={temp}: {output}")

Device set to use cuda


Prompt 1: What is the capital city of France?


Device set to use cuda


Temp=0.0:  The capital city of France is Paris.


Device set to use cuda


Temp=0.3:  The capital city of France is Paris.


Device set to use cuda


Temp=0.7:  The capital city of France is Paris.


Device set to use cuda


Temp=1.0:  The capital city of France is Paris.


Device set to use cuda


Temp=1.5:  The capital city of France is Paris.
Temp=4.0:  Marseille 28


In [53]:
# Creative: Write the first sentence of a romance novel.
prompt2 = "Write the first sentence of a romance novel."
print(f"Prompt 2: {prompt2}")
for temp in temperatures:
  output = generate_text(prompt2, temperature=temp, max_tokens=50)
  print(f"Temp={temp}: {output}")

Device set to use cuda


Prompt 2: Write the first sentence of a romance novel.


Device set to use cuda


Temp=0.0:  As the sun dipped below the horizon, their eyes met across the crowded room, and in that fleeting moment, they knew their lives would never be the same again.


Device set to use cuda


Temp=0.3:  Under the soft glow of the moonlight, their eyes met for the first time, and an unspoken promise of love hung in the air between them.


Device set to use cuda


Temp=0.7:  Underneath the vibrant hues of the setting sun, Ella found herself entranced by the mysterious stranger who had just entered the quaint little café.


Device set to use cuda


Temp=1.0:  Amelia gazed through the foggy windowpane, finding the first glimmer of sunrise in Ethan's eyes, her heart fluttering like a captive bird.


Device set to use cuda


Temp=1.5:  Amelia nervously clenched her partner's calloused hand as the fading sunlight basked their garden date in a warm, ethereal light.
Temp=4.0:  The dextone cliff behind, Cass designed a stunning plan of escape but forgot just twice before reaching Olivet Hill: not before meeting in-betreft Alex beneath the centuries-piled autumntal leaves during Worldly Games


In [54]:
# Math: "What is the square root of 10000?"
prompt3 = "What is the square root of 10000?"
print(f"Prompt 3: {prompt3}")
for temp in temperatures:
  output = generate_text(prompt3, temperature=temp, max_tokens=50)
  print(f"Temp={temp}: {output}")

Device set to use cuda


Prompt 3: What is the square root of 10000?


Device set to use cuda


Temp=0.0:  The square root of 10000 is 100. This is because 100 multiplied by itself (100 * 100) equals 10000.


Device set to use cuda


Temp=0.3:  The square root of 10000 is 100.


Device set to use cuda


Temp=0.7:  The square root of a number is a value that, when multiplied by itself, gives the original number. For the number 10000:

√10000 = 100

This is because


Device set to use cuda


Temp=1.0:  The square root of a number is a value that, when multiplied by itself, gives the original number. In this case, the square root of 10000 is 100, because when you multiply 100


Device set to use cuda


Temp=1.5:  The square root of 10000 is exactly
Temp=4.0:  **Calculations on Solution Exhibitor.model(sqrt).calc() for problem number $Mathemat-2* Mathemai...** 36 | ApproXe at..:42 + or~ $-0^.$
Sol


The effects of temperature is clear in the codes above

### Task 1a: Select Best Temperature

Based on the outputs above, fill in the best temperature for each use case.

In [None]:
# SOLUTION: Fill in based on experimentationbest_temp_factual = 0.0  # For factual questions - deterministicbest_temp_creative = 1.0  # For creative writing - varietybest_temp_math = 0.0  # For math/code - correctness

Test your selections here.

In [9]:
print("Testing your temperature selections:")

if best_temp_factual is not None:
    output = generate_text("What is the capital of France?", temperature=best_temp_factual, max_tokens=30)
    print(f"\nFactual (temp={best_temp_factual}): {output}")

if best_temp_creative is not None:
    output = generate_text("Write the first sentence of a mystery novel.", temperature=best_temp_creative, max_tokens=50)
    print(f"\nCreative (temp={best_temp_creative}): {output}")

if best_temp_code is not None:
    output = generate_text("Write a Python function to calculate factorial.", temperature=best_temp_code, max_tokens=100)
    print(f"\nCode (temp={best_temp_code}): {output}")

Testing your temperature selections:


### Task 1b: Determinism Test

Run this cell multiple times to verify temperature=0 gives identical outputs.

In [10]:
output = generate_text("What is 2+2?", temperature=0, max_tokens=20)
print(f"Output: {output}")
print("\nRun this cell again - you should get the EXACT same output.")

Device set to use cuda


Output:  The sum of 2 and 2 is 4.

Run this cell again - you should get the EXACT same output.


### Questions - ANSWERS

**1. At temperature=1.5, did the factual question give wrong answers? Why is determinism critical for factual tasks?**

Yes, at temp=4.0 we saw "Marseille" instead of "Paris". Determinism (temp=0) is critical because factual questions have ONE correct answer and we need consistency.

**2. For creative writing, compare outputs at temperature=0.3 vs 1.0. Which produced more interesting variations?**

Temperature 1.0 produced more interesting and varied openings. Temperature 0.3 was more predictable and safe.

**3. Did code generation at temperature=1.5 produce valid Python? What's the risk of high temperature for code?**

High temperature risks syntax errors and logical mistakes. Code needs to be correct, not creative.