# ✍️ Prompt Engineering Techniques with LLMs : From Zero to Hero 🤖💡

Welcome to this hands-on tutorial on **Prompt Engineering** — the art and science of crafting inputs to get the best out of Large Language Models (LLMs)! In this notebook, we’ll explore multiple prompt strategies like **zero-shot**, **one-shot**, and **few-shot** prompting to guide model behavior in different tasks.

👉 While the examples here use **LLaMA 2 7B Chat** models, the concepts apply to **any modern LLM**, such as GPT, Claude, or Mistral.

---

## 🚀 What You'll Learn:
- 🧠 What is prompt engineering and why it matters
- 🔁 How to format prompts for chat-based models (System, User, Assistant)
- 🧪 Different prompt styles: zero-shot, one-shot, and few-shot learning
- 🧩 Prompting for summarization, translation, reasoning, and more

This tutorial is perfect for **developers, data scientists, and AI enthusiasts** who want to get practical with LLMs — no fine-tuning required! 💬


# Prompt Engineering Techniques 🚀

Let’s explore the exciting world of **Prompt Engineering** using Meta’s LLaMA 2 **`LLaMA-2-7B-chat`**, a powerful model by Meta with **7 billion parameters**, fine-tuned specifically for conversational tasks and instruction following.


### 🌟 Import Helper Functions

To get started, let’s import all the necessary Python libraries and suppress any unwanted warnings. This sets the stage for using the Together API smoothly!


In [None]:
# 👉 Install the Together Python client
!pip install Together

In [None]:
# 👉 Import necessary Python libraries
import requests                     # for making API requests
import os                           # for accessing environment variables
import json                         # for working with JSON data
import warnings                     # for suppressing warnings
from google.colab import userdata   # Colab utility to access user secrets
import time                         # for adding delays if needed

# 👉 Ignore warnings to keep output clean
warnings.filterwarnings('ignore')


## 🤝 About Together AI — A Platform Powering Open-Source AI at Scale

**Together AI** (together.ai) is a cutting-edge AI cloud platform designed to help developers, researchers, and businesses **train, tune, and serve generative AI models**—especially those that are open-source.  

### ⚙️ What Can You Do with Together AI?

- **Inference at Scale**: Deploy models like LLaMA, Qwen, or your own fine‑tuned models using high-performance **serverless or dedicated API endpoints**. Optimized for speed and cost efficiency.
- **Fine-Tuning & Adaptation**: Customize open-source models using your own data with APIs that support both lightweight adapters (LoRA) and full fine‑tuning pipelines.   
- **GPU Clusters & Training Infrastructure**: Access NVIDIA H100, H200, GB200 GPUs across scalable clusters for training or inference, with enterprise-grade orchestration and scheduling.

### 🔒 Enterprise & Deployment Options

- Offers both **fully-managed cloud** deployment and **VPC/private deployment** for organizations with strong security and privacy needs.
- Built to comply with enterprise standards like **SOC‑2 and HIPAA**, ensuring your data and models remain under your control.

### 📈 Why It Matters for Prompt Engineering

- **Easy APIs** allow seamless integration into notebooks or apps without deep infrastructure setup.
- **Context injection & knowledge updates** become simple: you can fine-tune or supply fresh data to your model on-demand.
- **Cost‑effective performance** — Together claims **up to 4× faster inference** and **significantly lower costs** compared to many other platforms.  

### 🚀 About the Company

Founded in 2022 and based in San Francisco, Together AI is backed by major investors including Salesforce Ventures, NVIDIA, and General Catalyst.

### Dive into your prompts with confidence—Powered by Together!


In [None]:
# 👉 Import the Together API client
from together import Together

# 👉 Initialize the client
client = Together()

# 👉 Define a simple user message for sentiment analysis
response = client.chat.completions.create(
    model="meta-llama/Llama-3.2-3B-Instruct-Turbo",  # specify the model
    messages=[
      {
        "role": "user",
        "content": """
        Analyze the sentiment of the following message:
        Hey Lina, I truly appreciated your support during the event!
        """
      }
    ]
)

# 👉 Print the model's response
print(response.choices[0].message.content)


### 🚀 Setting Up the API Endpoint & Authentication

Now that we’ve imported all the necessary libraries, it’s time to prepare the connection to the **Together API**.

Here’s what we’ll do:
- Define the API **endpoint URL** 🌐
- Add an **Authorization header** using your secret API key 🔐 (from Google Colab's `userdata` storage)
- Set the content type to **JSON** 🧾

This setup ensures that we can send prompts securely to the hosted model and receive structured responses.


In [None]:
# 👉 Define the Together Inference API endpoint
url = "https://api.together.xyz/inference"

# 👉 Setup the headers for authorization using your API key
headers = {
    "Authorization": f"Bearer {userdata.get('TOGETHER_API_KEY')}",  # Automatically grabs the key from Google Colab secrets
    "Content-Type": "application/json"
}


### 🧰 Let’s Build a Prompt Sender Function!

To make our workflow smooth and reusable, we’ll define a **helper function** that sends prompts to the Together API.

Here’s what this function will handle:
- 🧠 Specify which model to use (default is LLaMA 3.2B)
- ✍️ Accept both a `system prompt` (model behavior) and a `user prompt` (instruction)
- 📬 Send the request and fetch the model’s reply

This will save us from repeating boilerplate code and help us focus on experimenting with different prompt strategies!


In [None]:
# 👉 Define a helper function to interact with the LLaMA model
def generate_llama_response(system_prompt, user_prompt, model="meta-llama/Llama-3.2-3B-Instruct-Turbo"):
    # Create the prompt structure with roles
    payload = {
        "model": model,
        "messages": [
            {"role": "system", "content": system_prompt},  # System prompt defines context
            {"role": "user", "content": user_prompt}       # User prompt is the actual instruction
        ]
    }

    # Send a POST request to the Together inference API
    response = requests.post(url, headers=headers, data=json.dumps(payload))

    # Extract the generated content from the response
    result = response.json()
    return result['output']['choices'][0]['text']

This function simplifies the process of interacting with the LLaMA model. You just need to provide a system prompt (context for the assistant) and a user prompt (actual task), and it will return the generated response. Neat, right? 🙌

In [None]:
def generate_llama_param(prompt,
          model= "meta-llama/Llama-3.2-3B-Instruct-Turbo",
          #model="meta-llama/Llama-3.3-70B-Instruct-Turbo-Free",
          #model= "meta-llama/Llama-3-8b-chat-hf",
          #model="togethercomputer/llama-2-7b-chat",
          temperature=0.0,
          max_tokens=256,
          verbose=False,
          url=url,
          headers=headers):

    data = {
            "model": model,
            "prompt": prompt,
            "temperature": temperature,
            "max_tokens": max_tokens
        }
    response = requests.post(url, headers=headers, json=data)
    return response.json()['output']['choices'][0]['text']


### 🧠 In-Context Learning (ICL) — Let the Model Learn from Examples!

🧩 Instead of just telling the model what to do, you **show it a few examples** in your prompt. It then figures out the pattern and continues based on what it learned from those examples!

💬 It’s like showing a friend how to solve 2 math problems and asking them to solve the third one.


### ✏️ Try It: Standard Instruction Prompt (No Examples Yet)

Let’s start with a simple example of **standard instruction prompting**.

In this case, we won’t show the model any examples. Instead, we’ll give it a clear, direct instruction and ask it to perform a task.

Think of it like saying:
🗣️ “Hey assistant, translate this sentence into Spanish.”

Let’s try that using the `generate_llama_response` function we just defined!


In [None]:
# 👉 Give the model an explicit instruction without any examples
instruction = "Translate the following sentence to Spanish:"
sentence = "Where is the nearest train station?"

# 👉 Generate and print the model's response
response = generate_llama_response("You are a helpful assistant.", f"{instruction}\n{sentence}")
print(response)

In [None]:
prompt = """
Analyze the sentiment of the following message:
Hey Lina, I truly appreciated your support during the event!
"""
response = generate_llama_param(prompt)
print(response)

### 🎯 Zero-Shot Prompting — Make a Guess with No Hints!

Now let’s test **Zero-Shot Prompting**. This means:
❌ No examples  
❌ No explicit task definition  
✅ Just structure — and let the model infer what to do!

It’s like handing someone a form that says:
📄 `"Dog: Chien Cat: Chat Bird: "` — and expecting them to figure out the translation task.

🧠 This is useful when:
• You want to test the model’s general intelligence  
• The task is obvious from context  
• You want fewer tokens (cheaper & faster!)

Let’s give it a shot!


In [None]:
# 👉 Zero-shot translation: Let the model infer the task from the pattern
zero_shot_prompt = """
English: Car
French:"""

# 👉 Call the model with no instruction, just the pattern
response = generate_llama_response("You are a translation assistant.", zero_shot_prompt)
print(response)


You should see the French word for "Car" — which is “Voiture”! 🚗🇫🇷

If you add examples to following, it will be few shots.

English: Apple<br>
French: Pomme<br>
English: House<br>
French: Maison<br>
English: Car<br>
French:

In [None]:
prompt = """
Message: Hey Lina, I truly appreciated your support during the event!
Sentiment:
"""
response = generate_llama_param(prompt)
print(response)

### 🧪 Few-Shot Prompting — Teaching with Examples

Few-shot prompting means we give the model a **few examples** of the task in the prompt before asking it to complete a similar one.

👉 It helps the model “learn” what we expect by showing real use cases.  
👉 Great for more complex or custom tasks where zero-shot might not be enough.

🧠 Think of it like saying:
> "Here are 3 problems and their solutions — now solve the next one."

Let’s show it how to convert measurements, and then ask it to convert another!


In [None]:
# 👉 Construct a few-shot prompt with several examples
few_shot_prompt = """
Convert the following measurements from inches to centimeters:

5 inches = 12.7 cm
12 inches = 30.48 cm
8 inches = 20.32 cm
15 inches =
"""

# 👉 Generate the model's prediction
response = generate_llama_response("You are a measurement converter.", few_shot_prompt)
print(response)


🎯 What to Expect:
If the model understood the examples, it should reply:
“38.1 cm” because 15 inches × 2.54 = 38.1 cm!

### Specifying the Output Format
- You can also specify the format in which you want the model to respond.
- In the example below, you are asking to "give a one word response".

In [None]:
prompt = """
Message: You forgot our dinner reservation again!
Sentiment: Negative

Message: Looking forward to the beach trip this weekend!
Sentiment: Positive

Message: Hey Lina, I truly appreciated your support during the event!
Sentiment:

Responsd in one word only.
"""
response = generate_llama_param(prompt)
print(response)

Let's try bigger model, for example: `llama-2-70b-chat` model...

In [None]:
prompt = """
Message: You forgot our dinner reservation again!
Sentiment: Negative

Message: Looking forward to the beach trip this weekend!
Sentiment: Positive

Message: Hey Lina, I truly appreciated your support during the event!
Sentiment:

Respond with only: positive, negative, or neutral.
"""
response = generate_llama_param(prompt, model="meta-llama/Llama-3.3-70B-Instruct-Turbo-Free")
print(response)

- Now we use the smaller model while
restricting the output format to choose from `positive`, `negative` or `neutral`.

In [None]:
prompt = """
Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative

Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive

Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment:

Respond with either positive, negative, or neutral.
"""
# Don't add any other text.
response = generate_llama_param(prompt)
print(response)

### 🎭 Role Prompting — Set the Stage for the Model

Here’s a fun and creative prompting trick — **Role Prompting**!

By assigning the model a role or persona, you can guide its behavior in specific ways:
🧑‍🏫 "You are a history professor..."  
👩‍💼 "You are a career advisor..."  
🧙 "You are a fantasy storyteller..."

✨ Why it works:
Giving the model a role changes its **`tone`**, **`vocabulary`**, and **`response structure`**. It's like an actor putting on a costume 🎭 — and acting accordingly!

Let’s try asking the model to act like a poetic Shakespearean assistant!


In [None]:
# 👉 Ask the model to respond in a poetic, Shakespearean style
role_prompt = """
You are a Shakespearean assistant.
User: Tell me about Artificial Intelligence.
"""

# 👉 Let’s see how creatively the model can respond!
response = generate_llama_response("You are a wise assistant from the 1600s.", role_prompt)
print(response)


🎭 Expected Output:
Something like: “O wondrous machine of mindless thought, that mimiceth man’s wit and toil...”

🧠 Notice how the model adjusts its style and tone based on the role you gave it!

In [None]:
role = """
You are a motivational speaker who shares wisdom in a poetic and inspiring style.
You respond with clarity, warmth, and hope.
"""

prompt = f"""
{role}
How can I respond to this question from a colleague:
What does it mean to live a fulfilling life?
"""
response = generate_llama_param(prompt)
print(response)

### 🔗 Chain-of-Thought Prompting (CoT) — Teach the Model to “Think Out Loud” 🧠💬

Sometimes, we don’t want only the answer — we want also the **reasoning behind it**.

That’s where Chain-of-Thought (CoT) prompting shines. Instead of simply asking for an answer, we **guide the model to break down its thinking process** step by step.

💡 Think of it like showing your work in math class:
> “First, we calculate X, then we multiply by Y… therefore, the final answer is Z.”

### ✅ Why is this helpful?
- Encourages **logical reasoning**
- Improves **accuracy** on complex problems
- Produces **transparent and explainable** outputs
- Boosts performance on tasks involving **math, logic, and multi-step decisions**

Let’s put this into action with a real example.


In [None]:
prompt = """
Q: If a train travels 60 miles in 1 hour, how far will it travel in 3 hours?
A: Let's think step by step.
"""

print(generate_llama_response("You are a smart math tutor.", prompt))


🎯 What to Expect:

"The train travels 60 miles in 1 hour. In 3 hours, it will travel 60 × 3 = 180 miles. So the answer is 180 miles."

📌 Takeaway: By explicitly telling the model to “think step by step,” we encourage a sequential and correct line of reasoning.

### 🔄 Let’s Refine the Prompt — Add More Instruction

CoT works best when the model is gently nudged to explain its thought process *before* answering.

So let’s rephrase the instruction more clearly to ask the model to "explain before answering."

This is perfect when:
- You want to **see how the model thinks**
- You need **intermediate steps**
- You want the **user to trust the answer**

Let’s try this improved version with another example.


In [None]:
prompt = """
Q: If there are 12 apples and 4 people, how many apples does each person get?
A: Please think step by step before answering.
"""

print(generate_llama_response("You are a patient problem solver.", prompt))


✅ What you should see:

"There are 12 apples and 4 people. We divide 12 by 4. Each person gets 3 apples."

🧠 Note: This level of clarity is essential in educational tools, tutoring platforms, and even business logic applications.

### 🧾 Advanced Tip: Separate the Reasoning from the Final Answer

In real-world use cases (like legal reasoning, tutoring, or decision-making), we often want to:
- ✅ Read the logic first
- ✅ Then see a clean, final answer at the end

This format mimics **how humans explain decisions**. It’s also great when you're chaining model outputs together and need a clear delimiter for the next step.

Let’s apply this to a simple money calculation.


In [None]:
prompt = """
Q: A toy costs $15. If you buy 4 toys, how much will you spend?

Explain your reasoning first, then state the final answer at the end.
"""

print(generate_llama_response("You are an analytical assistant.", prompt))


📘 Expected:

“Each toy costs $15. 4 × 15 = $60. Final Answer: $60”

🚀 Pro move: You can even ask the model to return the final answer in a specific format (e.g., Answer: $60) for parsing in apps.

### ⚠️ The Order of Instructions Matters! (Seriously!)

LLMs generate text **token by token, left to right** — so the **way you phrase and order your instructions** affects the outcome.

This means:
- “Think step by step, then answer” 🟢 works well
- “Give me the answer first, then explain” 🔴 might skip reasoning entirely

Always put **reasoning first**, answer second — unless you want the reverse.

Let’s reinforce this idea with one last example.


In [None]:
prompt = """
Q: If a car uses 2 gallons of gas per hour and drives for 5 hours, how much gas does it use?

First, think through the problem carefully. Then, and only then, give the final number of gallons used.
"""

print(generate_llama_response("You are a thoughtful and careful AI assistant.", prompt))


🧮 Expected Output:

“The car uses 2 gallons/hour. In 5 hours, 2 × 5 = 10 gallons. Final Answer: 10 gallons.”

🧠 Why this works: You're training the model — via prompt — to pause and process before blurting out a result.

### 🧠 Final Thought on Chain-of-Thought Prompting

Here’s a mind-blowing fact:  
Large Language Models like LLaMA **don’t actually “think.”** They predict the **next word** based on your input — that’s it.

But when you tell them to **“think step by step”**, you’re giving them a pattern that **simulates reasoning**.

And that’s the secret of Chain-of-Thought prompting:
> You trick the model into reasoning — by modeling what reasoning looks like.

This technique is used in:
- 🧠 Complex reasoning chains
- 🧮 Math problem-solving
- 🤖 Autonomous agents (like AutoGPT)
- 💡 Multi-turn reasoning chatbots

So the next time you want smarter, more thoughtful answers — just say:
👉 “Let’s think step by step.”  
...and watch the magic happen. ✨


In [None]:
prompt = """
We have 30 people going to a conference.
Three vans are available.
Each van fits 6 people.
One person has a motorbike for 2.

Can we transport everyone using only these vehicles?

First, think step by step. Then provide a one word answer: Yes or No.
"""

response = generate_llama_response("You are a group leader",prompt)
print(response)

### 🤖 Combining Techniques — Let's Chain Prompts Together!

Now that we've learned **zero-shot**, **few-shot**, **chain-of-thought**, and **role prompting**, let’s combine techniques for more control.

In this next example, we’ll:<BR>
• Assign a role  
• Use reasoning steps  
• End with a clear instruction

This helps the model deliver better answers with more context and structure — a real prompt engineering win! 🏆


In [None]:
# 👉 Combine role prompting with reasoning for better control
combined_prompt = """
You are a helpful teaching assistant. Answer the question and explain why.

Q: Why do birds migrate in winter?
A:
"""

# 👉 The model should now answer and justify its reasoning
response = generate_llama_response("You are helpful, kind, and intelligent.", combined_prompt)
print(response)


🌍 What to Look For:
A thoughtful answer like:

“Birds migrate in winter to find food and warmer climates. As temperatures drop, food becomes scarce...”

### 💥 Bonus: Prompt for Creative Writing

Prompt engineering isn't just for math and Q&A — it can spark creativity too! 🎨

Let’s use a prompt to generate a story intro and test how well the model handles open-ended narrative writing. This shows how LLMs can be used in:<br>
• Storytelling apps  
• Script writing  
• Game design and NPC behavior  

Ready to get creative? 🌈


In [None]:
# 👉 Prompt the model to write a story introduction
story_prompt = """
You are a fantasy storyteller. Begin a tale with the following line:

"In the heart of the forgotten forest, a single lantern flickered..."
"""

response = generate_llama_response("You are a master of fantasy storytelling.", story_prompt)
print(response)


📚 Expected Result:
An engaging story beginning with vivid imagery, characters, and magical setting!

### 🧪 Testing the Model with Factual Recall

Let’s see how the model performs when we prompt it with factual questions — like a quiz! 🧠

This time, we’ll just ask a direct question to see if the model can recall the correct fact from its training data.

This is great for:<br>
• Trivia-style chatbots  
• Educational assistants  
• Verifying general knowledge

Let’s try asking about a Nobel Prize winner!


In [None]:
# 👉 Ask the model a factual question about Nobel Prizes
#factual_prompt = "Who won the Nobel Prize in Literature in 2021?"
factual_prompt = "Who won the Nobel Prize in Physics in 2024?"

response = generate_llama_response("You are an expert in global awards and honors.", factual_prompt)
print(response)


📌 Tip:
The correct answer is Abdulrazak Gurnah, but depending on the model's training cutoff, it might guess incorrectly.
⚠️ Always double-check factual outputs when using LLMs in knowledge-based apps!

### 🧠 Providing New Information in the Prompt — When the Model Doesn't Know

LLMs like LLaMA are trained on data up to a certain cutoff date. So, if something happened after that — like a recent sports event or election — the model **won’t know** unless you **tell it in the prompt**.

Let’s see what happens if we ask about a recent event the model likely wasn’t trained on...


In [None]:
prompt = """
Who won the Wimbledon Championship in 2024?
"""

print(generate_llama_response("You are a sports expert.", prompt))


### 🤔 What If the Model Doesn’t Know? (Dealing with Outdated Knowledge)

Here’s something important to remember:  
Large Language Models like LLaMA are trained on data from the **past** — they don’t know what happened **after their training cutoff date**. 📆

So if you ask:
> “Who won the Wimbledon Championship in 2024?”

...and the model was trained before that event, it might **guess**, say **it doesn’t know**, or even give a wrong answer like that I Mohamed did win the championship! 😅

Let’s test it out and see how it responds when we give it **no extra information**.


In [None]:
prompt = """
Who won the 2024 Men's UEFA European Championship?
"""
response = generate_llama_param(prompt)
print(response)

💡 Context injection is great for:
- Recent events 🗞️
- Company-specific knowledge 🏢
- Custom workflows or datasets 🧠

Now, let’s define our context!


- As you can see, the model still thinks that the tournament is yet to be played, even though you are now in 2024!
- Another thing to **note** is, July 18, 2023 was the date the model was released to public, and it was trained even before that, so it only has information upto that point. The response says, "the final match is scheduled to take place in July 2023", but the final match was played on August 20, 2023.

- You can provide the model with information about recent events, in this case text from Wikipedia about the 2023 Women's World Cup.

context = """
The 2024 Wimbledon Championship was won by Carlos Alcaraz in the men’s singles category,
and Iga Świątek won the women’s singles title, both displaying incredible performances in the final matches.
"""


📌 This is the factual snippet we want the model to read before answering.

In [None]:
context = """ The 2024 Wimbledon Championship was won by Carlos Alcaraz in the men’s singles category,
and Iga Świątek won the women’s singles title, both displaying incredible performances in the final matches. """
prompt = f"""
{context}

Q: Who won the Wimbledon Championship in 2024?
A:
"""

print(generate_llama_response("You are a sports expert who keeps up with the latest news.", prompt))


🔢 Expected Output and Explanation
📣 Expected Result:

“Carlos Alcaraz and Iga Świątek won the 2024 Wimbledon Championship in men’s and women’s singles respectively.”

🎯 Why this matters:
By injecting the latest facts into the prompt, you can overcome the model’s training limitations and get timely, accurate answers — without retraining the model!

This is prompt engineering magic. ✨

In [None]:
context = """
The 2023 Nobel Peace Prize was awarded to Narges Mohammadi for her courageous fight against the oppression of women in Iran
and her efforts to promote human rights and freedom for all. Despite being imprisoned multiple times for her activism,
she continued to be a voice for change, advocating for the abolition of the death penalty and highlighting the conditions of political prisoners.
The Nobel Committee recognized her as a symbol of the broader movement for women’s rights and democratic reform in the region. Her award follows
the 2022 prize, which was shared by human rights defenders in Belarus, Russia, and Ukraine.
"""


In [None]:
prompt = f"""
Based on the context provided, who won the 2023 Nobel Peace Prize?
context: {context}
"""
response = generate_llama_param(prompt)
print(response)

### 📧 Prompting for Emails — Business Use Case

Prompt engineering is super helpful in professional settings too!  
You can ask the model to:

• Write emails  
• Create reports  
• Summarize long messages  
• Generate customer responses

Let’s test a prompt where the model has to write a kind but firm email reply.


In [None]:
# 👉 Ask the model to draft a polite and professional email
email_prompt = """
Write a professional email declining a job offer due to a better opportunity, while expressing appreciation.
"""

response = generate_llama_response("You are a professional HR assistant.", email_prompt)
print(response)


📬 What to Expect:
A thoughtful email like:

“Thank you very much for the opportunity. After careful consideration, I have decided to pursue another position that aligns more closely with my goals…”

### 📝 Text Summarization — Let the Model Do the Reading for You!

Let’s wrap up with one of the most **practical and popular use cases** for LLMs — **summarization**! 🎯

Instead of reading long articles or reports, we can ask the model to:<BR>
• 🔍 Pull out the key ideas  
• 📌 Create concise bullet points  
• 🧠 Generate TL;DR summaries

Summarization is powerful in:<BR>
• Productivity tools  
• News aggregators  
• Customer support platforms  
• Academic or research environments

Let’s try summarizing a short paragraph now!


In [None]:
# 👉 Define a paragraph to summarize
text_to_summarize = """
Large language models like LLaMA are revolutionizing the way humans interact with machines.
They can understand natural language, generate meaningful responses, and even simulate reasoning.
These capabilities make them valuable for a wide range of applications including education, research, business, and entertainment.
"""

# 👉 Prompt the model to create a summary
summary_prompt = f"Summarize the following text:\n\n{text_to_summarize}"

response = generate_llama_response("You are a professional summarizer.", summary_prompt)
print(response)


🧾 Expected Output Example:

"LLaMA models enable human-like interactions with machines and are useful in education, research, business, and entertainment."

🎯 Why This Matters:
Summarization helps compress information and improve decision-making by getting to the point fast. You can customize the tone, format (e.g., bullets), or length — making it flexible and efficient for real-world tasks.

In [None]:
email = """
Hi Alex,

I hope you're doing well. I wanted to bring you up to speed on our latest discussions from the AI strategy sync held on Monday.

We’ve decided to begin transitioning our customer support chatbot to use a Retrieval-Augmented Generation (RAG) architecture. This approach will allow the system to retrieve up-to-date information from our internal knowledge base and inject it into the prompt context, significantly improving response accuracy. We're currently experimenting with LLaMA 3 (8B and 70B) through Together.ai’s hosted endpoints due to their latency optimizations and cost flexibility. Hugging Face inference endpoints remain our fallback.

On the legal tech side, we’ve concluded that few-shot prompting has reached its limit for document summarization. We’ll move toward fine-tuning a smaller model (likely Mistral-7B) using a labeled corpus of 4,000+ legal documents to improve reliability and reduce token consumption. This aligns with our broader cost-reduction goals while maintaining domain-specific quality.

Another key point: we're evaluating model hosting options on AWS Bedrock and Azure ML. Initial tests on Bedrock with Anthropic's Claude and Amazon’s Titan models have shown promising stability, but GPU allocation on Azure appears to be more scalable in bursts. We'll continue benchmarking with production-like load simulations.

We're also allocating responsibilities for implementation:
- Priya and Omar will lead the RAG pipeline integration.
- Lina will coordinate dataset preparation and fine-tuning workflows.
- I'll work with DevOps to assess deployment bottlenecks and CI/CD automation for model rollouts.

Next steps:
1. Finalize model selection for chatbot and summarization by Friday.
2. Set up retrieval infrastructure and embedding index next week.
3. Prepare compliance review for data used in fine-tuning.

Let’s schedule a deep-dive call next Wednesday to align timelines and allocate resources.

Best regards,
Jordan
"""


In [None]:
prompt = f"""
Summarize this internal strategy email and extract the team's key AI decisions.
What architectural choices were made for the chatbot and legal summarization use cases?

email: {email}
"""

response = generate_llama_param(prompt)
print(response)

# What We Learned About Prompt Engineering

Congrats on making it through this hands-on tutorial! You’ve just explored a powerful skill set that will supercharge how you work with LLMs like LLaMA. 🚀

### 📚 Here's a recap of the prompt engineering strategies we covered:

🔹 **Zero-Shot Prompting** — No examples, just a task  
🔹 **Few-Shot Prompting** — Add examples to teach the model  
🔹 **In-Context Learning** — Let the model generalize from patterns  
🔹 **Chain-of-Thought (CoT)** — Ask it to explain its thinking  
🔹 **Role Prompting** — Give it a persona for more natural responses  
🔹 **Creative & Factual Prompts** — From storytelling to real-world questions  
🔹 **Business Use Cases** — Writing emails, summaries, and more

---


## ✅ Final Wrap-Up: Mastering Prompt Engineering with LLaMA 🧠✨

Congratulations on reaching the end of this hands-on journey! 🚀  
You’ve just explored the **art and science of prompt engineering**, and gained a powerful new skill to unlock the potential of Large Language Models like LLaMA. 💡

Let’s take a moment to reflect on everything we’ve learned:

---

### 🧰 Prompt Engineering Techniques You Now Master

#### 🔹 1. **Zero-Shot Prompting**
Ask a question without examples.  
✅ Great for simple, well-known tasks.

> 🧪 “Translate this sentence into French.”  
> 🎯 Direct and fast.

---

#### 🔹 2. **Few-Shot Prompting**
Provide a few labeled examples.  
✅ Ideal when the task is uncommon or has nuance.

> Example:  
> “2+2 = 4, 3+5 = 8… Now solve 7+6.”

---

#### 🔹 3. **In-Context Learning**
Let the model **learn from patterns** embedded in your prompt.  
✅ Excellent for custom logic or formats.

> Teach with examples, then ask it to continue the pattern.

---

#### 🔹 4. **Output Formatting**
Ask for structured results like:
- Bullet points
- Tables
- JSON
- Step-by-step reasoning

✅ Perfect for APIs, dashboards, and pipelines.

---

#### 🔹 5. **Role Prompting**
Assign the model a persona or job title.  
✅ Controls tone, style, and vocabulary.

> “You are a Shakespearean assistant…” 🎭  
> “You are a professional executive coach…” 👔

---

#### 🔹 6. **Context Injection**
When the model doesn’t know something (e.g., Wimbledon 2024 results), give it the answer yourself.

✅ This technique helps:
- Overcome training cut-off dates
- Add real-time or domain-specific info
- Improve factuality

---

#### 🔹 7. **Summarization**
Ask the model to condense long content into short insights.  
✅ Essential for productivity, education, and knowledge management.

> TL;DR the CEO’s email into 1 sentence ✅  
> Summarize a Wikipedia article in 3 bullet points ✅

---

#### 🔹 8. **Chain-of-Thought Prompting**
The *crown jewel* of prompt engineering.  
Ask the model to “think step by step” and watch its reasoning unfold.

✅ Great for:
- Math problems
- Logic puzzles
- Transparent decision-making
- Agent workflows

---

## 🧠 Key Takeaways

✅ Prompt engineering isn’t just about *what* you ask — it’s about *how* you ask.
✅ Simple changes in phrasing or structure can significantly improve output.
✅ You’re not “programming” the model — you’re **collaborating** with it. 🤝
✅ This is a creative and iterative process. Test, tweak, and try again!
✅ Small changes in your prompt can lead to BIG improvements in output  
✅ Context, clarity, and creativity make a difference  
✅ Prompt engineering is part art, part science — so experiment boldly!

---
### ⚠️ Common Mistakes to Avoid

- ❌ Asking multiple tasks in one prompt without structure
- ❌ Forgetting to guide output format
- ❌ Assuming the model knows post-2023 facts
- ❌ Skipping “think step by step” in reasoning prompts

Fix them by:
✅ Breaking instructions into steps  
✅ Injecting necessary context  
✅ Using role and tone consistently

### 🎯 What's Next?

💡 Try deploying these prompts in your own apps  
💡 Explore prompt tuning or fine-tuning if you want even more control  
💡 Check out more models like GPT-NeoX, Claude, or Mistral for comparison
💡 Build smarter chatbots  
💡 Summarize long documents  
💡 Create structured outputs for apps  
💡 Design multi-step reasoning chains  
💡 Inject knowledge into the model on-the-fly  
💡 Build products powered by LLaMA or other LLMs


---

### 🙌 Thanks for Learning With Me!

If this tutorial helped you:
- Share it with a friend 💌
- Try out your own prompts and build something cool 🛠️
- Explore more advanced topics like prompt tuning or fine-tuning 🔧

Keep experimenting. Keep learning. Keep prompting.  
And remember — the smartest AI… is the one you *prompt* well. 😉  

**💬 Let’s think step by step — and build the future together.**
