![email inbox](email_inbox.jpg)

Every day, professionals wade through hundreds of emails, from urgent client requests to promotional offers. It's like trying to find important messages in a digital ocean. But AI can help you stay afloat by automatically sorting emails to highlight what matters most.

You've been asked to build an intelligent email assistant using Llama, to help users automatically classify their incoming emails. Your system will identify which emails need immediate attention, which are regular updates, and which are promotions that can wait or be archived.

### The Data
You'll work with a dataset of various email examples, ranging from urgent business communications to promotional offers. Here's a peek at what you'll be working with:

### email_categories_data.csv

 Column | Description |
|--------|-------------|
| email_id | A unique identifier for each email in the dataset. |
| email_content | The full email text including subject line and body. Each email follows a format of "Subject" followed by the message content on a new line. |
| expected_category | The correct classification of the email: `Priority`, `Updates`, or `Promotions`. This will be used to validate your model's performance. |



In [15]:
# Run the following cells first
# Install necessary packages
!pip install llama-cpp-python==0.2.82 -q -q -q

In [16]:
# Download the model
!wget -q https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF/resolve/main/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf?download=true -O model.gguf

In [17]:
# Import required libraries
import pandas as pd
from llama_cpp import Llama

In [64]:
# Load the email dataset
emails_df = pd.read_csv('data/email_categories_data.csv')
# Display the first few rows of our dataset
print("Preview of our email dataset:")
emails_df.head(10)

Preview of our email dataset:


Unnamed: 0,email_id,email_content,expected_category
0,1,Urgent: Server Maintenance Required\nOur main ...,Priority
1,2,50% Off Spring Collection!\nDon't miss our big...,Promotions
2,3,Weekly Newsletter Update\nHere's your weekly r...,Updates
3,4,Team Meeting - Q2 Planning\nPlease join us tom...,Priority
4,6,Monthly Department Updates\nReview this month'...,Updates
5,8,New Product Launch Invitation\nJoin us for the...,Updates
6,9,Flash Sale - 24 Hours Only!\nEverything must g...,Promotions
7,10,Critical: Client Presentation Due\nThe client ...,Priority


In [65]:
# Initialize the Llama model
model_path = "model.gguf"
llm = Llama(model_path=model_path)


llama_model_loader: loaded meta data with 20 key-value pairs and 201 tensors from model.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = py007_tinyllama-1.1b-chat-v0.3
llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   4:                          llama.block_count u32              = 22
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv   7:                 llama.attention.head_count u32             

In [66]:
print(emails_df.iloc[7]['email_content'])
print(emails_df.iloc[7]['expected_category'])

Critical: Client Presentation Due\nThe client presentation for Project X is due in 2 hours. Requires immediate review.
Priority


In [91]:
# Create the system prompt with examples
# using last 3 email from the dataframe as examples - one of each category type

# Note: The triple-quoted strings (""" ... """) in Python do not automatically evaluate expressions inside {}. 
# We need to use f-strings (f""" ... """) to interpolate variables dynamically.

prompt = f""" You classify emails into Priority, Updates, or Promotions.

Example 1:
{emails_df.iloc[7]['email_content']}
Response: {emails_df.iloc[7]['expected_category']}

Example 2:
{emails_df.iloc[6]['email_content']}
Response: {emails_df.iloc[6]['expected_category']}

Example 3:
{emails_df.iloc[5]['email_content']}
Response: {emails_df.iloc[5]['expected_category']}

Example 4:
"""

In [92]:
print(prompt)

 You classify emails into Priority, Updates, or Promotions.

Example 1:
Critical: Client Presentation Due\nThe client presentation for Project X is due in 2 hours. Requires immediate review.
Response: Priority

Example 2:
Flash Sale - 24 Hours Only!\nEverything must go! Massive discounts on all items. Shop now before it's too late!
Response: Promotions

Example 3:
New Product Launch Invitation\nJoin us for the exclusive preview of our latest product line. RSVP required.
Response: Updates

Example 4:



In [102]:
# Function to process messages and return classifications
def process_message(llm, message, prompt):
    """Process a message and return the response"""
    input_prompt = f"{prompt} {message}\nResponse:"
    print(f"input_prompt:{input_prompt}")
    response = llm(
        input_prompt,
        max_tokens=10,
        temperature=0,
        #stop=["Example"],
        # \n: Prevents the model from generating new lines after the classification.
        # "Example": Ensures it doesn’t try to continue to the next example.
        # "Response:": Stops if the model mistakenly starts another response label. 
        stop=["\n", "Example", "Response:"],  # Stops after classification
    )
    
    print(f"response:{response}\n")
    
    return response['choices'][0]['text'].strip()

In [96]:
# Let's test our classifier on two emails from our dataset
# We'll take emails from different categories for variety
test_emails = emails_df.head(2)

# Process each test email and store results
results = []
for idx, row in test_emails.iterrows():
    email_content = row['email_content']
    expected_category = row['expected_category']
    
    # Get model's classification
    result = process_message(llm, email_content, prompt)
    
    print(f"result:{result}\n")
    
    # Store results
    results.append({
        'email_content': email_content,
        'expected_category': expected_category,
        'model_output': result
    })

input_prompt: You classify emails into Priority, Updates, or Promotions.

Example 1:
Critical: Client Presentation Due\nThe client presentation for Project X is due in 2 hours. Requires immediate review.
Response: Priority

Example 2:
Flash Sale - 24 Hours Only!\nEverything must go! Massive discounts on all items. Shop now before it's too late!
Response: Promotions

Example 3:
New Product Launch Invitation\nJoin us for the exclusive preview of our latest product line. RSVP required.
Response: Updates

Example 4:
 Urgent: Server Maintenance Required\nOur main server needs immediate maintenance due to critical errors. Please address ASAP.
Response:


Llama.generate: prefix-match hit

llama_print_timings:        load time =   10498.21 ms
llama_print_timings:      sample time =       0.64 ms /     4 runs   (    0.16 ms per token,  6269.59 tokens per second)
llama_print_timings: prompt eval time =    1556.45 ms /    31 tokens (   50.21 ms per token,    19.92 tokens per second)
llama_print_timings:        eval time =     309.23 ms /     3 runs   (  103.08 ms per token,     9.70 tokens per second)
llama_print_timings:       total time =    1868.16 ms /    34 tokens
Llama.generate: prefix-match hit


response:{'id': 'cmpl-9f17ba36-17db-4805-bca1-eaa09d6e12a7', 'object': 'text_completion', 'created': 1741299376, 'model': 'model.gguf', 'choices': [{'text': ' Promotions', 'index': 0, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 184, 'completion_tokens': 4, 'total_tokens': 188}}

result:Promotions

input_prompt: You classify emails into Priority, Updates, or Promotions.

Example 1:
Critical: Client Presentation Due\nThe client presentation for Project X is due in 2 hours. Requires immediate review.
Response: Priority

Example 2:
Flash Sale - 24 Hours Only!\nEverything must go! Massive discounts on all items. Shop now before it's too late!
Response: Promotions

Example 3:
New Product Launch Invitation\nJoin us for the exclusive preview of our latest product line. RSVP required.
Response: Updates

Example 4:
 50% Off Spring Collection!\nDon't miss our biggest sale of the season! All spring items half off. Limited time offer.
Response:



llama_print_timings:        load time =   10498.21 ms
llama_print_timings:      sample time =      88.89 ms /     4 runs   (   22.22 ms per token,    45.00 tokens per second)
llama_print_timings: prompt eval time =    2708.56 ms /    33 tokens (   82.08 ms per token,    12.18 tokens per second)
llama_print_timings:        eval time =     305.25 ms /     3 runs   (  101.75 ms per token,     9.83 tokens per second)
llama_print_timings:       total time =    3104.44 ms /    36 tokens


response:{'id': 'cmpl-79f45d08-d78e-4a20-a6ff-cfa039c6685c', 'object': 'text_completion', 'created': 1741299378, 'model': 'model.gguf', 'choices': [{'text': ' Promotions', 'index': 0, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 186, 'completion_tokens': 4, 'total_tokens': 190}}

result:Promotions



In [103]:
print(results)

result1 = results[0]['model_output']
result2 = results[1]['model_output']

# Display results
print(f"\nClassification Results: \n email 1  {result1} \n email 2: {result2}")

[{'email_content': 'Urgent: Server Maintenance Required\\nOur main server needs immediate maintenance due to critical errors. Please address ASAP.', 'expected_category': 'Priority', 'model_output': 'Promotions'}, {'email_content': "50% Off Spring Collection!\\nDon't miss our biggest sale of the season! All spring items half off. Limited time offer.", 'expected_category': 'Promotions', 'model_output': 'Promotions'}]

Classification Results: 
 email 1  Promotions 
 email 2: Promotions
