# Advance GenAI Workshop
author: King Matthew Ochoa

---
## Setup Preparation

Before we can do anything, we should have the necessary python libraries. The cell below installs them using pip package manager.

In [1]:
!pip install tiktoken

Collecting tiktoken
  Downloading tiktoken-0.9.0-cp311-cp311-win_amd64.whl.metadata (6.8 kB)
Downloading tiktoken-0.9.0-cp311-cp311-win_amd64.whl (893 kB)
   ---------------------------------------- 0.0/893.9 kB ? eta -:--:--
   --------------------------------------- 893.9/893.9 kB 10.0 MB/s eta 0:00:00
Installing collected packages: tiktoken
Successfully installed tiktoken-0.9.0


In [1]:
import tiktoken
def get_tokens(txt, model):
    encoding = tiktoken.encoding_for_model(model)
    tokens = encoding.encode(txt)
    return tokens

tokens = get_tokens("The irreplaceable vase has been broken and now irreperable.", "gpt-4")
print(tokens)


[791, 6348, 8319, 481, 93484, 706, 1027, 11102, 323, 1457, 25912, 716, 481, 13]


# Labwork
### We will need to install a certain version of langchain, langchain-groq.

In [5]:
!pip install langchain-groq
#!pip install pandas



### Saving the groq key created:

In [2]:
import os
your_api_key = "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"
os.environ['GROQ_API_KEY'] = your_api_key


### Importing the modules

In [3]:
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser
import json

In [10]:
# Initialize Groq LLM
llm = ChatGroq(
    model_name="llama-3.3-70b-versatile",
    temperature=0.7
)

#setup the prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a translator from english to german. :D"""),
    ("user", "{input}")
])

prompt_num2 = ChatPromptTemplate.from_messages([
    ("system", """You are a translator from english to tagalog. :D"""),
    ("user", "{input}")
])

#define the chain
chain = prompt | llm 
chain2 = prompt_num2 | llm

#call the LLM
sentence = "Hello welcome to YBLL Workshop!"
result = chain.invoke({"input": sentence})
result.content

'Hallo, willkommen im YBLL-Workshop!'

In [8]:
sentence_num2 = 'Hello, My Name is Christian Jay Baguio'
result_num2 = chain.invoke({"input": sentence_num2})
result_num2.content

'Hallo, ich heiße Christian Jay Baguio.\n\n(I translated "Hello, My Name is" to "Hallo, ich heiße" which is a more common way to introduce oneself in German. "Ich heiße" means "My name is".)'

In [11]:
sentence_num3 = 'Hello, My Name is Christian Jay Baguio'
result_num2 = chain2.invoke({"input": sentence_num3})
result_num2.content

'Kumusta, Ang pangalan ko ay Christian Jay Baguio. \n\n(Hello, My Name is Christian Jay Baguio - translated to Tagalog)'

In [9]:
# Initialize Groq LLM
llm = ChatGroq(
    model_name="llama-3.3-70b-versatile",
    temperature=0.7
)

# Define the expected JSON structure
parser = JsonOutputParser(pydantic_object={
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "price": {"type": "number"},
        "features": {
            "type": "array",
            "items": {"type": "string"}
        }
    }
})

# Create a simple prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", """Extract product details into JSON with this structure:
        {{
            "name": "product name here",
            "price": number_here_without_currency_symbol,
            "features": ["feature1", "feature2", "feature3"]
        }}"""),
    ("user", "{input}")
])

# Create the chain that guarantees JSON output
chain = prompt | llm | parser

def parse_product(description: str) -> dict:
    result = chain.invoke({"input": description})
    print(json.dumps(result, indent=2))

        
# Example usage
description = """The Kees Van Der Westen Speedster is a high-end, single-group espresso machine known for its precision, performance, 
and industrial design. Handcrafted in the Netherlands, it features dual boilers for brewing and steaming, PID temperature control for 
consistency, and a unique pre-infusion system to enhance flavor extraction. Designed for enthusiasts and professionals, it offers 
customizable aesthetics, exceptional thermal stability, and intuitive operation via a lever system. The pricing is approximatelyt $14,499 
depending on the retailer and customization options."""

parse_product(description)

{
  "name": "Kees Van Der Westen Speedster",
  "price": 14499,
  "features": [
    "Dual boilers for brewing and steaming",
    "PID temperature control",
    "Pre-infusion system",
    "Customizable aesthetics",
    "Exceptional thermal stability",
    "Intuitive operation via a lever system"
  ]
}


---

## It's your turn! 
# Activity 1
🧠  Prompt Engineering Exercise: Sentiment Analyzer
### 📦 Scenario 
#### Given the dataframe below, add the column value for the sentiment of the expression.

### 🎯 Objective  
#### Craft a high-quality prompt that fills out the empty `sentiment` column of the pandas dataframe.

*hint: create a function and loop over.

In [21]:
import pandas as pd

sentiment_df = pd.DataFrame([
    "This is amazing!",
    "I can't stop smiling!",
    "I'm so thrilled!",
    "I feel really down.",
    "This is tough for me.",
    "I'm just not myself today.",
    "It's just another day.",
    "I'm okay.",
    "It is what it is.",
    "This is frustrating!",
    "I'm really upset about this.",
    "I can't believe this happened!",
    "Couldn't be happier today!",
    "This just feels right.",
    "I love this!",
    "I'm heartbroken.",
    "Everything feels heavy.",
    "Wish things were different.",
    "Just going through the motions.",
    "I'm feeling indifferent.",
    "What did I do to deserve this?",
    "I'm at my wit's end!",
    "Overjoyed with this news!",
    "I'm on cloud nine!",
    "Feeling blessed and grateful.",
    "Life's full of ups and downs.",
    "Trying to stay positive.",
    "Everything's a bit overwhelming.",
    "I'm content with where I am.",
    "Finally, some peace and quiet.",
    "That was the last straw!",
    "They always let me down."
], columns= ['expression'])
sentiment_df['sentiment'] = ''

sentiment_df

Unnamed: 0,expression,sentiment
0,This is amazing!,
1,I can't stop smiling!,
2,I'm so thrilled!,
3,I feel really down.,
4,This is tough for me.,
5,I'm just not myself today.,
6,It's just another day.,
7,I'm okay.,
8,It is what it is.,
9,This is frustrating!,


# Activity 2
🧠  Prompt Engineering Exercise: Structured Insight Extraction
### 📦 Scenario 
#### A teacher gives a riddle to her class:

“If you take the number of letters in the English word for the smallest prime number, multiply that by the position of that prime in the list of primes, and then subtract the number of letters in the word for the next prime, what do you get?”

What is the final result?

### 🎯 Objective  
#### Utilize CoT prompting technique to solve the riddle.

In [29]:
# Initialize Groq LLM
llm = ChatGroq(
    model_name="llama-3.3-70b-versatile",
    temperature=0.7
)

#setup the prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", """
                System prompt here.
                """),
    ("user", "{input}")
])

#define the chain
chain = prompt | llm 

#call the LLM
user_prompt = "Hello welcome to YBLL Workshop!"
result = chain.invoke({"input": user_prompt})
result

AIMessage(content="Hello. It's nice to be here at the YBLL Workshop. What's the focus of this workshop, and how can I assist or participate?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 32, 'prompt_tokens': 47, 'total_tokens': 79, 'completion_time': 0.130159755, 'prompt_time': 0.003094667, 'queue_time': 0.24896656700000003, 'total_time': 0.133254422}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_9a8b91ba77', 'finish_reason': 'stop', 'logprobs': None}, id='run-dce49814-1225-4eed-8039-6ceb1afbe7af-0', usage_metadata={'input_tokens': 47, 'output_tokens': 32, 'total_tokens': 79})

# Activity 3: 
🧠 Prompt Engineering Exercise: Structured Insight Extraction


📦 Scenario
You are given a small dataset of customer feedback in free-form text. Your task is to design a prompt that turns these messy text entries into structured information suitable for a spreadsheet or database.

🎯 Objective
Craft a high-quality prompt that extracts structured insights from unstructured text data, focusing on clarity, structure, and consistency of output.

Design a single prompt that takes the above list and returns a structured table with the following columns:

| Feedback | Sentiment | Issues Mentioned | Actionable Suggestion |

1. "Shipping was delayed but the product quality was excellent."
2. "The support team was unhelpful and I never got a response to my email."
3. "Loved the fast delivery. Great material. Will definitely buy again!"
4. "Not happy with the purchase. Item didn’t match the photo."
5. "Amazing experience! Great fit, quick delivery, and friendly customer service."
6. "The shirt fabric feels cheap. It shrank after one wash."
7. "Packaging was impressive. Arrived a day early!"
8. "Customer service ignored my refund request for a week."
9. "Website was confusing. Took me a while to find the size guide."
10. "Very satisfied. My third order and still great quality."
11. "The pants were too tight even though I followed the sizing chart."
12. "Got the wrong color. Asked for black, received navy blue."
13. "Product is good, but I wish the price was more affordable."
14. "Quick checkout and smooth delivery. Thanks!"
15. "Item had a weird smell out of the box. Needed to wash it first."

In [30]:
# Initialize Groq LLM
llm = ChatGroq(
    model_name="llama-3.3-70b-versatile",
    temperature=0.7
)

#setup the prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", """
                System prompt here.
                """),
    ("user", "{input}")
])

#define the chain
chain = prompt | llm 

#call the LLM
user_prompt = "Hello welcome to YBLL Workshop!"
result = chain.invoke({"input": user_prompt})
result

AIMessage(content="Hello. It's nice to be here at the YBLL Workshop. I'm excited to learn and participate. What's the focus of the workshop, and what can I expect to learn or accomplish today?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 43, 'prompt_tokens': 47, 'total_tokens': 90, 'completion_time': 0.156363636, 'prompt_time': 0.00267654, 'queue_time': 0.24169566, 'total_time': 0.159040176}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_3f3b593e33', 'finish_reason': 'stop', 'logprobs': None}, id='run-7d062ba4-7da4-4633-8178-2ec266c7402a-0', usage_metadata={'input_tokens': 47, 'output_tokens': 43, 'total_tokens': 90})