# Programming Assignment: Developing a RAG-based Chatbot
---
Welcome to your fourth assignment in the RAG course! Having acquired foundational techniques, you are now ready to embark on building a more sophisticated Retrieval-Augmented Generation (RAG) system. In this assignment, you will undertake the following tasks:

- **LLM routing**: Develop functions to help categorize and identify the type of each query, creating a router that, depending on the query nature, different treatments are performed.
- **Conditional parameter setting**: Create methods to determine if a user’s query is creative or technical. This allows the LLM to adjust its settings to give the best answer.
- **Producing JSON Responses**: Program the LLMs to generate valid JSON responses with product information, making sure the output is organized for more processing if needed.
- **Adding Contextual Information**: Include relevant data in queries before they are handled by the LLM.
- **Chatbot Development**: Create a chatbot that can interact with users in a natural and efficient way, answering their questions clearly.

In this assignment, you will apply the skills you've learned to integrate RAG capabilities into a RAG system that will support a ChatBot.


# Table of Contents
- [ 1 - Introduction: Your Mission at Fashion Forward Hub](#1)
  - [ 1.1 Importing the libraries](#1-1)
  - [ 1.2 Loading the Weaviate client](#1-2)
- [ 2 - Understanding Fashion Forward Hub data schema](#2)
  - [ 2.1 Products Database](#2-1)
  - [ 2.2 FAQ Database](#2-2)
- [ 3 - Task routing](#3)
  - [ 3.1 Deciding if a query is FAQ or Product related](#3-1)
    - [ Exercise 1](#ex01)
  - [ 3.2 Answering a FAQ question](#3-2)
    - [ Exercise 2](#ex02)
  - [ 3.3 Decide the Nature of a Product-Related Question](#3-3)
    - [ Exercise 3](#ex03)
  - [ 3.4 Retrieving the Parameters for a Given Task](#3-4)
    - [ Exercise 4](#ex04)
- [ 4 - Retrieving Items Based on Metadata Inferred from a Query](#4)
  - [ 4.1 Generate metadata](#4-1)
    - [ Exercise 5](#ex05)
  - [ 4.2 Loading the Weaviate Product Collection](#4-2)
  - [ 4.3 Filtering by metadata (NOT GRADED)](#4-3)
  - [ 4.4 Generating the retrieve items as a context (NOT GRADED)](#4-4)
  - [ 4.5 Query on Products (NOT GRADED)](#4-5)
- [ 5 - The Final Function!](#5)
  - [ 5.1 The function to rule them all](#5-1)
  - [ 5.2 The ChatBot](#5-2)


---
<h4 style="color:black; font-weight:bold;">USING THE TABLE OF CONTENTS</h4>

JupyterLab provides an easy way for you to navigate through your assignment. It's located under the Table of Contents tab, found in the left panel, as shown in the picture below.

![TOC Location](images/toc.png)

---

<h4 style="color:green; font-weight:bold;">TIPS FOR SUCCESSFUL GRADING OF YOUR ASSIGNMENT:</h4>

- All cells are frozen except for the ones where you need to submit your solutions or when explicitly mentioned you can interact with it.

- You can add new cells to experiment but these will be omitted by the grader, so don't rely on newly created cells to host your solution code, use the provided places for this.

- Avoid using global variables unless you absolutely have to. The grader tests your code in an isolated environment without running all cells from the top. As a result, global variables may be unavailable when scoring your submission. Global variables that are meant to be used will be defined in UPPERCASE.

- - To submit your notebook for grading, first save it by clicking the 💾 icon on the top left of the page and then click on the <span style="background-color: blue; color: white; padding: 3px 5px; font-size: 16px; border-radius: 5px;">Submit assignment</span> button on the top right of the page.
---

<a id='1'></a>
## 1 - Introduction: Your Mission at Fashion Forward Hub
---

Great news! You've been brought on board by Fashion Forward Hub, an online clothing store always looking for the latest technology. They need your help to create a smart chatbot for their website. This chatbot will answer common questions, provide details about products, and help customers pick out outfits.

Using what you've learned so far in this course, you'll apply your skills in Retrieval-Augmented Generation (RAG) to make this chatbot a reality. The tools and techniques you've been exploring will come together as you build a system that makes shopping easier and more fun. Get ready to show how technology can transform the customer experience at Fashion Forward Hub!

<a id='1-1'></a>
### 1.1 Importing the libraries


In [2]:
import json
from weaviate.classes.query import Filter
import weaviate
import joblib

In [3]:
import unittests
import flask_app
import weaviate_server
from utils import (
    ChatWidget,
    generate_with_single_input,
    generate_params_dict
)

 * Serving Flask app 'flask_app'
 * Debug mode: off


<a id='1-2'></a>
### 1.2 Loading the Weaviate client

In this assignment you will use again the Weaviate API to load the vector database. Do not worry, you won't need to load the database. It is already given to you!

In [46]:
client = weaviate.connect_to_local(port=8079, grpc_port=50050)

## 1.3 `generate_params_dict` function

Let's recap the function you worked in the Ungraded lab to generate a dictionary of parameters:

```Python
def generate_params_dict(
    prompt: str, 
    temperature: float = None, 
    role = 'user',
    top_p: float = None,
    max_tokens: int = 500,
    model: str = "meta-llama/Llama-3.2-3B-Instruct-Turbo"
)
```

In [5]:
# An output example is
kwargs = generate_params_dict("Solve x^2 - 1 = 0", temperature = 1.2, top_p = 0.2)
print(kwargs)

{'prompt': 'Solve x^2 - 1 = 0', 'role': 'user', 'temperature': 1.2, 'top_p': 0.2, 'max_tokens': 500, 'model': 'meta-llama/Llama-3.2-3B-Instruct-Turbo'}


In [6]:
# Generating 
response = generate_with_single_input(**kwargs)
print(response['content'])

To solve the equation x^2 - 1 = 0, we can start by adding 1 to both sides:

x^2 - 1 + 1 = 0 + 1

This simplifies to:

x^2 = 1

Next, we can take the square root of both sides:

x = √1

x = ±1

So, the solutions to the equation x^2 - 1 = 0 are x = 1 and x = -1.


<a id='2'></a>
## 2 - Understanding Fashion Forward Hub data schema
---
In this section, you will understand how the data is stored in Fashion Forward hub databases. 

There are two databases:

- Product database: Contains the products and their information.
- FAQ database: Contains the FAQ data.

<a id='2-1'></a>
### 2.1 Products Database

Let's explore the products database that Fashion Forward Hub has available. To make it easier to understand, let's load it as a list of JSON files first.

In [7]:
# Loading products data
PRODUCTS_DATA = joblib.load('dataset/clothes_json.joblib')

In [8]:
# Let's get one example
PRODUCTS_DATA[0]

{'gender': 'Men',
 'masterCategory': 'Apparel',
 'subCategory': 'Topwear',
 'articleType': 'Shirts',
 'baseColour': 'Navy Blue',
 'season': 'Fall',
 'year': 2011.0,
 'usage': 'Casual',
 'productDisplayName': 'Turtle Check Men Navy Blue Shirt',
 'price': 67,
 'product_id': 15970}

The features each product has are:

- **Gender:** Target audience for the product, such as "Men," "Women," or "Unisex."
- **Master Category:** Broad classification like "Apparel" or "Footwear."
- **Sub Category:** Specific category within a master category, such as "Topwear."
- **Article Type:** Exact type of product, e.g., "Shirts" or "Jackets."
- **Base Colour:** Main color of the product, important for customer choice.
- **Season:** Intended season for the product, e.g., "Summer" or "Winter."
- **Year:** Year of release or collection.
- **Usage:** Intended use or occasion, like "Casual" or "Formal."
- **Product Display Name:** Descriptive name used in marketing.
- **Price:** Cost of the product.
- **Product ID:** Unique identifier for managing and tracking inventory.

<a id='2-2'></a>
### 2.2 FAQ Database

Now let's load the FAQ database. And explore it.

In [9]:
FAQ = joblib.load("dataset/faq.joblib")

In [10]:
# Get an example
FAQ[:2]

[{'question': 'What are your store hours?',
  'answer': 'Our online store is open 24/7. Customer service is available from 9:00 AM to 6:00 PM, Monday through Friday.',
  'type': 'general information'},
 {'question': 'Where is Fashion Forward Hub located?',
  'answer': 'Fashion Forward Hub is primarily an online store. Our corporate office is located at 123 Fashion Lane, Trend City, Style State.',
  'type': 'general information'}]

So the FAQs are in a list with dictionaries containing `question`, `answer` and `type`. In this assignment you will work with the FAQ as a hardcoded string into a prompt, so you won't need to have a collection for querying on it. 

<a id='3'></a>
## 3 - Task routing
---
<a id='3-1'></a>
### 3.1 Deciding if a query is FAQ or Product related


In this section, you will start building the framework for your RAG system. 

The idea is to create a router to decide if an instruction is about FAQ or about products (this conveys specific information about products and possible creative instructions, like creating a look for a specific occasion).

This function will input a query the user makes and output if it is FAQ or Product related.

<a id='ex01'></a>
### Exercise 1

In the function `check_if_faq_or_product`, you need to create a hardcoded `prompt` variable, containing the context to instruct the model to behave properly, i.e., to correctly decide where the query belongs to.

**Hints**: 
- Be specific, instruct the LLM to provide only two classes - FAQ or Product - **They must have these exact names**. 
- Add examples with the correct label (try adding questions that the LLM might struggle with).
- Restrict the max_tokens to 1.
- Do not forget to include the query in the prompt! 
- Set low temperatures to avoid too much randomness.
- If the LLM fails to return FAQ or Product, return `None`, so you can handle it later.

<details>
    <summary style="color: green;"><strong>Hint 1</strong></summary>
    Start by focusing on how to construct the prompt string. You'll need to include clear instructions and incorporate the provided query into it to ensure the LLM knows what to categorize.
</details>

<details>
  <summary style="color: green;"><strong>Hint 2</strong></summary>
  <p>When creating the <code>prompt</code>, include several examples of questions with their expected labels.</p>
  <p>This will help guide the LLM on how to categorize different kinds of queries as either FAQ or Product-related.</p>
  <p>You can start with something like <code>Label the following instruction as an FAQ related answer or a product related answer.</code> You might want to add one or two sentences explaining what an FAQ and product queries are. Then, include a few examples with their respective desired label.</p>
  <p>E.g., <code>Is there a refund for incorrectly bought clothes? Label: FAQ</code>.</p>
  <p>Try writing 4–5 examples that cover a variety of possibilities. At the end, write the query you want the model to decide. <b>Important: Don't forget to explicitly write that the model should output only one word.</b> Something like: <code>Return only one of the two labels: FAQ or Product.</code></p>
  <p>Add the query at the end.</p>
</details>


<details>
    <summary style="color: green;"><strong>Hint 3</strong></summary>
    After setting up the <code>prompt</code>, generate the parameters required for the LLM call. Use `generate_params_dict` with a low temperature (e.g., 0). Then, call `generate_with_single_input` using these parameters. Finally, check if the label is <code>FAQ</code> or <code>Product</code>.
</details>

<details>
    <summary style="color: green;"><strong>Prompt Example</strong></summary>
    <p>Here there is a simple prompt example where you can start working with. You may add more examples!</p>
    <pre><code>
Label the following instruction as an FAQ-related query or a product-related query.
Product-related answers are specific to product information or require using product details to answer. Products are clothes from a store. 
An FAQ question addresses common inquiries and provides answers to help users find the information they need.
Examples:
        Is there a refund for incorrectly bought clothes? Label: FAQ
        Tell me about the cheapest T-shirts that you have. Label: Product
        Do you have blue T-shirts under 100 dollars? Label: Product
        I bought a T-shirt and I didn't like it. How can I get a refund? Label: FAQ

Return only one of the two labels: FAQ or Product.
Instruction: {query}
    </code></pre>
</details>

In [10]:
# GRADED CELL 

def check_if_faq_or_product(query: str) -> str:
    """
    Determines whether a given instruction prompt is related to a frequently asked question (FAQ) or a product inquiry.

    Parameters:
    - query (str): The instruction or query to be labeled as either FAQ or product-related.

    Returns:
    - str: The label 'FAQ' if the prompt is classified as a frequently asked question, 'Product' if it relates to product information, or
      None if the label is inconclusive.
    """
    ### START CODE HERE ###

    # Set the hardcoded prompt. Remember to include the query, clear instructions (explicitly tell the LLM to return FAQ or Product)
    # Include examples of question / desired label pairs.
    
    prompt = f"""
    Determine the category of the following query as either "FAQ" or "Product" inquiry related for a retail company called Fashion Forward Hub.
  - FAQ queries: These are related to frequently asked questions such as store opening time, closing time,  store locations.
  - Product queries: These are related to product information such as colour, price and gender. 
  
  Examples:

1. Query: “What are your store opening hours?” Expected answer: FAQ
2. Query: “Is there a Fashion Forward hub located in London?” Expected answer: FAQ
3. Query: “What is the price of this shirt?” Expected answer: Product
4. Query: “What colour are these shirts available in?” Expected answer: Product
5. Query: “Is there a toilet in Fashion Forward Hub store” Expected answer: FAQ
6. Query: “What season can I wear this shirt” Expected answer: Product 

Query: {query}

Instructions: Respond with “FAQ” if the query pertains to FAQ queries or “Product” if it pertains to Product queries.
Answer only one single word.
"""

    # Get the kwargs dictionary to call the LLM, with PROMPT as prompt, low temperature (0.3 - 0.5)
    # The function call is generate_params_dict, pass the PROMPT and the correct temperature
    
    kwargs =  generate_params_dict(prompt, temperature=0.4, top_p=0.4)

    # Call generate_with_single_input with **kwargs
    response = generate_with_single_input(**kwargs)
    # Get the label by accessing the 'content' key of the response dictionary

    label = response['content']

    ### END CODE HERE ###
    
    return label


In [15]:
queries = ['What is your return policy?', 
           'Give me three examples of blue T-shirts you have available.', 
           'How can I contact the user support?', 
           'Do you have blue Dresses?',
           'Create a look suitable for a wedding party happening during dawn.']

for query in queries:
    response = check_if_faq_or_product(query)
    label = response
    print(f"Query: {query} Label: {label}")

Query: What is your return policy? Label: FAQ
Query: Give me three examples of blue T-shirts you have available. Label: Product
Query: How can I contact the user support? Label: FAQ


Exception: Error while calling LLM: f<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>504 Gateway Timeout ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.
<BR clear="all">
If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront) HTTP3 Server
Request ID: tHm1wkoTsotcHl0i-ew9N6sxpSf1UHahsX7SjKfMea72lDikiFfK_Q&#x3D;&#x3D;
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>

**Expected output:**
```
Query: What is your return policy? Label: FAQ
Query: Give me three examples of blue Tshirts you have available. Label: Product
Query: How can I contact the user support? Label: FAQ
Query: Do you have blue Dresses? Label: Product
Query: Create a look suitable for a wedding party happening during dawn. Label: Product
```

In [17]:
unittests.test_check_if_faq_or_product(check_if_faq_or_product)

[92m All tests passed!


<a id='3-2'></a>
### 3.2 Answering a FAQ question

Now that you have a method to decide whether a query is for FAQ or Product, you will create another function to answer a FAQ question.

This function also needs a hardcoded prompt and the FAQ question and answer pairs. For that, you will create a FAQ layout with these pairs. 

First, let's recall how the FAQ JSON is.

In [18]:
# print the structure of the first element
FAQ[0]

{'question': 'What are your store hours?',
 'answer': 'Our online store is open 24/7. Customer service is available from 9:00 AM to 6:00 PM, Monday through Friday.',
 'type': 'general information'}

<a id='ex04'></a>

#### 3.2.1 Creating the FAQ Layout

Now you will generate the FAQ layout as discussed above.

The FAQ Layout will be the following:

```
Question: FAQ Question 1, Answer: FAQ Answer 1, Type: FAQ Type 1
...
Question: FAQ Question 25, Answer: FAQ Answer 25, Type: FAQ Type 25
```

This function is given to you. Feel free to change the FAQ layout if you want to - but do so after finishing the assignment!

In [19]:
def generate_faq_layout(faq_dict: list) -> str:
    """
    Generates a formatted string layout for a list of FAQs.

    This function iterates through a dictionary of frequently asked questions (FAQs) and constructs
    a string where each question is followed by its corresponding answer and type.

    Parameters:
    - faq_dict (list): A list of dictionaries, each containing keys 'question', 'answer', and 'type' 
      representing an FAQ entry.

    Returns:
    - str: A string representing the formatted layout of FAQs, with each entry on a separate line.
    """
    # Initialize an empty string
    t = ""

    # Iterate over every FAQ question in the FAQ list
    for f in faq_dict:
        # Append the question with formatted string (remember to use f-string and access the values as f['question'], f['answer'] and so on)
        # Also, do not forget to add a new line character (\n) at the end of each line.
        t += f"Question: {f['question']} Answer: {f['answer']} Type: {f['type']}\n" 
  

    return t

In [20]:
FAQ_LAYOUT = generate_faq_layout(FAQ)
print(FAQ_LAYOUT[:1000])

Question: What are your store hours? Answer: Our online store is open 24/7. Customer service is available from 9:00 AM to 6:00 PM, Monday through Friday. Type: general information
Question: Where is Fashion Forward Hub located? Answer: Fashion Forward Hub is primarily an online store. Our corporate office is located at 123 Fashion Lane, Trend City, Style State. Type: general information
Question: Do you have a physical store location? Answer: At this time, we operate exclusively online. This allows us to offer a broader selection and lower prices directly to you. Type: general information
Question: How can I create an account with Fashion Forward Hub? Answer: Click on 'Sign Up' in the top right corner of our website and follow the instructions to set up your account. Type: general information
Question: How do I subscribe to your newsletter? Answer: To receive the latest updates and promotions, sign up for our newsletter at the bottom of our homepage. Type: general information
Question:

<a id='ex02'></a>
### Exercise 2
---
Great! Now that you have the FAQ layout ready, your next task is to create a function that answers questions based on the FAQ content. You’ll inject the FAQ layout into a new, hardcoded prompt. How you do this is up to you—one common approach is to wrap the FAQ layout using the `<FAQ> </FAQ>` tags.

Be sure to write a prompt with clear, explicit instructions. Also, **don’t forget to include the FAQ layout** you created earlier! You can insert it into your f-string using `{FAQ_LAYOUT}` like this:

```python
f"FAQ LAYOUT: {FAQ_LAYOUT}"
```
---
**How will you be graded?**  
The grader will check if your prompt includes the original question and whether your function correctly matches it to the relevant FAQ answers.

<details>
    <summary style="color: green;"><strong>Hint 1</strong></summary>
    Think about how to effectively integrate the FAQ content into your prompt. The variable <code>FAQ_LAYOUT</code> already holds the FAQ data, so you can include it directly in your prompt using an f-string. For example, start with something like <code>You will be provided with an FAQ for a clothing store.</code> Then, give the model clear instructions, such as:
    <ol>
        <li>Use multiple FAQ answers if necessary.</li>
        <li>Only answer the question asked.</li>
        <li>Base the answer solely on the provided FAQ.</li>
    </ol>
</details>

<details>
    <summary style="color: green;"><strong>Hint 2</strong></summary>
    Use a placeholder like <code>{FAQ_LAYOUT}</code> in your prompt to include the FAQ data. This ensures the LLM has the full context to generate accurate answers. A helpful format might be:  
    <code>&lt;FAQ&gt;<br>PROVIDED FAQ: {FAQ_LAYOUT}<br>&lt;/FAQ&gt;</code>
</details>
<details>
    <summary style="color: green;"><strong>Hint 3</strong></summary>
    After constructing your <code>prompt</code>, use <code>generate_params_dict</code> to create the parameters, passing in your prompt and any required keyword arguments. Then, call <code>generate_with_single_input</code> using your prompt to get the model’s response.
</details>

<details>
    <summary style="color: green;"><strong>Prompt Example</strong></summary>
    <p>Here there is a simple prompt example where you can start working with. </p>
    <pre><code>
You will be provided with an FAQ for a cloth store. 
    Answer the instruction based on it. You might use more than one question and answer to make your answer. Only answer the question and do not mention that you have access to a FAQ. 
    &lt;FAQ&gt;
    PROVIDED FAQ: {FAQ_LAYOUT}
    &lt;/FAQ&gt;
    Question: {query}
    </code></pre>
</details>


In [29]:
# GRADED CELL

def query_on_faq(query: str, **kwargs) -> dict:
    """
    Constructs a prompt to query an FAQ system and generates a response.

    Parameters:
    - query (str): The query about which the function seeks to provide an answer from the FAQ.
    - **kwargs: Optional keyword arguments for extra configuration of prompt parameters.

    Returns:
    - str: The response generated from the LLM based on the input query and FAQ layout.

    """
    ### START CODE HERE ###

    # Make the prompt. Don't forget to add the FAQ_LAYOUT and the query in it!
    prompt = f"""
    Answer the user query using this list of FAQs {FAQ_LAYOUT}.
    
    Query: {query}
    """

    # Generate the parameters dict with PROMPT and **kwargs 
    kwargs = generate_params_dict(prompt, temperature=0.4, top_p=0.4)

    ### END CODE HERE ###
    
    return kwargs

In [30]:
kwargs = query_on_faq("I got my cloth but I didn't like it. How can I return it?")

In [31]:
content = generate_with_single_input(**kwargs)

In [32]:
print(content['content'])

You can return your item through our Returns Center. To initiate the return process, select the item you wish to return and the desired replacement from the options provided. Our team will guide you through the next steps. Please note that returns are accepted within 30 days of delivery, and conditions apply for specific categories like accessories.


The related FAQ questions about returns are here:

```
Question: What is your return policy timeframe? Answer: We accept returns within 30 days of delivery. Conditions apply for specific categories like accessories. Type: returns and exchanges

Question: Are return shipping costs covered? Answer: We provide a prepaid return label for domestic returns. For international returns, shipping is at the customer's cost. Type: returns and exchanges
```

Analyze the model's answer and check if it makes sense given these two questions.

In [33]:
unittests.test_query_on_faq(query_on_faq)

[92m All tests passed!


<a id='3-3'></a>
### 3.3 Decide the Nature of a Product-Related Question

Now, let's start working with product-related queries.

<a id='ex03'></a>
### Exercise 3
---
Your employer only wants the chatbot to answer the following types of queries:

- **Technical queries** – asking for descriptions of specific products, such as whether a blue dress is available or requesting three examples of red T-shirts suitable for sunny days.
- **Creative queries** – asking for help creating a stylish look for visiting a museum.

You will proceed as before. Create a prompt with clear instructions (and examples!) alongside the query. Remember to ensure the model only outputs **"creative"** or **"technical"** by:

- setting a low temperature,
- and explicitly stating this in the prompt.

<details>
    <summary style="color: green;"><strong>Hint 1</strong></summary>
    Consider how to frame a prompt that clearly instructs the LLM to classify the nature of a query. This will involve using examples within the <code>prompt</code>. Include strong examples—especially cases where the LLM might struggle to decide—and label them appropriately. This exercise is very similar to Exercise 1. You might start with something like:  
    <code>Decide if the following query is a query that requires creativity (creating, composing, making new things) or technical (information about products, prices, etc.). Label it as creative or technical.</code>
</details>

<details>
    <summary style="color: green;"><strong>Hint 2</strong></summary>
    Use example queries to show the model how different types of queries are labeled. Include queries that are clearly creative or technical within the <code>prompt</code> for clarity. For instance:<br>
    <pre><code>
Examples:
Give me suggestions on a nice look for a nightclub. Label: creative
What are the blue dresses you have available? Label: technical
</code></pre>
    Finish by adding the query to be analyzed. Something like:  
    <code>Query to be analyzed: {query}. Only output one token with the label.</code>  
    should be enough.  
</details>

<details>
    <summary style="color: green;"><strong>Hint 3</strong></summary>
    After creating the <code>PROMPT</code>, use <code>generate_params_dict</code> to prepare the parameters. Set <code>temperature</code> to 0 for predictability and <code>max_tokens</code> to 1 to ensure a concise label. Then, call <code>generate_with_single_input</code> to get the label and extract it using <code>response['content']</code>.
</details>


<details>
    <summary style="color: green;"><strong>Prompt Example</strong></summary>
    <p>Here there is a simple prompt example where you can start working with. You may add more examples!</p>
    <pre><code>
Decide if the following query is a query that requires creativity (creating, composing, making new things) or technical (information about products, prices, etc.). Label it as creative or technical.
              Examples:
              Give me suggestions on a nice look for a nightclub. Label: creative
              What are the blue dresses you have available? Label: technical
              Give me three T-shirts for summer. Label: technical
              Give me a look for attending a wedding party. Label: creative
              Query to be analyzed: {query}. Only output one token: the label.
    </code></pre>
</details>



In [17]:
# GRADED CELL

def decide_task_nature(query: str) -> str:
    """
    Determines whether a query is creative or technical.

    This function constructs a prompt for an LLM to decide if a given query requires a creative response,
    such as making suggestions or composing ideas, or a technical response, such as providing product details or prices.

    Parameters:
    - query (str): The query to be evaluated for its nature.

    Returns:
    - str: The label 'creative' if the query requires creative input, or 'technical' if it requires technical information.
    """

    ### START CODE HERE ###

    # Create the prompt. Remember to include the query, examples, and clear instructions (not necessarily in this order!)
    prompt = f"""
    
    Determine the category of the following query as either "creative" or "technical" inquiry related for a retail company called Fashion Forward Hub.
  - creative queries: These are related to creative things such as asking what clothes should be worn in summer.
  - technical queries: These are related to product specific information such as colour, availability, price and gender. 
  
  Examples:

1. Query: “Is the blue shirt available in store?” Expected answer: technical
2. Query: “Do you have winter jacketa available online” Expected answer: technical
3. Query: “Give me two sneakers in vibrant colours” Expected answer: technical
4. Query: “What colour Jeans should I wear in summer?” Expected answer: creative
5. Query: “Give me ideas for summer dresses for a young girl” Expected answer: FAQ
6. Query: “What colour top can I wear with my dark blue jeans?” Expected answer: Product 

Query: {query}

Instructions: Respond with “creative” if the query pertains to creative queries or “technical” if it pertains to Product specific questions.
Answer only one single word.
"""

    # Generate the kwargs dictionary by passing the PROMPT, setting temperature to 0 and max_tokens to 1
    kwargs =  generate_params_dict(prompt, temperature=0.4, top_p=0.4)

    # Call generate_with_single_input with **kwargs
    response = generate_with_single_input(**kwargs)

    # Get the label
    label = response['content']

    ### END CODE HERE ###
    
    return label

In [18]:
queries = ["Give me two sneakers with vibrant colors.",
           "What are the most expensive clothes you have in your catalogue?",
           "I have a green dress and I like a suggestion on an accessory to match with it.",
           "Give me three trousers with vibrant colors you have in your catalogue.",
           "Create a look for a woman walking in a park on a sunny day. It must be fresh due to hot weather."
           ]

In [20]:
for query in queries:
    label = decide_task_nature(query)
    print(f"Query: {query} Label: {label}")

Query: Give me two sneakers with vibrant colors. Label: creative
Query: What are the most expensive clothes you have in your catalogue? Label: technical
Query: I have a green dress and I like a suggestion on an accessory to match with it. Label: creative
Query: Give me three trousers with vibrant colors you have in your catalogue. Label: technical
Query: Create a look for a woman walking in a park on a sunny day. It must be fresh due to hot weather. Label: creative


**Expected Output:**
```
Query: Give me two sneakers with vibrant colors. Label: technical
Query: What are the most expensive clothes you have in your catalogue? Label: technical
Query: I have a green Dress and I like a suggestion on an acessory to match with it. Label: creative
Query: Give me three trousers with vibrant colors you have in your catalogue. Label: technical
Query: Create a look for a woman walking in a park on a sunny day. It must be fresh due to hot weather. Label: creative
```

In [21]:
unittests.test_decide_task_nature(decide_task_nature)

[92m All tests passed!


<a id='3-4'></a>
### 3.4 Retrieving the Parameters for a Given Task

<a id='ex04'></a>
### Exercise 4
---
In this exercise, you will create a function that, given a task, returns the appropriate values for `top_p` and `temperature`.

For **technical** queries, **low randomness is preferred**, whereas for **creative** tasks, **higher randomness might be more suitable**. 

**Important:** If the task is neither `technical` nor `creative` (for example, if the LLM fails to output a valid label), then fallback to a default set of parameters. You can decide whether to choose a middle ground between low and high randomness or stick to low randomness as a conservative approach.

**Note**: Remember that a temperature that is too high will lead the model to nonsense results so keep it below **1.3** and if it is close to **1.3**, be sure to lower the **top_p**. Also, remember that *top_p* cannot be greater than 1!

<details>
    <summary style="color: green;"><strong>Hint 1</strong></summary>
    Begin by considering how to map each task type to its corresponding parameters. Think about how creative and technical tasks might require different model settings. Remember that if the temperature is too big, you might get nonsense results if you don't control the top_p parameter! Suggestions for temperature are: 1 for creative and 0.3 for technical. Adjust top_p accordingly.
</details>

<details>
    <summary style="color: green;"><strong>Hint 2</strong></summary>
    Use a dictionary (<code>PARAMETERS_DICT</code>) to store the parameter configurations for the 'creative' and 'technical' tasks. This will make retrieving parameters straightforward based on the task label.
</details>

<details>
    <summary style="color: green;"><strong>Hint 3</strong></summary>
    Implement logic to retrieve parameters: Check if <code>task</code> matches 'technical' or 'creative', and assign the corresponding parameter set from <code>PARAMETERS_DICT</code>. Use a default parameter set if <code>task</code> doesn't match expected values to ensure graceful handling of unexpected input.
</details>


In [22]:
# GRADED CELL 

def get_params_for_task(task: str) -> dict:
    """
    Retrieves specific LLM parameters based on the nature of the task.

    This function returns parameter sets optimized for either creative or technical tasks.
    Creative tasks benefit from higher randomness, while technical tasks require more focus and precision.
    A default parameter set is returned for unrecognized task types.

    Parameters:
    - task (str): The nature of the task ('creative' or 'technical').

    Returns:
    - dict: A dictionary containing 'top_p' and 'temperature' settings appropriate for the task.
    """
    ### START CODE HERE ###
    # Define the parameter sets for technical and creative tasks
    PARAMETERS_DICT = {
        "creative": {"top_p": None, 'temperature': None},
        "technical": {'top_p': None, 'temperature': None}
    }
    
    # Return the corresponding parameter set based on task type
    if task == 'technical':
        param_dict = {"top_p": 0.4, "temperature": 0.4}
    elif task == 'creative':
        param_dict = {"top_p": 0.8, "temperature": 1.5}
    else:
        # Fallback to a default parameter set for unrecognized task types
        param_dict = {"top_p": 0.5, "temperature": 1.0}
    ### END CODE HERE ###
    
    return param_dict


In [23]:
get_params_for_task("technical")

{'top_p': 0.4, 'temperature': 0.4}

**Expected Output (results may vary depending on the values you chose for top_p and temperature)**
```
{'top_p': 0.7, 'temperature': 0.3}
```

In [24]:
unittests.test_get_params_for_task(get_params_for_task)

<a id='4'></a>
## 4 - Retrieving Items Based on Metadata Inferred from a Query
---
In this section, you’ll create a function to extract useful metadata to help filter the items shown to it. You’ll get a JSON file with different features and all the possible values found in the dataset. Your job is to pass these values to your database, so the LLM can pick the ones that make the most sense. And of course, you'll also need to handle situations where the LLM might not find a correct value.

The values you’ll focus on are:
- gender  
- masterCategory  
- articleType  
- baseColour  
- season  
- usage

These were chosen because they strike a good balance — they’re specific enough to be useful, but general enough to avoid empty results. Some other features in the dataset are too detailed and could lead to no matches. Also, including every single value would make the prompt too large, which could slow things down and raise costs — something to keep in mind when building real-world solutions!


In [11]:
# Let's remember the data structure of a product
PRODUCTS_DATA[0]

{'gender': 'Men',
 'masterCategory': 'Apparel',
 'subCategory': 'Topwear',
 'articleType': 'Shirts',
 'baseColour': 'Navy Blue',
 'season': 'Fall',
 'year': 2011.0,
 'usage': 'Casual',
 'productDisplayName': 'Turtle Check Men Navy Blue Shirt',
 'price': 67,
 'product_id': 15970}

In [12]:
# Run this cell to generate the dictionary with the possible values for each key
values = {}
for d in PRODUCTS_DATA:
    for key, val in d.items():
        if key in ('product_id', 'price', 'productDisplayName', 'subCategory', 'year'):
            continue
        if key not in values.keys():
            values[key] = set()
        values[key].add(val)

In [30]:
# Example of possible values for the feature 'season'
values['season']

{'All seasons', 'Fall', 'Spring', 'Summer', 'Winter'}

<a id='5-1'></a>

<a id='4-1'></a>
### 4.1 Generate metadata

<a id='ex08'></a>

<a id='ex05'></a>
### Exercise 5
---
The next function’s purpose is to extract potential metadata from a given query. The approach is to construct a prompt that incorporates the `values` dictionary, which lists possible feature values. the LLM is then asked to generate a JSON response suggesting metadata relevant to the query. You have the flexibility to add more information as needed.

In addition to the metadata features, the LLM must also handle price constraints. If the query specifies a price range, the JSON should include a key like this:

```json
"price": {"min": min_value, "max": max_value}
```

If no price constraint is provided, the LLM should default to:

```json
"price": {"min": 0, "max": "inf"}
```

Here is an example of the expected JSON format you should explicitly include in your prompt to help guide the LLM:

```json
{
    "gender": ["Women"],
    "masterCategory": ["Apparel"],
    "articleType": ["Dresses"],
    "baseColour": ["Blue"],
    "price": {"min": 0, "max": "inf"},
    "usage": ["Formal"],
    "season": ["All seasons"]
}
```

**Important Note**: When using f-strings in Python, you must use double curly braces within the string to ensure it is parsed as a literal. For example:

```python
f"""Any text here {{
    "gender": ["Women"],
    "masterCategory": ["Apparel"],
    "articleType": ["Dresses"],
    "baseColour": ["Blue"],
    "price": {{"min": 0, "max": "inf"}},
    "usage": ["Formal"],
    "season": ["All seasons"]
}}"""
```

**Note**: To avoid truncating a JSON, set the `max_tokens` value for something around `1500`!

Always remember to use double curly braces within an f-string to ensure that Python interprets them correctly as part of the string.

<details>
    <summary style="color: green;"><strong>Hint 1</strong></summary>
    When constructing the prompt, focus on explicitly instructing the LLM to output a JSON format that includes specific metadata keys. Ensure that the prompt clearly includes the user's query and describes what the model should focus on extracting from it. You can start with something like:
    <pre>
    <code>
    One query will be provided. For the given query, there will be a call on vector database to query relevant clothing items. 
    Generate a JSON with useful metadata to filter the products in the query. Possible values for each feature is in the following JSON: {values}
    </code>
    </pre>
</details>

<details>
    <summary style="color: green;"><strong>Hint 2</strong></summary>
    Incorporate the `values` within your prompt to inform the LLM of the potential values for each feature. Explain to the LLM the purpose of generating metadata and be explicit about including keys like gender, masterCategory, articleType, baseColour, price, usage, and season. Your prompt should contain a text like this:
    <pre>
    <code>
    Provide a JSON with the features that best fit in the query (can be more than one, write in a list). Also, if present, add a price key, saying if there is a price range (between values, greater than or smaller than some value).
    Only return the JSON, nothing more. price key must be a JSON with "min" and "max" values (0 if no lower bound and inf if no upper bound). 
    Always include gender, masterCategory, articleType, baseColour, price, usage and season as keys. All values must be within lists.
    If there is no price set, add min = 0 and max = inf.
    Only include values that are given in the JSON above. 
    </code>
    </pre>
</details>

<details>
    <summary style="color: green;"><strong>Hint 3</strong></summary>
    In your prompt, make it clear that the `price` key in the output JSON must be structured as a dictionary with "min" and "max" values. Include instructions for handling cases with unspecified price constraints, defaulting to {"min": 0, "max": "inf"}. Reinforce that each feature's value should be within a list form, and provide an example JSON to demonstrate the desired format clearly. An expected JSON structure might be something like:
    <pre>
    <code>
    Example of expected JSON:
    {{
    "gender": ["Women"],
    "masterCategory": ["Apparel"],
    "articleType": ["Dresses"],
    "baseColour": ["Blue"],
    "price": {{"min": 0, "max": "inf"}},
    "usage": ["Formal"],
    "season": ["All seasons"]
    }}
    </code>
    </pre>

Don't forget to add the user query at the end!
</details>


<details>
    <summary style="color: green;"><strong>Prompt Example</strong></summary>
    <p>Here there is a simple prompt example where you can start working with. You may add more examples!</p>
    <pre>
    <code>
    A query will be provided. Based on this query, a vector database will be searched to find relevant clothing items.
    Generate a JSON object containing useful metadata to filter products for this query.
    The possible values for each feature are given in the following JSON: {values}

    Provide a JSON containing the features that best match the query (values should be in lists, multiple values possible).
    If a price range is mentioned, include a price key specifying the range (between values, greater than, or less than).
    Return only the JSON, nothing else. The price key must be a JSON object with "min" and "max" values (use 0 if no lower bound, and "inf" if no upper bound).
    Always include the following keys: gender, masterCategory, articleType, baseColour, price, usage, and season.
    If no price is specified, set min = 0 and max = inf.
    Include only values present in the JSON above.

    Example of expected JSON:

    {{
      "gender": ["Women"],
      "masterCategory": ["Apparel"],
      "articleType": ["Dresses"],
      "baseColour": ["Blue"],
      "price": {{"min": 0, "max": "inf"}},
      "usage": ["Formal"],
      "season": ["All seasons"]
    }}

    Query: {query}

</details>

In [16]:
# GRADED CELL

def generate_metadata_from_query(query: str) -> str:
    """
    Generates metadata in JSON format based on a given query to filter clothing items.

    This function constructs a prompt for an LLM to produce a JSON object
    that will guide filtering in a vector database query for clothing items.
    It uses possible values from a predefined set and ensures that only relevant metadata
    is included in the output JSON.

    Parameters:
    - query (str): A description of specific clothing-related needs.

    Returns:
    - str: A JSON string representing metadata with keys such as gender, masterCategory,
      articleType, baseColour, price, usage, and season. Each value in the JSON is a list.
      The price is specified as a dictionary with "min" and "max" keys.
      For unrestricted categories, use ["Any"], and if no price is specified,
      default to {"min": 0, "max": "inf"}.
    """
    ### START CODE HERE ### 

    # Construct the prompt.
    # Include the query, the desired JSON format, and the possible values (pass {values} where needed).
    # Clearly instruct the LLM to include gender, masterCategory, articleType, baseColour, price, usage, and season as keys.
    # Specify that the price key must be a JSON object with "min" and "max" values (0 if no lower bound, "inf" if no upper bound).
    # If no price is set, default to min = None
    
    prompt =  f"""
    One query will be provided. For the given query, there will be a call on vector database to query relevant clothing items. 
    Generate a JSON with useful metadata to filter the products in the query. Return only the JSON, nothing else. The price key must be a JSON object with "min" and "max" values (use 0 if no lower bound, and "inf" if no upper bound).
    Always include the following keys: gender, masterCategory, articleType, baseColour, price, usage, and season.
    If no price is specified, set min = 0 and max = inf.
    
    Include only values present in the JSON above. Possible values for each feature is in the following JSON:
    
    {{
    "gender": ["Women"],
    "masterCategory": ["Apparel"],
    "articleType": ["Dresses"],
    "baseColour": ["Blue"],
    "price": {{"min": 0, "max": "inf"}},
    "usage": ["Formal"],
    "season": ["All seasons"]
    }}
    
    Query: {query}
    """

    # Generate the response with generate_with_single_input using PROMPT, temperature=0 (low randomness), and max_tokens=1500
    kwargs =  generate_params_dict(prompt, temperature=0.0, top_p=0.4, max_tokens=1500)
    response = generate_with_single_input(**kwargs)
    

    # Extract the content from the response
    content = response['content']

    ### END CODE HERE ###
    
    return content

In [18]:
print(generate_metadata_from_query("Create a look for a man that suits a sunny day in the park. I don't want to spend more than 300 dollars on each piece."))

{
    "gender": "Men",
    "masterCategory": "Apparel",
    "articleType": ["Shirts", "Pants"],
    "baseColour": ["Light", "Pastel"],
    "price": {"min": 0, "max": 300},
    "usage": ["Casual"],
    "season": ["Summer"]
}


**Expected Output (result may vary)**
```
{
    "gender": ["Men"],
    "masterCategory": ["Apparel"],
    "articleType": ["Tshirts", "Shorts", "Sweatshirts"],
    "baseColour": ["Yellow", "Orange", "White"],
    "price": {"min": 0, "max": 300},
    "usage": ["Casual"],
    "season": ["Summer"]
}
```

In [19]:
unittests.test_generate_metadata_from_query(generate_metadata_from_query)

[92m All tests passed!


The next functions are helper functions to extract the JSON from the query. You also need to handle the case where the LLM doesn't provide a valid and recoverable JSON. In this case, the code will just create an empty filter.

In [20]:
def parse_json_output(llm_output: str) -> dict:
    """
    Parses a string output from an LLM into a JSON object.

    This function attempts to clean and parse a JSON-formatted string produced by an LLM.
    The input string might contain minor formatting issues, such as unnecessary newlines or single quotes
    instead of double quotes. The function attempts to correct such issues before parsing.

    Parameters:
    - llm_output (str): The string output from the LLM that is expected to be in JSON format.

    Returns:
    - dict or None: A dictionary if parsing is successful, or None if the input string cannot be parsed into valid JSON.

    Exception Handling:
    - In case of a JSONDecodeError during parsing, an error message is printed, and the function returns None.
    """
    try:
        # Since the input might be improperly formatted, ensure any single quotes are removed
        llm_output = llm_output.replace("\n", '').replace("'",'').replace("}}", "}").replace("{{", "{")  # Remove any erroneous structures
        
        # Attempt to parse JSON directly provided it is a properly-structured JSON string
        parsed_json = json.loads(llm_output)
        return parsed_json
    except json.JSONDecodeError as e:
        print(f"JSON parsing failed: {e}")
        return None

In [22]:
json_string = generate_metadata_from_query("Give me three blue dresses suitable for a wedding party, less than 200 dollars and at least 50 dollars")
json_output = parse_json_output(json_string)

In [23]:
json_output

{'gender': ['Women'],
 'masterCategory': ['Apparel'],
 'articleType': ['Dresses'],
 'baseColour': ['Blue'],
 'price': {'min': 50, 'max': 200},
 'usage': ['Formal'],
 'season': ['All seasons']}

**TIP**: Try with different queries and check if the JSON is properly parsed. If not, investigate why and maybe improve the PROMPT to avoid such issue.

<a id='4-2'></a>
### 4.2 Loading the Weaviate Product Collection

Now it is time to work with the Weaviate collection. It is already given to you and it is the product_data you saw before, but added as a Weaviate collection, so we can query with semantic search and metadata filtering.

In [25]:
products_collection = client.collections.get('products')

In [26]:
len(products_collection)

44423

<a id='4-3'></a>
### 4.3 Filtering by metadata (NOT GRADED)

This next function will create the filters given the metadata. It will create a `Filter` object for each key in the dictionary of metadata. 

In [56]:
def get_filter_by_metadata(json_output: dict | None = None):
    """
    Generate a list of Weaviate filters based on a provided metadata dictionary.

    Parameters:
    - json_output (dict) or None: Dictionary containing metadata keys and their values.

    Returns:
    - list[Filter] or None: A list of Weaviate filters, or None if input is None.
    """
    # If the input dictionary is None, return None immediately
    if json_output is None:
        return None

    # Define a tuple of valid keys that are allowed for filtering
    valid_keys = (
        'gender',
        'masterCategory',
        'articleType',
        'baseColour',
        'price',
        'usage',
        'season',
    )

    # Initialize an empty list to store the filters
    filters = []

    # Iterate over each key-value pair in the input dictionary
    for key, value in json_output.items():
        # Skip the key if it is not in the list of valid keys
        if key not in valid_keys:
            continue

        # Special handling for the 'price' key
        if key == 'price':
            # Ensure the value associated with 'price' is a dictionary
            if not isinstance(value, dict):
                continue

            # Extract the minimum and maximum prices from the dictionary
            min_price = value.get('min')
            max_price = value.get('max')

            # Skip if either min_price or max_price is not provided
            if min_price is None or max_price is None:
                continue

            # Skip if min_price is non-positive or max_price is infinity
            if min_price <= 0 or max_price == 'inf':
                continue

            # Add filters for price greater than min_price and less than max_price
            filters.append(Filter.by_property(key).greater_than(min_price))
            filters.append(Filter.by_property(key).less_than(max_price))
        else:
            # For other valid keys, add a filter that checks for any of the provided values
            filters.append(Filter.by_property(key).contains_any(value))

    return filters

This is wrapper function, that, given a query, return the desired filters.

In [57]:
def generate_filters_from_query(query: str) -> list:
    json_string = generate_metadata_from_query(query)
    json_output = parse_json_output(json_string)
    filters = get_filter_by_metadata(json_output)
    return filters

In [58]:
filters = generate_filters_from_query("Give me three T-shirts to use in sunny days")

In [59]:
filters

[_FilterValue(value=['Men', 'Women'], operator=<_Operator.CONTAINS_ANY: 'ContainsAny'>, target='gender'),
 _FilterValue(value=['Apparel'], operator=<_Operator.CONTAINS_ANY: 'ContainsAny'>, target='masterCategory'),
 _FilterValue(value=['T-shirts'], operator=<_Operator.CONTAINS_ANY: 'ContainsAny'>, target='articleType'),
 _FilterValue(value=['White', 'Light Blue', 'Yellow'], operator=<_Operator.CONTAINS_ANY: 'ContainsAny'>, target='baseColour'),
 _FilterValue(value=['Casual', 'Sunny days'], operator=<_Operator.CONTAINS_ANY: 'ContainsAny'>, target='usage'),
 _FilterValue(value=['Summer'], operator=<_Operator.CONTAINS_ANY: 'ContainsAny'>, target='season')]

`\Note that the filters are there with the correct metadata.

The next function will get the relevant products from the query, by generating the filters, running a semantic search using the query, and then perform the metadata filtering to narrow down the possibilities and increase accuracy. 

It deals with the case where the set of metadata returns too few results by incrementally removing some filters until it gets a result with more than 5 possibilities.

In [60]:
def get_relevant_products_from_query(query: str):
    """
    Retrieve products that are most relevant to a given query by applying filters.

    This function generates filters based on the provided query and uses them to find 
    products that closely match the query criteria. If no filters are applicable or if 
    the initial search returns a small number of products, the function dynamically reduces 
    the filtering constraints based on a predefined order of filter importance.

    Parameters:
    query (str): The query string used to search for relevant products.

    Returns:
    list: A list of product objects that are most relevant to the query. If filters are not effective,
          it adjusts them to ensure a minimum return of products.
    """
    filters = generate_filters_from_query(query)  # Generate filters based on query

    # Check if there are no applicable filters
    if filters is None or len(filters) == 0:
        # Query the collection without filters, using the query text for relevance
        res = products_collection.query.near_text(query, limit=20).objects
        return res

    # Query with filters and limit to top 20 relevant objects
    res = products_collection.query.near_text(query, filters=Filter.all_of(filters), limit=20).objects

    # If the result set is fewer than 10 products, try reducing filters to broaden the search
    importance_order = ['baseColour', 'masterCategory', 'usage', 'masterCategory', 'season', 'gender']

    if len(res) < 10:
        # Iterate through the importance order of filters
        for i in range(len(importance_order)):
            # Create a list of filters that excludes less important ones
            filtered_filters = [x for x in filters if x.target not in importance_order[i+1:]]
            
            # Re-query with the reduced set of filters
            res = products_collection.query.near_text(query, filters=Filter.all_of(filtered_filters), limit=20).objects
            
            # If sufficient products have been found, return early
            if len(res) >= 5:
                return res

    return res  # Return the final set of relevant products

In [61]:
query = "Give me three T-shirts to use in sunny days"

In [65]:
t = get_relevant_products_from_query("Give me three T-shirts to use in sunny days")

In [64]:
t = get_relevant_products_from_query("Give me three blue colour tshirts for men between £100 and £300")

TypeError: object of type 'NoneType' has no len()

In [66]:
print(t)

[]


In [52]:
t[0].properties

IndexError: list index out of range

So, one of the relevant results is indeed a Tshirt! 

<a id='4-4'></a>
### 4.4 Generating the retrieve items as a context (NOT GRADED)

Now, for the given retrieved items, let's generate a simple context in the format 

```
Product name: Inkfruit Mens Little Bit More T-shirt. Product Category: Apparel. Product usage: Casual. Product gender: Men. Product Type: Tshirts. Product Category: Topwear Product Color: Yellow. Product Season: Summer. Product Year: 2011.
```

In [None]:
def generate_items_context(results: list) -> str:
    """
    Compile detailed product information from a list of result objects into a formatted string.

    This function takes a list of results, each containing various product attributes, and constructs 
    a human-readable summary for each product. Each product's details, including ID, name, category, 
    usage, gender, type, and other characteristics, are concatenated into a string that describes 
    all products in the list.

    Parameters:
    results (list): A list of result objects, each having a `properties` attribute that is a dictionary 
                    containing product attributes such as 'product_id', 'productDisplayName', 
                    'masterCategory', 'usage', 'gender', 'articleType', 'subCategory', 
                    'baseColour', 'season', and 'year'.

    Returns:
    str: A multi-line string where each line contains the formatted details of a single product.
         Each product detail includes the product ID, name, category, usage, gender, type, color, 
         season, and year.
    """
    t = ""  # Initialize an empty string to accumulate product information

    for item in results:  # Iterate through each item in the results list
        item = item.properties  # Access the properties dictionary of the current item

        # Append formatted product details to the output string
        t += (
            f"Product ID: {item['product_id']}. "
            f"Product name: {item['productDisplayName']}. "
            f"Product Category: {item['masterCategory']}. "
            f"Product usage: {item['usage']}. "
            f"Product gender: {item['gender']}. "
            f"Product Type: {item['articleType']}. "
            f"Product Category: {item['subCategory']} "
            f"Product Color: {item['baseColour']}. "
            f"Product Season: {item['season']}. "
            f"Product Year: {item['year']}.\n"
        )

    return t  # Return the complete formatted string with product details

In [None]:
print(generate_items_context(t)[:1000])

<a id='4-5'></a>
### 4.5 Query on Products (NOT GRADED)

You’re almost there! This section explains the function that queries products based on a given task. The process follows these steps:

1. **Query**: Start with a product query.
2. **Determine Query Nature**: Identify if the query is technical or creative.
3. **Retrieve Relevant Products**: Find products that best match the query criteria.
4. **Generate Context**: Build a descriptive context string based on the products.
5. **Create Prompt**: Formulate the prompt using the context and the query nature.
6. **Generate Parameters**: Prepare parameters suited to the query nature for the LLM.
7. **Run Inference**: Perform the inference using the prepared parameters.

Let's check the code!


In [None]:
def query_on_products(query: str) -> dict:
    """
    Execute a product query process to generate a response based on the nature of the query.

    This function analyzes the type of query — whether it is technical or creative — and retrieves 
    relevant product information accordingly. It constructs a prompt that includes product details 
    and the original query, and then generates parameters for querying an LLM.
    Finally, it generates a response based on the prompt and returns the content of the response.

    Parameters:
    query (str): The input query string that needs to be analyzed and answered using product data.

    Returns:
    dict: A dictionary of keyword arguments (`kwargs`) containing the prompt and additional settings 
          for creating a response, suitable for input to an LLM or other processing system.

    Outputs:
    dict: A dictionary with the parameters to call an LLM
    """


    # Determine if the query is technical or creative in nature
    query_label = decide_task_nature(query) 
    
    # Obtain necessary parameters based on the query type
    parameters_dict = get_params_for_task(query_label) 
    
    # Retrieve products that are relevant to the query
    relevant_products = get_relevant_products_from_query(query) 
     
    # Create a context string from the relevant products
    context = generate_items_context(relevant_products) 

    # Construct a prompt including product details and the query. Remember to add the context and the query in the prompt, also, ask the LLM to provide the product ID in the answer
    prompt = (
    f"Given the available set of cloth products, answer the question that follows, providing the item ID in your answers. "
    f"Other information might be provided but not necessarily all of them; pick only the relevant ones for the given query and avoid being too long when describing the items' features. "
    f"If no number of products is mentioned in the query, select at most five to show. "
    f"CLOTH PRODUCTS AVAILABLE: {context} "
    f"QUERY: {query}"
        )
    
    # Generate kwargs (parameters dict) for parameterized input to the LLM with , Prompt, role = 'assistant' and **parameters_dict
    kwargs = generate_params_dict(prompt, role='assistant', **parameters_dict)
    
    
    return kwargs

In [None]:
kwargs = query_on_products('Make a wonderful look for a man attending a wedding party happening during night.')

In [None]:
result = generate_with_single_input(**kwargs)
print(result['content'])

In [None]:
kwargs = query_on_products('Give me three T-shirts for sunny days')

In [None]:
result = generate_with_single_input(**kwargs)
print(result['content'])

<a id='5'></a>
## 5 - The Final Function!

---

<a id='5-1'></a>
### 5.1 The function to rule them all

Now it’s time to bring everything together into a single function!

This function will:

1. Check if the query is related to an FAQ or a Product.
2. If it’s an FAQ, run the FAQ-related workflow.
3. If it’s a Product, run the Product-related workflow.

It returns the kwargs dictionary containing the appropriate arguments.

In [None]:
def answer_query(query: str) -> dict:
    """
    Determines the type of a given query (FAQ or Product) and executes the appropriate workflow.

    Parameters:
    - query (str): The user's query string.

    Returns:
    - dict: A dictionary of keyword arguments to be used for further processing.
      If the query is neither FAQ nor Product-related, returns a default response dictionary
      instructing the assistant to answer based on existing context.
    """
    label = check_if_faq_or_product(query)
    if label not in ['FAQ', 'Product']:
        return {
            "role": "assistant",
            "prompt": f"User provided a question that does not fit FAQ or Product related questions. "
                      f"Answer it based on the context you already have so far. Query provided by the user: {query}"
        }
    if label == 'FAQ':
        kwargs = query_on_faq(query)
    if label == 'Product':
        try:
            kwargs = query_on_products(query)
        except:
            return {
            "role": "assistant",
            "prompt": f"User provided a question that broke the querying system. Instruct them to rephrase it."
                      f"Answer it based on the context you already have so far. Query provided by the user: {query}"
        }
            
    return kwargs

In [None]:
kwargs = answer_query("What are your working hours?")

In [None]:
result = generate_with_single_input(**kwargs)
print(result['content'])

<a id='5-2'></a>
### 5.2 The ChatBot

Now you can implement the ChatBot! It is already given to you, as it is not the focus of this course or assignment, but feel free to inspect the utils.py file to understand how it works (and improve it as you wish!)

Suggested queries:

- Do you have blue t-shirts on your catalogue?
- I bought a dress and I didn't like it. How can I get a refund?
- I am going to a party at the beach. Can you suggest a nice look for me? It will be a warm night, and I’m a man.

In [None]:
chat_widget = ChatWidget(generator_function = answer_query)

Congratulations! You made your ChatBot using RAG techniques!