# **Retrieval-Augmented Generation (RAG)**

# 🛍️ Fashion Forward Hub – Intelligent Query System

This notebook demonstrates a complete pipeline for building a **fashion retail assistant** powered by **LLMs + Weaviate vector database**.  
It integrates multiple components to understand and answer customer queries in natural language:

- **FAQ Handling** – Answers store-related questions (refunds, returns, shipping, etc.) from a curated FAQ database.  
- **Product Query Handling** – Retrieves relevant fashion products from the product database using semantic search + metadata filtering.  
- **Task Routing** – Classifies queries as **FAQ** or **Product**, and further identifies whether product queries are **technical** (facts, availability, prices) or **creative** (style/look suggestions).  
- **Contextual Prompt Generation** – Builds tailored prompts with product context and FAQ content to guide the LLM.  
- **Dynamic Parameters** – Adjusts `temperature` and `top_p` based on query type (creative vs. technical).  

🔑 **End Goal:** Provide a natural, conversational, and accurate shopping assistant experience by seamlessly combining FAQs, product catalogs, and LLM-driven reasoning.


In [1]:
import json
from weaviate.classes.query import Filter
import weaviate
import joblib

In [2]:
from utils import (
    generate_with_single_input,
    generate_params_dict
)

In [3]:
import flask_app

 * Serving Flask app 'flask_app'
 * Debug mode: off


In [4]:
import weaviate_server

## `generate_params_dict` function

This function will be used to generate a dictionary of parameters:

In [5]:
generate_params_dict?

[31mSignature:[39m
generate_params_dict(
    prompt: str,
    temperature: float = [38;5;28;01mNone[39;00m,
    role=[33m'user'[39m,
    top_p: float = [38;5;28;01mNone[39;00m,
    max_tokens: int = [32m500[39m,
    model: str = [33m'meta-llama/Llama-3.2-3B-Instruct-Turbo'[39m,
)
[31mDocstring:[39m
Call an LLM with different sampling parameters to observe their effects.

Args:
    prompt: The text prompt to send to the model
    temperature: Controls randomness (lower = more deterministic)
    top_p: Controls diversity via nucleus sampling
    max_tokens: Maximum number of tokens to generate
    model: The model to use
    
Returns:
    The LLM response
[31mFile:[39m      ~/work/utils.py
[31mType:[39m      function

In [6]:
kwargs = generate_params_dict(prompt="Solve x^2 - 1 = 0", temperature=1.2, top_p=0.2)
print(kwargs)

{'prompt': 'Solve x^2 - 1 = 0', 'role': 'user', 'temperature': 1.2, 'top_p': 0.2, 'max_tokens': 500, 'model': 'meta-llama/Llama-3.2-3B-Instruct-Turbo'}


In [7]:
response = generate_with_single_input(**kwargs)
print(response['content'])

To solve the equation x^2 - 1 = 0, we can start by adding 1 to both sides:

x^2 - 1 + 1 = 0 + 1

This simplifies to:

x^2 = 1

Next, we can take the square root of both sides:

x = √1

x = ±1

So, the solutions to the equation x^2 - 1 = 0 are x = 1 and x = -1.


## Understanding Fashion Forward Hub data schema

There are two databases:

- Product database: Contains the products and their information.
- FAQ database: Contains the FAQ data.

### Products Database

In [8]:
# Loading products data
PRODUCTS_DATA = joblib.load('dataset/clothes_json.joblib')

In [9]:
PRODUCTS_DATA[0]

{'gender': 'Men',
 'masterCategory': 'Apparel',
 'subCategory': 'Topwear',
 'articleType': 'Shirts',
 'baseColour': 'Navy Blue',
 'season': 'Fall',
 'year': 2011.0,
 'usage': 'Casual',
 'productDisplayName': 'Turtle Check Men Navy Blue Shirt',
 'price': 67,
 'product_id': 15970}

The features each product has are:

- **Gender:** Target audience for the product, such as "Men," "Women," or "Unisex."
- **Master Category:** Broad classification like "Apparel" or "Footwear."
- **Sub Category:** Specific category within a master category, such as "Topwear."
- **Article Type:** Exact type of product, e.g., "Shirts" or "Jackets."
- **Base Colour:** Main color of the product, important for customer choice.
- **Season:** Intended season for the product, e.g., "Summer" or "Winter."
- **Year:** Year of release or collection.
- **Usage:** Intended use or occasion, like "Casual" or "Formal."
- **Product Display Name:** Descriptive name used in marketing.
- **Price:** Cost of the product.
- **Product ID:** Unique identifier for managing and tracking inventory.

### FAQ Database

In [10]:
FAQ = joblib.load("dataset/faq.joblib")

In [11]:
FAQ[:2]

[{'question': 'What are your store hours?',
  'answer': 'Our online store is open 24/7. Customer service is available from 9:00 AM to 6:00 PM, Monday through Friday.',
  'type': 'general information'},
 {'question': 'Where is Fashion Forward Hub located?',
  'answer': 'Fashion Forward Hub is primarily an online store. Our corporate office is located at 123 Fashion Lane, Trend City, Style State.',
  'type': 'general information'}]

So the FAQs are in a list with dictionaries containing `question`, `answer` and `type`. we will work with the FAQ as a hardcoded string into a prompt, so we won't need to have a collection for querying on it.

In [12]:
len(FAQ)

25

In [13]:
len(PRODUCTS_DATA)

44424

## Task routing

### Deciding if a query is FAQ or Product related

In [14]:
def check_if_faq_or_product(query: str) -> str:
    """
    Determines whether a given instruction prompt is related to a frequently asked question (FAQ) or a product inquiry.

    Parameters:
    - query (str): The instruction or query to be labeled as either FAQ or product-related.

    Returns:
    - str: The label 'FAQ' if the prompt is classified as a frequently asked question, 'Product' if it relates to product information, or
      None if the label is inconclusive.
    """

    prompt = f"""
You are a text classification assistant. 
Your task is to label the following instruction as either 'FAQ' or 'Product'. 

Definitions:
- Product: Queries that ask about specific clothes, their features, prices, colors, availability, or details related to purchasing products.
- FAQ: Queries that ask about store policies, refunds, returns, shipping, sizing help, or other general information not tied to a specific product.

Examples:
1. "Is there a refund for incorrectly bought clothes?": FAQ
2. "Tell me about the cheapest T-shirts that you have.": Product
3. "Do you have blue T-shirts under 100 dollars?": Product
4. "I bought a T-shirt and I didn't like it. How can I get a refund?": FAQ
5. "What sizes are available in jackets?": Product
6. "How long does shipping usually take?": FAQ

Instructions:
- Return only ONE word: either 'FAQ' or 'Product'.
- Do not include explanations, punctuation, or extra words.

Instruction: {query}
Answer:
"""
    kwargs = generate_params_dict(prompt, temperature=0)

    response = generate_with_single_input(**kwargs)

    label = response['content']
    
    return label

In [15]:
queries = ['What is your return policy?', 
           'Give me three examples of blue T-shirts you have available.', 
           'How can I contact the user support?', 
           'Do you have blue Dresses?',
           'Create a look suitable for a wedding party happening during dawn.']

for query in queries:
    response = check_if_faq_or_product(query)
    label = response
    print(f"Query: {query} Label: {label}")

Query: What is your return policy? Label: FAQ
Query: Give me three examples of blue T-shirts you have available. Label: Product
Query: How can I contact the user support? Label: FAQ
Query: Do you have blue Dresses? Label: Product
Query: Create a look suitable for a wedding party happening during dawn. Label: Product


### Answering a FAQ question

We have a method to decide whether a query is for FAQ or Product, we will create another function to answer a FAQ question.

This function also needs a hardcoded prompt and the FAQ question and answer pairs. For that, we will create a FAQ layout with these pairs. 

In [16]:
FAQ[0]

{'question': 'What are your store hours?',
 'answer': 'Our online store is open 24/7. Customer service is available from 9:00 AM to 6:00 PM, Monday through Friday.',
 'type': 'general information'}

#### Creating the FAQ Layout

In [17]:
def generate_faq_layout(faq_dict: list) -> str:
    t = ""

    # Iterate over every FAQ question in the FAQ list
    for f in faq_dict:
        t += f"Question: {f['question']} Answer: {f['answer']} Type: {f['type']}\n" 
  
    return t

In [18]:
FAQ_LAYOUT = generate_faq_layout(FAQ)
print(FAQ_LAYOUT[:1000])

Question: What are your store hours? Answer: Our online store is open 24/7. Customer service is available from 9:00 AM to 6:00 PM, Monday through Friday. Type: general information
Question: Where is Fashion Forward Hub located? Answer: Fashion Forward Hub is primarily an online store. Our corporate office is located at 123 Fashion Lane, Trend City, Style State. Type: general information
Question: Do you have a physical store location? Answer: At this time, we operate exclusively online. This allows us to offer a broader selection and lower prices directly to you. Type: general information
Question: How can I create an account with Fashion Forward Hub? Answer: Click on 'Sign Up' in the top right corner of our website and follow the instructions to set up your account. Type: general information
Question: How do I subscribe to your newsletter? Answer: To receive the latest updates and promotions, sign up for our newsletter at the bottom of our homepage. Type: general information
Question:

In [19]:
def query_on_faq(query: str, **kwargs) -> dict:
    """
    Constructs a prompt to query an FAQ system.

    Parameters:
    - query (str): The query about which the function seeks to provide an answer from the FAQ.
    - **kwargs: Optional keyword arguments for extra configuration of prompt parameters.
    """
    
    prompt = f"""
You are an assistant that answers customer questions using only the provided FAQ content. 

Instructions:
- Use only the information from the FAQ to answer the question.
- If multiple FAQ entries are relevant, combine them into a single helpful answer.
- Do not mention the FAQ or that the answer is coming from an FAQ.
- Be clear and concise, but cover all relevant details.

<FAQ>
{FAQ_LAYOUT}
</FAQ>

Question: {query}
Answer:
""" 

    kwargs = generate_params_dict(prompt, **kwargs)
    
    return kwargs

In [20]:
kwargs = query_on_faq("I got my cloth but I didn't like it. How can I return it?")
content = generate_with_single_input(**kwargs)
print(content['content'])

To initiate a return, go to our Returns Center and select the item you wish to exchange or return. You will need to provide a reason for the return and print a return shipping label. Once we receive the item, we will process your return within 5-7 business days. Please note that sale items are final sale and cannot be returned or exchanged, unless stated otherwise.


### Decide the Nature of a Product-Related Question

Now, let's start working with product-related queries.

- **Technical queries** – asking for descriptions of specific products, such as whether a blue dress is available or requesting three examples of red T-shirts suitable for sunny days.
- **Creative queries** – asking for help creating a stylish look for visiting a museum.

In [21]:
def decide_task_nature(query: str) -> str:
    """
    Determines whether a query is creative or technical.

    This function constructs a prompt for an LLM to decide if a given query requires a creative response,
    such as making suggestions or composing ideas, or a technical response, such as providing product details or prices.

    Parameters:
    - query (str): The query to be evaluated for its nature.

    Returns:
    - str: The label 'creative' if the query requires creative input, or 'technical' if it requires technical information.
    """

    prompt = f"""
Decide if the following query is a query that requires creativity (creating, composing, making new things) 
or technical (information about products, availability, descriptions, or prices). 

Definitions:
- Creative: requires imagination or styling advice (e.g., suggesting looks, composing outfits, matching accessories).
- Technical: requests factual product information (availability, price, catalog items, counts, colors, sizes).

Label it strictly as either "creative" or "technical".

Examples:
- "Give me suggestions on a nice look for a nightclub." → creative
- "What are the blue dresses you have available?" → technical
- "Give me three T-shirts for summer." → technical
- "Give me a look for attending a wedding party." → creative
- "Suggest a stylish outfit for visiting a museum." → creative
- "Do you have red jackets under 50 dollars?" → technical

Query to be analyzed: {query}

Only output one word: "creative" or "technical".
"""


    # Build kwargs for strict behavior
    kwargs = generate_params_dict(prompt, temperature=0, max_tokens=1)

    # Call the model
    response = generate_with_single_input(**kwargs)

    # Extract and clean label
    label = response['content'].strip().lower()
    
    return label

In [22]:
queries = ["Give me two sneakers with vibrant colors.",
           "What are the most expensive clothes you have in your catalogue?",
           "I have a green dress and I like a suggestion on an accessory to match with it.",
           "Give me three trousers with vibrant colors you have in your catalogue.",
           "Create a look for a woman walking in a park on a sunny day. It must be fresh due to hot weather."
           ]


for query in queries:
    label = decide_task_nature(query)
    print(f"Query: {query} Label: {label}")
    print('-'*80)

Query: Give me two sneakers with vibrant colors. Label: technical
--------------------------------------------------------------------------------
Query: What are the most expensive clothes you have in your catalogue? Label: technical
--------------------------------------------------------------------------------
Query: I have a green dress and I like a suggestion on an accessory to match with it. Label: creative
--------------------------------------------------------------------------------
Query: Give me three trousers with vibrant colors you have in your catalogue. Label: technical
--------------------------------------------------------------------------------
Query: Create a look for a woman walking in a park on a sunny day. It must be fresh due to hot weather. Label: creative
--------------------------------------------------------------------------------


### Retrieving the Parameters for a Given Task

A function that, given a task, returns the appropriate values for `top_p` and `temperature`.

For **technical** queries, **low randomness is preferred**, whereas for **creative** tasks, **higher randomness might be more suitable**. 

In [23]:
def get_params_for_task(task: str) -> dict:
    """
    Retrieves specific LLM parameters based on the nature of the task.

    Creative tasks benefit from higher randomness, while technical tasks 
    require more focus and precision. A default parameter set is returned 
    for unrecognized task types.
    """

    # Define the parameter sets for technical and creative tasks
    PARAMETERS_DICT = {
        "creative": {"top_p": 0.9, "temperature": 1.0},
        "technical": {"top_p": 0.7, "temperature": 0.3}
    }

    # Return the corresponding parameter set based on task type
    if task == "technical":
        param_dict = PARAMETERS_DICT["technical"]
    elif task == "creative":
        param_dict = PARAMETERS_DICT["creative"]
    else:
        # Fallback: safe middle ground
        param_dict = {"top_p": 0.8, "temperature": 0.5}

    return param_dict

In [24]:
get_params_for_task("technical")

{'top_p': 0.7, 'temperature': 0.3}

## Retrieving Items Based on Metadata Inferred from a Query

We will create a function to extract useful metadata to help filter the items shown to it. we’ll get a JSON file with different features and all the possible values found in the dataset. Our job is to pass these values to Our database, so the LLM can pick the ones that make the most sense. And of course, we'll also need to handle situations where the LLM might not find a correct value.

The values we’ll focus on are:
- gender  
- masterCategory  
- articleType  
- baseColour  
- season  
- usage

These were chosen because they strike a good balance — they’re specific enough to be useful, but general enough to avoid empty results. Some other features in the dataset are too detailed and could lead to no matches. Also, including every single value would make the prompt too large, which could slow things down and raise costs.

In [25]:
PRODUCTS_DATA[0]

{'gender': 'Men',
 'masterCategory': 'Apparel',
 'subCategory': 'Topwear',
 'articleType': 'Shirts',
 'baseColour': 'Navy Blue',
 'season': 'Fall',
 'year': 2011.0,
 'usage': 'Casual',
 'productDisplayName': 'Turtle Check Men Navy Blue Shirt',
 'price': 67,
 'product_id': 15970}

In [26]:
# generate the dictionary with the possible values for each key
values = {}
for d in PRODUCTS_DATA:
    for key, val in d.items():
        if key in ('product_id', 'price', 'productDisplayName', 'subCategory', 'year'):
            continue
        if key not in values.keys():
            values[key] = set()
        values[key].add(val)

In [27]:
values['season']

{'All seasons', 'Fall', 'Spring', 'Summer', 'Winter'}

In [28]:
values.keys()

dict_keys(['gender', 'masterCategory', 'articleType', 'baseColour', 'season', 'usage'])

### Generate metadata

The next function’s purpose is to extract potential metadata from a given query. The approach is to construct a prompt that incorporates the `values` dictionary, which lists possible feature values. the LLM is then asked to generate a JSON response suggesting metadata relevant to the query. 

The LLM must also handle price constraints. If the query specifies a price range, the JSON should include a key like this:

```json
"price": {"min": min_value, "max": max_value}
```

If no price constraint is provided, the LLM should default to:

```json
"price": {"min": 0, "max": "inf"}
```

In [29]:
def generate_metadata_from_query(query: str) -> str:
    """
    Generates metadata in JSON format based on a given query to filter clothing items.

    Uses possible values from PRODUCTS_DATA (stored in values dictionary).
    Ensures only valid metadata is extracted for filtering.
    """

    prompt = f"""
    A query will be provided. Based on this query, a vector database will be searched to find relevant clothing items.
    Generate a JSON object containing useful metadata to filter products for this query.

    The possible values for each feature are given in the following JSON:
    {values}

    Provide a JSON with the features that best fit the query. 
    Rules:
    - Always include these keys: gender, masterCategory, articleType, baseColour, price, usage, season
    - Each value must be inside a list (even if there is only one)
    - Only use values from the JSON provided above
    - If a price range is mentioned, add it under "price": {{"min": x, "max": y}}
    - If no price range is given, set price to: {{"min": 0, "max": "inf"}}
    - Return only the JSON, nothing else

    Example of expected JSON:
    {{
      "gender": ["Women"],
      "masterCategory": ["Apparel"],
      "articleType": ["Dresses"],
      "baseColour": ["Blue"],
      "price": {{"min": 0, "max": "inf"}},
      "usage": ["Formal"],
      "season": ["All seasons"]
    }}

    Query: {query}
    """

    # Generate the response with low randomness (temperature=0)
    response = generate_with_single_input(
        prompt,
        temperature=0,
        max_tokens=1500
    )

    # Extract the generated JSON content
    content = response["content"]
    
    return content

In [30]:
print(generate_metadata_from_query("Create a look for a man that suits a sunny day in the park. I don't want to spend more than 300 dollars on each piece."))

{
  "gender": ["Men"],
  "masterCategory": ["Apparel"],
  "articleType": ["Shirts", "Shorts", "Sunglasses"],
  "baseColour": ["Yellow", "Orange", "Green", "Blue", "Red"],
  "price": {"min": 0, "max": 300},
  "usage": ["Casual", "Sports"],
  "season": ["Summer"]
}


The next functions are helper functions to extract the JSON from the query. we also need to handle the case where the LLM doesn't provide a valid and recoverable JSON. In this case, the code will just create an empty filter.

In [31]:
def parse_json_output(llm_output: str) -> dict:
    try:
        llm_output = llm_output.replace("\n", '').replace("'",'').replace("}}", "}").replace("{{", "{")
        parsed_json = json.loads(llm_output)
        return parsed_json
    except json.JSONDecodeError as e:
        print(f"JSON parsing failed: {e}")
        return None

In [32]:
json_string = generate_metadata_from_query("Give me three blue dresses suitable for a wedding party, less than 200 dollars and at least 50 dollars")
json_output = parse_json_output(json_string)
json_output

{'gender': ['Women'],
 'masterCategory': ['Apparel'],
 'articleType': ['Dresses'],
 'baseColour': ['Blue'],
 'price': {'min': 50, 'max': 200},
 'usage': ['Formal'],
 'season': ['All seasons']}

In [33]:
type(json_output)

dict

### Loading the Weaviate Product Collection

Now it is time to work with the Weaviate collection. It is already built and it is the product_data, but added as a Weaviate collection, so we can query with semantic search and metadata filtering.

### Loading the Weaviate client

We will use the Weaviate API to load the vector database.

In [34]:
client = weaviate.connect_to_local(port=8079, grpc_port=50050)

In [35]:
products_collection = client.collections.get('products')

In [36]:
len(products_collection)

44423

### Filtering by metadata

This function will create the filters given the metadata. It will create a `Filter` object for each key in the dictionary of metadata. 

In [37]:
def get_filter_by_metadata(json_output: dict = None):
    """
    Generate a list of Weaviate filters based on a provided metadata dictionary.

    Parameters:
    - json_output (dict) or None: Dictionary containing metadata keys and their values.

    Returns:
    - list[Filter] or None: A list of Weaviate filters, or None if input is None.
    """
    # If the input dictionary is None, return None immediately
    if json_output is None:
        return None

    # Define a tuple of valid keys that are allowed for filtering
    valid_keys = (
        'gender',
        'masterCategory',
        'articleType',
        'baseColour',
        'price',
        'usage',
        'season',
    )

    # Initialize an empty list to store the filters
    filters = []

    # Iterate over each key-value pair in the input dictionary
    for key, value in json_output.items():
        # Skip the key if it is not in the list of valid keys
        if key not in valid_keys:
            continue

        # Special handling for the 'price' key
        if key == 'price':
            # Ensure the value associated with 'price' is a dictionary
            if not isinstance(value, dict):
                continue

            # Extract the minimum and maximum prices from the dictionary
            min_price = value.get('min')
            max_price = value.get('max')

            # Skip if either min_price or max_price is not provided
            if min_price is None or max_price is None:
                continue

            # Skip if min_price is non-positive or max_price is infinity
            if min_price <= 0 or max_price == 'inf':
                continue

            # Add filters for price greater than min_price and less than max_price
            filters.append(Filter.by_property(key).greater_than(min_price))
            filters.append(Filter.by_property(key).less_than(max_price))
        else:
            # For other valid keys, add a filter that checks for any of the provided values
            filters.append(Filter.by_property(key).contains_any(value))

    return filters

This is wrapper function, that, given a query, return the desired filters.

In [38]:
def generate_filters_from_query(query: str) -> list:
    json_string = generate_metadata_from_query(query)
    json_output = parse_json_output(json_string)
    filters = get_filter_by_metadata(json_output)
    return filters

In [39]:
filters = generate_filters_from_query("Give me three T-shirts to use in sunny days")
filters

[_FilterValue(value=['Men', 'Women'], operator=<_Operator.CONTAINS_ANY: 'ContainsAny'>, target='gender'),
 _FilterValue(value=['Apparel'], operator=<_Operator.CONTAINS_ANY: 'ContainsAny'>, target='masterCategory'),
 _FilterValue(value=['Tshirts'], operator=<_Operator.CONTAINS_ANY: 'ContainsAny'>, target='articleType'),
 _FilterValue(value=['Yellow', 'Orange', 'Lime Green'], operator=<_Operator.CONTAINS_ANY: 'ContainsAny'>, target='baseColour'),
 _FilterValue(value=['Casual', 'Smart Casual'], operator=<_Operator.CONTAINS_ANY: 'ContainsAny'>, target='usage'),
 _FilterValue(value=['Summer'], operator=<_Operator.CONTAINS_ANY: 'ContainsAny'>, target='season')]

Note that the filters are there with the correct metadata.

The next function will get the relevant products from the query, by generating the filters, running a semantic search using the query, and then perform the metadata filtering to narrow down the possibilities and increase accuracy. 

It deals with the case where the set of metadata returns too few results by incrementally removing some filters until it gets a result with more than 5 possibilities.

In [40]:
def get_relevant_products_from_query(query: str):
    """
    Retrieve products that are most relevant to a given query by applying filters.

    This function generates filters based on the provided query and uses them to find 
    products that closely match the query criteria. If no filters are applicable or if 
    the initial search returns a small number of products, the function dynamically reduces 
    the filtering constraints based on a predefined order of filter importance.

    Parameters:
    query (str): The query string used to search for relevant products.

    Returns:
    list: A list of product objects that are most relevant to the query. If filters are not effective,
          it adjusts them to ensure a minimum return of products.
    """
    filters = generate_filters_from_query(query)  # Generate filters based on query

    # Check if there are no applicable filters
    if filters is None or len(filters) == 0:
        # Query the collection without filters, using the query text for relevance
        res = products_collection.query.near_text(query, limit=20).objects
        return res

    # Query with filters and limit to top 20 relevant objects
    res = products_collection.query.near_text(query, filters=Filter.all_of(filters), limit=20).objects

    # If the result set is fewer than 10 products, try reducing filters to broaden the search
    importance_order = ['baseColour', 'masterCategory', 'usage', 'masterCategory', 'season', 'gender']

    if len(res) < 10:
        # Iterate through the importance order of filters
        for i in range(len(importance_order)):
            # Create a list of filters that excludes less important ones
            filtered_filters = [x for x in filters if x.target not in importance_order[i+1:]]
            
            # Re-query with the reduced set of filters
            res = products_collection.query.near_text(query, filters=Filter.all_of(filtered_filters), limit=20).objects
            
            # If sufficient products have been found, return early
            if len(res) >= 5:
                return res
        # If there are no enough results, perform a basic near_text with only the query.
        if len(res) < 5:
            res = products_collection.query.near_text(query, limit=20).objects
        
    return res  # Return the final set of relevant products

In [41]:
query = "Give me three T-shirts to use in sunny days"

In [42]:
t = get_relevant_products_from_query("Give me three T-shirts to use in sunny days")

In [43]:
if len(t) > 0:
    print(t[0].properties)

{'year': 2011, 'masterCategory': 'Apparel', 'product_id': 1853, 'baseColour': 'Yellow', 'productDisplayName': 'Inkfruit Mens Little Bit More T-shirt', 'season': 'Summer', 'subCategory': 'Topwear', 'price': 50, 'gender': 'Men', 'usage': 'Casual', 'articleType': 'Tshirts'}


So, one of the relevant results is indeed a Tshirt! 

### Generating the retrieve items as a context

Now, for the given retrieved items, let's generate a simple context.

In [44]:
def generate_items_context(results: list) -> str:
    """
    Compile detailed product information from a list of result objects into a formatted string.

    This function takes a list of results, each containing various product attributes, and constructs 
    a human-readable summary for each product. Each product's details, including ID, name, category, 
    usage, gender, type, and other characteristics, are concatenated into a string that describes 
    all products in the list.

    Parameters:
    results (list): A list of result objects, each having a `properties` attribute that is a dictionary 
                    containing product attributes such as 'product_id', 'productDisplayName', 
                    'masterCategory', 'usage', 'gender', 'articleType', 'subCategory', 
                    'baseColour', 'season', and 'year'.

    Returns:
    str: A multi-line string where each line contains the formatted details of a single product.
         Each product detail includes the product ID, name, category, usage, gender, type, color, 
         season, and year.
    """
    t = ""  # Initialize an empty string to accumulate product information

    for item in results:  # Iterate through each item in the results list
        item = item.properties  # Access the properties dictionary of the current item

        # Append formatted product details to the output string
        t += (
            f"Product ID: {item['product_id']}. "
            f"Product name: {item['productDisplayName']}. "
            f"Product Category: {item['masterCategory']}. "
            f"Product usage: {item['usage']}. "
            f"Product gender: {item['gender']}. "
            f"Product Type: {item['articleType']}. "
            f"Product Category: {item['subCategory']} "
            f"Product Color: {item['baseColour']}. "
            f"Product Season: {item['season']}. "
            f"Product Year: {item['year']}.\n"
        )

    return t  # Return the complete formatted string with product details

In [45]:
print(generate_items_context(t)[:1000])

Product ID: 1853. Product name: Inkfruit Mens Little Bit More T-shirt. Product Category: Apparel. Product usage: Casual. Product gender: Men. Product Type: Tshirts. Product Category: Topwear Product Color: Yellow. Product Season: Summer. Product Year: 2011.
Product ID: 33565. Product name: Wrangler Women Sunset Beach Yellow T-shirt. Product Category: Apparel. Product usage: Casual. Product gender: Women. Product Type: Tshirts. Product Category: Topwear Product Color: Yellow. Product Season: Summer. Product Year: 2012.
Product ID: 47281. Product name: Myntra Men Pack of 3 T-shirts. Product Category: Apparel. Product usage: Casual. Product gender: Men. Product Type: Tshirts. Product Category: Topwear Product Color: Yellow. Product Season: Summer. Product Year: 2012.
Product ID: 54935. Product name: Do u speak green Women T-shirt. Product Category: Apparel. Product usage: Casual. Product gender: Women. Product Type: Tshirts. Product Category: Topwear Product Color: Yellow. Product Season:

### Query on Products

The function that queries products based on a given task. The process follows these steps:

1. **Query**: Start with a product query.
2. **Determine Query Nature**: Identify if the query is technical or creative.
3. **Retrieve Relevant Products**: Find products that best match the query criteria.
4. **Generate Context**: Build a descriptive context string based on the products.
5. **Create Prompt**: Formulate the prompt using the context and the query nature.
6. **Generate Parameters**: Prepare parameters suited to the query nature for the LLM.
7. **Run Inference**: Perform the inference using the prepared parameters.

In [46]:
def query_on_products(query: str) -> dict:
    """
    Execute a product query process to generate a response based on the nature of the query.

    This function analyzes the type of query — whether it is technical or creative — and retrieves 
    relevant product information accordingly. It constructs a prompt that includes product details 
    and the original query, and then generates parameters for querying an LLM.
    Finally, it generates a response based on the prompt and returns the content of the response.

    Parameters:
    query (str): The input query string that needs to be analyzed and answered using product data.

    Returns:
    dict: A dictionary of keyword arguments (`kwargs`) containing the prompt and additional settings 
          for creating a response, suitable for input to an LLM or other processing system.

    Outputs:
    dict: A dictionary with the parameters to call an LLM
    """


    # Determine if the query is technical or creative in nature
    query_label = decide_task_nature(query) 
    
    # Obtain necessary parameters based on the query type
    parameters_dict = get_params_for_task(query_label) 
    
    # Retrieve products that are relevant to the query
    relevant_products = get_relevant_products_from_query(query) 
     
    # Create a context string from the relevant products
    context = generate_items_context(relevant_products) 

    # Construct a prompt including product details and the query. Remember to add the context and the query in the prompt, also, ask the LLM to provide the product ID in the answer
    prompt = (
    f"You are a helpful shopping assistant. Answer the user’s query directly, in a natural conversational style. "
    f"You are a very helpful assistant. You are given a list of clothing products. Answer the query below by selecting the most relevant items. "
    f"Always include the item ID in your response. "
    f"Only describe features that are directly relevant to the query—keep descriptions concise. "
    f"If the query does not specify a number of products, return at most five. "
    f"\n\nAVAILABLE PRODUCTS:\n{context}\n\nQUERY:\n{query}"
)

    
    # Generate kwargs (parameters dict) for parameterized input to the LLM with , Prompt, role = 'assistant' and **parameters_dict
    kwargs = generate_params_dict(prompt, role='assistant', **parameters_dict)
    
    
    return kwargs

In [47]:
kwargs = query_on_products('Make a wonderful look for a man attending a wedding party happening during night.')

In [48]:
result = generate_with_single_input(**kwargs)
print(result['content'])

To create a wonderful look for a man attending a wedding party at night, I recommend the following formal shoes:

1. Product ID: 21059. Product name: Clarks Men Extra Look Black Formal Shoe. 
2. Product ID: 2834. Product name: Lee Cooper Men's Slip On Darknight Black Shoe.
3. Product ID: 33349. Product name: Homme Men Black Semi Formal Shoes.
4. Product ID: 10281. Product name: Clarks Men Hang Spring Leather Black Formal Shoes.
5. Product ID: 33355. Product name: Homme Men Black Formal Ankle Boots.

These formal shoes are suitable for a night wedding party and will complement a man's attire for the occasion.


In [49]:
kwargs = query_on_products('Give me three T-shirts for sunny days')

In [50]:
result = generate_with_single_input(**kwargs)
print(result['content'])

Here are three T-shirt options for sunny days:

1. Product ID: 1853 - Inkfruit Mens Little Bit More T-shirt (Yellow) - Casual, Men's T-shirt for sunny days.
2. Product ID: 47281 - Myntra Men Pack of 3 T-shirts (Yellow) - Casual, Men's T-shirt pack for sunny days.
3. Product ID: 4260 - Inkfruit Men's Music Fever Yellow T-shirt (Yellow) - Casual, Men's T-shirt for sunny days.


## The Final Function!
### The function to rule them all

Now it’s time to bring everything together into a single function!

This function will:

1. Check if the query is related to an FAQ or a Product.
2. If it’s an FAQ, run the FAQ-related workflow.
3. If it’s a Product, run the Product-related workflow.

It returns the kwargs dictionary containing the appropriate arguments.

In [51]:
def answer_query(query: str) -> dict:
    """
    Determines the type of a given query (FAQ or Product) and executes the appropriate workflow.

    Parameters:
    - query (str): The user's query string.

    Returns:
    - dict: A dictionary of keyword arguments to be used for further processing.
      If the query is neither FAQ nor Product-related, returns a default response dictionary
      instructing the assistant to answer based on existing context.
    """
    label = check_if_faq_or_product(query)
    if label not in ['FAQ', 'Product']:
        return {
            "role": "assistant",
            "prompt": f"User provided a question that does not fit FAQ or Product related questions. "
                      f"Answer it based on the context you already have so far. Query provided by the user: {query}"
        }
    if label == 'FAQ':
        kwargs = query_on_faq(query)
    if label == 'Product':
        try:
            kwargs = query_on_products(query)
        except:
            return {
            "role": "assistant",
            "prompt": f"User provided a question that broke the querying system. Instruct them to rephrase it."
                      f"Answer it based on the context you already have so far. Query provided by the user: {query}"
        }
            
    return kwargs

In [52]:
kwargs = answer_query("What are your working hours?")

In [53]:
result = generate_with_single_input(**kwargs)
print(result['content'])

Our customer service is available from 9:00 AM to 6:00 PM, Monday through Friday. Our online store is open 24/7.


# ChatBot

In [54]:
# !pip install gradio

In [55]:
from utils import *

In [58]:
%run utils.py

* Running on local URL:  http://127.0.0.1:8081
* Running on public URL: https://9832da9d7d8fe0583a.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
