# Improving RAG System

1. **Cost Measurement**: Figure out the potential costs of running a RAG application.
2. **Prompt Improvement**: Enhance our prompts to speed up response times, while finding the right balance between time, performance, and cost.
3. **Tracing System**: Set up a system to keep track of the inputs and outputs during interactions with the RAG system.

In [1]:
import json
from weaviate.classes.query import Filter
import weaviate
import joblib

from utils import (
    ChatWidget, 
    generate_with_single_input,
    parse_json_output,
    get_filter_by_metadata,
    generate_filters_from_query,
    process_and_print_query,
    print_properties,
    make_url
)

In [2]:
import flask_app

 * Serving Flask app 'flask_app'
 * Debug mode: off


In [3]:
import weaviate_server

## Loading the Weaviate client

In [4]:
client = weaviate.connect_to_local(port=8079, grpc_port=50050)

## Preparing the Tracing with Phoenix

In [5]:
import phoenix as px
from phoenix.otel import register
from opentelemetry.trace import Status, StatusCode

In [6]:
make_url()
session = px.launch_app()

[1mFOLLOW THIS URL TO OPEN THE UI: http://rpyqcvlvppro.labs.coursera.org[0m
🌍 To view the Phoenix app in your browser, visit http://localhost:6006/
📖 For more information on how to use Phoenix, check out https://arize.com/docs/phoenix


## Setting model cost per token

we will setup in Phoenix the two models that will be used:

- meta-llama/Llama-3.2-3B-Instruct-Turbo
- meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo

Alongside with their cost per million of tokens. This will allow us to see the cost of each operation. 

**NOTE**: For illustration purposes, let's assume a cost of **1000 USD** per million tokens for `meta-llama/Llama-3.2-3B-Instruct-Turbo` and **2000 USD** for `meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo`. The real cost per token for these models is **MUCH LOWER** than this (together.ai offers 0.08 USD per million tokens for `meta-llama/Llama-3.2-3B-Instruct-Turbo`, for instance).

In [7]:
make_url("/settings/models")

[1mFOLLOW THIS URL TO OPEN THE UI: http://rpyqcvlvppro.labs.coursera.org/settings/models[0m


In [8]:
# Setting up the telemetry
phoenix_project_name = "chatbot"

# With phoenix, we just need to register to get the tracer provider with the appropriate endpoint. 
tracer_provider_phoenix = register(project_name=phoenix_project_name, endpoint="http://127.0.0.1:6006/v1/traces")

# Retrieve a tracer for manual instrumentation
tracer = tracer_provider_phoenix.get_tracer(__name__)

🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: chatbot
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: http://127.0.0.1:6006/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



## A Quick Recap on the Database Structure

- Product database: Contains the products and their information.
- FAQ database: Contains the FAQ data.

### Products Database

In [9]:
# Loading products data
products_data = joblib.load('dataset/clothes_json.joblib')

In [10]:
# Let's get one example
products_data[0]

{'gender': 'Men',
 'masterCategory': 'Apparel',
 'subCategory': 'Topwear',
 'articleType': 'Shirts',
 'baseColour': 'Navy Blue',
 'season': 'Fall',
 'year': 2011,
 'usage': 'Casual',
 'productDisplayName': 'Turtle Check Men Navy Blue Shirt',
 'price': 67.0,
 'product_id': 15970}

### FAQ Database

Now, let's load the FAQ database and explore it.

In [11]:
faq = joblib.load("dataset/faq.joblib")

In [12]:
faq[:2]

[{'question': 'What are your store hours?',
  'answer': 'Our online store is open 24/7. Customer service is available from 9:00 AM to 6:00 PM, Monday through Friday.',
  'type': 'general information'},
 {'question': 'Where is Fashion Forward Hub located?',
  'answer': 'Fashion Forward Hub is primarily an online store. Our corporate office is located at 123 Fashion Lane, Trend City, Style State.',
  'type': 'general information'}]

The FAQs are organized in a list, where each entry is a dictionary containing the following keys: `question`, `answer`, and `type`.

## Recap on LLM calls and new output

In [13]:
generate_with_single_input?

[31mSignature:[39m
generate_with_single_input(
    prompt: str,
    role: str = [33m'user'[39m,
    top_p: float = [38;5;28;01mNone[39;00m,
    temperature: float = [38;5;28;01mNone[39;00m,
    max_tokens: int = [32m500[39m,
    model: str = [33m'meta-llama/Llama-3.2-3B-Instruct-Turbo'[39m,
    together_api_key=[38;5;28;01mNone[39;00m,
    **kwargs,
)
[31mDocstring:[39m <no docstring>
[31mFile:[39m      ~/work/utils.py
[31mType:[39m      function

In [14]:
# The output is a dictionary containing the role and content from the LLM call, as well as the token usage.:
result = generate_with_single_input("What are the primary colors?")
print(json.dumps(result, indent = 2))

{
  "id": "oE5e3Jc-4msxKE-98a453bcafd77ac1",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "The primary colors are:\n\n1. Red\n2. Blue\n3. Yellow\n\nThese colors cannot be created by mixing other colors together, and they are the base colors used to create all other colors.",
        "refusal": null,
        "role": "assistant",
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": []
      },
      "seed": 16019830532306369000
    }
  ],
  "created": 1759744987,
  "model": "meta-llama/Llama-3.2-3B-Instruct-Turbo",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 42,
    "prompt_tokens": 41,
    "total_tokens": 83,
    "completion_tokens_details": null,
    "prompt_tokens_details": null,
    "cached_tokens": 0
  },
  "prompt": []
}


In [15]:
print(result['choices'][0]['message']['content'])

The primary colors are:

1. Red
2. Blue
3. Yellow

These colors cannot be created by mixing other colors together, and they are the base colors used to create all other colors.


In [16]:
print(result['usage']['total_tokens'])

83


## Function to generate the parameters dictionary

In [17]:
def generate_params_dict(
    prompt: str,
    temperature: float = 1.0,
    role: str = 'user',
    top_p: float = 1.0,
    max_tokens: int = 500,
    model: str = "meta-llama/Llama-3.2-3B-Instruct-Turbo"
) -> dict:
    """
    Generates a dictionary of parameters for calling a Language Learning Model (LLM),
    allowing for the customization of several key options that can affect the output from the model. 

    Args:
        prompt (str): The input text that will be provided to the model to guide text generation.
        temperature (float): A value between 0 and 1 that controls the randomness of the model's output; 
            lower values result in more repetitive and deterministic results, while higher values enhance randomness.
        role (str): The role designation to be used in context, typically identifying the initiator of the interaction.
        top_p (float): A value between 0 and 1 that manages diversity through the technique of nucleus sampling; 
            this parameter limits the set of considered words to the smallest possible while maintaining 'top_p' cumulative probability.
        max_tokens (int): The maximum number of tokens that the model is allowed to generate in response, where a token can 
            be as short as one character or as long as one word.
        model (str): The specific model identifier to be utilized for processing the request. This typically specifies both 
            the version and configuration of the LLM to be employed.

    Returns:
        dict: A dictionary containing all specified parameters which can then be used to configure and execute a call to the LLM.
    """
    # Create the dictionary with the necessary parameters
    kwargs = {
        "prompt": prompt,
        "role": role,
        "temperature": temperature,
        "top_p": top_p,
        "max_tokens": max_tokens,
        "model": model
    }
    return kwargs

In [18]:
kwargs = generate_params_dict("Solve 3x^2 + 5 = 0")
print(kwargs)

{'prompt': 'Solve 3x^2 + 5 = 0', 'role': 'user', 'temperature': 1.0, 'top_p': 1.0, 'max_tokens': 500, 'model': 'meta-llama/Llama-3.2-3B-Instruct-Turbo'}


In [19]:
result = generate_with_single_input(**kwargs)
content = result['choices'][0]['message']['content']
total_tokens = result['usage']['total_tokens']
print(f"Content: {content}\n\nTotal Tokens: {total_tokens}")

Content: To solve the quadratic equation 3x^2 + 5 = 0, we can use the quadratic formula:

x = (-b ± √(b^2 - 4ac)) / 2a

In this case, a = 3, b = 0, and c = 5. 

Plugging these values into the formula, we get:

x = (-(0) ± √((0)^2 - 4(3)(5))) / 2(3)
x = (0 ± √(0 - 60)) / 6
x = (0 ± √(-60)) / 6

Unfortunately, there is no real solution to this equation, as the square root of a negative number is not a real number. However, we can express the solution using imaginary numbers:

x = (0 ± √(-60)) / 6
x = (0 ± i√60) / 6

Simplifying further, we get:

x = (0 ± i√(4*15)) / 6
x = (0 ± i2√15) / 6
x = (0 ± i√15) / 3

So, the two solutions to the equation are:

x = (0 + i√15) / 3
x = (0 - i√15) / 3

Total Tokens: 328


## Improving Task handling

### Refactoring the function to decide whether it is an FAQ or product-related question

In [20]:
def check_if_faq_or_product(query, simplified = False):
    """
    Determines whether a given instruction prompt is related to a frequently asked question (FAQ) or a product inquiry.

    Parameters:
    - query (str): The instruction or query that needs to be labeled as either FAQ or Product related.
    - simplified (bool): If True, uses a simplified prompt.

    Returns:
    - str: The label 'FAQ' if the prompt is deemed a frequently asked question, 'Product' if it is related to product information, or
      None if the label is inconclusive.
    """
 
    # If not simplified, uses a more complex prompt
    if not simplified:
        PROMPT = f"""
You are a text classification assistant. 
Your task is to label the following instruction as either 'FAQ' or 'Product'. 

Definitions:
- Product: Queries that ask about specific clothes, their features, prices, colors, availability, or details related to purchasing products.
- FAQ: Queries that ask about store policies, refunds, returns, shipping, sizing help, or other general information not tied to a specific product.

Examples:
1. "Is there a refund for incorrectly bought clothes?": FAQ
2. "Tell me about the cheapest T-shirts that you have.": Product
3. "Do you have blue T-shirts under 100 dollars?": Product
4. "I bought a T-shirt and I didn't like it. How can I get a refund?": FAQ
5. "What sizes are available in jackets?": Product
6. "How long does shipping usually take?": FAQ

Instructions:
- Return only ONE word: either 'FAQ' or 'Product'.
- Do not include explanations, punctuation, or extra words.

Instruction: {query}
Answer:
"""

    # If simlpified, uses a simplified prompt.
    else:
        PROMPT = f"""
Classify the query as FAQ or Product for a clothing store.
Product = asks about product details, availability, or outfit suggestions.
FAQ = asks about store policies, refunds, or general info.
Examples:
- "What colors do your hoodies come in?": Product
- "How can I exchange a shirt?": FAQ
- "Suggest an outfit for a beach picnic": Product
Query: {query}
Return only: FAQ or Product.
"""
        
    with tracer.start_as_current_span("routing_faq_or_product", openinference_span_kind = 'tool') as span:
        span.set_input(str({"query":query, "simplified": simplified}))
        
        # Get the kwargs dictinary to call the llm, with PROMPT as prompt, low temperature (0 or near 0) and max_tokens = 10
        kwargs = generate_params_dict(PROMPT, temperature = 0, max_tokens = 10)

        # Call generate_with_single_input with **kwargs
        with tracer.start_as_current_span("router_call", openinference_span_kind = 'llm') as router_span:
            router_span.set_input(kwargs)
            try:
                response = generate_with_single_input(**kwargs) 
            except Exception as error:
                router_span.record_exception(error)
                router_span.set_status(Status(StatusCode.ERROR))
            else:
                # OpenInference Semantic Conventions for computing Costs
                router_span.set_attribute("llm.token_count.prompt", response['usage']['prompt_tokens'])
                router_span.set_attribute("llm.token_count.completion", response['usage']['completion_tokens'])
                router_span.set_attribute("llm.token_count.total", response['usage']['total_tokens'])
                router_span.set_attribute("llm.model_name", response['model'])
                router_span.set_attribute("llm.provider", 'together.ai')
                router_span.set_output(response)
                router_span.set_status(Status(StatusCode.OK))
        
    
        # Get the Label by accessing the content key of the response dictionary
        label = response['choices'][0]['message']['content']
        total_tokens = response['usage']['total_tokens']
        span.set_output(str({"label": label, 'total_tokens':total_tokens}))
        span.set_status(Status(StatusCode.OK))

        # Improvement to prevent cases where LLM outputs more than one word
        if 'faq' in label.lower():
            label = 'FAQ'
        elif 'product' in label.lower():
            label = 'Product'
        else:
            label = 'undefined'
    
        return label, total_tokens

Let's test both versions:

In [21]:
queries = [
    'What is your return policy?', 
    'Give me three examples of blue T-shirts you have available.', 
    'How can I contact the user support?', 
    'Do you have blue Dresses?',
    'Create a look suitable for a wedding party happening during dawn.'
]

labels = ['FAQ', 'Product', 'FAQ', 'Product', 'Product']

for query, correct_label in zip(queries, labels):
    # Call check_if_faq_or_product and store the results
    response_std, tokens_std = check_if_faq_or_product(query, simplified=False)
    response_simp, tokens_simp = check_if_faq_or_product(query, simplified=True)
    
    # Print results
    process_and_print_query(query, correct_label, response_std, tokens_std, response_simp, tokens_simp)

Query: What is your return policy?
  Standard    → Label: [32mFAQ[0m | Tokens: [31m261[0m
  Simplified  → Label: [32mFAQ[0m | Tokens: [32m132[0m

Query: Give me three examples of blue T-shirts you have available.
  Standard    → Label: [32mProduct[0m | Tokens: [31m267[0m
  Simplified  → Label: [32mProduct[0m | Tokens: [32m138[0m

Query: How can I contact the user support?
  Standard    → Label: [32mFAQ[0m | Tokens: [31m263[0m
  Simplified  → Label: [32mFAQ[0m | Tokens: [32m134[0m

Query: Do you have blue Dresses?
  Standard    → Label: [32mProduct[0m | Tokens: [31m261[0m
  Simplified  → Label: [32mProduct[0m | Tokens: [32m132[0m

Query: Create a look suitable for a wedding party happening during dawn.
  Standard    → Label: [32mProduct[0m | Tokens: [31m267[0m
  Simplified  → Label: [32mProduct[0m | Tokens: [32m138[0m



### Answering a FAQ question

In [22]:
@tracer.tool
def generate_faq_layout(faq_dict):
    """
    Generates a formatted string layout for a list of FAQs.

    Returns:
    - str: A string representing the formatted layout of FAQs, with each entry on a separate line.
    """

    t = ""
    
    # Iterate over every FAQ question in the FAQ list
    for f in faq_dict:
        t += f"Question: {f['question']} Answer: {f['answer']} Type: {f['type']}\n" 

    return t

In [23]:
faq_layout = generate_faq_layout(faq)
print(faq_layout[:1000])

Question: What are your store hours? Answer: Our online store is open 24/7. Customer service is available from 9:00 AM to 6:00 PM, Monday through Friday. Type: general information
Question: Where is Fashion Forward Hub located? Answer: Fashion Forward Hub is primarily an online store. Our corporate office is located at 123 Fashion Lane, Trend City, Style State. Type: general information
Question: Do you have a physical store location? Answer: At this time, we operate exclusively online. This allows us to offer a broader selection and lower prices directly to you. Type: general information
Question: How can I create an account with Fashion Forward Hub? Answer: Click on 'Sign Up' in the top right corner of our website and follow the instructions to set up your account. Type: general information
Question: How do I subscribe to your newsletter? Answer: To receive the latest updates and promotions, sign up for our newsletter at the bottom of our homepage. Type: general information
Question:

In [24]:
print(generate_faq_layout(faq[1:2]))

Question: Where is Fashion Forward Hub located? Answer: Fashion Forward Hub is primarily an online store. Our corporate office is located at 123 Fashion Lane, Trend City, Style State. Type: general information



### Querying on FAQ

Previously, the entire FAQ was added in the query. This approach is useful to provide the entire information to the LLM, but it significantly increases token usage and execution time. Now that we are refining our chatbot, it's time to use a more efficient collection to handle it! Let's load the collection.

In [25]:
faq_collection = client.collections.get("Faq")

This code:

    Retrieves the FAQ collection from Weaviate.
    Loops over the FAQ dataset.
    Generates a deterministic UUID for each FAQ entry.
    Adds entries in batches of 20 and 5 concurrent requests.
    Uses a progress bar (tqdm) to track insertion progress.

Effect: Efficiently uploads our FAQ data into Weaviate so it can be queried later by vector similarity in our RAG pipeline.

In [26]:
from tqdm import tqdm
from weaviate.util import generate_uuid5

# Set up a batch process with specified fixed size and concurrency
with faq_collection.batch.fixed_size(batch_size=20, concurrent_requests=5) as batch:
    # Iterate over a subset of the dataset
    for document in tqdm(faq):
        # Generate a UUID based on the chunk text for unique identification
        uuid = generate_uuid5(document['question'])

        # Add the chunk object to the batch with properties and UUID
        batch.add_object(
            properties=document,
            uuid=uuid,
        )

100%|██████████| 25/25 [00:00<00:00, 26797.24it/s]


In [27]:
res = faq_collection.query.near_text("What is the return policy?", limit = 5)

In [28]:
for obj in res.objects:
    print_properties(obj)
    print('-'*50)

{
  "answer": "We accept returns within 30 days of delivery. Conditions apply for specific categories like accessories.",
  "question": "What is your return policy timeframe?",
  "type": "returns and exchanges"
}
--------------------------------------------------
{
  "answer": "Sale items are final sale and cannot be returned or exchanged, unless stated otherwise.",
  "question": "Can I return a sale item?",
  "type": "returns and exchanges"
}
--------------------------------------------------
{
  "answer": "Return processing typically takes 5-7 business days from when the item is received at our warehouse.",
  "question": "How long does it take to process a return?",
  "type": "returns and exchanges"
}
--------------------------------------------------
{
  "answer": "We provide a prepaid return label for domestic returns. For international returns, shipping is at the customer's cost.",
  "question": "Are return shipping costs covered?",
  "type": "returns and exchanges"
}
------------

Below is the function used to answer a FAQ question.
**It will only run if the question is already labeled as a FAQ.**
we’ve seen this question in a previous version, but now there’s one change:

1. A new parameter called `simplified` was added.
   This controls whether the function uses the full `faq` list or a smaller selection from it.
   If `simplified` is `True`, we should run a semantic search on the FAQ collection and use only the top 5 results.

In [29]:
def query_on_faq(query, simplified = False, **kwargs):
    """
    Constructs a prompt to query an FAQ system and generates a response.

    This function integrates an FAQ layout into the prompt to help generate a suitable answer to the given query
    using a language model. It supports additional keyword arguments to customize the prompt generation process.

    Parameters:
    - query (str): The query about which the function seeks to provide an answer from the FAQ.
    - simplified (bool): If True, uses semantic search to extract a relevant subset of FAQ questions
    - **kwargs: Optional keyword arguments for extra configuration of prompt parameters.

    Returns:
    - str: The response generated from the language model based on the input query and FAQ layout.

    """

    
    # If not simplified, generate the faq layout with the entire FAQ questions
    if not simplified:
        # Set the tracer as a chain type, since in non-simplified version, the full FAQ is used
        with tracer.start_as_current_span("query_on_faq", openinference_span_kind="tool") as span:
            
            span.set_input({"query": query, "simplified": simplified})
            faq_layout = generate_faq_layout(faq)
            
            # Generate the prompt
            PROMPT = f"""
You are an assistant that answers customer questions using only the provided FAQ content. 

Instructions:
- Use only the information from the FAQ to answer the question.
- If multiple FAQ entries are relevant, combine them into a single helpful answer.
- Do not mention the FAQ or that the answer is coming from an FAQ.
- Be clear and concise, but cover all relevant details.

<FAQ>
{faq_layout}
</FAQ>

Question: {query}
Answer:
""" 
            span.set_attribute("prompt", PROMPT)

            # Generate the parameters dict with PROMPT and **kwargs 
            kwargs = generate_params_dict(PROMPT, **kwargs) 

            span.set_attribute("output", str(kwargs))
            span.set_status(Status(StatusCode.OK))
    
            return kwargs
    
    else:
        with tracer.start_as_current_span("query_on_faq", openinference_span_kind="tool") as span:
            span.set_input({"query": query, "simplified": simplified})
            with tracer.start_as_current_span("retrieve_faq_questions", openinference_span_kind="retriever") as retrieve_span:
                
                # Get the 5 most relevant FAQ objects, in this case limit = 5
                results = faq_collection.query.near_text(query, limit=5)

                # Set the retrieved documents as attributes on the span
                for i, document in enumerate(results.objects): 
                    retrieve_span.set_attribute(f"retrieval.documents.{i}.document.id", str(document.uuid)) 
                    retrieve_span.set_attribute(f"retrieval.documents.{i}.document.metadata", str(document.metadata)) 
                    retrieve_span.set_attribute( 
                        f"retrieval.documents.{i}.document.content", str(document.properties) 
                    )  
                # Transform the results in a list of dictionary
                results = [x.properties for x in results.objects] 
                # Generate the faq layout with the new list of FAQ questions `results`
                faq_layout = generate_faq_layout(results) 

            # Different prompt to deal with this new scenario. 
            PROMPT = (
    f"Answer the following query for a clothing store using the relevant FAQ provided. "
    f"The FAQs are ordered by relevance, most relevant first. Use one or more FAQ items as needed. "
    f"Answer only the question; do not mention the FAQ.\n"
    f"<FAQ>\n"
    f"{faq_layout}\n"
    f"</FAQ>\n"
    f"Query: {query}"
)

            span.set_attribute("prompt", PROMPT)
        
            # Generate the parameters dict with PROMPT and **kwargs 
            kwargs = generate_params_dict(PROMPT, **kwargs) 
        
            span.set_attribute("output", str(kwargs))
            span.set_status(Status(StatusCode.OK))
    
            return kwargs

In [30]:
# Get the dictionary of arguments
kwargs = query_on_faq("I received the dress I ordered but I don't like it. How can I return it?")

In [31]:
# The number of split tokens in this prompt is:
print(len(kwargs['prompt'].split()))

808


In [32]:
# Run the inference
result = generate_with_single_input(**kwargs)

Let's check the content without the simplified version:

In [33]:
print(result['choices'][0]['message']['content'])

To initiate a return, please visit our Returns Center and select the dress you wish to exchange. Once in the Returns Center, follow the prompted steps and print a prepaid return label for your domestic return. For international returns, please arrange for shipping at your own cost.

Please note that the return window is within 30 days of delivery. Refunds are issued within 5-7 business days after receiving the returned item at our warehouse. Sale items are final and cannot be returned or exchanged, unless stated otherwise.


In [34]:
# Get the total tokens
print(result['usage']['total_tokens'])

1173


Now let's check the simplified version.

In [35]:
# Get the dictionary of arguments
kwargs = query_on_faq("I received the dress I ordered but I don't like it. How can I return it?", simplified = True)

In [36]:
# The number of split tokens in this prompt is:
print(len(kwargs['prompt'].split()))

201


In [37]:
# Run the inference
result = generate_with_single_input(**kwargs)

In [38]:
print(result['choices'][0]['message']['content'])

You can initiate a return through our Returns Center, selecting the dress you wish to return and the desired replacement.


In [39]:
# Get the total tokens
print(result['usage']['total_tokens'])

314


    Note that the answer is still correct and the final token count is way smaller!

## Improving the Decision Between Creative or Technical Product Queries

The goal is the same: reduce token usage while keeping good accuracy.

1. It now returns the total number of tokens used during processing.
2. It includes a new argument called `simplified`.

Must meet both of the following conditions:

* Accuracy of at least **80%** on the test set (we can get **at most one** question wrong).
* Use **fewer than 170 tokens** for **every** query.

In [40]:
def decide_task_nature(query, simplified = True):
    """
    Determines the nature of a query, labeling it as either creative or technical.

    This function constructs a prompt for a language model to decide if a given query requires a creative response,
    such as making suggestions or composing ideas, or a technical response, like providing product details or prices.

    Parameters:
    - query (str): The query to be evaluated for its nature.
    - simplified (bool): If True, uses a simplified prompt.

    Returns:
    - str: The label 'creative' if the query requires creative input, or 'technical' if it requires technical information.
    """
    
    if not simplified:
        PROMPT = f"""
Decide if the following query is a query that requires creativity (creating, composing, making new things) 
or technical (information about products, availability, descriptions, or prices). 

Definitions:
- Creative: requires imagination or styling advice (e.g., suggesting looks, composing outfits, matching accessories).
- Technical: requests factual product information (availability, price, catalog items, counts, colors, sizes).

Label it strictly as either "creative" or "technical".

Examples:
- "Give me suggestions on a nice look for a nightclub." → creative
- "What are the blue dresses you have available?" → technical
- "Give me three T-shirts for summer." → technical
- "Give me a look for attending a wedding party." → creative
- "Suggest a stylish outfit for visiting a museum." → creative
- "Do you have red jackets under 50 dollars?" → technical

Query to be analyzed: {query}

Only output one word: "creative" or "technical".
"""

    # If simplified, uses a simplified query
    else:
        PROMPT = f"""
Decide if the query requires creative input (ideas, styling, outfit suggestions) 
or technical info (product details, prices, availability). 
Examples:
- "Suggest an outfit for a beach party": creative
- "Which blue dresses are in stock?": technical
- "Give a summer T-shirt combination": creative
- "List the sizes available for red jackets": technical
Query: {query}
Use only one word: creative or technical.
"""
    
    with tracer.start_as_current_span("decide_task_nature", openinference_span_kind="tool") as span:
    # Generate the kwards dictionary by passing the PROMPT, low temperature and max_tokens = 1
        span.set_input({"query":query, "simplified": simplified})
        kwargs = generate_params_dict(PROMPT, temperature = 0, max_tokens = 1)

        with tracer.start_as_current_span("router_call", openinference_span_kind = 'llm') as router_span:
            router_span.set_input(kwargs)
            try:
                response = generate_with_single_input(**kwargs) 
            except Exception as error:
                router_span.record_exception(error)
                router_span.set_status(Status(StatusCode.ERROR))
            else:
                # OpenInference Semantic Conventions for computing Costs
                router_span.set_attribute("llm.token_count.prompt", response['usage']['prompt_tokens'])
                router_span.set_attribute("llm.token_count.completion", response['usage']['completion_tokens'])
                router_span.set_attribute("llm.token_count.total", response['usage']['total_tokens'])
                router_span.set_attribute("llm.model_name", response['model'])
                router_span.set_attribute("llm.provider", 'together.ai')
                router_span.set_output(response)
                router_span.set_status(Status(StatusCode.OK))

        # Get the Label by accessing the content key of the response dictionary
        label = response['choices'][0]['message']['content']
        total_tokens = response['usage']['total_tokens']
        span.set_output(str({"label": label, 'total_tokens':total_tokens}))
        span.set_status(Status(StatusCode.OK))    
    
        return label, total_tokens

In [41]:
queries = ["Give me two sneakers with vibrant colors.",
           "What are the most expensive clothes you have in your catalogue?",
           "I have a green Dress and I like a suggestion on an accessory to match with it.",
           "Give me three trousers with vibrant colors you have in your catalogue.",
           "Create a look for a woman walking in a park on a sunny day. It must be fresh due to hot weather."
           ]

labels = ['technical', 'technical', 'creative', 'technical', 'creative']

In [42]:
for query, correct_label in zip(queries, labels):
    response, total_tokens = decide_task_nature(query, simplified = True)
    label = response
    if label == correct_label:
        label = "\033[32m" + label + "\033[0m" 
    else:
        label = "\033[31m" + label + "\033[0m"
    if total_tokens > 170:
        total_tokens = "\033[31m"  + str(total_tokens) + "\033[0m"
    else:
        total_tokens = "\033[32m"  + str(total_tokens) + "\033[0m"
    print(f"Query: {query} Label Predicted: {label}. Correct Label: {correct_label} Total Tokens: {total_tokens}")
    print('='*70)

Query: Give me two sneakers with vibrant colors. Label Predicted: [31mcreative[0m. Correct Label: technical Total Tokens: [32m134[0m
Query: What are the most expensive clothes you have in your catalogue? Label Predicted: [32mtechnical[0m. Correct Label: technical Total Tokens: [32m138[0m
Query: I have a green Dress and I like a suggestion on an accessory to match with it. Label Predicted: [32mcreative[0m. Correct Label: creative Total Tokens: [32m144[0m
Query: Give me three trousers with vibrant colors you have in your catalogue. Label Predicted: [32mtechnical[0m. Correct Label: technical Total Tokens: [32m139[0m
Query: Create a look for a woman walking in a park on a sunny day. It must be fresh due to hot weather. Label Predicted: [32mcreative[0m. Correct Label: creative Total Tokens: [32m150[0m


In [43]:
queries = ["Give me two sneakers with vibrant colors.",
           "What are the most expensive clothes you have in your catalogue?",
           "I have a green Dress and I like a suggestion on an accessory to match with it.",
           "Give me three trousers with vibrant colors you have in your catalogue.",
           "Create a look for a woman walking in a park on a sunny day. It must be fresh due to hot weather."
           ]

labels = ['technical', 'technical', 'creative', 'technical', 'creative']

for query, correct_label in zip(queries, labels):
    response, total_tokens = decide_task_nature(query, simplified = False)
    label = response
    if label == correct_label:
        label = "\033[32m" + label + "\033[0m" 
    else:
        label = "\033[31m" + label + "\033[0m"
    if total_tokens > 170:
        total_tokens = "\033[31m"  + str(total_tokens) + "\033[0m"
    else:
        total_tokens = "\033[32m"  + str(total_tokens) + "\033[0m"
    print(f"Query: {query} Label Predicted: {label}. Correct Label: {correct_label} Total Tokens: {total_tokens}")
    print('='*70)

Query: Give me two sneakers with vibrant colors. Label Predicted: [32mtechnical[0m. Correct Label: technical Total Tokens: [31m244[0m
Query: What are the most expensive clothes you have in your catalogue? Label Predicted: [32mtechnical[0m. Correct Label: technical Total Tokens: [31m248[0m
Query: I have a green Dress and I like a suggestion on an accessory to match with it. Label Predicted: [32mcreative[0m. Correct Label: creative Total Tokens: [31m254[0m
Query: Give me three trousers with vibrant colors you have in your catalogue. Label Predicted: [32mtechnical[0m. Correct Label: technical Total Tokens: [31m249[0m
Query: Create a look for a woman walking in a park on a sunny day. It must be fresh due to hot weather. Label Predicted: [32mcreative[0m. Correct Label: creative Total Tokens: [31m260[0m


## Retrieving the parameters for a given task

In [44]:
@tracer.tool
def get_params_for_task(task):
    """
    Retrieves specific language model parameters based on the task nature.

    This function provides parameter sets tailored for creative or technical tasks to optimize
    language model behavior. For creative tasks, higher randomness is encouraged, while technical
    tasks are handled with more focus and precision. A default parameter set is provided for unexpected cases.

    Parameters:
    - task (str): The nature of the task ('creative' or 'technical').

    Returns:
    - dict: A dictionary containing 'top_p' and 'temperature' settings for the specified task.
    """
    # Create the parameters dict for technical and creative tasks
    PARAMETERS_DICT = {"creative": {'top_p': 0.9, 'temperature': 1},
                       "technical": {'top_p': 0.7, 'temperature': 0.3}} 
    
    # If task is technical, return the value for the key technical in PARAMETERS_DICT
    if task == 'technical':
        param_dict = PARAMETERS_DICT['technical'] 

    # If task is creative, return the value for the key creative in PARAMETERS_DICT
    if task == 'creative':
        param_dict = PARAMETERS_DICT['creative'] 

    # If task is a different value, fallback to another set of parameters
    else: # Fallback to a standard value
        param_dict = {'top_p': 0.5, 'temperature': 1} 

    return param_dict

## Retrieving Items Based on Metadata from a Query

When a query is identified as a product query, we need to find and return relevant products from the vector database. This process works in three main steps:

1. **Generate a metadata JSON** — Use the LLM to guess likely values for some product categories based on the query.
2. **Run a semantic search** — Use those values as filters when querying the database.
3. **Return the results** — Provide the most relevant products found.

The metadata should include values for the following features:

* Gender
* Master Category
* Article Type
* Base Color
* Season
* Usage

These categories offer a good trade-off between being specific enough to improve relevance and general enough to avoid missing results.

In [45]:
# Let's remember the data structure of a product
products_data[0]

{'gender': 'Men',
 'masterCategory': 'Apparel',
 'subCategory': 'Topwear',
 'articleType': 'Shirts',
 'baseColour': 'Navy Blue',
 'season': 'Fall',
 'year': 2011,
 'usage': 'Casual',
 'productDisplayName': 'Turtle Check Men Navy Blue Shirt',
 'price': 67.0,
 'product_id': 15970}

This is a dictionary with every possible value for the categories the LLM can pick from to generate a JSON.

In [46]:
values = {}
for d in products_data:
    for key, val in d.items():
        if key in ('product_id', 'price', 'productDisplayName', 'subCategory', 'year'):
            continue
        if key not in values.keys():
            values[key] = set()
        values[key].add(val)

In [47]:
values['season']

{'All seasons', 'Fall', 'Spring', 'Summer', 'Winter'}

### Generate metadata

This function generates a metadata JSON with possible values for each clothing category. The possible values are passed through the dictionary "values". 

    Note that the prompt is huge. Let's investigate the total tokens for a query.

In [48]:
def generate_metadata_from_query(query):
    """
    Generates metadata in JSON format based on a given query to filter clothing items.

    This function constructs a prompt for a language model to create a JSON object that will
    guide the filtering of a vector database query for clothing items. It takes possible values from
    a predefined set and ensures only relevant metadata is included in the output JSON.

    Parameters:
    - query (str): The query describing specific clothing-related needs.

    Returns:
    - str: A JSON string representing metadata with keys like gender, masterCategory, articleType,
      baseColour, price, usage, and season. Each value in the JSON is within a list, with prices specified
      as a dict containing "min" and "max" values. Unrestricted keys should use ["Any"] and unspecified
      prices should default to {"min": 0, "max": "inf"}.
    """
    
    PROMPT = f"""
    A query will be provided. Based on this query, a vector database will be searched to find relevant clothing items.
    Generate a JSON object containing useful metadata to filter products for this query.

    The possible values for each feature are given in the following JSON:
    {values}

    Provide a JSON with the features that best fit the query. 
    Rules:
    - Always include these keys: gender, masterCategory, articleType, baseColour, price, usage, season
    - Each value must be inside a list (even if there is only one)
    - Only use values from the JSON provided above
    - If a price range is mentioned, add it under "price": {{"min": x, "max": y}}
    - If no price range is given, set price to: {{"min": 0, "max": "inf"}}
    - Return only the JSON, nothing else

    Example of expected JSON:
    {{
      "gender": ["Women"],
      "masterCategory": ["Apparel"],
      "articleType": ["Dresses"],
      "baseColour": ["Blue"],
      "price": {{"min": 0, "max": "inf"}},
      "usage": ["Formal"],
      "season": ["All seasons"]
    }}

    Query: {query}
    """
    
    with tracer.start_as_current_span("generate_metadata_from_query", openinference_span_kind="tool") as span:
        span.set_input(query)
        with tracer.start_as_current_span("llm_call", openinference_span_kind="llm") as metadata_span:
            # Generate the response with the generate_with_single_input, PROMPT, temperature = 0 (low randomness) and max_tokens = 1500.
            kwargs = {"prompt": PROMPT, 'temperature': 0, "max_tokens": 1500} 
            metadata_span.set_input(kwargs)
            try:
                response = generate_with_single_input(**kwargs) 
            except Exception as error:
                metadata_span.record_exception(error)
                metadata_span.set_status(Status(StatusCode.ERROR))
            else:
                # OpenInference Semantic Conventions for computing Costs
                metadata_span.set_attribute("llm.token_count.prompt", response['usage']['prompt_tokens'])
                metadata_span.set_attribute("llm.token_count.completion", response['usage']['completion_tokens'])
                metadata_span.set_attribute("llm.token_count.total", response['usage']['total_tokens'])
                metadata_span.set_attribute("llm.model_name", response['model'])
                metadata_span.set_attribute("llm.provider", 'together.ai')
                metadata_span.set_output(response)
                metadata_span.set_status(Status(StatusCode.OK))

        # Get the Label by accessing the content key of the response dictionary
        content = response['choices'][0]['message']['content']
        total_tokens = response['usage']['total_tokens']
        span.set_output({"content": content, 'total_tokens':total_tokens})
        span.set_status(Status(StatusCode.OK))   

    
    return content, total_tokens

In [49]:
content, total_tokens = generate_metadata_from_query("Create a look for a man that suits a sunny day in the park. I don't want to spend more than 300 dollars on each piece.")

In [50]:
total_tokens

1445

In [51]:
print(content)

{
  "gender": ["Men"],
  "masterCategory": ["Apparel"],
  "articleType": ["Tshirts", "Shorts", "Sunglasses", "Shoes"],
  "baseColour": ["Yellow", "Navy Blue", "Green", "Red"],
  "price": {"min": 0, "max": 300},
  "usage": ["Casual", "Sports"],
  "season": ["Summer"]
}


So far, each product query has involved processing around **1,500 tokens**—mainly because we generate a set of filters across multiple categories before searching.

we will now **simplify** this process.

Instead of creating detailed filters for each category (like gender, color, etc.), the system will just use **semantic search directly on the user query**. This means:

* No more generating metadata.
* Just take the user’s question and run a semantic search on the product collection.

This approach is faster, uses fewer tokens, and is still effective for most queries.

### Loading the Weaviate Product Collection

Now it is time to work with the Weaviate collection.

In [52]:
products_collection = client.collections.get('products')

In [53]:
len(products_collection)

44423

### Filtering by Metadata

In [54]:
@tracer.tool
def parse_json_output(llm_output):
    try:
        # Since the input might be improperly formatted, ensure any single quotes are removed
        llm_output = llm_output.replace("\n", '').replace("'",'').replace("}}", "}").replace("{{", "{")  # Remove any erroneous structures
        
        # Attempt to parse JSON directly provided it is a properly-structured JSON string
        parsed_json = json.loads(llm_output)
        return parsed_json
    except json.JSONDecodeError as e:
        print(f"JSON parsing failed: {e}")
        return None

In [55]:
@tracer.tool
def get_filter_by_metadata(json_output: dict):
    """
    Generate a list of Weaviate filters based on a provided metadata dictionary.

    Parameters:
    - json_output (dict) or None: Dictionary containing metadata keys and their values.

    Returns:
    - list[Filter] or None: A list of Weaviate filters, or None if input is None.
    """
    # If the input dictionary is None, return None immediately
    if json_output is None:
        return None

    # Define a tuple of valid keys that are allowed for filtering
    valid_keys = (
        'gender',
        'masterCategory',
        'articleType',
        'baseColour',
        'price',
        'usage',
        'season',
    )

    # Initialize an empty list to store the filters
    filters = []

    # Iterate over each key-value pair in the input dictionary
    for key, value in json_output.items():
        # Skip the key if it is not in the list of valid keys
        if key not in valid_keys:
            continue

        # Special handling for the 'price' key
        if key == 'price':
            # Ensure the value associated with 'price' is a dictionary
            if not isinstance(value, dict):
                continue

            # Extract the minimum and maximum prices from the dictionary
            min_price = value.get('min')
            max_price = value.get('max')

            # Skip if either min_price or max_price is not provided
            if min_price is None or max_price is None:
                continue

            # Skip if min_price is non-positive or max_price is infinity
            if min_price <= 0 or max_price == 'inf':
                continue

            # Add filters for price greater than min_price and less than max_price
            filters.append(Filter.by_property(key).greater_than(min_price))
            filters.append(Filter.by_property(key).less_than(max_price))
        else:
            # For other valid keys, add a filter that checks for any of the provided values
            filters.append(Filter.by_property(key).contains_any(value))

    return filters

In [56]:
@tracer.tool
def generate_filters_from_query(query):
    json_string, total_tokens = generate_metadata_from_query(query)
    json_output = parse_json_output(json_string)
    filters = get_filter_by_metadata(json_output)
    return filters, total_tokens

It’s a modified version of the one we used previously, with one key change:

* It now includes a boolean parameter called `simplified`.
* If `simplified` is `True`, the function **must skip metadata filtering** and perform a **simple semantic search** using the query.
* Choose an appropriate limit—5 may be too low. In the previous scenario, 20 items were returned, so you might want to stick with that.

Therefore, when `simplified = True`, we should only run a semantic search—**no metadata filters should be applied**.

In [57]:
def get_relevant_products_from_query(query, simplified = False):
    """
    Retrieve the most relevant products for a given query by applying semantic search and optional filters.

    This function generates metadata filters from the query and uses them to search for products 
    that best match the intended criteria. If `simplified` is True, it performs only a basic semantic 
    search with no filters. If the filtered search returns too few results, it progressively reduces 
    filtering constraints based on the predefined importance of each filter.

    Parameters:
    query (str): The query string used to search for relevant products.
    simplified (bool): If True, only a simple semantic search is performed without any metadata filters.

    Returns:
    list: A list of product objects that are most relevant to the query.
    total_tokens: The number of tokens used in the LLM call. Returns 0 if simplified search is used.
    """
    
    # If simplified, just do a semantic search with 20 objects and return it
    if simplified:
        with tracer.start_as_current_span("get_relevant_products_from_query", openinference_span_kind="retriever") as span:  
            span.set_input({'query':query, 'simplified':simplified})
            
            results = products_collection.query.near_text(query, limit=20)

            # Set the retrieved documents as attributes on the span
            for i, document in enumerate(results.objects): 
                span.set_attribute(f"retrieval.documents.{i}.document.id", str(document.uuid)) 
                span.set_attribute(f"retrieval.documents.{i}.document.metadata", str(document.metadata)) 
                span.set_attribute( 
                    f"retrieval.documents.{i}.document.content", str(document.properties) #@ KEEP
                )  
            
            span.set_output({"results": results.objects, "total_tokens": 0})
            span.set_status(Status(StatusCode.OK))  
    
            return results.objects, 0  # Total tokens in this case is 0 because there was no LLM call!
            
    # If not simplified, perform the previous workflow by generating the filters and then doing a semantic search with them
    with tracer.start_as_current_span("get_relevant_products_from_query", openinference_span_kind="retriever") as span:  
        span.set_input({'query':query, 'simplified':simplified})
        filters, total_tokens = generate_filters_from_query(query)  # Generate filters based on the query

    # Check if there are no applicable filters
        if filters is None or len(filters) == 0:
            span.set_attribute("retrieval.filters", '')
            results = products_collection.query.near_text(query, limit=20) 
            # Set the retrieved documents as attributes on the span
            for i, document in enumerate(results.objects): 
                span.set_attribute(f"retrieval.documents.{i}.document.id", str(document.uuid))
                span.set_attribute(f"retrieval.documents.{i}.document.metadata", str(document.metadata)) 
                span.set_attribute( 
                    f"retrieval.documents.{i}.document.content", str(document.properties) 
                )  
            span.set_output({"results": results.objects, "total_tokens": total_tokens})
            span.set_status(Status(StatusCode.OK))  
            return results.objects, total_tokens
            
    # Query with filters and limit to the top 20 relevant objects
        span.set_attribute("retrieval.filters", str(filters))
        results = products_collection.query.near_text(query, filters=Filter.all_of(filters), limit=20)
        span.set_attribute("retrieval.len", len(results.objects))
        # Set the retrieved documents as attributes on the span
        for i, document in enumerate(results.objects): 
            span.set_attribute(f"retrieval.documents.{i}.document.id", str(document.uuid))
            span.set_attribute(f"retrieval.documents.{i}.document.metadata", str(document.metadata)) 
            span.set_attribute( 
                f"retrieval.documents.{i}.document.content", str(document.properties) 
            )
    
        # If the result set contains fewer than 10 products, try reducing filters to broaden the search
        importance_order = [ 'baseColour', 'masterCategory', 'usage', 'masterCategory', 'season', 'articleType', 'gender']
        if len(results.objects) < 10:
            # Iterate through the importance order of filters
            for i in range(len(importance_order)):
                with tracer.start_as_current_span(f"refilter_{i}", openinference_span_kind="chain") as refilter_span: 
                    # Create a list of filters that excludes less important ones
                    filtered_filters = [x for x in filters if x.target in importance_order[i+1:]]
                    refilter_span.set_input(str(filtered_filters))
                    
                    results = products_collection.query.near_text(query, filters=Filter.all_of(filtered_filters), limit=20)
                    # Set the retrieved documents as attributes on the span
                    for j, document in enumerate(results.objects): 
                        refilter_span.set_attribute(f"retrieval.documents.{j}.document.id", str(document.uuid))
                        refilter_span.set_attribute(f"retrieval.documents.{j}.document.metadata", str(document.metadata)) 
                        refilter_span.set_attribute( 
                            f"retrieval.documents.{j}.document.content", str(document.properties) 
                        )
                    # If sufficient products have been found, return early
                    if len(results.objects) >= 5:
                        refilter_span.set_output(results.objects)
                        refilter_span.set_status(Status(StatusCode.OK))  
                        span.set_output(results.objects)
                        span.set_status(Status(StatusCode.OK)) 
                        return results.objects, total_tokens
        span.set_output(results.objects)
        span.set_status(Status(StatusCode.OK)) 
        return results.objects, total_tokens  # Return the final set of relevant products

In [58]:
query = "Give me three T-shirts to use in sunny days"

In [59]:
t, total_tokens = get_relevant_products_from_query(query)

In [60]:
total_tokens

1411

Around 1500 tokens for this query! Let's try with the simplified version

In [61]:
t, total_tokens = get_relevant_products_from_query(query, simplified = True)

In [62]:
total_tokens

0

Note that this query took 0 tokens, as it didn't use the LLM. It directly used the query to retrieve the objects that are in the vector database.

## Generating the retrieved items as context

Now, for the given retrieved items, let's generate a simple context.

In [63]:
@tracer.tool
def generate_items_context(results):
    """
    Compile detailed product information from a list of result objects into a formatted string.

    Parameters:
    results (list): A list of result objects, each having a `properties` attribute that is a dictionary 
                    containing product attributes such as 'product_id', 'productDisplayName', 
                    'masterCategory', 'usage', 'gender', 'articleType', 'subCategory', 
                    'baseColour', 'season', and 'year'.

    Returns:
    str: A multi-line string where each line contains the formatted details of a single product.
         Each product detail includes the product ID, name, category, usage, gender, type, color, 
         season, and year.
    """
    t = ""  # Initialize an empty string to accumulate product information

    for item in results:  # Iterate through each item in the results list
        item = item.properties  # Access the properties dictionary of the current item

        # Append formatted product details to the output string
        t += (
            f"Product ID: {item['product_id']}. "
            f"Product name: {item['productDisplayName']}. "
            f"Product Category: {item['masterCategory']}. "
            f"Product usage: {item['usage']}. "
            f"Product gender: {item['gender']}. "
            f"Product Type: {item['articleType']}. "
            f"Product Category: {item['subCategory']} "
            f"Product Color: {item['baseColour']}. "
            f"Product Season: {item['season']}. "
            f"Product Year: {item['year']}.\n"
        )

    return t  # Return the complete formatted string with product details

### Query on Products

The next function will answer a product query. 

In [64]:
@tracer.tool
def query_on_products(query, simplified = False):
    """
    Execute a product query process to generate a response based on the nature of the query.

    Parameters:
    query (str): The input query string that needs to be analyzed and answered using product data.
    task_nature_prompt_function (func): The prompt function to be used to decide the task nature (if creative of technical)
    simplified (bool): If True, does not use LLM to generate metadata for filtering

    Returns:
    dict: A dictionary of keyword arguments (`kwargs`) containing the prompt and additional settings 
          for creating a response, suitable for input to an LLM or other processing system.
    int: Number of tokens used in the process to create the kwargs dictionary

    Outputs:
    str: The content of the generated response from the LLM based on the provided query and product 
         information.
    """
    total_tokens = 0
    # Determine if the query is technical or creative in nature
    
    query_label, tokens = decide_task_nature(query, simplified = simplified)
    
    # Sum the tokens used to decide the task nature (creative or technical)
    total_tokens += tokens

    # Obtain necessary parameters based on the query type
    parameters_dict = get_params_for_task(query_label)
    
    # Retrieve products that are relevant to the query
    relevant_products, tokens = get_relevant_products_from_query(query, simplified = simplified)
    
    # Sum the tokens used to get relevant products 
    total_tokens += tokens
     
    # Create a context string from the relevant products
    context = generate_items_context(relevant_products)

    # Construct a prompt including product details and the query. Remember to add the context and the query in the prompt, also, ask the LLM to provide the product ID in the answer
    PROMPT = (
    f"You are a helpful shopping assistant. Answer the user’s query in a natural conversational style. "
    f"You are a very helpful assistant. You are given a list of clothing products. Answer the query below by selecting the most relevant items. "
    f"Always include the item ID in your response. "
    f"Only describe features that are directly relevant to the query—keep descriptions concise. "
    f"If the query does not specify a number of products, return at most five. "
    f"\n\nAVAILABLE PRODUCTS:\n{context}\n\nQUERY:\n{query}"
)
    
    # Generate kwargs (parameters dict) for parameterized input to the LLM with , Prompt, role = 'assistant' and **parameters_dict
    kwargs = generate_params_dict(PROMPT, role='assistant', **parameters_dict) 
    
    return kwargs, total_tokens

Let's check with both the previous setup and the enhanced setup


#### Previous setup with simplified = False

In [65]:
kwargs, total_tokens = query_on_products('Make a wonderful look for a man attending a wedding party happening during night.', simplified = False)

In [66]:
result = generate_with_single_input(**kwargs)
print(result['choices'][0]['message']['content'])

Based on your query, I would recommend the following products to create a wonderful look for a man attending a wedding party at night:

1. Product ID: 12347 - Fastrack Men Red Manhattan Regular Fit Solid Formal Shirt (to add a pop of color and create a stylish look)
2. Product ID: 19860 - U.S. Polo Assn. Men Solid Olive Jacket (to add a touch of sophistication and blend with the wedding party atmosphere)
3. Product ID: 19855 - U.S. Polo Assn. Men Solid Navy Blue Jackets (to add a touch of elegance and complement the red shirt)
4. Product ID: 17150 - U.S. Polo Assn. Men Solid Red Jacket (not recommended as it might be too casual and overpowering)
5. Product ID: 59106 - Just Natural Men Black Jacket (not recommended as it might not add enough style and elegance to the outfit)

These products will create a well-coordinated and stylish outfit for a man attending a wedding party at night.


Now let's sum the total tokens to generate the kwargs dictionary and the total tokens used in the final execution.

In [67]:
print(f"Total tokens used in the query is: {total_tokens + result['usage']['total_tokens']}")

Total tokens used in the query is: 2556


**New setup with <code>simplified = True</code>**

In [68]:
kwargs, total_tokens = query_on_products('Make a wonderful look for a man attending a wedding party happening during night.', simplified = True)

In [69]:
total_tokens

141

In [70]:
result = generate_with_single_input(**kwargs)
print(result['choices'][0]['message']['content'])

Based on your query, I recommend the following products to create a wonderful look for a man attending a wedding party during night:

1. Product ID: 8960 - Provogue Men Night Black Shoe
This shoe is perfect for a formal night event like a wedding party. Its black color and formal design will complement any outfit.

2. Product ID: 17375, Product ID: 17378, Product ID: 17370, Product ID: 17361, Product ID: 17374, Product ID: 17359, Product ID: 17373, Product ID: 17369, Product ID: 17377 - Arrow Men Formal Purple Tie+Cufflink+Pocket square - Combo Pack (any one)
Purple is a classic and elegant color for a wedding party, and the combo pack includes a tie, cufflink, and pocket square. This will add a touch of sophistication to the outfit.

3. Product ID: 40245, Product ID: 40216 - Provogue Men Pink Tie (any one)
If the wedding party has a more casual or unique theme, a pink tie could be a bold and eye-catching choice. Pair it with the black shoe and purple combo pack for a stylish and memor

In [71]:
print(f"Total tokens used in the query is: {total_tokens + result['usage']['total_tokens']}")

Total tokens used in the query is: 1906


And the total tokens used in one query was way lower than before!

## The final function! 

### The function to rule them all

Now let's consolidate the functions

The function will:

1. Check if the query is FAQ or Product
2. If FAQ, runs the FAQ related workflow
3. If Product, runs the Product related workflow
4. Add the information into a dataframe

It returns the kwargs dict with the appropriate arguments and the total tokens used to get to the kwargs dict.

In [72]:
@tracer.tool
def answer_query(query, model = "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo", simplified=False):
    """
    Processes a user's query to determine its type (FAQ or Product) and executes the appropriate workflow.
    
    Parameters:
    - query (str): The query string provided by the user.
    - model (str): The model that will answer the question. Defaults to meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo'
    - simplified (bool): If True, uses a simplified version of the method. Defaults to False.
    
    Returns:
    - dict: A dictionary containing keyword arguments for further processing.
      If the query is neither FAQ nor Product-related, returns a default response dictionary instructing
      the assistant to answer based on existing context.
    """
    # Initialize the total tokens used to zero
    total_tokens = 0
    
    # Determine if the query is FAQ or Product and get the token count for this step
    label, tokens = check_if_faq_or_product(query, simplified=simplified)
    
    # Sum the tokens
    total_tokens += tokens
    
    # If the query is neither FAQ nor Product, return a default response
    if label not in ['FAQ', 'Product']:
        return {
            "role": "assistant",
            "prompt": (f"User provided a question that does not fit FAQ or Product-related categories. "
                       f"Answer it based on the context you already have. Query provided by the user: {query}")
        }
    
    # Process the query based on its label
    if label == 'FAQ':
        # Handle FAQ-related queries
        kwargs = query_on_faq(query, simplified=simplified)
    elif label == 'Product':
        try:
            # Handle Product-related queries, with error handling in place
            kwargs, tokens = query_on_products(query, simplified=simplified)
            # Add the tokens to the total tokens
            total_tokens += tokens
        except Exception:
            # Return an error response if an exception occurs during querying
            return {
                "role": "assistant",
                "prompt": (f"User provided a question that broke the querying system. "
                           f"Instruct them to rephrase it. Answer it based on the context you already have. "
                           f"Query provided by the user: {query}")
            }, total_tokens
    # Set the model to answer the final query - usually a better one         
    kwargs['model'] = model
    # Return the kwargs and total_tokens for further processing
    return kwargs, total_tokens

In [73]:
kwargs, total_tokens = answer_query("Give me three examples of blue t-shirts available on your catalogue.", simplified = False)

In [74]:
result = generate_with_single_input(**kwargs)
print(result['choices'][0]['message']['content'])

Here are three blue t-shirts available on our catalogue:

1. Product ID: 1847. Product name: Inkfruit Mens Messy T-shirt. Product Color: Blue.
2. Product ID: 3103. Product name: Probase Men's Wtf Blue T-Shirt. Product Color: Navy Blue.
3. Product ID: 3754. Product name: Status Quo Men's Music Revolution Blue T-shirt. Product Color: Blue.

These t-shirts are all blue in color and suitable for casual wear.


In [75]:
# To get the total tokens for the call, we must sum the total_tokens to get the kwargs dictionary + total tokens from the LLM call
total_tokens +  result['usage']['total_tokens']

3462

In [76]:
kwargs, total_tokens = answer_query("Give me three examples of blue t-shirts available on your catalogue.", simplified = True)

In [77]:
result = generate_with_single_input(**kwargs)
print(result['choices'][0]['message']['content'])

Here are three blue t-shirts available in our catalogue:

1. Product ID: 1847. Product name: Inkfruit Mens Messy T-shirt. Product Color: Blue.
2. Product ID: 3995. Product name: Mr.Men Men's Thats Funny Blue T-shirt. Product Color: Blue.
3. Product ID: 3103. Product name: Probase Men's Wtf Blue T-Shirt. Product Color: Navy Blue.

These are just a few examples of the many blue t-shirts available in our catalogue.


In [78]:
total_tokens +  result['usage']['total_tokens']

1808

## The ChatBot

In [81]:
chat_widget_standard = ChatWidget(generator_function = lambda x: answer_query(x, simplified = False), tracer = tracer)

VBox(children=(HTML(value=''), HBox(), HBox(children=(Text(value='', layout=Layout(width='90%'), placeholder='…

In [80]:
make_url()

[1mFOLLOW THIS URL TO OPEN THE UI: http://rpyqcvlvppro.labs.coursera.org[0m
