Cell 1: Setup the environment

Load environment variables and select the appropriate LLM model based on the current date to ensure the use of the most updated model.



In [None]:
# Import necessary modules
import os
from dotenv import load_dotenv, find_dotenv

# Load environment variables from a .env file
_ = load_dotenv(find_dotenv())  # Load .env file

# Suppress warnings related to deprecated functions or modules
import warnings
warnings.filterwarnings('ignore')

# Handle model deprecation based on current date
import datetime

# Get the current date
current_date = datetime.datetime.now().date()

# Define a target date after which a new model should be used
target_date = datetime.date(2024, 6, 12)

# Conditionally set the LLM model based on the current date
if current_date > target_date:
    llm_model = "gpt-3.5-turbo"
else:
    llm_model = "gpt-3.5-turbo-0301"


Cell 2: Create the Q&A Application


Create a basic Q&A application using LangChain’s RetrievalQA that will allow querying a product catalog.

In [None]:
# Import necessary LangChain modules
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch

# Define the path to the CSV file
file = 'OutdoorClothingCatalog_1000.csv'

# Load the product data from CSV
loader = CSVLoader(file_path=file)
data = loader.load()

# Create a vector store index from the CSV loader
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

# Initialize the language model (LLM) for Q&A
llm = ChatOpenAI(temperature=0.0, model=llm_model)

# Create the RetrievalQA chain for querying
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs={"document_separator": "<<<<>>>>>"}
)


Cell 3: Coming Up with Test Data Points

Inspect a few data points to understand the structure of the dataset and develop hard-coded examples for evaluation.

In [None]:
# Inspect specific data points from the dataset
data[10]
data[11]

# Hard-coded examples for testing
examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty 850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]


Cell 4: Generate LLM-Generated Examples

Use LangChain’s QAGenerateChain to automatically generate test examples from the dataset using the language model.

In [None]:
# Import the QAGenerateChain for generating examples
from langchain.evaluation.qa import QAGenerateChain

# Initialize the example generation chain using the LLM
example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI(model=llm_model))

# Generate new examples from the dataset (using the first 5 entries as an example)
# The warning about ignoring can be safely ignored
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)

# Inspect the first generated example
new_examples[0]
data[0]

# Combine hard-coded and LLM-generated examples
examples += new_examples


Cell 5: Running Q&A on an Example

Test the Q&A application by running a query from the examples and retrieving the result.



In [None]:
# Run the Q&A model on the first example's query
qa.run(examples[0]["query"])


Manually evaluate and debug the system by running queries with debug mode enabled.

In [None]:
# Enable debug mode for manual evaluation
import langchain
langchain.debug = True

# Run the Q&A model on the first example's query with debugging enabled
qa.run(examples[0]["query"])

# Turn off the debug mode
langchain.debug = False


Cell 7: LLM-Assisted Evaluation

Use LangChain’s QAEvalChain to evaluate the predictions generated by the Q&A system and compare them to the correct answers.

In [None]:
# Run predictions on all examples
predictions = qa.apply(examples)

# Import QAEvalChain for LLM-assisted evaluation
from langchain.evaluation.qa import QAEvalChain

# Initialize the evaluation chain using the LLM
llm = ChatOpenAI(temperature=0, model=llm_model)
eval_chain = QAEvalChain.from_llm(llm)

# Evaluate the predictions and grade the outputs
graded_outputs = eval_chain.evaluate(examples, predictions)

# Display the evaluation results
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['text'])
    print()

# Display the graded output for the first example
graded_outputs[0]


Explanation:
Cells 1-2: Set up the environment, load the product catalog data, and create a basic Q&A system using LangChain’s RetrievalQA.

Cells 3-4: Test the system with both hard-coded and LLM-generated examples for better coverage of different types of queries.

Cell 5: Test the Q&A system on a sample query.

Cell 6: Enable debug mode to manually evaluate and inspect the internal workings of the system for better insights into its performance.

Cell 7: Use LLM-assisted evaluation to automatically assess the accuracy of the Q&A system’s predictions.



Suggestions for further improvements:
a. Extend the system with more complex multi-step queries for products that involve multiple attributes.
b. Implement confidence scoring for each prediction to understand how certain the model is about its answers.