<a href="https://colab.research.google.com/github/richlin/gpt-researcher/blob/main/Langchain_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from dotenv import load_dotenv
import os

load_dotenv()

OPENAI_API_KEY=os.environ["OPENAI_API_KEY"]

In [None]:
%pip install langchain
%pip install openai
%pip install tiktoken
%pip install faiss-cpu

In [None]:
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain

# Vectorize data

In [23]:
loader = CSVLoader(file_path='paper.csv')
documents = loader.load()

In [26]:
documents[1]

Document(page_content='\ufeffstore: Walmart\nbrand: Pen+Gear\nname: Pen+Gear Copy Paper, 8.5" x 11", 92 Bright, White, 20 lb., 1 Ream (500 Sheets)\ndescription: Let this Pen+Gear Copy Paper lead you on the right path to success for your next project. Whether you\'re printing presentation handouts, announcements, signs, or just starting up a new craft project, this traditional white copy paper is a versatile canvas that lends itself to achieving just about any printed results you\'re seeking. Stock your office with Pen+Gear Copy Paper today!\n\nJam-resistant printer paper at your ready, you can easily bring your designs to life with black and color ink, which makes this copy paper the perfect choice for use at home, office, or school\nThis pack comes complete with 500 sheets and delivers results in all laser, inkjet printers and copiers\nPaper is certified by the Sustainable Forestry Initiative (SFI)\nurl: https://www.walmart.com/ip/Pen-Gear-Copy-Paper-8-5-x-11-92-Bright-White-20-lb-1-R

In [34]:
embeddings = OpenAIEmbeddings(openai_api_key = OPENAI_API_KEY)

In [36]:
db = FAISS.from_documents(documents, embeddings)

# Similarity search

In [37]:
def retrieve_info(query):
  similar_response = db.similarity_search(query, k=3)
  page_contants_array = [doc.page_content for doc in similar_response]
  return page_contants_array

In [112]:
llm = ChatOpenAI(temperature=0, model='gpt-3.5-turbo-0125', openai_api_key = OPENAI_API_KEY)

In [118]:
template = """
You are a world class product matching specialist.
I will share a list of products, and you will find the best match of the products.

Below is the description of a product that needs to be matched:
{product_description}

Here is the list of products that we have information
{existing_products}

Please find the best matched products of existing_products only from product_description.
First compare these products on core specs that are relevant to this product, focus more on numeric features.
Compare both similar and different features in great detail and output in bullet points.
Then return a numeric similarity score between 0 and 1 based on the comparison to imply how similar they are.
Please find at least 3 of these kind of products.

Below is an example of output
1. Product name
Brand name
Store name
url
Key differences between 2 products:
- difference A
- difference B
- difference C
- ...
Similarity score: 0.9
"""

In [119]:
prompt = PromptTemplate(
    input_variables=["product_description", "existing_products"],
    template=template
)

chain = LLMChain(llm=llm, prompt = prompt)

In [120]:
odp_prd = """
Hammermill® Copy Plus® Copier Paper, Letter Size (8 1/2" x 11"), 5000 Total Sheets, 92 (U.S.) Brightness, 20 Lb, FSC® Certified, White, 500 Sheets Per Ream, Case Of 10 Reams

Hammermill Copy Plus paper is an economical copy paper designed for everyday use at offices large and small. Offering dependable performance on all office machines, you'll want to have plenty of this dependable paper on hand for everyday, general office use. ColorLok for bolder blacks, brighter colors and faster drying. Backed by the 99.99% Jam-Free Guarantee. Acid-free material prevents yellowing over time to ensure a long-lasting appearance.

Perfect for black and white printing, drafts and forms. - Hammermill is more than just paper.
99.99% JAM-FREE GUARANTEE - You can trust Hammermill paper quality, guaranteed.
COLORLOK TECHNOLOGY & ACID-FREE - Colors on Hammermill copy paper are 30% brighter.
blacks are up to 60% bolder, and inks dry 3 times faster for less smearing. Acid-free Hammermill paper also prevents printing and copier sheets from yellowing over time to ensure long-lasting archival quality.
RENEWABLE RESOURCE - Hammermill copy paper is Forest Stewardship Council (FSC) certified, contributing to “MR1 Performance” for paper and wood products under LEED.
Letter-size paper measures 8 1/2" x 11" to suit your printing needs.
Forest Stewardship Council® (FSC®) certified — made from wood/paper that comes from forests managed to rigorous environmental and social standards, supported by the world's leading conservation organizations.
Leadership forestry — from forests or sourcing programs that meet specific environmental standards, helping you support practices that better protect forests and the environment.

"""

In [121]:
existing_products = retrieve_info(odp_prd)
response = chain.run(product_description=odp_prd, existing_products=existing_products)

In [122]:
print(response)

1. Product: Pen+Gear Copy Paper, 8.5" x 11", 92 Bright, White, 20 lb., 1 Ream (500 Sheets)
   Brand: Pen+Gear
   Store: Walmart
   URL: https://www.walmart.com/ip/Pen-Gear-Copy-Paper-8-5-x-11-92-Bright-White-20-lb-1-Ream-500-Sheets/487634010?athbdg=L1200
   Key differences:
   - Pen+Gear Copy Paper comes in 1 ream with 500 sheets, while Hammermill Copy Plus comes in a case of 10 reams with 5000 total sheets.
   - Pen+Gear Copy Paper is certified by the Sustainable Forestry Initiative (SFI), while Hammermill Copy Plus is FSC certified.
   - Pen+Gear Copy Paper does not mention ColorLok technology or a Jam-Free Guarantee, which are highlighted features of Hammermill Copy Plus.
   Similarity score: 0.6

2. Product: HP Printer Paper - Copy and Print, 20 lb., 8.5" x 11", 2,400 Sheets, 6 Pack
   Brand: HP
   Store: Walmart
   URL: https://www.walmart.com/ip/HP-Printer-Paper-Copy-and-Print-20-lb-8-5-x-11-2-400-Sheets-6-Pack/972411531?adsRedirect=true
   Key differences:
   - HP Printer Paper 

In [104]:
1. Pen+Gear Copy Paper, 8.5" x 11", 92 Bright, White, 20 lb., 1 Ream (500 Sheets)
Brand: Pen+Gear
Store: Walmart
Key differences between Hammermill Copy Plus paper:
- Pen+Gear has 1 ream (500 sheets) while Hammermill has 10 reams (5000 sheets)
- Pen+Gear is Sustainable Forestry Initiative (SFI) certified, while Hammermill is Forest Stewardship Council (FSC) certified
- Pen+Gear is jam-resistant, while Hammermill offers a 99.99% Jam-Free Guarantee
Similarity score: 0.7

2. HP Printer Paper - Copy and Print, 20 lb., 8.5" x 11", 2,400 Sheets, 6 Pack
Brand: HP
Store: Walmart
Key differences between Hammermill Copy Plus paper:
- HP comes in a 6 pack with 2400 sheets, while Hammermill comes in a case of 10 reams with 5000 sheets
- HP is ultra white shade (92 bright, 155 whiteness) with Color Lok technology, while Hammermill offers 92 (U.S.) Brightness
- HP is Forest Stewardship Council (FSC) certified, similar to Hammermill
Similarity score: 0.8

3. Staples Pastel Colored Copy Paper 8 1/2" x 11" Lilac 500/Ream (14782) 678826
Brand: Staples
Store: Staples
Key differences between Hammermill Copy Plus paper:
- Staples is pastel lilac colored paper, while Hammermill is white
- Staples contains 30% recycled post-consumer content, while Hammermill is FSC certified
- Staples is sold as 500 sheets per ream, while Hammermill is sold as 5000 sheets per case
Similarity score: 0.6
