# RAG-Enhanced Review Insights with Konko, LangChain & Pinecone

## Introduction

In today's digital marketplace, customer reviews are invaluable assets. They encapsulate genuine feedback, and if harnessed correctly, can be a source of actionable business intelligence.

In this tutorial, we venture into the world of retrieval augmentation using Konko's hosted LLM, LangChain, and Pinecone to derive insights from Amazon's vast collection of reviews.

<img src="https://github.com/konko-ai/examples/blob/main/img/analytics.png" width="500" height="300">

## 🧠 LLMs: Powerhouses with Potential... and Limitations

LLMs, despite their vast capabilities, have one inherent limitation: they're not inherently aware of recent events or fresh data. Their understanding remains anchored to the last data they were trained on.

Here are some nuances we need to be aware of:

1. **Business Context Blindness:** An LLM, out of the box, lacks the nuances of your specific business. It's like a fresh recruit on their first day; they don't inherently know the intricacies of your operations or the preferences of your users.
2. **Static Knowledge Base:** An LLM's strength is its extensive knowledge, but it's also its limitation. It's not inherently aware of evolving trends, recent events, or fresh data, which can be vital for many applications.



## The RAG (Retrieval Augmented Generation) Solution: Keeping LLMs Current and Contextual


Enter the RAG framework. The essence of Retrieval Augmentation is to supplement LLMs with external, up-to-date information. This ensures that the insights and analyses are both deep and current.

**Advantages of RAG:**

1. **Dynamic Knowledge:** RAG ensures that the information LLMs work with is both vast (from its internal knowledge) and fresh (from external sources).
2. **Efficient Fine-Tuning:** RAG allows updates to its knowledge without the need for exhaustive retraining. This flexibility makes it adept at adapting to changing information landscapes.
3. **Contextual Business Relevance:** With the right sources, RAG can be tailored to provide business-specific context, making LLM outputs more pertinent to specific user needs and business scenarios.

<img src="https://raw.githubusercontent.com/konko-ai/examples/main/img/Rag.png" width="500" height="300">

In this notebook, you will experience firsthand the synergy of **Konko's hosted LLM**, **LangChain**, and **Pinecone**. The RAG framework elegantly addresses the data freshness challenge faced by LLMs. By fetching current, relevant data, and feeding it to LLMs, we ensure that our analyses remain both in-depth and contemporary.

Equipped with these tools and techniques, businesses can gain a competitive edge, always staying in tune with their customer's latest feedback. Ready to leverage this for your business? Dive in and explore the code snippets provided. Happy coding!

### Getting Started

1. Install Necessary Libraries: First up, we'll set up our environment.
2. Set Up Environment Variables: As a best practice, API keys and configurations will be kept in environment variables. Ensure you have established variables for Konko URL, Konko API KEY, Pinecone API KEY, and PINECONE ENVIRONMENT.

The dataset we'll be working with is available [here](https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/).



In [1]:
import os
import json
import gzip
import pandas as pd
from urllib.request import urlopen

from dotenv import load_dotenv,find_dotenv

load_dotenv(find_dotenv())

True

### Initialization & Data Loading

1. Extract and load Amazon reviews and associated metadata directly from compressed files to pandas dataframes.
2. Conduct a preliminary data cleanup, focusing on truncating lengthy reviews for more efficient processing.

In [2]:
# Extract data from files
data = []
with gzip.open('data/AMAZON_FASHION.json.gz') as f:
    for l in f:
        data.append(json.loads(l.strip()))
        
metadata = []
with gzip.open('data/meta_AMAZON_FASHION.json.gz') as f:
    for l in f:
        metadata.append(json.loads(l.strip()))

In [3]:
# Load the data to dataframes

df = pd.DataFrame.from_dict(data)
df = df[df['reviewText'].notna()]

df_meta=pd.DataFrame.from_dict(metadata)

In [4]:
# Truncate the reviewText

max_text_length=400
def truncate_review(text):
    return text[:max_text_length]

df['truncated']=df.apply(lambda row: truncate_review(row['reviewText']),axis=1)

**This is what the data appears like after cleaning.**

In [5]:
df

Unnamed: 0,overall,verified,reviewTime,reviewerID,asin,reviewerName,reviewText,summary,unixReviewTime,vote,style,image,truncated
0,5.0,True,"10 20, 2014",A1D4G1SNUZWQOT,7106116521,Tracy,Exactly what I needed.,perfect replacements!!,1413763200,,,,Exactly what I needed.
1,2.0,True,"09 28, 2014",A3DDWDH9PX2YX2,7106116521,Sonja Lau,"I agree with the other review, the opening is ...","I agree with the other review, the opening is ...",1411862400,3,,,"I agree with the other review, the opening is ..."
2,4.0,False,"08 25, 2014",A2MWC41EW7XL15,7106116521,Kathleen,Love these... I am going to order another pack...,My New 'Friends' !!,1408924800,,,,Love these... I am going to order another pack...
3,2.0,True,"08 24, 2014",A2UH2QQ275NV45,7106116521,Jodi Stoner,too tiny an opening,Two Stars,1408838400,,,,too tiny an opening
4,3.0,False,"07 27, 2014",A89F3LQADZBS5,7106116521,Alexander D.,Okay,Three Stars,1406419200,,,,Okay
...,...,...,...,...,...,...,...,...,...,...,...,...,...
883631,5.0,True,"02 21, 2017",A1ZSB2Q144UTEY,B01HJHTH5U,Amazon Customer,I absolutely love this dress!! It's sexy and ...,I absolutely love this dress,1487635200,,,,I absolutely love this dress!! It's sexy and ...
883632,5.0,True,"11 25, 2016",A2CCDV0J5VB6F2,B01HJHTH5U,Amazon Customer,I'm 5'6 175lbs. I'm on the tall side. I wear a...,I wear a large and ordered a large and it stil...,1480032000,2,,,I'm 5'6 175lbs. I'm on the tall side. I wear a...
883633,3.0,True,"11 10, 2016",A3O90PACS7B61K,B01HJHTH5U,Fabfifty,Too big in the chest area!,Three Stars,1478736000,,,,Too big in the chest area!
883634,3.0,True,"11 10, 2016",A2HO94I89U3LNH,B01HJHF97K,Mgomez,"Too clear in the back, needs lining",Three Stars,1478736000,,,,"Too clear in the back, needs lining"


### Filtering Products with Adequate Reviews
 

In this section, we aim to identify and select products from our dataset that have garnered a significant number of reviews. A considerable amount of reviews ensures that we have enough data to glean meaningful insights. 

In [6]:
# Look for productIds with enough reviews

df.groupby('asin').count().sort_values('overall').tail(20)

Unnamed: 0_level_0,overall,verified,reviewTime,reviewerID,reviewerName,reviewText,summary,unixReviewTime,vote,style,image,truncated
asin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
B00XTM0ZPG,1405,1405,1405,1405,1405,1405,1405,1405,33,1405,66,1405
B000GHMRLW,1415,1415,1415,1415,1414,1415,1415,1415,53,1391,3,1415
B000GHRZN2,1415,1415,1415,1415,1414,1415,1415,1415,0,0,3,1415
B00ZW3SCF0,1522,1522,1522,1522,1522,1522,1518,1522,142,1520,276,1522
B000JOOR7O,1584,1584,1584,1584,1584,1584,1583,1584,74,1538,28,1584
B009RUKQ2G,1590,1590,1590,1590,1590,1590,1590,1590,92,1590,27,1590
B000YFSR4W,1648,1648,1648,1648,1648,1648,1646,1648,44,1612,10,1648
B004HX6P1E,1671,1671,1671,1671,1671,1671,1670,1671,147,1670,81,1671
B005N7YWX6,1688,1688,1688,1688,1688,1688,1688,1688,101,1649,11,1688
B0017U1KBK,1826,1826,1826,1826,1826,1826,1824,1826,178,0,49,1826


To narrow down our focus, we'll work on just a subset of the entire dataset that corresponds to two specific products:

1. RFID Blocking Card Holder
2. PowerStep Pinnacle Orthotic Shoe Insoles


In [7]:
# Work on only a slice of the dataframe

df = df.loc[(df['asin'] == 'B00GXE331K') | (df['asin'] == 'B000KPIHQ4')].copy()

**Below is a snapshot of the reviews dataset, specifically focusing on the two products we've chosen for this analysis: 'RFID Blocking Card Holder' and 'PowerStep Pinnacle Orthotic Shoe Insoles'.**

In [8]:
df

Unnamed: 0,overall,verified,reviewTime,reviewerID,asin,reviewerName,reviewText,summary,unixReviewTime,vote,style,image,truncated
11218,3.0,True,"09 26, 2007",A1CIM0XZ3UA926,B000KPIHQ4,M. Cane,"Good price, good product. Howver, it is generi...",Orthotics off the rack,1190764800,2,"{'Size Name:': ' Men's 5-5.5, Women's 7-7.5', ...",,"Good price, good product. Howver, it is generi..."
11219,5.0,True,"01 18, 2007",A1EVVPCWRW5YYZ,B000KPIHQ4,Deborah Morris,My husband rates these insoles a 5 for comfort...,Very comfortable,1169078400,3,"{'Size Name:': ' Men's 10-10.5, Women's 12', '...",,My husband rates these insoles a 5 for comfort...
11220,5.0,True,"05 18, 2018",A2P3NZ9H4PANK0,B000KPIHQ4,Stephanie,I have worn the Powerstep Pinnacle shoe insole...,... Pinnacle shoe insoles for the past 5 years...,1526601600,,"{'Size Name:': ' Men's 6-6.5, Women's 8-8.5', ...",,I have worn the Powerstep Pinnacle shoe insole...
11221,1.0,True,"05 18, 2018",A2975GY186VV7A,B000KPIHQ4,jessica etim,Very uncomfortable feel like I wasted my money!,Uncomfortable,1526601600,,"{'Size Name:': ' Men's 7-7.5, Women's 9-9.5', ...",,Very uncomfortable feel like I wasted my money!
11222,5.0,True,"05 17, 2018",A3U8E58RIKWDAW,B000KPIHQ4,Nancy Mazzuca,work perfect,Five Stars,1526515200,,"{'Size Name:': ' Men's 9-9.5, Women's 11-11.5'...",,work perfect
...,...,...,...,...,...,...,...,...,...,...,...,...,...
486369,2.0,True,"07 4, 2018",AQCHECTIUVKTV,B00GXE331K,Amazon Customer,I started switching my cards from my old walle...,I started switching my cards from my old walle...,1530662400,,{'Color:': ' Stainless Steel'},,I started switching my cards from my old walle...
486370,5.0,True,"07 4, 2018",A1LXAF4YMKSDEB,B00GXE331K,Amazon Customer,I really love the card holder case that I'm us...,I really love the card holder case that I'm us...,1530662400,,{'Color:': ' Black Stainless Steel'},,I really love the card holder case that I'm us...
486371,4.0,True,"07 3, 2018",A3USRXIGMZW02O,B00GXE331K,Dave Dettelbach,Fast shipping and product looks great.,Four Stars,1530576000,,{'Color:': ' Black Stainless Steel'},,Fast shipping and product looks great.
486372,5.0,True,"07 3, 2018",A1M00GF04C1TZK,B00GXE331K,xiiztec,"Love it, held it and didn't want to put it down.",Absolutely amazing,1530576000,,{'Color:': ' Black Stainless Steel'},,"Love it, held it and didn't want to put it down."


### Diving into Embeddings with HuggingFace! (But Stay Tuned for Our Very Own  🚀 Konko Embeddings!) 

In this step, we're leveraging HuggingFace to convert our textual data into meaningful numerical vectors. These vectors pack the essence of our product reviews into a format that's digestible for our subsequent analyses. And guess what? We're on the verge of rolling out our exclusive Konko embedding endpoint - ensuring even more tailored and cost-effective insights!

Remember, while embeddings are powerful, always be cautious about costs when venturing into paid embedding solutions, especially with large datasets.




In [10]:
# Import and apply embeddings from HuggingFace
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()

df['embeddings']=df.apply(lambda row: embeddings.embed_query(row['truncated']),axis=1)

### 🌲 Powering Up with Pinecone: Efficiently Storing Our Review Embeddings! 

To harness the true potential of our review embeddings, we're tapping into Pinecone - a vector database. By doing so, we're not only storing our data but also setting the stage for **LLAMA 2 13B** to weave its magic and derive meaningful insights from the reviews.

In [11]:
# Import Pinecone client

import pinecone
from langchain.vectorstores import Pinecone

# Initialize Pinecone
pinecone.init(
    api_key=os.getenv('PINECONE_API_KEY'),  
    environment=os.getenv('PINECONE_ENVIRONMENT')  
)

**Transform & Upload:** Convert truncated reviews into a list, embed via HuggingFace, and store using Pinecone's from_texts method.



In [12]:
# Create list with truncated review texts

texts=df['truncated'].tolist()

In [13]:
# Send embedding vectors to Pinecone with Langchain

vstore = Pinecone.from_texts(texts, embeddings, index_name='cxanalytics')

**Confirmation:** A quick glance at Pinecone's dashboard verifies the successful upload of our review vectors.

## 🚀 Unleashing LLaMa 2 with Konko API: Dive into Review Insights!
 

Dive deep into how RetrievalQA and the Konko API work hand-in-hand to empower LLaMa 2, allowing us to efficiently extract, analyze, and process reviews.

**RetrievalQA in Action:** An essential tool to sift through and retrieve the most pertinent reviews from our dataset.

**Incorporating Konko API:** Seamlessly connecting with the LLaMa 2 model, facilitating in-depth review analysis.

###  🛠 Setting Up the Review Chain: Diving Deeper with RetrievalQA!

Take a closer look at how the review_chain is set up using RetrievalQA, bridging our dataset with the LLaMa 2 model to facilitate a richer context-driven analysis.

1. Building the Review Chain: Leveraging RetrievalQA to seamlessly connect the language model with the vector store.
2. Understanding Chain Type 'Stuff': A unique approach that packs all related data into the prompt, providing a comprehensive context for the language model to operate within.

In [14]:
# Import RetrievalQA adn Konko API and define review_chain in order to have Llama 2 access the review data

from langchain.chains import RetrievalQA
from langchain.llms import Konko

chat = Konko(model_id='meta-llama--Llama-2-13b-chat-hf')
review_chain = RetrievalQA.from_chain_type(llm=chat, chain_type="stuff", retriever=vstore.as_retriever())

## 🚀 Unlocking Insights with LLaMa 2!

Dive into a transformative approach to understand product reviews using LLaMa 2. We're set to extract meaningful feedback and actionable recommendations.

1. Crafting the Query: We're asking LLaMa 2 for an overall impression, detailed examples, and potential improvements.
2. Running the Chain: With a quick command, we tap into the power of our review_chain.
3. Fine-Tuning Feedback: Remember, with system messages, we can calibrate LLaMa 2's responses for better insights.
4. From POC to Action: The potential? Convert this into weekly digests for teams, ensuring feedback is always actionable.

**Harness the reviews, guide the strategy!**

In [15]:
# Define the task for Llama 2 and run the chain

q="""
The reviews you see are for a product called 'Best RFID Blocking Card Holder Case for Men and Women Slim Stainless Steel Metal Wallet'.
What is the overall impression of these reviews? Give most prevalent examples in bullets. 
What do you suggest we focus on improving?
"""

result=review_chain.run(q)
print(result)

 Based on the reviews provided, here is the overall impression:

* The wallet is effective in protecting credit cards from theft and has a beautiful aluminum finish.
* The wallet is well-made and durable, with cards fitting snugly into the holders.
* The wallet has a professional look and has been tested to be reliable, with no cards falling out even when dropped upside down.

Here are the most prevalent examples in bullets:

* The wallet is slim and stylish, making it a great option for those who want a minimalist look.
* The RFID blocking feature provides an added layer of security against identity theft.
* The wallet is made of high-quality materials, such as stainless steel, which ensures its durability.
* The wallet has multiple card slots, allowing users to carry all their necessary cards.

Based on these reviews, it seems that the product is well-liked by customers and meets their needs effectively. However, there is one potential area for improvement:

* Some reviewers mentione

In [16]:
# Define the task for Llama 2 and run the chain

q="""
The reviews you see are for a product called 'Powerstep Pinnacle Orthotic Shoe Insoles'.
What is the overall impression of these reviews? Give most prevalent examples in bullets. 
What do you suggest we focus on improving?
"""

result=review_chain.run(q)
print(result)

 Based on the reviews provided, here is the overall impression:

* Most reviewers purchased the Powerstep Pinnacle Orthotic Shoe Insoles based on good reviews, but they did not provide any significant relief from foot pain.
* Some reviewers found the arch support to be too high, which caused discomfort.
* Despite trying the insoles for a week, some reviewers still experienced severe foot pain due to plantar fasciitis.

Here are the most prevalent examples in bullets:

* "I still have severe foot pain due to plantar fasciitis and am going to try something else."
* "The arches are too high for me."
* "I was ready to send the SuperFeet back, but my father informed me he liked them better."

Based on these reviews, it seems that the Powerstep Pinnacle Orthotic Shoe Insoles may not be effective in providing relief from foot pain, especially for those with plantar fasciitis. The high arch support may also be a source of discomfort for some users. To improve, the manufacturer could consider a