#### Project: Feedback summarizer

In this project, we analyze the feedback of Amazon products.

#### Imports

In [7]:
import os
from urllib.parse import urlparse

from bs4 import BeautifulSoup
from dotenv import load_dotenv
import pandas as pd

from langchain_core.prompts import PromptTemplate
from langchain_community.llms import Ollama
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document

from common_functions import datasets_dir, ensure_llama_running, clean_prompt, display_md
ensure_llama_running()

load_dotenv()
llm_model = os.getenv('LLM_MODEL')

#### Data collection

In [8]:
def scrape_amazon_reviews(link, dropna=True):
	base_url = '{uri.scheme}://{uri.netloc}/'.format(uri=urlparse(link))
	
	# get product ID from link - last part of the link
	product_id = link.split('/')[-1].split('?')[0]
	
	# ensure no spaces in product ID
	product_id = product_id.replace(' ', '')
	
	filename = f'reviews_{product_id}.csv'
	filepath = os.path.join(datasets_dir, filename)
	# if file exists, return the reviews
	if os.path.exists(filepath):
		reviews_df = pd.read_csv(filepath)
		reviews_df.dropna(inplace=True)
		return reviews_df
	
	from selenium import webdriver
	options = webdriver.ChromeOptions()
	options.add_argument('headless')  # to run the browser in background

	driver = webdriver.Chrome(options=options)
	driver.get(link)
	soup = BeautifulSoup(driver.page_source, "html.parser")
	
	# new code to find the button and open its link
	button = soup.find('a', {'class': 'a-link-emphasis a-text-bold', 'data-hook': 'see-all-reviews-link-foot'})
	button_link = button['href']
	driver.get(base_url + button_link)
	soup = BeautifulSoup(driver.page_source, "html.parser")
	
	# html_path = os.path.join(datasets_dir, f'amazon_{product_id}.html')
	# with open(html_path, 'w') as f:
	# 	f.write(soup.prettify())
	driver.quit()
	reviews = []

	for review in soup.find_all('div', {'data-hook': 'review'}):
		title = review.find('a', {'data-hook': 'review-title'}).span.text
		# rating = review.find('i', {'data-hook': 'review-star-rating'}).span.text
		feedback = review.find('span', {'data-hook': 'review-body'}).span.text
		# remove new lines
		feedback = feedback.replace('\n', ' ')

		reviews.append({
			'rating': title[0],
			'feedback': feedback,
		})

	reviews_df = pd.DataFrame(reviews)
	reviews_df.to_csv(filepath, index=False)
	
	if dropna:
		reviews_df.dropna(inplace=True)
	return reviews_df

amazon_in_link = 'https://www.amazon.in/dp/B00KXULGJQ?th=1'
reviews_df = scrape_amazon_reviews(amazon_in_link)
reviews_df.head()

Unnamed: 0,rating,feedback
0,5,I recently purchased the TP-Link AC750 WiFi Ra...
1,4,I have a range issue in the second floor as th...
2,5,"Excellent product, does the job very well and ..."
3,5,"I bought this product , because someone has su..."
4,5,The RE450 is an 802.11ac range extender with a...


In [9]:
# convert reviews into single string
reviews = '\n\n'.join(reviews_df['feedback'].tolist())
print(reviews[:200] + '...')

I recently purchased the TP-Link AC750 WiFi Range Extender, and I must say, it has made a significant difference in my home network. Here's my review:Performance (4/5):The AC750 range extender effecti...


#### Creating the model

In [10]:
llm = Ollama(model=llm_model)
llm.get_name()

'Ollama'

Define the prompt

In [11]:
prompt = """
Reviews of my product are given below. I created this product and want to improve using user feedback.
Give me insights into what should be improved in the product based on the reviews.
Elaborate more in 3 paragraphs.

{text}
"""

prompt = clean_prompt(prompt, llm)

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


Number of initial tokens: 54
Number of tokens after cleanup: 49


In [12]:
docs = [Document(reviews)]

prompt_template = PromptTemplate(
	input_variables=['text'],
	template=prompt,
)
chain = load_summarize_chain(
	llm,
	prompt=prompt_template,
	verbose=False,
)
print('Chain created')

Chain created


In [13]:
summary = chain.run(docs)
display_md(f"**Insights based on the reviews:**\n\n{summary}")

  warn_deprecated(


**Insights based on the reviews:**

Overall, I would recommend the TP-Link AC1900 Wi-Fi range extender, also known as the RE Prism. It offers dual-band compatibility, WPS, an access point mode, a Gigabit ethernet port, and helpful indicator lights with external antennas. The installation process is straightforward using the Tether mobile app.

However, if you are looking for a more durable option, it may not be the best choice due to its bulkiness and weight. Additionally, some users have reported issues with the Ethernet port's stability upon restart.

If you experience problems during use or after initial setup, I recommend reaching out to TP Link's customer support for assistance.

Note: I used a Small Language Model (SLM) for a faster output. It can be replaced with an LLM.

#### Hosting with gradio

In [15]:
import gradio as gr

def summarize_reviews(link):
    reviews_df = scrape_amazon_reviews(link)
    reviews = '\n\n'.join(reviews_df['feedback'].tolist())
    docs = [Document(reviews)]
    summary = chain.run(docs)
    return summary

interface=gr.Interface(
    fn=summarize_reviews,
    inputs=[gr.Textbox(lines=4, placeholder="Paste Amazon link")],
    outputs="text"
)
interface.launch()

Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.


