# Amazon Recommendation System 

Welcome to the Amazon Recommendation System project! This project is about the development of a recommendation system utilizing Cosine Similarity, which was applied to tags generated from review and product columns within the Amazon Sales dataset sourced from Kaggle. The dataset underwent preprocessing steps, including stop words removal, stemming, and count vectorization, to enhance its utility. Amazon is a prominent American multinational technology company recognized worldwide for its extensive e-commerce platform.

Link to Dataset: https://www.kaggle.com/datasets/karkavelrajaj/amazon-sales-dataset

# Summary

### Chapter 1: Exploratory and Statistical Analysis
- 1.1: Importing Data and First Look
- 1.2: Checking Data Types
- 1.3: Checking Null Values
- 1.4: Checking Undesired Values

### Chapter 2: Data Processing
- 2.1: Separating the Important Columns
- 2.2: Handling Null Values
- 2.3: Handling Undesired Values
- 2.4: Handling Columns Types
- 2.5: Tags Generation
- 2.6: Handling 'http' Tags

### Chapter 3: Building Recommendation Model
- 3.1: Stop Words
- 3.2: Stemming
- 3.3: Count Vectorizer
- 3.4: Cosine Similarity
- 3.5: Creating a Product Recommendation System

### Chapter 4: Tests and Conclusions
- Final Tests and Conclusions of The Project

# Chapter 1: Exploratory and Statistical Analysis

In this section, we take a close look at our data to understand what it can tell us. 

This step is crucial for getting to know our data better before we dive into more advanced techniques.

### 1.1: Importing Data and First Look

In [1]:
import pandas as pd
import numpy as np

# Eliminate the warnings. 
import warnings
warnings.filterwarnings("ignore")

In [2]:
# load CSV file from Google Drive
amazon_df = pd.read_csv("amazon.csv")
amazon_df

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,₹399,"₹1,099",64%,4.2,24269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,₹199,₹349,43%,4.0,43994,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,₹199,"₹1,899",90%,3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,₹329,₹699,53%,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...","Omkar dhale,JD,HEMALATHA,Ajwadh a.,amar singh ...","R3EEUZKKK9J36I,R3HJVYCLYOY554,REDECAZ7AMPQC,R1...","Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,₹154,₹399,61%,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...","rahuls6099,Swasat Borah,Ajay Wadke,Pranali,RVK...","R1BP4L2HH9TFUP,R16PVJEXKV6QZS,R2UPDB81N66T4P,R...","As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1460,B08L7J3T31,Noir Aqua - 5pcs PP Spun Filter + 1 Spanner | ...,Home&Kitchen|Kitchen&HomeAppliances|WaterPurif...,₹379,₹919,59%,4,1090,SUPREME QUALITY 90 GRAM 3 LAYER THIK PP SPUN F...,"AHITFY6AHALOFOHOZEOC6XBP4FEA,AFRABBODZJZQB6Z4U...","Prabha ds,Raghuram bk,Real Deal,Amazon Custome...","R3G3XFHPBFF0E8,R3C0BZCD32EIGW,R2EBVBCN9QPD9R,R...","Received the product without spanner,Excellent...","I received product without spanner,Excellent p...",https://m.media-amazon.com/images/I/41fDdRtjfx...,https://www.amazon.in/Noir-Aqua-Spanner-Purifi...
1461,B01M6453MB,Prestige Delight PRWO Electric Rice Cooker (1 ...,Home&Kitchen|Kitchen&HomeAppliances|SmallKitch...,"₹2,280","₹3,045",25%,4.1,4118,"230 Volts, 400 watts, 1 Year","AFG5FM3NEMOL6BNFRV2NK5FNJCHQ,AGEINTRN6Z563RMLH...","Manu Bhai,Naveenpittu,Evatira Sangma,JAGANNADH...","R3DDL2UPKQ2CK9,R2SYYU1OATVIU5,R1VM993161IYRW,R...","ok,everything was good couldn't return bcoz I ...","ok,got everything as mentioned but the measuri...",https://m.media-amazon.com/images/I/41gzDxk4+k...,https://www.amazon.in/Prestige-Delight-PRWO-1-...
1462,B009P2LIL4,Bajaj Majesty RX10 2000 Watts Heat Convector R...,"Home&Kitchen|Heating,Cooling&AirQuality|RoomHe...","₹2,219","₹3,080",28%,3.6,468,International design and styling|Two heat sett...,"AGVPWCMAHYQWJOQKMUJN4DW3KM5Q,AF4Q3E66MY4SR7YQZ...","Nehal Desai,Danish Parwez,Amazon Customer,Amaz...","R1TLRJVW4STY5I,R2O455KRN493R1,R3Q5MVGBRIAS2G,R...","very good,Work but front melt after 2 month,Go...","plastic but cool body ,u have to find sturdy s...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Bajaj-RX-10-2000-Watt-Co...
1463,B00J5DYCCA,Havells Ventil Air DSP 230mm Exhaust Fan (Pist...,"Home&Kitchen|Heating,Cooling&AirQuality|Fans|E...","₹1,399","₹1,890",26%,4,8031,Fan sweep area: 230 MM ; Noise level: (40 - 45...,"AF2JQCLSCY3QJATWUNNHUSVUPNQQ,AFDMLUXC5LS5RXDJS...","Shubham Dubey,E.GURUBARAN,Mayank S.,eusuf khan...","R39Q2Y79MM9SWK,R3079BG1NIH6MB,R29A31ZELTZNJM,R...","Fan Speed is slow,Good quality,Good product,go...",I have installed this in my kitchen working fi...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Havells-Ventilair-230mm-...


As we can see, this is a dataset containing products of Amazon, and reviews related to them. We can find several well-known and lesser-known products to the general public, along with various data about them.

The dataset has 1465 rows and 16 columns.

### 1.2: Checking Data Types

'dtypes' show us the type of each column of our dataframe.

In [3]:
amazon_df.dtypes

product_id             object
product_name           object
category               object
discounted_price       object
actual_price           object
discount_percentage    object
rating                 object
rating_count           object
about_product          object
user_id                object
user_name              object
review_id              object
review_title           object
review_content         object
img_link               object
product_link           object
dtype: object

All columns are 'object' type.

In [4]:
amazon_df.columns

Index(['product_id', 'product_name', 'category', 'discounted_price',
       'actual_price', 'discount_percentage', 'rating', 'rating_count',
       'about_product', 'user_id', 'user_name', 'review_id', 'review_title',
       'review_content', 'img_link', 'product_link'],
      dtype='object')

### 1.3: Checking Null Values

We'll use 'isnull()' and 'sum()' to see how many null values the columns has.

In [5]:
amazon_df.isnull().sum()

product_id             0
product_name           0
category               0
discounted_price       0
actual_price           0
discount_percentage    0
rating                 0
rating_count           2
about_product          0
user_id                0
user_name              0
review_id              0
review_title           0
review_content         0
img_link               0
product_link           0
dtype: int64

There are two null values in "rating_count".

### 1.4: Checking Undesired Values

Reviewing the table in Excel, it came to my attention that certain entries in the 'review_content' column contain hyperlinks to web pages. While this column holds significance in the development of our recommendation system, these embedded links are irrelevant.

Let's see how many rows they appear in.

In [6]:
# Finding all rows where the 'tags' column contains the specific string
tags_contendo_imagem = amazon_df[amazon_df['review_content'].str.contains('https:')]

# Display the found rows
tags_contendo_imagem.head()

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,₹399,"₹1,099",64%,4.2,24269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,₹199,"₹1,899",90%,3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...
5,B08Y1TFSP6,pTron Solero TB301 3A Type-C Data and Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,₹149,"₹1,000",85%,3.9,24871,Fast Charging & Data Sync: Solero TB301 Type-C...,"AEQ2YMXSZWEOHK2EHTNLOS56YTZQ,AGRVINWECNY7323CW...","Jayesh,Rajesh k.,Soopy,amazon customer,Aman,Sh...","R7S8ANNSDPR40,R3CLZFLHVJU26P,RFF7U7MPQFUGR,R1M...","It's pretty good,Average quality,very good and...","It's a good product.,Like,Very good item stron...",https://m.media-amazon.com/images/I/31wOPjcSxl...,https://www.amazon.in/Solero-TB301-Charging-48...
15,B083342NKJ,MI Braided USB Type-C Cable for Charging Adapt...,Computers&Accessories|Accessories&Peripherals|...,₹349,₹399,13%,4.4,18757,1M Long Cable. Usb 2.0 (Type A)|Toughened Join...,"AGSGSRTEZBQY64WO2HKQTV7TWFSA,AEYD5HVYAJ23CR6PT...","Birendra ku Dash,Aditya Gupta,Abdulla A N,Deep...","R2JPQNKCOE10UK,RQI80JG2WZXNF,R2LYZ4CUWPMUJN,R1...","Good product,using this product 8months It is ...","I like it 👍👍,Best charging power . I used this...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Mi-Braided-USB-Type-C-Ca...
16,B0B6F7LX4C,MI 80 cm (32 inches) 5A Series HD Ready Smart ...,"Electronics|HomeTheater,TV&Video|Televisions|S...","₹13,999","₹24,999",44%,4.2,32840,"Note : The brands, Mi and Xiaomi, are part of ...","AHEVOQADJSSRX7DS325HSFLMP7VQ,AG7XYZRCSKX6G2OLO...","Manoj maddheshiya,Manoj Kumar Sahoo,Saumil s.,...","R13UTIA6KOF6QV,R2UGDZSGFF01K7,RHHIZ45VYU5X6,R1...",It is the best tv if you are getting it in 10-...,Pros- xiomi 5a is best in budget-Nice picture ...,https://m.media-amazon.com/images/I/51fmHk3km+...,https://www.amazon.in/MI-inches-Ready-Android-...


In [7]:
tags_contendo_imagem.shape

(233, 16)

233 rows in our dataframe contain links to web addresses that are not relevant for creating our tags. We will handle them in the next phase of our project.

There's a row in the 'rating' column containing the value '|' instead of a number. Let's check how often this occurs.

In [8]:
unique_ratings = amazon_df['rating'].unique()
unique_ratings

array(['4.2', '4.0', '3.9', '4.1', '4.3', '4.4', '4.5', '3.7', '3.3',
       '3.6', '3.4', '3.8', '3.5', '4.6', '3.2', '5.0', '4.7', '3.0',
       '2.8', '4', '3.1', '4.8', '2.3', '|', '2', '3', '2.6', '2.9'],
      dtype=object)

In [9]:
# Counting the number of occurrences of '|'
count_pipe = (amazon_df['rating'] == '|').sum()

print(f'The character "|" appears {count_pipe} times in the column "rating".')

The character "|" appears 1 times in the column "rating".


Let's check if the Indian rupee symbol is present in all values of the 'actual_price' column. These values will need to be addressed later.

- Indian rupee is the currency of India.

In [10]:
# Using .loc to avoid SettingWithCopyWarning
# Creating the column 'first_letter'
amazon_df.loc[:, 'first_letter'] = amazon_df['actual_price'].str[0]

# Display the unique first letters present in the column
unique_first_letters = amazon_df['first_letter'].unique()

# Check if there is more than one first letter
if len(unique_first_letters) == 1:
    print(f'The rupee sign appears in all values: {unique_first_letters[0]}')
else:
    print('The rupee sign does not appear in all values.')

The rupee sign appears in all values: ₹


In [11]:
# Dropping the first_letter column used to check the rupee sign
amazon_df = amazon_df.drop(columns=['first_letter'])

# Chapter 2: Data Processing

This section involves formatting and consolidating relevant data columns to create a unified representation of each product attributes. 

The processed data will serve as the foundation for building the recommendation system.

### 2.1: Separating the Important Columns

We will separate here the most relevant columns for our project.

In [12]:
# Keeping important columns for recommendation
amazon_df = amazon_df[['product_id', 'product_name', 'category', 'actual_price', 'rating', 'rating_count', 'about_product', 'review_title', 'review_content']]
amazon_df.head()

Unnamed: 0,product_id,product_name,category,actual_price,rating,rating_count,about_product,review_title,review_content
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,"₹1,099",4.2,24269,High Compatibility : Compatible With iPhone 12...,"Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,₹349,4.0,43994,"Compatible with all Type C enabled devices, be...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,"₹1,899",3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a..."
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,₹699,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou..."
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,₹399,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"As good as original,Decent,Good one for second...","Bought this instead of original apple, does th..."


### 2.2: Handling Null Values

 We've decided to drop the null values, since there's only one of them.

In [13]:
amazon_df.dropna(subset=['rating_count'], inplace=True)
# Counting null values in the 'rating' column.
valores_nulos = amazon_df['rating_count'].isnull().sum()

print(f'The number of null values in the column "rating" is: {valores_nulos}')

The number of null values in the column "rating" is: 0


### 2.3 Handling Undesired Values

We'll drop the row with '|' from the 'rating' column.

In [14]:
amazon_df = amazon_df[amazon_df['rating'] != '|']

Let's replace '|' with commas in the 'category' column.

In [15]:
amazon_df['category'] = amazon_df['category'].str.replace('|', ',')

amazon_df.head()

Unnamed: 0,product_id,product_name,category,actual_price,rating,rating_count,about_product,review_title,review_content
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,"Computers&Accessories,Accessories&Peripherals,...","₹1,099",4.2,24269,High Compatibility : Compatible With iPhone 12...,"Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,"Computers&Accessories,Accessories&Peripherals,...",₹349,4.0,43994,"Compatible with all Type C enabled devices, be...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,"Computers&Accessories,Accessories&Peripherals,...","₹1,899",3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a..."
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,"Computers&Accessories,Accessories&Peripherals,...",₹699,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou..."
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,"Computers&Accessories,Accessories&Peripherals,...",₹399,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"As good as original,Decent,Good one for second...","Bought this instead of original apple, does th..."


### 2.4: Handling Columns Types

Let's convert the 'rating' and 'rating_count' columns to float. Also, we'll remove the commas from our 'rating_count' column.

In [16]:
# Converting 'rating' to float
amazon_df['rating'] = amazon_df['rating']

# Replacing commas and converting 'rating_count' to float
amazon_df['rating_count'] = amazon_df['rating_count'].replace(',', '', regex=True)
amazon_df.head()

Unnamed: 0,product_id,product_name,category,actual_price,rating,rating_count,about_product,review_title,review_content
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,"Computers&Accessories,Accessories&Peripherals,...","₹1,099",4.2,24269,High Compatibility : Compatible With iPhone 12...,"Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,"Computers&Accessories,Accessories&Peripherals,...",₹349,4.0,43994,"Compatible with all Type C enabled devices, be...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,"Computers&Accessories,Accessories&Peripherals,...","₹1,899",3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a..."
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,"Computers&Accessories,Accessories&Peripherals,...",₹699,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou..."
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,"Computers&Accessories,Accessories&Peripherals,...",₹399,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"As good as original,Decent,Good one for second...","Bought this instead of original apple, does th..."


In [17]:
amazon_df.dtypes

product_id        object
product_name      object
category          object
actual_price      object
rating            object
rating_count      object
about_product     object
review_title      object
review_content    object
dtype: object

### 2.5: Tags Generation

In this section, we generate tags for each product by combining relevant information such as product_name, category, actual_price, rating, rating_count, about_product, review_title, and review_content. 

This process creates a comprehensive representation of each product's attributes, which will be used in the recommendation system.

In [18]:
# Concatenate all
amazon_df['tags'] = amazon_df['product_name'] + " " + amazon_df['category'] + " "  + amazon_df['actual_price'] + " "  + amazon_df['rating'] + " " + amazon_df['rating_count'] + " " + amazon_df['about_product']+ " " + amazon_df['review_title']+ " " + amazon_df['review_content']

amazon_df.head()

Unnamed: 0,product_id,product_name,category,actual_price,rating,rating_count,about_product,review_title,review_content,tags
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,"Computers&Accessories,Accessories&Peripherals,...","₹1,099",4.2,24269,High Compatibility : Compatible With iPhone 12...,"Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,Wayona Nylon Braided USB to Lightning Fast Cha...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,"Computers&Accessories,Accessories&Peripherals,...",₹349,4.0,43994,"Compatible with all Type C enabled devices, be...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,"Computers&Accessories,Accessories&Peripherals,...","₹1,899",3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",Sounce Fast Phone Charging Cable & Data Sync U...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,"Computers&Accessories,Accessories&Peripherals,...",₹699,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,"Computers&Accessories,Accessories&Peripherals,...",₹399,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",Portronics Konnect L 1.2M Fast Charging 3A 8 P...


In [19]:
# Check and handle null values
amazon_df['tags'] = amazon_df['tags'].fillna('')  # Substitui NaN por string vazia

# Now, replace commas with spaces
amazon_df['tags'] = amazon_df['tags'].str.replace(',', ' ')

amazon_df['tags'][0]

"Wayona Nylon Braided USB to Lightning Fast Charging and Data Sync Cable Compatible for iPhone 13  12 11  X  8  7  6  5  iPad Air  Pro  Mini (3 FT Pack of 1  Grey) Computers&Accessories Accessories&Peripherals Cables&Accessories Cables USBCables ₹1 099 4.2 24269 High Compatibility : Compatible With iPhone 12  11  X/XsMax/Xr  iPhone 8/8 Plus iPhone 7/7 Plus iPhone 6s/6s Plus iPhone 6/6 Plus iPhone 5/5s/5c/se iPad Pro iPad Air 1/2 iPad mini 1/2/3 iPod nano7 iPod touch and more apple devices.|Fast Charge&Data Sync : It can charge and sync simultaneously at a rapid speed  Compatible with any charging adaptor  multi-port charging station or power bank.|Durability : Durable nylon braided design with premium aluminum housing and toughened nylon fiber wound tightly around the cord lending it superior durability and adding a bit to its flexibility.|High Security Level : It is designed to fully protect your device from damaging excessive current.Copper core thick+Multilayer shielding  Anti-inter

### 2.6 Handling 'http' Tags

In this part of the project, we're checking how many tags contain 'http' in an Amazon dataset. We're also cleaning up the tags by removing any words starting with 'http'. This helps prepare the data for analysis.

These tags contain hyperlinks that are irrelevant to our tags.

In [20]:
# Initialize a variable to count the number of tags containing 'http'
http_tags_count = 0

# Iterate over all rows in the 'tags' column
for tag in amazon_df['tags']:
    # Check if 'http' is present in the tag
    if 'http' in tag:
        # Increment the count
        http_tags_count += 1

# Show the number of tags containing 'http'
print("Number of tags containing 'http':", http_tags_count)

Number of tags containing 'http': 233


In [21]:
# Dividir a string da coluna 'tags' em palavras
tags_split = amazon_df['tags'].str.split()

# Remover a palavra que começa com 'http' em cada lista de palavras
tags_split = tags_split.apply(lambda x: [word for word in x if not word.startswith('http')])

# Juntar as palavras de volta em uma única string
amazon_df['tags'] = tags_split.str.join(' ')

# Verificar a alteração na primeira linha
print("Tags após a remoção de 'http' no início na primeira linha:")
print(amazon_df['tags'][0])

Tags após a remoção de 'http' no início na primeira linha:
Wayona Nylon Braided USB to Lightning Fast Charging and Data Sync Cable Compatible for iPhone 13 12 11 X 8 7 6 5 iPad Air Pro Mini (3 FT Pack of 1 Grey) Computers&Accessories Accessories&Peripherals Cables&Accessories Cables USBCables ₹1 099 4.2 24269 High Compatibility : Compatible With iPhone 12 11 X/XsMax/Xr iPhone 8/8 Plus iPhone 7/7 Plus iPhone 6s/6s Plus iPhone 6/6 Plus iPhone 5/5s/5c/se iPad Pro iPad Air 1/2 iPad mini 1/2/3 iPod nano7 iPod touch and more apple devices.|Fast Charge&Data Sync : It can charge and sync simultaneously at a rapid speed Compatible with any charging adaptor multi-port charging station or power bank.|Durability : Durable nylon braided design with premium aluminum housing and toughened nylon fiber wound tightly around the cord lending it superior durability and adding a bit to its flexibility.|High Security Level : It is designed to fully protect your device from damaging excessive current.Copper 

In [22]:
# Droping those extra columns
amazon_df = amazon_df[['product_id','product_name', 'tags']]
amazon_df.head()

Unnamed: 0,product_id,product_name,tags
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Wayona Nylon Braided USB to Lightning Fast Cha...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Sounce Fast Phone Charging Cable & Data Sync U...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Portronics Konnect L 1.2M Fast Charging 3A 8 P...


# Chapter 3: Building Recommendation Model

### 3.1 Stop Words

We'll download and load the English stop words list using the Natural Language Toolkit (NLTK). The stop words are then used to create a function called 'remover_stop_words' that removes stop words from the 'tags' column in the DataFrame.

 This step helps in preprocessing the text data by removing common words that do not contribute significantly to the meaning of the text.

In [23]:
amazon_df['tags'][0]

"Wayona Nylon Braided USB to Lightning Fast Charging and Data Sync Cable Compatible for iPhone 13 12 11 X 8 7 6 5 iPad Air Pro Mini (3 FT Pack of 1 Grey) Computers&Accessories Accessories&Peripherals Cables&Accessories Cables USBCables ₹1 099 4.2 24269 High Compatibility : Compatible With iPhone 12 11 X/XsMax/Xr iPhone 8/8 Plus iPhone 7/7 Plus iPhone 6s/6s Plus iPhone 6/6 Plus iPhone 5/5s/5c/se iPad Pro iPad Air 1/2 iPad mini 1/2/3 iPod nano7 iPod touch and more apple devices.|Fast Charge&Data Sync : It can charge and sync simultaneously at a rapid speed Compatible with any charging adaptor multi-port charging station or power bank.|Durability : Durable nylon braided design with premium aluminum housing and toughened nylon fiber wound tightly around the cord lending it superior durability and adding a bit to its flexibility.|High Security Level : It is designed to fully protect your device from damaging excessive current.Copper core thick+Multilayer shielding Anti-interference Protecti

In [24]:
import nltk
from nltk.corpus import stopwords

# Baixando a lista de stop words em inglês
nltk.download('stopwords')

# Carregando as stop words em inglês
stop_words = set(stopwords.words('english'))

# Supondo que 'disney_df' é o seu DataFrame com a coluna 'tags'

# Função para remover stop words
def remover_stop_words(tags):
    # Verificar se 'tags' é uma string
    if isinstance(tags, str):
        # Remover stop words
        return ' '.join([word for word in tags.split() if word.lower() not in stop_words])
    else:
        return tags  # Retorna o valor original se não for uma string

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\yamas\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [25]:
# Aplicando a função à coluna 'tags' usando uma expressão lambda
amazon_df['tags'] = amazon_df['tags'].apply(lambda x: remover_stop_words(x))
amazon_df['tags'][0]

'Wayona Nylon Braided USB Lightning Fast Charging Data Sync Cable Compatible iPhone 13 12 11 X 8 7 6 5 iPad Air Pro Mini (3 FT Pack 1 Grey) Computers&Accessories Accessories&Peripherals Cables&Accessories Cables USBCables ₹1 099 4.2 24269 High Compatibility : Compatible iPhone 12 11 X/XsMax/Xr iPhone 8/8 Plus iPhone 7/7 Plus iPhone 6s/6s Plus iPhone 6/6 Plus iPhone 5/5s/5c/se iPad Pro iPad Air 1/2 iPad mini 1/2/3 iPod nano7 iPod touch apple devices.|Fast Charge&Data Sync : charge sync simultaneously rapid speed Compatible charging adaptor multi-port charging station power bank.|Durability : Durable nylon braided design premium aluminum housing toughened nylon fiber wound tightly around cord lending superior durability adding bit flexibility.|High Security Level : designed fully protect device damaging excessive current.Copper core thick+Multilayer shielding Anti-interference Protective circuit equipment.|WARRANTY: 12 months warranty friendly customer services ensures long-time enjoymen

### 3.2: Stemming

We are going to perform the stemming process on our tags

- Stemming is the process of reducing words to their root or base form, even if the result is not a valid word. This helps to group together words with similar meanings. For example, "running," "runs," and "runner" would all be stemmed to "run."

In [26]:
import nltk
from nltk.stem import PorterStemmer

ps = PorterStemmer()

def stems(text):
    T = []
    
    for i in text.split():
        T.append(ps.stem(i))
    
    return " ".join(T)

In [27]:
# Apply the function to non-null values
amazon_df['tags'] = amazon_df['tags'].apply(lambda x: stems(x) if isinstance(x, str) else x)
amazon_df['tags'][0]


'wayona nylon braid usb lightn fast charg data sync cabl compat iphon 13 12 11 x 8 7 6 5 ipad air pro mini (3 ft pack 1 grey) computers&accessori accessories&peripher cables&accessori cabl usbcabl ₹1 099 4.2 24269 high compat : compat iphon 12 11 x/xsmax/xr iphon 8/8 plu iphon 7/7 plu iphon 6s/6 plu iphon 6/6 plu iphon 5/5s/5c/se ipad pro ipad air 1/2 ipad mini 1/2/3 ipod nano7 ipod touch appl devices.|fast charge&data sync : charg sync simultan rapid speed compat charg adaptor multi-port charg station power bank.|dur : durabl nylon braid design premium aluminum hous toughen nylon fiber wound tightli around cord lend superior durabl ad bit flexibility.|high secur level : design fulli protect devic damag excess current.copp core thick+multilay shield anti-interfer protect circuit equipment.|warranty: 12 month warranti friendli custom servic ensur long-tim enjoy purchase. meet question problem pleas hesit contact us. satisfi charg realli fast valu money product review good qualiti good p

In [28]:
# Assuming you want to count the number of words in the first row (index 0) of the 'tags' column
numero_de_palavras = len(amazon_df['tags'][0].split())

# Display the result
numero_de_palavras

203

### 3.3: Count Vectorizer

We're about to apply CountVectorizer to our tags, converting them into numerical representations for analysis.

- CountVectorizer is a method used for converting a collection of text documents into a matrix of token counts. It essentially converts text data into numerical data that can be used by machine learning algorithms.

In [29]:
from sklearn.feature_extraction.text import CountVectorizer

cv = CountVectorizer(max_features=5000,stop_words='english')

vector = cv.fit_transform(amazon_df['tags']).toarray()
vector[0]

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

In [30]:
vector.shape

(1462, 5000)

In [31]:
len(cv.get_feature_names_out())

5000

### 3.4: Cosine Similarity

We're going to apply Cosine Similarity to our tags now, assessing the similarity between them for better analysis.

- Cosine similarity is a measure that quantifies the similarity between two numerical vectors, often used in natural language processing and recommendation tasks. It measures the cosine of the angle between the two vectors and ranges from -1 (completely different) to 1 (identical), with 0 indicating that the vectors are orthogonal (unrelated). In text processing contexts, vectors typically represent word frequencies in documents. Cosine similarity is useful for finding similar documents or calculating the proximity between words based on their occurrence context.

In [32]:
from sklearn.metrics.pairwise import cosine_similarity
similarity = cosine_similarity(vector)

In [33]:
similarity.shape

(1462, 1462)

### 3.5: Creating a Product Recommendation System

We'll define a function called 'recommend' that takes a product ID as input. It retrieves the index of the ID in the DataFrame, calculates the cosine similarity between the selected product and all others, sorts them by similarity, and then prints the 10 most similar products.

In [34]:
def recommend(product_id):
    index = amazon_df[amazon_df['product_id'] == product_id].index[0]
    print("Searched Item:")
    print(amazon_df.iloc[index].product_name)
    
    distances = sorted(list(enumerate(similarity[index])), reverse=True, key=lambda x: x[1])
    print("\nRecommendations:")
    for i in distances[1:11]:
        print(amazon_df.iloc[i[0]].product_name)


# Chapter 4: Tests and Conclusions

In this final stage, we will test our recommendation system with some product titles to see what it suggests to us.

In [35]:
recommend('B0BBLHTRM9')

Searched Item:
KNOWZA Electric Handheld Milk Wand Mixer Frother for Latte Coffee Hot Milk, Milk Frother for Coffee, Egg Beater, Hand Blender, Coffee Beater (BLACK COFFEE BEATER)

Recommendations:
WIDEWINGS Electric Handheld Milk Wand Mixer Frother for Latte Coffee Hot Milk, Milk Frother for Coffee, Egg Beater, Hand Blender, Coffee Beater with Stand
Zuvexa USB Rechargeable Electric Foam Maker - Handheld Milk Wand Mixer Frother for Hot Milk, Hand Blender Coffee, Egg Beater (Black)
InstaCuppa Milk Frother for Coffee - Handheld Battery-Operated Electric Milk and Coffee Frother, Stainless Steel Whisk and Stand, Portable Foam Maker for Coffee, Cappuccino, Lattes, and Egg Beaters
Oratech Coffee Frother electric, milk frother electric, coffee beater, cappuccino maker, Coffee Foamer, Mocktail Mixer, Coffee Foam Maker, coffee whisker electric, Froth Maker, coffee stirrers electric, coffee frothers, Coffee Blender, (6 Month Warranty) (Multicolour)
PRO365 Indo Mocktails/Coffee Foamer/Cappuccino/Le

In [36]:
recommend('B0B3XXSB1K')

Searched Item:
Tata Sky Digital TV HD Setup Box Remote

Recommendations:
Crypo™ Universal Remote Compatible with Tata Sky Universal HD & SD Set top Box (Also Works with All TV)
7SEVEN® Compatible Tata Sky Remote Control Replacement of Original dth SD HD tata Play Set top Box Remote - IR Learning Universal Remote for Any Brand TV - Pairing Must
Tata Sky Universal Remote Compatible for SD/HD
LOHAYA Television Remote Compatible for VU LED LCD HD Tv Remote Control Model No :- EN2B27V
TATA SKY HD Connection with 1 month basic package and free installation
Airtel DigitalTV DTH Remote SD/HD/HD Recording Compatible for Television (Shining Black )
SVM Products Unbreakable Set Top Box Stand with Dual Remote Holder (Black)
Firestick Remote
VW 80 cm (32 inches) Playwall Frameless Series HD Ready Android Smart LED TV VW3251 (Black)
Technotech High Speed HDMI Cable 5 Meter V1.4 - Supports Full HD 1080p (Color May Vary)


Apparently, our recommendation system is fulfilling its purpose.