# Part 2: Product matching

Problem statement:
Using ML/DL techniques, match similar products from the Flipkart dataset with the Amazon dataset. Once
similar products are matched, display the retail price from FK and AMZ side by side. Please explore as
many techniques as possible before choosing the final technique.
You may either display the final result in single table format OR You may create a simple form where we
input the product name and the output of prices of the product from both websites are displayed.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [84]:
amazon = pd.read_csv('/content/amz_com-ecommerce_sample.csv',encoding='unicode_escape')
flipkart = pd.read_csv('/content/flipkart_com-ecommerce_sample.csv',encoding='unicode_escape')

In [3]:
amazon.head()

Unnamed: 0,uniq_id,crawl_timestamp,product_url,product_name,product_category_tree,pid,retail_price,discounted_price,image,is_FK_Advantage_product,description,product_rating,overall_rating,brand,product_specifications
0,c2d766ca982eca8304150849735ffef9,2016-03-25 22:59:23 +0000,http://www.flipkart.com/alisha-solid-women-s-c...,Alisha Solid Women's Cycling Shorts,"[""Clothing >> Women's Clothing >> Lingerie, Sl...",SRTEH2FF9KEDEFGF,982,438,"[""http://img5a.flixcart.com/image/short/u/4/a/...",False,Key Features of Alisha Solid Women's Cycling S...,No rating available,No rating available,Alisha,"{""product_specification""=>[{""key""=>""Number of ..."
1,7f7036a6d550aaa89d34c77bd39a5e48,2016-03-25 22:59:23 +0000,http://www.flipkart.com/fabhomedecor-fabric-do...,FabHomeDecor Fabric Double Sofa Bed,"[""Furniture >> Living Room Furniture >> Sofa B...",SBEEH3QGU7MFYJFY,32143,29121,"[""http://img6a.flixcart.com/image/sofa-bed/j/f...",False,FabHomeDecor Fabric Double Sofa Bed (Finish Co...,No rating available,No rating available,FabHomeDecor,"{""product_specification""=>[{""key""=>""Installati..."
2,f449ec65dcbc041b6ae5e6a32717d01b,2016-03-25 22:59:23 +0000,http://www.flipkart.com/aw-bellies/p/itmeh4grg...,AW Bellies,"[""Footwear >> Women's Footwear >> Ballerinas >...",SHOEH4GRSUBJGZXE,991,551,"[""http://img5a.flixcart.com/image/shoe/7/z/z/r...",False,Key Features of AW Bellies Sandals Wedges Heel...,No rating available,No rating available,AW,"{""product_specification""=>[{""key""=>""Ideal For""..."
3,0973b37acd0c664e3de26e97e5571454,2016-03-25 22:59:23 +0000,http://www.flipkart.com/alisha-solid-women-s-c...,Alisha Solid Women's Cycling Shorts,"[""Clothing >> Women's Clothing >> Lingerie, Sl...",SRTEH2F6HUZMQ6SJ,694,325,"[""http://img5a.flixcart.com/image/short/6/2/h/...",False,Key Features of Alisha Solid Women's Cycling S...,No rating available,No rating available,Alisha,"{""product_specification""=>[{""key""=>""Number of ..."
4,bc940ea42ee6bef5ac7cea3fb5cfbee7,2016-03-25 22:59:23 +0000,http://www.flipkart.com/sicons-all-purpose-arn...,Sicons All Purpose Arnica Dog Shampoo,"[""Pet Supplies >> Grooming >> Skin & Coat Care...",PSOEH3ZYDMSYARJ5,208,258,"[""http://img5a.flixcart.com/image/pet-shampoo/...",False,Specifications of Sicons All Purpose Arnica Do...,No rating available,No rating available,Sicons,"{""product_specification""=>[{""key""=>""Pet Type"",..."


## Dropping Columns which are not required.

In [85]:
amazon.drop(['uniq_id','pid','crawl_timestamp','product_url','product_category_tree','image','is_FK_Advantage_product','description','product_rating','overall_rating','brand','product_specifications'],axis = 1, inplace=True)
flipkart.drop(['uniq_id','pid','crawl_timestamp','product_url','product_category_tree','image','is_FK_Advantage_product','description','product_rating','overall_rating','brand','product_specifications'],axis = 1, inplace=True)

In [70]:
flipkart.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 3 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   product_name      20000 non-null  object 
 1   retail_price      19922 non-null  float64
 2   discounted_price  19922 non-null  float64
dtypes: float64(2), object(1)
memory usage: 468.9+ KB


In [71]:
amazon.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 3 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   product_name      20000 non-null  object
 1   retail_price      20000 non-null  int64 
 2   discounted_price  20000 non-null  int64 
dtypes: int64(2), object(1)
memory usage: 468.9+ KB


## Removing null values from Flipkart Dataset.

In [86]:
print('The number of null values in the Flipkart Dataset is:',flipkart.isnull().sum().sum())

The number of null values in the Flipkart Dataset is: 156


In [87]:
flipkart = flipkart.dropna(axis = 0, how ='any')

## Removing negative values from Amazon Dataset.

In [88]:
amazon[amazon['retail_price']<0]

Unnamed: 0,product_name,retail_price,discounted_price
12,Sicons All Purpose Tea Tree Dog Shampoo,-11,0
21,ALISHA SOLID WOMEN'S CYCLING ShorTS,-2,0
76,Eurospa Cotton Terry Face Towel Set,-15,0
812,Fundoo T Printed Men's Track Suit,-11,0
1318,Techware Microwavable Tea Cups WF13115 - Purpl...,-2,0
...,...,...,...
16762,MUCHMORE ALLOY COPPER CHARM BRACELET,-1,0
17634,FRICTION MEN'S VEST,-11,0
19543,KARISHMA WOMEN'S A-LINE DRESS,-17,0
19599,L'APPEL DU VIDE WOMEN'S SHIFT DRESS,-10,0


In [89]:
amazon[amazon['retail_price'] < 0] = None
amazon[amazon['discounted_price'] == 0] = None
amazon = amazon.dropna(axis = 0, how ='any')

In [91]:
amazon['product_name'] = amazon['product_name'].apply(lambda a:a.lower())
flipkart['product_name'] = flipkart['product_name'].apply(lambda a:a.lower())
flipkart['retail_price'] = flipkart['retail_price'].apply(lambda a : int(a))
flipkart['discounted_price'] = flipkart['discounted_price'].apply(lambda a : int(a))

In [92]:
new_df= pd.DataFrame({'Product name in Flipkart':flipkart['product_name'],'Retail Price in Flipkart':flipkart['retail_price'],'Discounted Price in Flipkart':flipkart['discounted_price'],'Product name in Amazon':amazon['product_name'],'Retail Price in Amazon':amazon['retail_price'],'Discounted Price in Amazon':amazon['discounted_price']})

# Product Comparison

In [101]:
p_name = input("Enter Product Name:")
p_lower = p_name.lower()
#for i in new_df['Product name in Flipkart']:
if p_lower in new_df['Product name in Flipkart']:
  print('Product does not exist')

new_df[new_df['Product name in Flipkart']==p_lower]


Enter Product Name:FDT Women's Leggings


Unnamed: 0,Product name in Flipkart,Retail Price in Flipkart,Discounted Price in Flipkart,Product name in Amazon,Retail Price in Amazon,Discounted Price in Amazon
28,fdt women's leggings,699,309,fdt women's leggings pants,698.0,362.0
