# Script for extracting product page details

The script takes as input a .jsonl file containing a list of product details (Eg Output of ScrapeProductListBySeller) and then iterates through all the products in that list.

For each product in the product list, it reads the ProductPageUrl and uses Requests module to download and scrape the product page details at that URL. The details are stored in a .jsonl file. 

In [None]:
import urllib.request # FOR URL ENCODING 
import requests # For making requests to download a webpage content
from selectorlib import Extractor # For extracting specific fileds from downloaded webpage
import json 
import random
import re
from time import sleep
import os
import jsonlines
import pandas as pd
import datetime

#### Step 1: Read ProductList jsonl file and extract all product page urls

**NOTE:** Before running this, change the path variable 'products_file' to point to the ProductList .jsonl file. 

The following code loads a ProductList file, and reads the ProductPageUrl field for each product into a list. 

In [None]:
products_file = open('./../DATASET/ProductLists/SCRAPED_PRODUCT_LIST_CLOUDTAIL_TOP_BRANDS.jsonl', 'r')
#brands = open('./../DATASETS/ProductLists/SCRAPED_PRODUCT_LIST_CLOUDTAIL.jsonl', 'r')
#brands = open('./../DATASETS/ProductLists/SCRAPED_PRODUCT_LIST.jsonl', 'r')

Product_List = []

reader = jsonlines.Reader(products_file)
for item in reader.iter():
    Product_List.append(item)
    
df = pd.DataFrame(Product_List)
df.head()

Unnamed: 0,Title,Label,Rating,RatingCount,AmazonPrice,ProductPageUrl,SearchUrl,Brand,Timestamp,Seller
0,Fire TV Stick streaming media player with Alex...,,4.2,24637.0,3999.0,https://www.amazon.in/Amazon-FireTVStick-Alexa...,https://www.amazon.in/s?i=merchant-items&me=AT...,Amazon,Fri Jul 17 10:12:01 2020,Cloudtail India
1,Echo Dot (3rd Gen) – New and improved smart sp...,Best seller,4.3,26578.0,3499.0,https://www.amazon.in/All-new-Echo-Dot-3rd-Gen...,https://www.amazon.in/s?i=merchant-items&me=AT...,Amazon,Fri Jul 17 10:12:01 2020,Cloudtail India
2,Echo Dot (3rd Gen) – New and improved smart sp...,,4.3,26578.0,3499.0,https://www.amazon.in/All-new-Echo-Dot-3rd-Gen...,https://www.amazon.in/s?i=merchant-items&me=AT...,Amazon,Fri Jul 17 10:12:01 2020,Cloudtail India
3,All-New Alexa Voice Remote with Power and Volu...,,4.1,1188.0,1999.0,https://www.amazon.in/Amazon-FireTV-Stick-Alex...,https://www.amazon.in/s?i=merchant-items&me=AT...,Amazon,Fri Jul 17 10:12:01 2020,Cloudtail India
4,Echo Dot (3rd Gen) – New and improved smart sp...,,4.3,26578.0,3499.0,https://www.amazon.in/C78MP8/dp/B07PGL2ZSL/ref...,https://www.amazon.in/s?i=merchant-items&me=AT...,Amazon,Fri Jul 17 10:12:01 2020,Cloudtail India


In [None]:
urls = []
for p in Product_List:
    urls.append(p['ProductPageUrl'])
print('Example Url in the list:')
print('\n'.join(urls[:2]))
print('URLS Count: ', len(urls))

Example Url in the list:
https://www.amazon.in/Amazon-FireTVStick-Alexa-Voice-Remote-Streaming-Player/dp/B0791YHVMK/ref=sr_1_1?dchild=1&fst=as%3Aoff&m=AT95IG9ONZD7S&qid=1595168134&refinements=p_4%3AAmazon&s=merchant-items&sr=1-1
https://www.amazon.in/All-new-Echo-Dot-3rd-Gen/dp/B07PFFMP9P/ref=sr_1_2?dchild=1&fst=as%3Aoff&m=AT95IG9ONZD7S&qid=1595168134&refinements=p_4%3AAmazon&s=merchant-items&sr=1-2
URLS Count:  6198


#### Step 2: Define Headers

Each header is a unique user agent which will be used to request the data from the website to be scraped. We use multiple user agents to ensure that if our request is rejected, we can retry.

To create more headers, simply copy any one of the old headers and replace the 'user-agent' string with a new 'user-agent' string, which can be found online. (Eg. https://developer.chrome.com/multidevice/user-agent)

In [None]:
headers = [
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 Edge/16.16299',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:70.0) Gecko/20100101 Firefox/70.0',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36 OPR/68.0.3618.165',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Trident/7.0; rv:11.0) like Gecko',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           },
           {
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36 Edg/83.0.478.37',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://www.amazon.com/',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
           }
]

#### Step 3: Read Extractor Files

The extractor (.yml) files contain *css id* information about the fields which we intend to extract from the scarped website. Here, the extractor file is:

##### product_page.yml
From the scraped webpage, this extractor file extracts the all the fields that are relevant to the product on the given page.

In [None]:
e = Extractor.from_yaml_file('./Extractor/product_page.yml')

#### Step 4: Define scrape function
**NOTE:** Set the variables MAX_TRIALS & ERROR_THRESHHOLD according to your preferences. 

A high MAX_TRIALS will slow down the scraping as it will scrape those pages without actually any data multiple times too, but it will reduce the chances of error. 
A low ERROR_THRESHHOLD will also slow down the scraping, as VPN will need to changed multiple times. However, it will reduce the chances missing data due to errors. 

The function scrape(url) downloads the webpage at the given url (here: product page) using requests module, and looks for the specific fileds defigned in the extractor file product_page.yml. If a Title for the product on the page is not found, it continues to randomly select a new header and retry scraping untill the limit MAX_TRIALS is reached, where it reports that the page does not contain any data.

These multiple trials are required, as amazon often blocks a user for repeqatedly making requests using the same user agent. 

In [None]:
MAX_TRIALS = 50 # Set the max number of trials to perform here.
ERROR_COUNT = 1 # Used for keeping a count of errors, if the count exceeds threshhold, the user is asked to
                # change the vpn
ERROR_THRESHHOLD = 5 # Number of pages with missed information allowed after which vpn change is required
def scrape(url): 
    global MAX_TRIALS
    global ERROR_COUNT
    global ERROR_THRESHHOLD
    '''
    This function downloads the webpage at the given url using requests module.
    
    Parameters:
    url (string): URL of webpage to scrape
    Returns: 
    string: If the URL contains products, returns the html of the webpage as text, else returns 'False'.
    '''
    
    # Download the page using requests
    print("Downloading %s"%url)
    trial = 0
    while(True):
        
        # Get the html data from the url
        while True:
            try:
                if ERROR_COUNT % ERROR_THRESHHOLD == 0:
                    _ = input('Please Change VPN and press any key to continue')
                    ERROR_COUNT += 1
                if trial == MAX_TRIALS:  
                    print("Max trials exceeded yet no Data found on this page!")
                    ERROR_COUNT += 1
                    return 'False'
                trial = trial + 1
                print("Trial no:", trial)
        
                r = requests.get(url, headers=random.choice(headers), timeout = 15) 
                
                # We use product_page.yml extractor to extract the product details from the html data text
                data = e.extract(r.text) 

                # If the product title is empty, it means that the returned page did not contain a product so we retry with 
                # a new user agent
                if (data['Title'] != None): 
                    return e.extract(r.text)
                else:
                    print("Retrying with new user agent!")
            except requests.exceptions.RequestException as err:
                print('Error Detected: ', err)
                print('Retrying after 30 seconds')
                sleep(30)
                continue
            except requests.exceptions.HTTPError as err:
                print('Error Detected: ', err)
                print('Retrying after 30 seconds')
                sleep(30)
                continue
            except requests.exceptions.ConnectionError as err:
                print('Error Detected: ', err)
                print('Retrying after 30 seconds')
                sleep(30)
                continue
            except requests.exceptions.Timeout as err:
                print('Error Detected: ', err)
                print('Retrying after 30 seconds')
                sleep(30)
                continue

#### Step 5: Initialise path of output file

**NOTE:** Set the File Name accoring to what is being scraped here

Eg: SCRAPED_PRODUCT_PAGES_APPARIO or SCRAPED_PRODUCT_PAGES_CLOUDTAIL

In [None]:
FileName = input('Enter a Filename for output file!\n')

outfile_path = str('./ScriptOutput/DATASET/' + str(FileName) + '.jsonl') 

Enter a Filename for output file!
test


#### Step 6: Define cleaning functions

In [None]:
def CleanRating(s):
    '''
    Here, the input is rating in a string format, eg: "3.3 out of 5 stars".
    The function converts it to a float, eg: '3.3'
    '''
    if s is not None:
        try:
            return float(s.split(' ')[0])
        except ValueError:
            return None
        except AttributeError:
            return None
    else:
        return None

def CleanRatingCount(s):
    '''
    Here, the input is RatingCount in a string format, eg: "336 ratings".
    The function converts it to a float, eg: '336'
    '''
    if s is not None:
        return float(s.split(' ')[0].replace(',', ''))
    else:
        return float(0)

def CleanAnsweredQuestionsCount(s):
    '''
    Here, the input is AnsweredQuestionsCount in a string format, eg: "336 answered questions".
    The function converts it to a float, eg: '336'
    '''
    if s is not None:
        try:
            return float(s.split(' ')[0].replace(',', '').replace('+', ''))
        except ValueError:
            return float(0)
        except AttributeError:
            return float(0)
    else:
        return float(0)
    
def CleanAmazonPrice(s):
    '''
    Here, the input is AmazonPrice in a string format, eg: "₹ 336.00".
    The function converts it to a float, eg: '336'
    '''
    if s is not None:
        print(s)
        s = s.replace('₹', '').replace(',', '').replace(r'\x', '').replace('a', '')
        return float(s.strip().split(' ')[0])
    else:
        return s
    
def CleanMRP(s):
    '''
    Here, the input is MRP in a string format, eg: "₹ 336.00".
    The function converts it to a float, eg: '336'
    '''
    if s is not None:
        print(s)
        s = s.replace('₹', '').replace(',', '').replace(r'\x', '').replace('a', '')
        return float(s.strip().split(' ')[0])
    else:
        return s
def CleanDiscount(s):
    '''
    Here, the input is Savings in a string format, eg: "₹ 336.00 (50% Off)".
    The function converts it to a float, eg: '50'
    '''
    if s is not None:
        if re.search(re.compile(r'\(.*\)'), s):
            return int((re.search(re.compile(r'\(.*\)'), s).group(0)).replace('(', '').replace(')', '').replace('%', '').replace(',', ''))
        else:
            return s
    else:
        return s

def CleanSavings(s):
    '''
    Here, the input is Savings in a string format, eg: "₹ 336.00 (50% Off)".
    The function converts it to a float, eg: '336'
    '''
    if s is not None:
        s = s.replace('₹', '').replace(',', '').replace(r'\x', '').replace('a', '')
        return float(s.split(' ')[0])
    else:
        return s
    
def CleanKeywords(s):
    '''
    Here, the input is Breadcrumbs in a string format, 
    eg: 'Electronics  > Home Audio  > Speakers  > 10.or Crafted for Amazon Rave Portable Wireless Bluetooth Speaker'
    The function converts it to a list, seperating it based on the '>' symbol.
    '''
    if type(s) == float:
        if math.isnan(s):
            return None
    else:
        if s is not None:
            if '›' in s:
                k = list(s.split('›'))
            else:
                k = list(s.split('> '))
            return k

#### Step 7: Begin main scraping

In [None]:
with open(outfile_path,'a') as outfile:
    for url in urls:
        if 'amazon.in' not in url:
            url = 'https://www.amazon.in' + url
        product = scrape(url)
        if product == 'False':
            print('No Data on this page!')
        else:
            if product['AmazonPrice'] == None: # Amazon price is mentioned only when the product is in stock
                product['Availability'] = 'Currently Unavailable'
            else:
                product['Availability'] = 'Available'
            product['ProductPageUrl'] = url
            if re.search('B0.{8}', product['ProductPageUrl']):
                product['ASIN'] = re.search('B0.{8}', product['ProductPageUrl']).group(0)
            else:
                if re.search('/dp/\d*/', product['ProductPageUrl']):
                    product['ASIN'] = re.search('/dp/\d*/', product['ProductPageUrl']).group(0).replace('/dp/', '').replace('/', '')
                else:
                    product['ASIN'] = None
            product['Rating'] = CleanRating(product['Rating'])
            product['RatingCount'] = CleanRatingCount(product['RatingCount'])
            product['AmazonPrice'] = CleanAmazonPrice(product['AmazonPrice'])
            product['MRP'] = CleanMRP(product['MRP'])
            product['AnsweredQuestionsCount'] = CleanAnsweredQuestionsCount(product['AnsweredQuestionsCount'])
            product['DiscountPercentage'] = CleanDiscount(product['Savings'])
            product['Savings'] = CleanSavings(product['Savings'])
            product['Keywords'] = CleanKeywords(product['Breadcrumbs'])
            print(product)
            json.dump(product,outfile)
            outfile.write("\n")

Downloading https://www.amazon.in/Amazon-FireTVStick-Alexa-Voice-Remote-Streaming-Player/dp/B0791YHVMK/ref=sr_1_1?dchild=1&fst=as%3Aoff&m=AT95IG9ONZD7S&qid=1595168134&refinements=p_4%3AAmazon&s=merchant-items&sr=1-1
Trial no: 1
₹ 3,999.00
{'Title': 'Fire TV Stick streaming media player with Alexa built in, includes all-new Alexa Voice Remote, HD, easy set-up, released 2019', 'Brand': 'Brand: Amazon', 'Rating': 4.2, 'RatingCount': 26423.0, 'AnsweredQuestionsCount': 1000.0, 'MRP': None, 'AmazonPrice': 3999.0, 'Savings': None, 'ShortDescription': '#1 best-selling streaming media player, with all-new Alexa Voice Remote (2nd Gen, released 2019). Fire TV Stick is easy to setup and comes pre-registered to your Amazon account so you can just plug it in to your HDTV and enjoy favourite titles. Use the dedicated power, volume and mute buttons to control compatible TVs. Watch favourites from Prime Video, Hotstar, Netflix, Zee5, Sony LIV, Apple TV and others. Subscription fees may apply. The offic

₹ 3,698.00
{'Title': 'Echo Dot (Black) bundle with Wipro 9W smart color bulb', 'Brand': 'Brand: Amazon', 'Rating': 4.5, 'RatingCount': 1892.0, 'AnsweredQuestionsCount': 277.0, 'MRP': None, 'AmazonPrice': 3698.0, 'Savings': None, 'ShortDescription': 'This bundle contains Echo Dot and Wipro 9W smart color bulb (pin type). Use this bundle to experience the magic of controlling your lights, using just your voice. Control your lights using voice, or control them remotely away from home. Or simply create routine to dim them automatically at night. Only Wi-Fi needed - no additonal hub or setup required! Echo Dot is our most popular voice-controlled speaker, with new fabric design, and improved speaker for richer and louder sound. Voice control your music: Stream music from Amazon Prime Music, Saavn, and Gaana – just ask for a song, artist, or genre. Bigger, Better Sound: Pair with a second Echo Dot for rich, stereo sound. Fill your home with music with compatible Echo devices across different

KeyboardInterrupt: 

#### Step 7: Read Jsonl file

In [None]:
Product_Page_file = open(outfile_path)

Product_Page = []
reader = jsonlines.Reader(Product_Page_file)
for item in reader.iter():
    Product_Page.append(item)
    
df = pd.DataFrame(Product_Page)
print(df.count())
df.head()

Title                     7
Brand                     7
Rating                    7
RatingCount               7
AnsweredQuestionsCount    7
MRP                       4
AmazonPrice               7
Savings                   4
ShortDescription          7
ProductDescription        0
BestSellerRank            0
DateFirstAvailable        0
Breadcrumbs               0
Seller                    7
FullfilledBy              7
Availability              7
ProductPageUrl            7
ASIN                      7
DiscountPercentage        4
Keywords                  0
dtype: int64


Unnamed: 0,Title,Brand,Rating,RatingCount,AnsweredQuestionsCount,MRP,AmazonPrice,Savings,ShortDescription,ProductDescription,BestSellerRank,DateFirstAvailable,Breadcrumbs,Seller,FullfilledBy,Availability,ProductPageUrl,ASIN,DiscountPercentage,Keywords
0,Fire TV Stick streaming media player with Alex...,Brand: Amazon,4.2,26423.0,1000.0,,3999.0,,"#1 best-selling streaming media player, with a...",,,,,Cloudtail India,Fulfilled by Amazon,Available,https://www.amazon.in/Amazon-FireTVStick-Alexa...,B0791YHVMK,,
1,Echo Dot (3rd Gen) – New and improved smart sp...,Brand: Amazon,4.3,26950.0,1000.0,4499.0,3499.0,1000.0,Our most popular smart speaker with 360 degree...,,,,,Cloudtail India,Fulfilled by Amazon,Available,https://www.amazon.in/All-new-Echo-Dot-3rd-Gen...,B07PFFMP9P,22.0,
2,Echo Dot (3rd Gen) – New and improved smart sp...,Amazon,4.3,26950.0,1000.0,4499.0,3499.0,1000.0,Our most popular smart speaker with 360 degree...,,,,,Cloudtail India,Fulfilled by Amazon,Available,https://www.amazon.in/All-new-Echo-Dot-3rd-Gen...,B07PKXJN7J,22.0,
3,All-New Alexa Voice Remote with Power and Volu...,Brand: Amazon,4.1,1286.0,675.0,,1999.0,,"Compatible with Fire TV Stick(2nd Generation),...",,,,,Cloudtail India,Fulfilled by Amazon,Available,https://www.amazon.in/Amazon-FireTV-Stick-Alex...,B07B6NCTWB,,
4,Echo Dot (3rd Gen) – New and improved smart sp...,Amazon,4.3,26950.0,1000.0,4499.0,3499.0,1000.0,Our most popular smart speaker with 360 degree...,,,,,Cloudtail India,Fulfilled by Amazon,Available,https://www.amazon.in/C78MP8/dp/B07PGL2ZSL/ref...,B07PGL2ZSL,22.0,
