# Scraping product feedback from Nigeria's biggest e-commerce website (Jumia)

## Method and Steps

We will be using the Python programming language.

* Send a request to the Jumia website to get a particular webpage of your choice.
* Use a framework called Beautiful Soup to parse the web pages.
* Use Pandas to store the scraped data.
* Save to Google Drive

In [1]:
# Importing necessary libraries
from google.colab import drive
drive.mount("/content/drive", force_remount=True)  # Mount Google Drive to access files

from bs4 import BeautifulSoup
import requests
import pandas as pd

Mounted at /content/drive


In [2]:
# Create empty lists to store data
summaries = []  # List to store product summaries
reviews = []    # List to store product reviews
ratings = []    # List to store product ratings

In [3]:
# Iterate through pages
for page in range(1, 2):  # Scraping only one page for demonstration purpose
    url = "https://www.jumia.com.ng/catalog/productratingsreviews/sku/OI955EA4LZQQ4NAFAMZ/" + "?page=" + str(page) + "#catalog-listing"
    furl = requests.get(url)  # Send GET request to the Jumia website
    jsoup = BeautifulSoup(furl.content, 'html.parser')  # Parse the HTML content
    products = jsoup.find_all('article', class_='-pvs -hr _bet')  # Find all products in the HTML

    # Iterate through each product
    for product in products:
        summary = product.find('h3', class_="-m -fs16 -pvs").text.replace('\n', '')  # Extract product summary
        review = product.find('p', class_="-pvs").text.replace('\n', '')  # Extract product review
        try:
            rating = product.find('div', class_='stars _m _al -mvs').text.replace('\n', '')  # Extract product rating
        except:
            rating = 'None'  # In case rating is not available, assign 'None'

        # Append data to lists
        summaries.append(summary)
        reviews.append(review)
        ratings.append(rating)

        # Print the data
        print([summary, review, rating])

['I love', 'Perfect', '5 out of 5']
['Highly quality', 'The sound is very okay', '5 out of 5']


In [4]:
# Create a DataFrame from the lists
df = pd.DataFrame({'Product summary': summaries, 'Review': reviews, 'Rating': ratings})

In [5]:
# Display the shape of DataFrame
print(df.shape)

(2, 3)


In [6]:
# Display the first few rows of DataFrame
print(df.head())

  Product summary                  Review      Rating
0          I love                 Perfect  5 out of 5
1  Highly quality  The sound is very okay  5 out of 5


In [7]:
# Save DataFrame to CSV file
df.to_csv('./drive/My Drive/MLstart/financial-sentiment-analysis/giftv1/newkogi/k/productsR1.csv', index=False, encoding='utf-8')