# Web Scraping using JUMIA SITE

**Objective** <br>
This goal of this script is to scrape products data from jumia website. This dataset will be transformed stored to form a data catalog for out price prediction app.

## Step 1: Import the neccessary Libraries

In [4]:
import requests  # make a request to a url
from bs4 import BeautifulSoup  # parse the requests as html
import pandas as pd  # data manipulation
import re
from time import sleep

## Step 2: Loop through the pages

In [6]:
product_data = {
    "Product Name": [],
    "Current Price": [],
    "Old Price": [],
    "Discount": [],
    "Rating": [],
    "URL": [],
    "Photo": [],
    "Vendor": []
}

# extract product info from 50 pages
for page_num in range(1, 51):
    BASE_URL = "https://www.jumia.com.ng/"
    PAGE_URL = f"{BASE_URL}laptops/?page={page_num}#catalog-listing"

    try:
        response = requests.get(url=PAGE_URL)
        if response.status_code == 200:
            content = response.content
        else:
            print("Resource Not Found!")
    except:
        pass

    # soup
    soup = BeautifulSoup(content, "html.parser")

    print(f"Collecting Data from {PAGE_URL} ...")
    sleep(20)
    
    # find articles
    articles = soup.find_all('article', class_="prd _fb col c-prd")

    # looping the articles
    for article in articles:
        
        # ecommerce Site
        product_data["Vendor"] = "Jumia Site"

        # product name
        name = article.find('h3', class_='name')
        if name != None:
            product_data['Product Name'].append(name.text)
        else:
            product_data['Product Name'].append("")

        # product URL
        url = article.find('a', class_='core')
        if url != None:
            product_data['URL'].append(f"{BASE_URL}{url.get('href')}") # product url
        else:
            product_data['URL'].append("")

        # product Photo
        url = article.find('div', class_="img-c").find("img").get("data-src")
        if url != None:
            product_data['Photo'].append(url) # product photo
        else:
            product_data['Photo'].append("")

        # current price
        current_price = article.find('div', class_='prc')
        if current_price != None:
            product_data["Current Price"].append(current_price.text)
        else:
            product_data["Current Price"].append("")

        # old price
        old_price = article.find('div', class_='old')
        if old_price != None:
            product_data["Old Price"].append(old_price.text)
        else:
            product_data["Old Price"].append("")

        # discount
        discount = article.find('div', class_='bdg _dsct _sm')
        if discount != None:
            product_data["Discount"].append(discount.text)
        else:
            product_data["Discount"].append("")

        # rating
        rating = article.find('div', class_='stars _s')
        if rating != None:
            product_data["Rating"].append(rating.text)
        else:
            product_data["Rating"].append("")
            

    print(f"Done Collecting Data from {PAGE_URL}")

Collecting Data from https://www.jumia.com.ng/laptops/?page=1#catalog-listing ...
Done Collecting Data from https://www.jumia.com.ng/laptops/?page=1#catalog-listing
Collecting Data from https://www.jumia.com.ng/laptops/?page=2#catalog-listing ...
Done Collecting Data from https://www.jumia.com.ng/laptops/?page=2#catalog-listing
Collecting Data from https://www.jumia.com.ng/laptops/?page=3#catalog-listing ...
Done Collecting Data from https://www.jumia.com.ng/laptops/?page=3#catalog-listing
Collecting Data from https://www.jumia.com.ng/laptops/?page=4#catalog-listing ...
Done Collecting Data from https://www.jumia.com.ng/laptops/?page=4#catalog-listing
Collecting Data from https://www.jumia.com.ng/laptops/?page=5#catalog-listing ...
Done Collecting Data from https://www.jumia.com.ng/laptops/?page=5#catalog-listing
Collecting Data from https://www.jumia.com.ng/laptops/?page=6#catalog-listing ...
Done Collecting Data from https://www.jumia.com.ng/laptops/?page=6#catalog-listing
Collecting

## Step 4 store in dataframe

In [92]:
jumia_laptop_df = pd.DataFrame.from_dict(product_data)
jumia_laptop_df

Unnamed: 0,Product Name,Current Price,Old Price,Discount,Rating,URL,Photo,Vendor
0,Hp Refurbished EliteBook 840 G6 Intel Core I5-...,"₦ 400,660","₦ 500,000",20%,,https://www.jumia.com.ng//hp-refurbished-elite...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1,"WOZIFAN 14.1""Intel Celeron N4020 6GB+256GB,SSD...","₦ 260,300","₦ 1,606,500",84%,3.8 out of 5,https://www.jumia.com.ng//wozifan-14.1intel-ce...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
2,Hp Stream 11 Intel Celeron 2GB RAM- 32GB HDD W...,"₦ 135,000",,,5 out of 5,https://www.jumia.com.ng//hp-stream-11-intel-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
3,Lenovo ThinkPad X1 Carbon G9 Touchscreen 11th ...,"₦ 1,658,000","₦ 1,780,000",7%,4 out of 5,https://www.jumia.com.ng//lenovo-thinkpad-x1-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
4,XIAOMI Mi Pad 2 16 Or 64GB Windows 10 OS Tablet,"₦ 75,999","₦ 99,999",24%,5 out of 5,https://www.jumia.com.ng//xiaomi-mi-pad-2-16-o...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
...,...,...,...,...,...,...,...,...
1995,Hp Elitebook X2 1012 G2 Intel Core I5 256GB SS...,"₦ 700,000","₦ 900,000",22%,,https://www.jumia.com.ng//hp-elitebook-x2-1012...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1996,Hp 250 G9 12TH GEN INTEL CORE I3 16GB RAM 512G...,"₦ 625,800",,,,https://www.jumia.com.ng//hp-250-g9-12th-gen-i...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1997,Hp ENVY 14 X360 13TH GEN INTEL CORE I7 16GB RA...,"₦ 1,319,000","₦ 1,590,000",17%,,https://www.jumia.com.ng//hp-envy-14-x360-13th...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1998,Hp EliteBook 840 G8 11th Gen 16GB RAM/256GB SS...,"₦ 595,000","₦ 696,000",15%,,https://www.jumia.com.ng//hp-elitebook-840-g8-...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site


## Step 5: Store data into CSV

In [94]:
jumia_laptop_df.to_csv("jumia_laptop.csv", index=False)

## Performing a Data Cleaning on Jumia Data

In [97]:
jumia_df = pd.read_csv("jumia_laptop.csv")
jumia_df

Unnamed: 0,Product Name,Current Price,Old Price,Discount,Rating,URL,Photo,Vendor
0,Hp Refurbished EliteBook 840 G6 Intel Core I5-...,"₦ 400,660","₦ 500,000",20%,,https://www.jumia.com.ng//hp-refurbished-elite...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1,"WOZIFAN 14.1""Intel Celeron N4020 6GB+256GB,SSD...","₦ 260,300","₦ 1,606,500",84%,3.8 out of 5,https://www.jumia.com.ng//wozifan-14.1intel-ce...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
2,Hp Stream 11 Intel Celeron 2GB RAM- 32GB HDD W...,"₦ 135,000",,,5 out of 5,https://www.jumia.com.ng//hp-stream-11-intel-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
3,Lenovo ThinkPad X1 Carbon G9 Touchscreen 11th ...,"₦ 1,658,000","₦ 1,780,000",7%,4 out of 5,https://www.jumia.com.ng//lenovo-thinkpad-x1-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
4,XIAOMI Mi Pad 2 16 Or 64GB Windows 10 OS Tablet,"₦ 75,999","₦ 99,999",24%,5 out of 5,https://www.jumia.com.ng//xiaomi-mi-pad-2-16-o...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
...,...,...,...,...,...,...,...,...
1995,Hp Elitebook X2 1012 G2 Intel Core I5 256GB SS...,"₦ 700,000","₦ 900,000",22%,,https://www.jumia.com.ng//hp-elitebook-x2-1012...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1996,Hp 250 G9 12TH GEN INTEL CORE I3 16GB RAM 512G...,"₦ 625,800",,,,https://www.jumia.com.ng//hp-250-g9-12th-gen-i...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1997,Hp ENVY 14 X360 13TH GEN INTEL CORE I7 16GB RA...,"₦ 1,319,000","₦ 1,590,000",17%,,https://www.jumia.com.ng//hp-envy-14-x360-13th...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1998,Hp EliteBook 840 G8 11th Gen 16GB RAM/256GB SS...,"₦ 595,000","₦ 696,000",15%,,https://www.jumia.com.ng//hp-elitebook-840-g8-...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site


In [99]:
jumia_df.shape

(2000, 8)

In [101]:
jumia_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Product Name   2000 non-null   object
 1   Current Price  2000 non-null   object
 2   Old Price      1348 non-null   object
 3   Discount       1348 non-null   object
 4   Rating         595 non-null    object
 5   URL            2000 non-null   object
 6   Photo          2000 non-null   object
 7   Vendor         2000 non-null   object
dtypes: object(8)
memory usage: 125.1+ KB


In [103]:
# current price
jumia_df['Current Price'] = jumia_df['Current Price'].str.replace("₦ ","")

In [105]:
jumia_df['Current Price'] = jumia_df['Current Price'].str.replace(",","")

In [107]:
jumia_df[jumia_df['Current Price'].str.contains("-")]

Unnamed: 0,Product Name,Current Price,Old Price,Discount,Rating,URL,Photo,Vendor
779,Hp EliteBook 830 G6 Intel Core I5-16GB RAM/256...,413250 - 650000,"₦ 700,000",41%,,https://www.jumia.com.ng//hp-elitebook-830-g6-...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
895,"IKIA 11.6"" Laptop, Intel Celeron N4020C, RAM 8...",238800 - 322800,"₦ 309,600 - ₦ 430,800",25%,,https://www.jumia.com.ng//ikia-11.6-laptop-int...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
998,"DELL LATITUDE 3190 2in1 X360 INTEL PENTIUM, 8G...",265000 - 270000,"₦ 400,000",34%,,https://www.jumia.com.ng//dell-latitude-3190-2...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1004,Lenovo IdeaPad Slim 7-14ITL05,1200000 - 1500000,,,,https://www.jumia.com.ng//lenovo-ideapad-slim-...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1499,Hp EliteBook 840 G5 Intel Core I5-8GB RAM/256G...,420000 - 650000,"₦ 700,000",40%,,https://www.jumia.com.ng//hp-elitebook-840-g5-...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site


In [109]:
# drop a record where the current price is '413250 - 650000' 

jumia_df = jumia_df[jumia_df['Current Price'] != '413250 - 650000']
jumia_df = jumia_df[jumia_df['Current Price'] != '238800 - 322800']
jumia_df = jumia_df[jumia_df['Current Price'] != '1200000 - 1500000']
jumia_df = jumia_df[jumia_df['Current Price'] != '420000 - 650000']
jumia_df = jumia_df[jumia_df['Current Price'] != '265000 - 270000']

In [111]:
jumia_df['Current Price'] = jumia_df['Current Price'].astype("float64")

In [113]:
jumia_df.head()

Unnamed: 0,Product Name,Current Price,Old Price,Discount,Rating,URL,Photo,Vendor
0,Hp Refurbished EliteBook 840 G6 Intel Core I5-...,400660.0,"₦ 500,000",20%,,https://www.jumia.com.ng//hp-refurbished-elite...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1,"WOZIFAN 14.1""Intel Celeron N4020 6GB+256GB,SSD...",260300.0,"₦ 1,606,500",84%,3.8 out of 5,https://www.jumia.com.ng//wozifan-14.1intel-ce...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
2,Hp Stream 11 Intel Celeron 2GB RAM- 32GB HDD W...,135000.0,,,5 out of 5,https://www.jumia.com.ng//hp-stream-11-intel-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
3,Lenovo ThinkPad X1 Carbon G9 Touchscreen 11th ...,1658000.0,"₦ 1,780,000",7%,4 out of 5,https://www.jumia.com.ng//lenovo-thinkpad-x1-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
4,XIAOMI Mi Pad 2 16 Or 64GB Windows 10 OS Tablet,75999.0,"₦ 99,999",24%,5 out of 5,https://www.jumia.com.ng//xiaomi-mi-pad-2-16-o...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site


In [115]:
# Old price
jumia_df['Old Price'] = jumia_df['Old Price'].str.replace("₦ ","")
jumia_df['Old Price'] = jumia_df['Old Price'].str.replace(",","")
jumia_df['Old Price'].fillna(value=0, inplace=True)

In [117]:
jumia_df.head()

Unnamed: 0,Product Name,Current Price,Old Price,Discount,Rating,URL,Photo,Vendor
0,Hp Refurbished EliteBook 840 G6 Intel Core I5-...,400660.0,500000,20%,,https://www.jumia.com.ng//hp-refurbished-elite...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1,"WOZIFAN 14.1""Intel Celeron N4020 6GB+256GB,SSD...",260300.0,1606500,84%,3.8 out of 5,https://www.jumia.com.ng//wozifan-14.1intel-ce...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
2,Hp Stream 11 Intel Celeron 2GB RAM- 32GB HDD W...,135000.0,0,,5 out of 5,https://www.jumia.com.ng//hp-stream-11-intel-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
3,Lenovo ThinkPad X1 Carbon G9 Touchscreen 11th ...,1658000.0,1780000,7%,4 out of 5,https://www.jumia.com.ng//lenovo-thinkpad-x1-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
4,XIAOMI Mi Pad 2 16 Or 64GB Windows 10 OS Tablet,75999.0,99999,24%,5 out of 5,https://www.jumia.com.ng//xiaomi-mi-pad-2-16-o...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site


In [119]:
jumia_df.tail()

Unnamed: 0,Product Name,Current Price,Old Price,Discount,Rating,URL,Photo,Vendor
1995,Hp Elitebook X2 1012 G2 Intel Core I5 256GB SS...,700000.0,900000,22%,,https://www.jumia.com.ng//hp-elitebook-x2-1012...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1996,Hp 250 G9 12TH GEN INTEL CORE I3 16GB RAM 512G...,625800.0,0,,,https://www.jumia.com.ng//hp-250-g9-12th-gen-i...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1997,Hp ENVY 14 X360 13TH GEN INTEL CORE I7 16GB RA...,1319000.0,1590000,17%,,https://www.jumia.com.ng//hp-envy-14-x360-13th...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1998,Hp EliteBook 840 G8 11th Gen 16GB RAM/256GB SS...,595000.0,696000,15%,,https://www.jumia.com.ng//hp-elitebook-840-g8-...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1999,"DELL Latitude 3190 2in1 X360 INTEL PENTIUM, T...",215000.0,300000,28%,4 out of 5,https://www.jumia.com.ng//latitude-3190-intel-...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site


In [121]:
jumia_df['Old Price'].isnull().sum()

0

In [123]:
jumia_df = jumia_df[jumia_df['Old Price'] != '685000 - 900000']

In [125]:
jumia_df['Old Price'] = jumia_df['Old Price'].astype("float64")

In [127]:
jumia_df.head()

Unnamed: 0,Product Name,Current Price,Old Price,Discount,Rating,URL,Photo,Vendor
0,Hp Refurbished EliteBook 840 G6 Intel Core I5-...,400660.0,500000.0,20%,,https://www.jumia.com.ng//hp-refurbished-elite...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1,"WOZIFAN 14.1""Intel Celeron N4020 6GB+256GB,SSD...",260300.0,1606500.0,84%,3.8 out of 5,https://www.jumia.com.ng//wozifan-14.1intel-ce...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
2,Hp Stream 11 Intel Celeron 2GB RAM- 32GB HDD W...,135000.0,0.0,,5 out of 5,https://www.jumia.com.ng//hp-stream-11-intel-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
3,Lenovo ThinkPad X1 Carbon G9 Touchscreen 11th ...,1658000.0,1780000.0,7%,4 out of 5,https://www.jumia.com.ng//lenovo-thinkpad-x1-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
4,XIAOMI Mi Pad 2 16 Or 64GB Windows 10 OS Tablet,75999.0,99999.0,24%,5 out of 5,https://www.jumia.com.ng//xiaomi-mi-pad-2-16-o...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site


In [129]:
# Discount
jumia_df['Discount'] = jumia_df['Discount'].str.replace("%","")

In [131]:
# let fill missing value in discount
jumia_df["Discount"].fillna(value=0, inplace=True)

In [133]:
jumia_df["Discount"].isnull().sum()

0

In [135]:
# now let change the datatype
jumia_df['Discount'] = jumia_df['Discount'].astype("int")

In [137]:
jumia_df['Rating'] = jumia_df['Rating'].str.replace("out of 5","")

In [139]:
jumia_df['Rating'] = jumia_df['Rating'].str.strip()

In [141]:
jumia_df["Rating"].fillna(value=0, inplace=True)

In [143]:
jumia_df['Rating'] = jumia_df['Rating'].astype("float64")

In [145]:
jumia_df

Unnamed: 0,Product Name,Current Price,Old Price,Discount,Rating,URL,Photo,Vendor
0,Hp Refurbished EliteBook 840 G6 Intel Core I5-...,400660.0,500000.0,20,0.0,https://www.jumia.com.ng//hp-refurbished-elite...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1,"WOZIFAN 14.1""Intel Celeron N4020 6GB+256GB,SSD...",260300.0,1606500.0,84,3.8,https://www.jumia.com.ng//wozifan-14.1intel-ce...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
2,Hp Stream 11 Intel Celeron 2GB RAM- 32GB HDD W...,135000.0,0.0,0,5.0,https://www.jumia.com.ng//hp-stream-11-intel-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
3,Lenovo ThinkPad X1 Carbon G9 Touchscreen 11th ...,1658000.0,1780000.0,7,4.0,https://www.jumia.com.ng//lenovo-thinkpad-x1-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
4,XIAOMI Mi Pad 2 16 Or 64GB Windows 10 OS Tablet,75999.0,99999.0,24,5.0,https://www.jumia.com.ng//xiaomi-mi-pad-2-16-o...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
...,...,...,...,...,...,...,...,...
1995,Hp Elitebook X2 1012 G2 Intel Core I5 256GB SS...,700000.0,900000.0,22,0.0,https://www.jumia.com.ng//hp-elitebook-x2-1012...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1996,Hp 250 G9 12TH GEN INTEL CORE I3 16GB RAM 512G...,625800.0,0.0,0,0.0,https://www.jumia.com.ng//hp-250-g9-12th-gen-i...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1997,Hp ENVY 14 X360 13TH GEN INTEL CORE I7 16GB RA...,1319000.0,1590000.0,17,0.0,https://www.jumia.com.ng//hp-envy-14-x360-13th...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1998,Hp EliteBook 840 G8 11th Gen 16GB RAM/256GB SS...,595000.0,696000.0,15,0.0,https://www.jumia.com.ng//hp-elitebook-840-g8-...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site


In [147]:
jumia_df['Product Name'] = jumia_df['Product Name'].str.strip()

In [148]:
jumia_df

Unnamed: 0,Product Name,Current Price,Old Price,Discount,Rating,URL,Photo,Vendor
0,Hp Refurbished EliteBook 840 G6 Intel Core I5-...,400660.0,500000.0,20,0.0,https://www.jumia.com.ng//hp-refurbished-elite...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1,"WOZIFAN 14.1""Intel Celeron N4020 6GB+256GB,SSD...",260300.0,1606500.0,84,3.8,https://www.jumia.com.ng//wozifan-14.1intel-ce...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
2,Hp Stream 11 Intel Celeron 2GB RAM- 32GB HDD W...,135000.0,0.0,0,5.0,https://www.jumia.com.ng//hp-stream-11-intel-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
3,Lenovo ThinkPad X1 Carbon G9 Touchscreen 11th ...,1658000.0,1780000.0,7,4.0,https://www.jumia.com.ng//lenovo-thinkpad-x1-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
4,XIAOMI Mi Pad 2 16 Or 64GB Windows 10 OS Tablet,75999.0,99999.0,24,5.0,https://www.jumia.com.ng//xiaomi-mi-pad-2-16-o...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
...,...,...,...,...,...,...,...,...
1995,Hp Elitebook X2 1012 G2 Intel Core I5 256GB SS...,700000.0,900000.0,22,0.0,https://www.jumia.com.ng//hp-elitebook-x2-1012...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1996,Hp 250 G9 12TH GEN INTEL CORE I3 16GB RAM 512G...,625800.0,0.0,0,0.0,https://www.jumia.com.ng//hp-250-g9-12th-gen-i...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1997,Hp ENVY 14 X360 13TH GEN INTEL CORE I7 16GB RA...,1319000.0,1590000.0,17,0.0,https://www.jumia.com.ng//hp-envy-14-x360-13th...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1998,Hp EliteBook 840 G8 11th Gen 16GB RAM/256GB SS...,595000.0,696000.0,15,0.0,https://www.jumia.com.ng//hp-elitebook-840-g8-...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site


In [151]:
jumia_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1994 entries, 0 to 1999
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Product Name   1994 non-null   object 
 1   Current Price  1994 non-null   float64
 2   Old Price      1994 non-null   float64
 3   Discount       1994 non-null   int64  
 4   Rating         1994 non-null   float64
 5   URL            1994 non-null   object 
 6   Photo          1994 non-null   object 
 7   Vendor         1994 non-null   object 
dtypes: float64(3), int64(1), object(4)
memory usage: 140.2+ KB


In [153]:
jumia_df[jumia_df["Discount"] == 0]

Unnamed: 0,Product Name,Current Price,Old Price,Discount,Rating,URL,Photo,Vendor
2,Hp Stream 11 Intel Celeron 2GB RAM- 32GB HDD W...,135000.0,0.0,0,5.0,https://www.jumia.com.ng//hp-stream-11-intel-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
11,Lenovo THINKPAD X1 CARBON GEN 8 CORE I7-1035G7...,993900.0,0.0,0,0.0,https://www.jumia.com.ng//lenovo-thinkpad-x1-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
12,Acer TRAVELMATE B3 TMB311 CELERON N4020 4GB RA...,226850.0,0.0,0,0.0,https://www.jumia.com.ng//acer-travelmate-b3-t...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
29,Hp Stream 11 Pro- Intel Celeron - 4GB RAM - 64...,150000.0,0.0,0,4.6,https://www.jumia.com.ng//hp-stream-11-pro-int...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
33,Hp Stream11intel Celeron D/C 64GB HDD+4GB RAM+...,137000.0,0.0,0,3.3,https://www.jumia.com.ng//stream11intel-celero...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
...,...,...,...,...,...,...,...,...
1986,"DELL ALIENWARE M16 R1 INTEL CORE I9-13900HX,4T...",4850000.0,0.0,0,0.0,https://www.jumia.com.ng//alienware-m16-r1-int...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1990,One Flexible USB LED Lights For Laptop And Des...,4000.0,0.0,0,0.0,https://www.jumia.com.ng//usb-one-flexible-usb...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1991,"DELL XPS 13 9320 Plus,Intel Core I7-13TH GEN,1...",2400000.0,0.0,0,0.0,https://www.jumia.com.ng//dell-xps-13-9320-plu...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1993,"Hp CHROMEBOOK 11, INTEL CELERON, 4GB RAM,32GB ...",117000.0,0.0,0,4.4,https://www.jumia.com.ng//chromebook-intel-cel...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site


In [155]:
# duplicate
jumia_df.duplicated().sum()

0

In [157]:
jumia_df[jumia_df.duplicated()]

Unnamed: 0,Product Name,Current Price,Old Price,Discount,Rating,URL,Photo,Vendor


In [159]:
jumia_df.drop_duplicates(inplace=True)

In [161]:
jumia_df

Unnamed: 0,Product Name,Current Price,Old Price,Discount,Rating,URL,Photo,Vendor
0,Hp Refurbished EliteBook 840 G6 Intel Core I5-...,400660.0,500000.0,20,0.0,https://www.jumia.com.ng//hp-refurbished-elite...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1,"WOZIFAN 14.1""Intel Celeron N4020 6GB+256GB,SSD...",260300.0,1606500.0,84,3.8,https://www.jumia.com.ng//wozifan-14.1intel-ce...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
2,Hp Stream 11 Intel Celeron 2GB RAM- 32GB HDD W...,135000.0,0.0,0,5.0,https://www.jumia.com.ng//hp-stream-11-intel-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
3,Lenovo ThinkPad X1 Carbon G9 Touchscreen 11th ...,1658000.0,1780000.0,7,4.0,https://www.jumia.com.ng//lenovo-thinkpad-x1-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
4,XIAOMI Mi Pad 2 16 Or 64GB Windows 10 OS Tablet,75999.0,99999.0,24,5.0,https://www.jumia.com.ng//xiaomi-mi-pad-2-16-o...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
...,...,...,...,...,...,...,...,...
1995,Hp Elitebook X2 1012 G2 Intel Core I5 256GB SS...,700000.0,900000.0,22,0.0,https://www.jumia.com.ng//hp-elitebook-x2-1012...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1996,Hp 250 G9 12TH GEN INTEL CORE I3 16GB RAM 512G...,625800.0,0.0,0,0.0,https://www.jumia.com.ng//hp-250-g9-12th-gen-i...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1997,Hp ENVY 14 X360 13TH GEN INTEL CORE I7 16GB RA...,1319000.0,1590000.0,17,0.0,https://www.jumia.com.ng//hp-envy-14-x360-13th...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1998,Hp EliteBook 840 G8 11th Gen 16GB RAM/256GB SS...,595000.0,696000.0,15,0.0,https://www.jumia.com.ng//hp-elitebook-840-g8-...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site


In [163]:
# converting to percent
jumia_df["Discount"] = jumia_df["Discount"] / 100

In [165]:
# remove white space in the URL
jumia_df["URL"] = jumia_df["URL"].str.strip()

# replace the space with - in the URL
jumia_df["URL"] = jumia_df["URL"].str.replace(" ", "-")

In [167]:
jumia_df

Unnamed: 0,Product Name,Current Price,Old Price,Discount,Rating,URL,Photo,Vendor
0,Hp Refurbished EliteBook 840 G6 Intel Core I5-...,400660.0,500000.0,0.20,0.0,https://www.jumia.com.ng//hp-refurbished-elite...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1,"WOZIFAN 14.1""Intel Celeron N4020 6GB+256GB,SSD...",260300.0,1606500.0,0.84,3.8,https://www.jumia.com.ng//wozifan-14.1intel-ce...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
2,Hp Stream 11 Intel Celeron 2GB RAM- 32GB HDD W...,135000.0,0.0,0.00,5.0,https://www.jumia.com.ng//hp-stream-11-intel-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
3,Lenovo ThinkPad X1 Carbon G9 Touchscreen 11th ...,1658000.0,1780000.0,0.07,4.0,https://www.jumia.com.ng//lenovo-thinkpad-x1-c...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
4,XIAOMI Mi Pad 2 16 Or 64GB Windows 10 OS Tablet,75999.0,99999.0,0.24,5.0,https://www.jumia.com.ng//xiaomi-mi-pad-2-16-o...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
...,...,...,...,...,...,...,...,...
1995,Hp Elitebook X2 1012 G2 Intel Core I5 256GB SS...,700000.0,900000.0,0.22,0.0,https://www.jumia.com.ng//hp-elitebook-x2-1012...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1996,Hp 250 G9 12TH GEN INTEL CORE I3 16GB RAM 512G...,625800.0,0.0,0.00,0.0,https://www.jumia.com.ng//hp-250-g9-12th-gen-i...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1997,Hp ENVY 14 X360 13TH GEN INTEL CORE I7 16GB RA...,1319000.0,1590000.0,0.17,0.0,https://www.jumia.com.ng//hp-envy-14-x360-13th...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site
1998,Hp EliteBook 840 G8 11th Gen 16GB RAM/256GB SS...,595000.0,696000.0,0.15,0.0,https://www.jumia.com.ng//hp-elitebook-840-g8-...,https://ng.jumia.is/unsafe/fit-in/300x300/filt...,Jumia Site


## Save Clean data to CSV

In [170]:
jumia_df.to_csv("jumia_clean_laptop.csv", index=False)