# Web Scraping using JUMIA SITE

**Objective** <br>
This goal of this script is to scrape products data from jumia website. This dataset will be transformed stored to form a data catalog for out price prediction app.

## Step 1: Import the neccessary Libraries

In [154]:
import requests  # make a request to a url
from bs4 import BeautifulSoup  # parse the requests as html
import pandas as pd  # data manipulation

In [10]:
response.status_code

200

In [12]:
response.content

b'<!DOCTYPE html><html lang="en" dir="ltr"><head><meta charset="utf-8"/><title>Buy Laptops Online | Jumia.com.ng</title><meta property="og:type" content="product"/><meta property="og:site_name" content="Jumia Nigeria"/><meta property="og:title" content="Buy Laptops Online | Jumia.com.ng"/><meta property="og:description" content="Find amazing deals when you buy laptops online at Jumia Nigeria - notebooks, ultrabooks &amp; more - Top brands: HP, Samsung, Acer, Asus, Dell &amp; more\xe2\x9c\x94 Enjoy pay on delivery."/><meta property="og:url" content="/laptops/"/><meta property="og:image" content="https://ng.jumia.is/cms/jumialogonew.png"/><meta property="og:locale" content="en_NG"/><meta name="title" content="Buy Laptops Online | Jumia.com.ng"/><meta name="robots" content="index, follow"/><meta name="description" content="Find amazing deals when you buy laptops online at Jumia Nigeria - notebooks, ultrabooks &amp; more - Top brands: HP, Samsung, Acer, Asus, Dell &amp; more\xe2\x9c\x94 En

## Step 2: Created a Dictionary

In [156]:
product_data = {
    "Product Name": [],
    "Current Price": [],
    "Old Price": [],
    "Discount": [],
    "Rating": [],
    "Vendor": []
}

## Step 3: Loop through the pages

In [162]:



for page_num in range(1, 2):
    URL = f"https://www.jumia.com.ng/laptops/?page={page_num}#catalog-listing"

    try:
        response = requests.get(url=URL)
        if response.status_code == 200:
            content = response.content
        else:
            print("Resource Not Found!")
    except:
        pass

    # soup
    soup = BeautifulSoup(content, "html.parser")
    # find articles
    articles = soup.find_all('article', class_="prd _fb col c-prd")

    # looping the articles
    for article in articles:

        product_data["Vendor"] = "Jumia"
        
        name = article.find('h3', class_='name')
        if name != None:
            product_data['Product Name'].append(name.text)
        else:
            product_data['Product Name'].append("")
    
        current_price = article.find('div', class_='prc')
        if current_price != None:
            product_data["Current Price"].append(current_price.text)
        else:
            product_data["Current Price"].append("")
    
        old_price = article.find('div', class_='old')
        if old_price != None:
            product_data["Old Price"].append(old_price.text)
        else:
            product_data["Old Price"].append("")
    
        discount = article.find('div', class_='bdg _dsct _sm')
        if discount != None:
            product_data["Discount"].append(discount.text)
        else:
            product_data["Discount"].append("")
    
        rating = article.find('div', class_='stars _s')
        if rating != None:
            product_data["Rating"].append(rating.text)
        else:
            product_data["Rating"].append("")
            

    print(f"Done Collecting Data from {URL}")

Done Collecting Data from https://www.jumia.com.ng/laptops/?page=1#catalog-listing


## Step 4 store in dataframe

In [164]:
jumia_laptop_df = pd.DataFrame.from_dict(product_data)
jumia_laptop_df

Unnamed: 0,Product Name,Current Price,Old Price,Discount,Rating,Vendor
0,"AOCWEI 14.1"" Intel Celeron N4020 6GB+256GB, SS...","₦ 294,325","₦ 547,000",46%,4.5 out of 5,Jumia
1,"WOZIFAN 14.1""Intel Celeron N4020 6GB+256GB,SSD...","₦ 239,258","₦ 1,606,500",85%,3 out of 5,Jumia
2,AOCWEI Laptop Windows 11 Intel Celeron 6GB+256...,"₦ 267,384","₦ 1,666,000",84%,5 out of 5,Jumia
3,Hp ProBook 11 X360- TOUCH- 512GB SSD/4GB RAM-I...,"₦ 270,000","₦ 900,000",70%,4 out of 5,Jumia
4,"DELL Latitude 3190 Intel Celeron 128GB SSD, 4G...","₦ 148,000","₦ 300,000",51%,5 out of 5,Jumia
...,...,...,...,...,...,...
75,DELL Latitude 5400 Intel Core I7-1TB SSD/16GB ...,"₦ 750,000","₦ 980,000",23%,5 out of 5,Jumia
76,Hp Notebook 15-AMD RYZEN 3 -16GB RAM/1TBGB HDD...,"₦ 345,000",,,2 out of 5,Jumia
77,"Hp Stream 11 2GB, 32GB SSD +32gb Flash,USB LED...","₦ 120,000",,,,Jumia
78,DELL Latitude 3190 INTEL CELERON 128GB SSD 4GB...,"₦ 145,500",,,2 out of 5,Jumia


## Step 5: Store data into CSV

In [168]:
jumia_laptop_df.to_csv("jumia_laptop.csv", index=False)