#### Project Statement 
_____
- In this project, we shall be extracting data from Jumia (www.jumia.co.ke) an e-Commerce website. 

- We shall be scrapping the website to access products with discounts currently. 

- The data will be moved to a Postgres database housed at Aiven - (https://aiven.io/) 

#### Key libraries for this projects include;
___

1. Beautiful Soup - `pip install beautifulsoup4`

2. Pandas - `pip install pandas`

3. requests 

#### Stage 1: Setting up the project 

- Importing the libraries,

- Setting project variables

In [49]:
# Installing necessary libaries 

from bs4 import BeautifulSoup 
import pandas as pd 
import lxml
import requests 
import time

In [110]:
BASE_URL = "https://www.jumia.co.ke/{}/?page={}#catalog-listing" # This is the BASE_URL that will be used in this project

# This list will hold the product categories we shall scrape
PRODUCT_CATEGORIES = [
    "electronics",
    "phones-tablets",
    # "category-fashion-by-jumia",
    # "home-office",
    # "health-beauty",
    # "home-office-appliances",
    # "computing",
    # "baby-products",
    # "sporting-goods"
]

MAX_PAGE_COUNT = 3 # Sets the number of pages to scrape for every product category. Max = 50

# To make sure that we are sending requests as user agennts for all our HTTP requests.
# The default user agent using python requests in Python
PAGE_HEADERS = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'}

### Step 2: Scrape the Website

In [111]:
def scrapper() -> list:
    """ 
    This function scrapes the project URL to find products, thier prices, and discounts prices

    Returns:
        all_products (list): A list of dictionaries containing products that have been scrapped.
    """ 

    all_products = [] # The scraped products will be added here as a list of dictionaries

    current_page_num = 1 # Holds the value for the current page being scrapped 

    # Looping through the product categories of interest
    for product_category in PRODUCT_CATEGORIES:

        # Make sure we don't try to access pages that don't exist
        while current_page_num <= MAX_PAGE_COUNT: 

            response = requests.get(BASE_URL.format(product_category, current_page_num), headers=PAGE_HEADERS) 
            print("Showing you The first every page number", current_page_num)

            soup = BeautifulSoup(response.text, 'lxml') # Create a soup 

            products_wrapper = soup.find_all("article", {"class": "prd _fb col c-prd"})  # Find all the HTML tags wrapping each product
            
            # Loop and access each wrapper to access specific information for earch product
            for product in products_wrapper:
                product_name = product.find("h3", {"class": "name"}).text # Access the product name 

                current_price = product.find("div", {"class": "prc"}).text # Access the current price 
 
                try: # Accounting for products that may not have old price
                    old_price = product.find("div", {"class": "old"}).text
                except:
                    old_price = "0" 

                # Create a dictionary for this product and append to the list all_products
                current_product_details = {
                    "product_name": product_name,
                    "category": product_category,
                    "current_price": current_price,
                    "old_price": old_price
                } 

                all_products.append(current_product_details)
            
            current_page_num = current_page_num + 1 # Increment this to move to the next page

            # We want the scrapper to pause for 4 seconds before making another request
            print("Scrapper going to sleep...")
            time.sleep(4)
            
        # Reset the page counter when done with each category
        current_page_num = 1

    return all_products 

### Step 3: Data Storage 
___ 
This stage involves storing the scrapped data to the database 

Below are the implementation details;

- Move the data to a Pandas Dataframe. 

- Perform some data cleaning tasks, e.g., transformations 

- Set up the database

- Use Pandas to move the cleaned data to our database.