# Web Scraping MobileMasr.com

In this notebook, I demonstrate **two different approaches** to writing Python web scraping code:

1. **Monolithic Approach (All-in-One Code)**  
   - In this approach, the entire scraping process is written in one continuous block of code.  
   - It includes fetching pages, parsing HTML, extracting product details, and writing to CSV all together.  
   - This method is simple for small scripts but can become hard to maintain and read for larger projects.

2. **Modular Approach (Using Functions)**  
   - In this approach, the code is divided into **functions** for each specific task, such as:
     - Creating the CSV file
     - Fetching a webpage
     - Parsing a product card
     - Writing data to CSV
     - Main loop controlling the scraping
   - This method improves **readability, reusability, and maintainability**.
   - Each function has a clear responsibility, making the code easier to debug and extend.

Both methods achieve the same end result: scraping mobile phone data from the website and saving it to a CSV file.  
The difference lies in **code organization and readability**.


# -----------------------------------------------------------------------------------------------------------

# 1- Monolithic Approach (All-in-One Code)

In [1]:
# Library
import requests
from bs4 import BeautifulSoup
import csv
import re
import pandas as pd

In [4]:
# Create CSV file and write header
with open('mobile_misr_phones_1.csv','w',encoding='utf-8',newline='') as f:
    writer = csv.writer(f)  # create CSV writer
    writer.writerow(['product_name','price','seller','status','battery_condition','warranty_period',
                     'memory_size','RAM_size','color','page_number','product_url'])  # write column headers

page_number = 1  # start with first page
count = 0  # counter for total products
status_dict = {'جديد':'new','مستعمل':'used'}  # map Arabic status to English

while True:  # loop through pages until no more
    url = f'https://mobilemasr.com/category/%D9%85%D9%88%D8%A8%D8%A7%D9%8A%D9%84/products?page={page_number}'  # construct page URL
    response = requests.get(url)  # fetch page content
    soup = BeautifulSoup(response.text,'html.parser')  # parse HTML content
    print(f'( Starting to scrape page : {page_number} )'.center(80, " "))  # print status
    print("-------------------------------".center(80, " "))

    # open CSV in append mode to write product data
    with open('mobile_misr_phones_1.csv','a',encoding='utf-8',newline='') as f:
        writer = csv.DictWriter(f, fieldnames=['product_name','price','seller','status','battery_condition',
                                               'warranty_period','memory_size','RAM_size','color','page_number','product_url'])
        
        dash_line = "."  # for loading indicator

        # iterate over all product links on the page
        for item in soup.find_all('a', attrs={'class':'h-[55px] p-0 m-0'}):
            print(f"\rLoading {dash_line}", end="")  # show loading dots
            dash_line += "."
            seller, warranty_period, battery_condition = [], [], []  # initialize lists

            # extract main page product info like....battery info and warranty info
            product_card = item.find_parent('div', class_='product-card')  # find parent card
            if product_card:
                for span in product_card.select('span[class*="inline-flex"]'):
                    if re.search(r'[A-Za-z%]', span.get_text()):
                        battery_condition.append(span.get_text())  # battery info
                    else:
                        warranty_period.append(span.get_text())  # warranty info

            # go to product detail page for more info
            link_tag = item.get('href')  # get product link
            full_url = "https://mobilemasr.com" + link_tag  # complete URL
            second_response = requests.get(full_url)  # fetch product page
            second_soup = BeautifulSoup(second_response.text, "html.parser")  # parse HTML
            
            # Extract Moemory Size , RAM Size , Color
            phone_details = [span.get_text(strip=True) for span in second_soup.find_all('span', attrs={'class': 'mx-1'})[:3]]  # memory, RAM, color

            # extract seller name
            if second_soup.find('h3', attrs={'class': 'mx-2 mt-1'}) is None:
                seller.append(second_soup.find('span', attrs={'class': 'mx-1 text-sm'}).get_text())
            else:
                seller.append(second_soup.find('h3', attrs={'class': 'mx-2 mt-1'}).contents[0].strip())

            # write product data to CSV
            writer.writerow({
                'product_name': (re.split(r'\s+(?:رامات|جيجابايت)\s+', second_soup.find('h1').get_text())[0].strip()
                                 if second_soup.find('h1') else 'N/A'), 
                'price': (second_soup.find('h4').get_text().split()[0] if second_soup.find('h4') else 'N/A'),  
                'seller': (seller[0] if seller else None),  # seller
                'status': (status_dict.get(second_soup.select('span[class*="text-xs"]')[0].get_text(strip=True))
                           if second_soup.select('span[class*="text-xs"]') else 'N/A'), 
                'memory_size': phone_details[0].split()[0] if len(phone_details) > 0 else 'N/A',  
                'RAM_size': phone_details[1].split()[0] if len(phone_details) > 1 else 'N/A',  
                'color': phone_details[2].split()[0] if len(phone_details) > 2 else 'N/A',  
                'page_number': page_number,  
                'product_url': full_url, 
                'warranty_period': (warranty_period[0] if warranty_period else 'N/A'),  
                'battery_condition': (battery_condition[0] if battery_condition else 'N/A')  
            })
            count += 1  # increase product counter
            warranty_period = []  # reset for next product
            battery_condition = []  # reset for next product

    # print summary for the page
    print(f"\nSuccessfully scraped page ( {page_number} ) with ( {count} ) products.".center(80, " "))
    count=0
    print('='*80)

    # check if there are more pages
    if len(soup.find_all('span', attrs={'class':'hidden sm:flex'})) != 0:
        page_number += 1  # next page
    else:
        print('Finished All Pages')  # no more pages
        break


                        ( Starting to scrape page : 1 )                         
                        -------------------------------                         
Loading ........................             
Successfully scraped page ( 1 ) with ( 24 ) products.             
                        ( Starting to scrape page : 2 )                         
                        -------------------------------                         
Loading ........................             
Successfully scraped page ( 2 ) with ( 24 ) products.             
                        ( Starting to scrape page : 3 )                         
                        -------------------------------                         
Loading ........................             
Successfully scraped page ( 3 ) with ( 24 ) products.             
                        ( Starting to scrape page : 4 )                         
                        -------------------------------                         
Loading .....

Loading ........................            
Successfully scraped page ( 24 ) with ( 24 ) products.             
                        ( Starting to scrape page : 25 )                        
                        -------------------------------                         
Loading ........................            
Successfully scraped page ( 25 ) with ( 24 ) products.             
                        ( Starting to scrape page : 26 )                        
                        -------------------------------                         
Loading ........................            
Successfully scraped page ( 26 ) with ( 24 ) products.             
                        ( Starting to scrape page : 27 )                        
                        -------------------------------                         
             
Successfully scraped page ( 27 ) with ( 0 ) products.             
Finished All Pages


In [5]:
df= pd.read_csv('mobile_misr_phones_1.csv')
df

Unnamed: 0,product_name,price,seller,status,battery_condition,warranty_period,memory_size,RAM_size,color,page_number,product_url
0,أبل أيفون 16 Pro Max,77000,Harmony1,used,100 %,ضمان ٣٠ يوم,512,8,تيتانيوم,1,https://mobilemasr.com/products/show/%D8%A3%D8...
1,ايفون 15 Pro Max,55680,Harmony1,used,87 %,ضمان ٣٠ يوم,256,8,تيتانيوم,1,https://mobilemasr.com/products/show/%D8%A7%D9...
2,هونر Magic 6 Pro,37250,Harmony1,used,very good,ضمان ٣٠ يوم,512,12,أسود,1,https://mobilemasr.com/products/show/%D9%87%D9...
3,أبل أيفون 16 Pro Max,68600,Harmony+,used,100 %,ضمان ٣٠ يوم,256,8,تايتنيوم,1,https://mobilemasr.com/products/show/%D8%A3%D8...
4,فيفو V50 Lite 5G,15100,Harmony1,new,,ضمان محلي 12 شهر,256,12,ذهبي,1,https://mobilemasr.com/products/show/%D9%81%D9...
...,...,...,...,...,...,...,...,...,...,...,...
619,شاومي Redmi Note 14 Pro Plus 5G,21380,Harmony+,new,,ضمان محلي 12 شهر,256,12,بنفسجي,26,https://mobilemasr.com/products/show/%D8%B4%D8...
620,ريلمي Note 50,6035,Harmony1,new,,ضمان محلي 12 شهر,128,4,أسود,26,https://mobilemasr.com/products/show/%D8%B1%D9...
621,ايفون 16 Pro Max,73000,FM store,new,,ضمان دولي 12 شهر,256,8,تيتانيوم,26,https://mobilemasr.com/products/show/%D8%A7%D9...
622,ايفون 14 Pro Max,47320,Harmony+,used,85 %,ضمان ٣٠ يوم,256,6,أسود,26,https://mobilemasr.com/products/show/%D8%A7%D9...


# ---------------------------------------------------------------------------------------------------------------

# 2- Modular Approach (Using Functions)

In [14]:
# Variable

csv_file = 'mobile_misr_phones_1.csv'
fieldnames = ['product_name','price','seller','status','battery_condition','warranty_period',
              'memory_size','ram_size','color','page_number','product_url']
status_dict = {'جديد':'new','مستعمل':'used'}

In [6]:
def create_csv():
    """
    Create a CSV file and write the header row.
    
    This function initializes the CSV file where all scraped
    mobile phone data will be saved.
    """
    with open(csv_file,'w',encoding='utf-8',newline='') as f:
        writer = csv.writer(f)
        writer.writerow(fieldnames)

In [7]:
def fetch_page(url):
    """
    Fetch a webpage and return a BeautifulSoup object.
    
    Parameters:
        url (str): The URL of the webpage to fetch.
    
    Returns:
        BeautifulSoup object if the page is fetched successfully, otherwise None in case of a request error.
    """
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        return BeautifulSoup(response.text,'html.parser')
    except requests.RequestException as e:
        print(f"[Error] Failed to fetch {url}: {e}")
        return None

In [8]:
def parse_product_card(item, page_number):
    """
    Extract product details from the main page and the product detail page.
    
    Parameters:
        item : The HTML element containing the product card.
        page_number (int): The current page number being scraped.
    
    Returns:
        dict: A dictionary containing all extracted product information.
        None: If the product detail page could not be fetched.
    """
    seller_list, warranty_list, battery_list = [], [], []

    # Extract basic info from the main page
    product_card = item.find_parent('div', class_='product-card')
    if product_card:
        for span in product_card.select('span[class*="inline-flex"]'):
            if re.search(r'[A-Za-z%]', span.get_text()):
                battery_list.append(span.get_text())
            else:
                warranty_list.append(span.get_text())

    # Extract detailed info from the product's own page
    link_tag = item.get('href')
    full_url = "https://mobilemasr.com" + link_tag
    second_soup = fetch_page(full_url)
    if second_soup is None:
        return None

    # Extract Moemory Size , RAM Size , Color
    phone_details = [span.get_text(strip=True) for span in second_soup.find_all('span', attrs={'class': 'mx-1'})[:3]]

    # Extract seller information
    if second_soup.find('h3', attrs={'class': 'mx-2 mt-1'}) is None:
        seller_list.append(second_soup.find('span', attrs={'class': 'mx-1 text-sm'}).get_text())
    else:
        seller_list.append(second_soup.find('h3', attrs={'class': 'mx-2 mt-1'}).contents[0].strip())

    # Create a dictionary of all product information
    product_data = {
        'product_name': (re.split(r'\s+(?:رامات|جيجابايت)\s+', second_soup.find('h1').get_text())[0].strip()
                         if second_soup.find('h1') else 'N/A'),
        'price': (second_soup.find('h4').get_text().split()[0] if second_soup.find('h4') else 'N/A'),
        'seller': seller_list[0] if seller_list else 'N/A',
        'status': status_dict.get(second_soup.select('span[class*="text-xs"]')[0].get_text(strip=True))
                  if second_soup.select('span[class*="text-xs"]') else 'N/A',
        'memory_size': phone_details[0].split()[0] if len(phone_details) > 0 else 'N/A',
        'ram_size': phone_details[1].split()[0] if len(phone_details) > 1 else 'N/A',
        'color': phone_details[2].split()[0] if len(phone_details) > 2 else 'N/A',
        'page_number': page_number,
        'product_url': full_url,
        'warranty_period': warranty_list[0] if warranty_list else 'N/A',
        'battery_condition': battery_list[0] if battery_list else 'N/A'
    }

    return product_data

In [9]:
def write_to_csv(data):
    """
    Write a single product's data to the CSV file.
    
    Parameters:
        data (dict): A dictionary containing product information.
    """
    with open(csv_file,'a',encoding='utf-8',newline='') as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writerow(data)

In [11]:
def main():
    """
    Main function to control the scraping process.
    
    - Creates the CSV file
    - Iterates through all product pages
    - Fetches each page and parses all product cards
    - Writes the extracted data to the CSV
    - Handles request errors gracefully
    """
    create_csv()
    page_number = 1
    count = 0

    while True:
        url = f'https://mobilemasr.com/category/%D9%85%D9%88%D8%A8%D8%A7%D9%8A%D9%84/products?page={page_number}'
        soup = fetch_page(url)
        if soup is None:
            print(f"Skipping page {page_number} due to fetch error.")
            page_number += 1
            continue

        print(f'( Starting to scrape page : {page_number} )'.center(80, " "))
        print("-------------------------------".center(80, " "))

        dash_line = "."
        for item in soup.find_all('a', attrs={'class':'h-[55px] p-0 m-0'}):
            print(f"\rLoading {dash_line}", end="")
            dash_line += "."
            product_data = parse_product_card(item, page_number)
            if product_data is not None:
                write_to_csv(product_data)
                count += 1

        print(f"\nSuccessfully scraped page ( {page_number} ) with ( {count} ) products.".center(80, " "))
        print('='*80)
        count=0

        # Check if there are more pages
        if len(soup.find_all('span', attrs={'class':'hidden sm:flex'})) != 0:
            page_number += 1
        else:
            print('Finished All Pages')
            break

In [15]:
# Run Script 
if __name__ == "__main__":
    main()

                        ( Starting to scrape page : 1 )                         
                        -------------------------------                         
Loading ....[Error] Failed to fetch https://mobilemasr.com/products/show/%D8%A3%D8%A8%D9%84-%D8%A3%D9%8A%D9%81%D9%88%D9%86-16-pro-max-%D8%B1%D8%A7%D9%85%D8%A7%D8%AA-8-%D8%AC%D9%8A%D8%AC%D8%A7%D8%A8%D8%A7%D9%8A%D8%AA-256-%D8%AC%D9%8A%D8%AC%D8%A7%D8%A8%D8%A7%D9%8A%D8%AA-%D8%AA%D8%A7%D9%8A%D8%AA%D9%86%D9%8A%D9%88%D9%85-%D8%A3%D8%B3%D9%88%D8%AF-(titanium-black)-%D9%85%D9%85%D8%AA%D8%A7%D8%B2-vu016223: HTTPSConnectionPool(host='mobilemasr.com', port=443): Read timed out. (read timeout=10)
Loading ........................             
Successfully scraped page ( 1 ) with ( 23 ) products.             
                        ( Starting to scrape page : 2 )                         
                        -------------------------------                         
Loading ........................             
Successfully scraped page ( 

Loading ........................            
Successfully scraped page ( 21 ) with ( 24 ) products.             
                        ( Starting to scrape page : 22 )                        
                        -------------------------------                         
Loading ........................            
Successfully scraped page ( 22 ) with ( 24 ) products.             
                        ( Starting to scrape page : 23 )                        
                        -------------------------------                         
Loading ........................            
Successfully scraped page ( 23 ) with ( 24 ) products.             
                        ( Starting to scrape page : 24 )                        
                        -------------------------------                         
Loading ........................            
Successfully scraped page ( 24 ) with ( 24 ) products.             
                        ( Starting to scrape page : 25 )      

In [256]:
import pandas as pd
df= pd.read_csv('mobile_misr_phones_1.csv')
df

Unnamed: 0,product_name,price,seller,status,battery_condition,warranty_period,memory_size,ram_size,color,page_number,product_url
0,ايفون 13,23450,Nashwan Store,used,79 %,ضمان ٣٠ يوم,128,4,وردى,1,https://mobilemasr.com/products/show/%D8%A7%D9...
1,هونر 400,23400,Harmony1,new,,ضمان محلي 12 شهر,512,12,ذهبي,1,https://mobilemasr.com/products/show/%D9%87%D9...
2,فيفو Y19s Pro,6865,El akhwa,new,,ضمان محلي 12 شهر,128,6,فضى,1,https://mobilemasr.com/products/show/%D9%81%D9...
3,انفنيكس Hot 50i 4G,5720,El akhwa,new,,ضمان محلي 12 شهر,256,8,اخضر,1,https://mobilemasr.com/products/show/%D8%A7%D9...
4,ايفون 16 Pro Max,68850,Harmony+,used,100 %,ضمان ٣٠ يوم,256,8,تيتانيوم,1,https://mobilemasr.com/products/show/%D8%A7%D9...
5,سامسونج جلاكسي S24 Ultra,46350,El Hawakmy,used,very good,ضمان ٣٠ يوم,512,12,تايتنيوم,1,https://mobilemasr.com/products/show/%D8%B3%D8...
6,سامسونج جلاكسي A36 5G,17180,Harmony+,new,,ضمان محلي 12 شهر,256,8,أسود,1,https://mobilemasr.com/products/show/%D8%B3%D8...
7,شاومي Poco F5 Pro,17375,Harmony+,used,very good,ضمان ٣٠ يوم,256,12,أسود,1,https://mobilemasr.com/products/show/%D8%B4%D8...
8,أبل أيفون 16 Pro,53200,Harmony+,used,92 %,ضمان ٣٠ يوم,128,8,أبيض,1,https://mobilemasr.com/products/show/%D8%A3%D8...
9,هونر X9c,15250,Harmony+,used,very good,ضمان ٣٠ يوم,256,12,تايتنيوم,1,https://mobilemasr.com/products/show/%D9%87%D9...
