<a id="1"></a>
# <div style="text-align: center; background-color: #569db3; color: white; padding: 14px; line-height: 1;border-radius:20px">Document — Web Scraping [MobileMasr.com]</div>

## 🧭 Table of Contents
1. [⚙️ Code Approaches](#️code)
2. [🌐 Which Website Was Scraped](#-which-website-was-scraped)
3. [💾 Data Collected](#-data-collected)
4. [🚧 Challenges and Solutions](#-challenges-and-solutions)
5. [📘 How to Run the Script](#-how-to-run-the-script)

<h1 style="color:red; font-weight:bold; background-color:#f5f5f5; border:2px solid red; padding:10px; border-radius:10px; font-family:monospace;">
<code>1- Code approaches:-</code>
</h1>

In this notebook, I demonstrate **two different approaches** to writing Python web scraping code:

**<span style="color:blue;">1. Monolithic Approach (All-in-One Code)</span>**

   - In this approach, the entire scraping process is written in one continuous block of code.  
   - It includes fetching pages, parsing HTML, extracting product details, and writing to CSV all together.  
   - This method is simple for small scripts but can become hard to maintain and read for larger projects.

**<span style="color:blue;">2. Modular Approach (Using Functions)</span>**
   - In this approach, the code is divided into **functions** for each specific task, such as:
     - Creating the CSV file
     - Fetching a webpage
     - Parsing a product card
     - Writing data to CSV
     - Main loop controlling the scraping
   - This method improves **readability, reusability, and maintainability**.
   - Each function has a clear responsibility, making the code easier to debug and extend.

Both methods achieve the same end result: scraping mobile phone data from the website and saving it to a CSV file.  
The difference lies in **code organization and readability**.


<h1 style="color:red; font-weight:bold; background-color:#f5f5f5; border:2px solid red; padding:10px; border-radius:10px; font-family:monospace;">
<code>2- Which Website Was Scraped :- </code>
</h1>

- **Website:** [MobileMasr.com](https://mobilemasr.com)  
- **Category:** Mobile Phones  
- The script navigates through **all pages** of the mobile phone category.

<h1 style="color:red; font-weight:bold; background-color:#f5f5f5; border:2px solid red; padding:10px; border-radius:10px; font-family:monospace;">
<code>3- Data Collected </code>
</h1>

Each row in the CSV file contains the following columns:

- **`product_name:`** Name of the mobile phone  
- **`price:`** Price of the mobile phone  
- **`seller:`** Seller name  
- **`status:`** Condition of the product (`new` / `used`)  
- **`battery_condition:`** Battery condition (if listed)  
- **`warranty_period:`** Warranty period (if listed)  
- **`memory_size:`** The storage size (e.g., 128 GB).   
- **`RAM_size:`** The RAM capacity of the phone (e.g., 4 GB 
- **`color:`** Phone color  
- **`page_number:`** Page number where the product was scraped  
- **`product_url:`** URL of the product page


<h1 style="color:red; font-weight:bold; background-color:#f5f5f5; border:2px solid red; padding:10px; border-radius:10px; font-family:monospace;">
<code>4- Some Challenges and Solutions :-</code>
</h1>



### <span style="color:blue;"><code>Challenge 1:</code> Dynamic Class Names in HTML</span>



**Problem:**  
Most of the website’s elements (like `<div>` and `<a>` tags) used **Bootstrap or dynamically generated class names**.  
This caused issues because when the website was updated, class names often changed — which meant the scraper couldn’t find the elements anymore, and no data was returned.

**Solution:**  
To make the scraper more stable and update-proof:
- I made the code depend on **static or consistent class names** whenever possible.  
- When all class names were dynamic, I used **CSS selectors with partial matches** (e.g., selecting a part of the class name that was unlikely to change).  
- This allowed the scraper to locate elements reliably even if the site structure changed slightly.



### <span style="color:blue;"><code>Challenge 2:</code>Handling Request Failures</span>

**Problem:** Some requests failed due to network issues or missing pages.  
**Solution:** Wrapped each `requests.get()` call inside a `try-except` block to catch exceptions and continue scraping without stopping the program.

### <span style="color:blue;"><code>Challenge 3:</code> Missing Data in Some Products</span>

**Problem:** Some product pages had missing fields like seller or warranty.  
**Solution:** Added conditional checks (`if ... else 'N/A'`) before writing data to the CSV to prevent errors and keep the dataset consistent.


### <span style="color:blue;"><code>Challenge 4:</code> Extracting Battery and Warranty Information</span>


**Problem:**  
On the main product page, both **battery condition** and **warranty period** were displayed inside similar `<span>` tags within the same parent container (`div.product-card`).  
There was no unique class or identifier to tell which one was which.  
Additionally, in some products, **one of the values was missing**, which made it even harder to identify which existing value belonged to the battery or the warranty.

**Solution:**  
To handle this issue:
- I analyzed the text inside each `<span>` to detect patterns.  
- If the text contained **letters or symbols like `%`**, it was identified as **battery condition** (e.g., “Battery 95%”, very good).   
- When a value was missing, I safely assigned **"N/A"** to that field so the script wouldn’t crash and the data structure stayed consistent.

This logic ensured that even if one of the fields was absent, the scraper still recorded the correct information in the right column.


### <span style="color:blue;"><code>Challenge 5:</code> Extracting and Cleaning Product Names</span>


**Problem:**  
The product name (`<h1>` tag) on the details page often included **extra technical details** such as memory size or RAM (e.g., "iPhone 11 128 جيجابايت رامات 4").  
This made the product name inconsistent and hard to analyze later because those details were already extracted separately in other fields.  
Additionally, in some cases, the `<h1>` tag itself was missing, which could cause the scraper to throw an error.

**Solution:**  
To solve this:
- I used a **regular expression with `re.split()`** to split the text at patterns like `"رامات"` or `"جيجابايت"`, keeping only the clean product name before these words. 

```python
 - re.split(r'\s+(?:رامات|جيجابايت)\s+', text)[0].strip()


<h1 style="color:red; font-weight:bold; background-color:#f5f5f5; border:2px solid red; padding:10px; border-radius:10px; font-family:monospace;">
<code>📘 5- How to Run the Script and What Sites Were Scraped </code>
</h1>

## 1. Overview
This script scrapes **mobile phone data** from the website [MobileMasr.com](https://mobilemasr.com), specifically from the **Mobile Phones** category.  
It extracts detailed information about each phone (price, seller, condition, specifications, etc.) and saves the data into a CSV file.

---

## 2. How to Run the Script

1. **Install required libraries:**
   ```bash
   pip install requests beautifulsoup4

## 🧠 How the Script Works

Run the script in your Python environment or notebook cell.  
The script will:

- Start from **page 1** of the “Mobile Phones” category.  
- Visit each product page to extract detailed information.  
- Continue automatically to the next page until no more products are available.  

### 🗂️ Output
- A CSV file named **`mobile_misr_phones_1.csv`** will be created in your working directory.  
- Each row represents one mobile phone with its full details.  
