# Web Scraping Wuzzuf Job Listings
This notebook scrapes job listings from Wuzzuf for the keyword **"data"**.
# Steps:
- Send HTTP requests to Wuzzuf
- Parse HTML pages using BeautifulSoup
- Extract job information
- Store results inside a DataFrame
- Export to CSV


<img src="wuzzuf_logo.jpg" width="500">


## Importing Required Libraries
- `requests` is used to send HTTP requests.
- `BeautifulSoup` is used to parse HTML content.
- `pandas` is used to store and export data.

In [26]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

## Testing the Base URL
Send a request to ensure the website is reachable and HTML content can be parsed.


In [27]:
base_url = "https://wuzzuf.net/search/jobs?a=hpb%7Cspbg&q=data&start="
response = requests.get(base_url)
response.status_code 

200

## Parsing the Page Structure
Parse the HTML content using BeautifulSoup to identify the structure of job cards.


In [28]:
soup = BeautifulSoup(response.text, "html.parser")
soup

<!DOCTYPE html>

<html data-id="SSR" dir="ltr" lang="en" translate="no">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1.0, shrink-to-fit=no" name="viewport"/>
<meta content="Thu Dec 08 2022 18:30:44 GMT+0200" http-equiv="expires">
<meta content="no-cache" http-equiv="Pragma"/>
<meta content="no-cache, no-store, must-revalidate" http-equiv="cache-control"/>
<meta content="notranslate" name="googlebot"/>
<title data-react-helmet="true">3,683 data jobs in Egypt – discover job details now on Wuzzuf!</title>
<meta charset="utf-8" data-react-helmet="true"><meta content="Explore 3,683 data jobs in Egypt.  Apply for great job opportunities in leading companies with Wuzzuf today!" data-react-helmet="true" name="description"><meta content="jobs in Egypt, job in Egypt, careers egypt, jobs in Cairo, jobs in alexandria, employment in egypt, Egypt jobs, jobs vacancies, job vacancies in egypt, job search egypt, 

## Locating Job Cards
Identify the main job card container using its HTML class (`css-pkv5jc`).


In [29]:
data_ana = soup.find_all("div", class_="css-pkv5jc")

In [30]:
data_ana[0]

<div class="css-pkv5jc"><a href="https://wuzzuf.net/jobs/careers/HiRemoters-Egypt-125803" rel="noreferrer" target="_blank"><style data-emotion="css 1in28d3">.css-1in28d3{position:absolute;inset-inline-end:0;inset-block-start:0;width:60px;height:60px;object-fit:contain;object-position:center center;}</style><img alt="Jobs and Careers at HiRemoters Egypt" class="css-1in28d3" height="90" loading="lazy" src="https://images.wuzzuf-data.net/files/company_logo/193512423467ff9ff51e42d.png?height=90&amp;width=90" style="opacity:0;transition:opacity 0.2s ease-in-out" width="90"/></a><style data-emotion="css lptxge">.css-lptxge{-webkit-padding-end:60px;padding-inline-end:60px;}</style><div class="css-lptxge"><style data-emotion="css 193uk2c">.css-193uk2c{font-size:16px;font-weight:600;font-style:normal;letter-spacing:-0.4px;line-height:24px;color:#0055D9;margin-block:0;}.css-193uk2c:dir(rtl){letter-spacing:0;line-height:28px;}</style><h2 class="css-193uk2c"><style data-emotion="css o171kl">.css-o

## Scraping All Job Listings
Loop through multiple pages (1–2000) and extract:
- job title
- company name
- location
- job type
- work mode
- experience level
- experience years
- job category
- department
- skills list

All extracted records are appended to a list.

In [32]:
data_analysis = []
for page in range(1, 2000):
    url = base_url + str(page)
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, "html.parser")
        data_ana = soup.find_all("div", class_="css-pkv5jc")
        for data in data_ana:
            title = data.find("h2", class_="css-193uk2c").text.strip()
            company = data.find("a", class_="css-ipsyv7").text.strip("  -")
            location = data.find("span", class_="css-16x61xq").text.strip()
            job_type = data.find("span", class_="css-uc9rga eoyjyou0").text.strip()
            work_mode = data.find("span", class_="css-uofntu eoyjyou0")
            work_mode = work_mode.text.strip() if work_mode else None
            x=data.findAll("a", class_="css-o171kl")
            Experience_level=x[1].text
            y=data.findAll("div", class_="css-1rhj4yg")
            z=y[0].text
            Experience_year=z.split(" · ")[1]
            categories=x[2].text.strip(" · ")
            department=z.split(" · ")[3]
            skills=z.split(" · ")[4:]
            data_analysis.append(
                {
                "Title": title,
                "company" : company,
                "location" : location,
                "job_type" : job_type,
                "work_mode" : work_mode,
                "Experience_level" : Experience_level,
                "Experience_year" : Experience_year,
                "categories" :categories,
                "department" : department,
                "skills" :skills
                })
    else:
        print(f"Failed to fetch page {page}")

## Converting Scraped Data to DataFrame
Store all collected records in a pandas DataFrame.


In [33]:
df = pd.DataFrame(data_analysis)
print(df.head(20))

                                                Title  \
0              Data Scientist with Database Expertise   
1                                        Data Analyst   
2                               Data Entry Specialist   
3                                 Junior Data Analyst   
4                           Data Analytics Specialist   
5                                    Data Entry Clerk   
6   Data Entry & Sourcing Specialist (Hotel / Supp...   
7                                          Data Entry   
8                         Data Scientist & Instructor   
9        Data Analyst & Instructor (Excel & Power BI)   
10                             Master Data Specialist   
11                         Receptionist - Real Estate   
12                                Recruitment Officer   
13  IT Infrastructure Specialist – Data Center Ope...   
14                     Scanning Produducts Data Entry   
15                         Data Scientist Team Leader   
16                             

## Inspecting the DataFrame Structure
Check data types and number of rows collected.

In [34]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3677 entries, 0 to 3676
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Title             3677 non-null   object
 1   company           3677 non-null   object
 2   location          3677 non-null   object
 3   job_type          3677 non-null   object
 4   work_mode         2392 non-null   object
 5   Experience_level  3677 non-null   object
 6   Experience_year   3677 non-null   object
 7   categories        3677 non-null   object
 8   department        3677 non-null   object
 9   skills            3677 non-null   object
dtypes: object(10)
memory usage: 287.4+ KB


## Exporting the Dataset
Save the scraped job data into a CSV file for further analysis.


In [35]:
df.to_csv("data_jobs.csv", index=False)