# Web Scraping Wuzzuf Job Listings
This notebook scrapes job listings from Wuzzuf for the keyword **"data"**.
# Steps:
- Send HTTP requests to Wuzzuf
- Parse HTML pages using BeautifulSoup
- Extract job information
- Store results inside a DataFrame
- Export to CSV


<img src="wuzzuf_logo.jpg" width="500">

## Importing Required Libraries
- `requests` is used to send HTTP requests.
- `BeautifulSoup` is used to parse HTML content.
- `pandas` is used to store and export data.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

## Testing the Base URL
Send a request to ensure the website is reachable and HTML content can be parsed.


In [2]:
base_url = "https://wuzzuf.net/search/jobs/?q=Developer%20&start="
response = requests.get(base_url)
response.status_code 

200

## Parsing the Page Structure
Parse the HTML content using BeautifulSoup to identify the structure of job cards.


In [3]:
soup = BeautifulSoup(response.text, "html.parser")
soup

<!DOCTYPE html>

<html data-id="SSR" dir="ltr" lang="en" translate="no">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1.0, shrink-to-fit=no" name="viewport"/>
<meta content="Thu Dec 08 2022 18:30:44 GMT+0200" http-equiv="expires">
<meta content="no-cache" http-equiv="Pragma"/>
<meta content="no-cache, no-store, must-revalidate" http-equiv="cache-control"/>
<meta content="notranslate" name="googlebot"/>
<title data-react-helmet="true">7,177 Developer  jobs in Egypt – discover job details now on Wuzzuf!</title>
<meta charset="utf-8" data-react-helmet="true"><meta content="Explore 7,177 Developer  jobs in Egypt.  Apply for great job opportunities in leading companies with Wuzzuf today!" data-react-helmet="true" name="description"><meta content="jobs in Egypt, job in Egypt, careers egypt, jobs in Cairo, jobs in alexandria, employment in egypt, Egypt jobs, jobs vacancies, job vacancies in egypt, job se

## Locating Job Cards
Identify the main job card container using its HTML class (`css-pkv5jc`).


In [4]:
dev = soup.find_all("div", class_="css-pkv5jc")

In [5]:
dev[0]

<div class="css-pkv5jc"><a href="https://wuzzuf.net/jobs/careers/Covertina Egypt-Egypt-99677" rel="noreferrer" target="_blank"><style data-emotion="css 1in28d3">.css-1in28d3{position:absolute;inset-inline-end:0;inset-block-start:0;width:60px;height:60px;object-fit:contain;object-position:center center;}</style><img alt="Jobs and Careers at Covertina Egypt  Egypt" class="css-1in28d3" height="90" loading="lazy" src="https://images.wuzzuf-data.net/files/company_logo/209651758065b635a0765a8.png?height=90&amp;width=90" style="opacity:0;transition:opacity 0.2s ease-in-out" width="90"/></a><style data-emotion="css lptxge">.css-lptxge{-webkit-padding-end:60px;padding-inline-end:60px;}</style><div class="css-lptxge"><style data-emotion="css 193uk2c">.css-193uk2c{font-size:16px;font-weight:600;font-style:normal;letter-spacing:-0.4px;line-height:24px;color:#0055D9;margin-block:0;}.css-193uk2c:dir(rtl){letter-spacing:0;line-height:28px;}</style><h2 class="css-193uk2c"><style data-emotion="css o171

### Extracting Job Details  
For each job card, we extract:  
- Title  
- Company  
- Location  
- Job Type  
- Work Mode  
- Experience Level  
- Experience Years  
- Category  
- Department  
- Skills  
All extracted data is stored inside a dictionary and appended to a list.

In [6]:
developers = []
for page in range(1,1500):
    url = base_url + str(page)
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, "html.parser")
        dev = soup.find_all("div", class_="css-pkv5jc")
        for data in dev:
            title = data.find("h2", class_="css-193uk2c").text.strip()
            company = data.find("a", class_="css-ipsyv7").text.strip("  -")
            location = data.find("span", class_="css-16x61xq").text.strip()
            job_type = data.find("span", class_="css-uc9rga eoyjyou0").text.strip()
            work_mode_elem = data.find("span", class_="css-uofntu eoyjyou0")
            work_mode = work_mode_elem.text.strip() if work_mode_elem else None
            x=data.findAll("a", class_="css-o171kl")
            Experience_level=x[1].text
            y=data.findAll("div", class_="css-1rhj4yg")
            z=y[0].text
            Experience_year=z.split(" · ")[1]
            categories=x[2].text.strip(" · ")
            department=z.split(" · ")[3]
            skills=z.split(" · ")[4:]
            developers.append(
                {
                "Title": title,
                "company" : company,
                "location" : location,
                "job_type" : job_type,
                "work_mode" : work_mode,
                "Experience_level" : Experience_level,
                "Experience_year" : Experience_year,
                "categories" :categories,
                "department" : department,
                "skills" :skills
                })
    else:
        print(f"Failed to fetch page {page}")

  x=data.findAll("a", class_="css-o171kl")
  y=data.findAll("div", class_="css-1rhj4yg")


### Creating DataFrame  
After scraping all pages, we convert the collected data into a pandas DataFrame for inspection and further processing.

In [7]:
df = pd.DataFrame(developers)
print(df.head(20))

                                                Title  \
0   Senior Laravel Backend Developer - ERP Systems...   
1                       Business Development Engineer   
2   B2B Account Executive - Business Development E...   
3                         Junior Full Stack Developer   
4                                  Business Developer   
5                                       IOS Developer   
6                        Business Development Manager   
7                      UI Developer (GSAP Specialist)   
8                                      Odoo Developer   
9                            Senior Flutter Developer   
10       HR OD -Organizational Development Specialist   
11                            Senior Blazor Developer   
12  Dental Business Development Specialist ( Upper...   
13         Developer & Technical Specialist [ Remote]   
14                               Articulate developer   
15                        Odoo Developer – Full Stack   
16               Senior Front-E

### Inspecting DataFrame Structure  
We display DataFrame information (columns, datatypes, memory usage) to ensure data integrity.


In [8]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7162 entries, 0 to 7161
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Title             7162 non-null   object
 1   company           7162 non-null   object
 2   location          7162 non-null   object
 3   job_type          7162 non-null   object
 4   work_mode         4292 non-null   object
 5   Experience_level  7162 non-null   object
 6   Experience_year   7162 non-null   object
 7   categories        7162 non-null   object
 8   department        7162 non-null   object
 9   skills            7162 non-null   object
dtypes: object(10)
memory usage: 559.7+ KB
None


### Saving Dataset to CSV  
Finally, we export the cleaned developer job dataset to a CSV file named `devolpers.csv`.

In [9]:
df.to_csv("devolpers.csv", index=False)