## Web Scraping Demo
This is a simple demonstration of web scraping using Python. The code uses libraries such as `requests` and `BeautifulSoup` to extract job data from websites like Wuzzuf and Bayt.


### Step 1: Install Required Libraries
Install the `lxml` library for HTML parsing.

In [None]:
!pip install lxml

### Step 2: Import Required Libraries
Import the necessary libraries for making HTTP requests and parsing HTML.

In [1]:
import requests
from bs4 import BeautifulSoup

### Step 3: Scrape Wuzzuf for Job Data
Define the target URL and fetch the page content.

In [16]:
# Define the URL for Wuzzuf search page
u = "https://wuzzuf.net/search/jobs?a=spbg&q=machine%20learning"

# Send a GET request to the URL
page = requests.get(u)

# Parse the page content using BeautifulSoup
soup = BeautifulSoup(page.content, "html.parser")

#### Step 3.1: Extract Job Titles
Use BeautifulSoup to find and print all job titles on the page.

In [8]:
# Find all job titles on the page
j_t = soup.find_all('h2', class_="css-m604qf")
for i in j_t:
    print(i.text)

Machine Learning Manager
Robotics and Programming Engineer
Sales Specialist
Senior Backend Developer in Node/Express
Machinist / Mechanical Engineer
AI Technical Team Lead (Computer Vision Focus & NLP)
Online Coding Instructor
Senior ML JD
Senior AI Engineer
Senior Full Stack/ Embedded Engineering
Senior Full Stack Developer (MERN or Laravel Stack)
Process Electronics Engineer
AWS Cloud Administrator
Technical sales and marketing manager
E-Commerce Assistant - Remote


#### Step 3.2: Extract Job Locations
Extract and print the locations of the jobs.

In [9]:
# Find all job locations
loc = soup.find_all("span", class_="css-5wys0k")
for i in loc:
    print(i.text)

Cairo, Egypt 
Alexandria, Egypt 
Downtown, Cairo, Egypt 
Manchester, United Kingdom 
Larnaca, Cyprus 
Sheikh Zayed, Giza, Egypt 
Riyadh, Saudi Arabia 
Damietta, Egypt 
Cairo, Egypt 
Riyadh, Saudi Arabia 
Cairo, Egypt 
10th of Ramadan City, Sharqia, Egypt 
New Cairo, Cairo, Egypt 
Hadayek Alahram, Giza, Egypt 
Cairo, Egypt 


#### Step 3.3: Extract Company Names
Extract and print the company names associated with the jobs.

In [10]:
# Find all company names
company = soup.find_all("a", class_="css-17s97q8")
for i in company:
    print(i.text)

kcsc -
Smart Technology -
Gila Electric -
Give Brite  -
kpec international -
Lumin -
Confidential -
ysolution -
RMG -
Qudra Tech -
 Si-Vision -
VIVO -
Citylogix ME -
Etkaan -
Confidential -


#### Step 3.4: Extract Job Post Dates
Extract and print when the jobs were posted.

In [17]:
# Find all job posting dates
# <div class="css-do6t5g">1 month ago</div>
# <div class="css-4c4ojb">5 hours ago</div>
dates = soup.find_all("div", class_= ["css-do6t5g", "css-4c4ojb"])
for i in dates:
    print(i.text)

1 month ago
12 days ago
4 hours ago
2 months ago
1 month ago
15 days ago
1 month ago
5 days ago
28 days ago
1 day ago
6 days ago
13 days ago
29 days ago
6 days ago
13 days ago


#### Step 3.5: Extract Job Types
Extract and print the types of jobs (e.g., Full-time, Part-time).

In [18]:
# Find all job types
# <span class="css-1ve4b75 eoyjyou0">Full Time</span>
job_type = soup.find_all("span", class_="css-1ve4b75 eoyjyou0")
for i in job_type:
    print(i.text)

Full Time
Full Time
Full Time
Full Time
Full Time
Full Time
Full Time
Part Time
Full Time
Full Time
Full Time
Full Time
Full Time
Full Time
Full Time
Full Time


#### Step 3.6: Extract Job URLs
Extract and print the URLs for individual job postings.

In [23]:
# Extract job URLs
# <h2 class="css-m604qf">
# <a href="https://wuzzuf.net/jobs/p/ow9y8zcxQbne-Machine-Learning-Manager-kcsc-Cairo-Egypt" target="_blank" rel="noreferrer" class="css-o171kl">Machine Learning Manager</a></h2>
jobs_url = []
urls = soup.find_all("h2", class_="css-m604qf")
for i in urls:
    jobs_url.append(i.find('a').attrs["href"])
jobs_url

['https://wuzzuf.net/jobs/p/ow9y8zcxQbne-Machine-Learning-Manager-kcsc-Cairo-Egypt',
 'https://wuzzuf.net/jobs/p/jjEycNgftu1U-Robotics-and-Programming-Engineer-Smart-Technology-Alexandria-Egypt',
 'https://wuzzuf.net/jobs/p/PfXPkSfnCCNs-Sales-Specialist-Gila-Electric-Cairo-Egypt',
 'https://wuzzuf.net/jobs/p/3k4DYwP9VAza-Senior-Backend-Developer-in-NodeExpress-Give-Brite-Manchester-United-Kingdom',
 'https://wuzzuf.net/jobs/p/eRVXqix8zgJQ-Machinist-Mechanical-Engineer-kpec-international-Larnaca-Cyprus',
 'https://wuzzuf.net/jobs/p/1kRUuVPxsf8W-AI-Technical-Team-Lead-Computer-Vision-Focus-NLP-Lumin-Giza-Egypt',
 'https://wuzzuf.net/jobs/p/Ej8oiMp2sfqj-Online-Coding-Instructor-Riyadh-Saudi-Arabia',
 'https://wuzzuf.net/jobs/p/iitM79Aq3BJp-Senior-ML-JD-ysolution-Damietta-Egypt',
 'https://wuzzuf.net/jobs/p/YqALuLpVSgcA-Senior-AI-Engineer-RMG-Cairo-Egypt',
 'https://wuzzuf.net/jobs/p/lewDJYtw0FhS-Senior-Full-Stack-Embedded-Engineering-Qudra-Tech-Riyadh-Saudi-Arabia',
 'https://wuzzuf.net/j

In [26]:
for url in jobs_url[0:3]:
    url = url.replace(" ", "")
    print(url)
    page = requests.get(url)
    soup = BeautifulSoup(page.content, "html.parser")
    print(soup.title.text)

https://wuzzuf.net/jobs/p/ow9y8zcxQbne-Machine-Learning-Manager-kcsc-Cairo-Egypt
Machine Learning Manager Job at kcsc in Cairo, Egypt – Apply Now!
https://wuzzuf.net/jobs/p/jjEycNgftu1U-Robotics-and-Programming-Engineer-Smart-Technology-Alexandria-Egypt
Robotics and Programming Engineer Job at Smart Technology in Alexandria, Egypt – Apply Now!
https://wuzzuf.net/jobs/p/PfXPkSfnCCNs-Sales-Specialist-Gila-Electric-Cairo-Egypt
Sales Specialist Job at Gila Electric in Downtown, Cairo – Apply Now!


#### Step 3.7: Fetch Individual Job Details
Fetch and print the titles of individual jobs from their URLs.

In [27]:
# Example URL for testing
url = "https://wuzzuf.net/jobs/p/5NFxSMKMH5K0-SeniorMid-Senior-Deep-Learning-Engineer-Cairo-Egypt?o=1&l=sp&t=sj&a=machine%20learning|search-v3|spbg"

# Send a GET request and parse the content
page = requests.get(url)
soup = BeautifulSoup(page.content, "lxml")
print(soup.title.text)

8,753 Job Opportunities in Egypt - Apply Today!


In [25]:
soup

<!DOCTYPE html>
<html lang="en" translate="no">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1.0, shrink-to-fit=no" name="viewport"/>
<meta content="Thu Dec 08 2022 18:30:44 GMT+0200" http-equiv="expires"/>
<meta content="no-cache" http-equiv="Pragma"/>
<meta content="no-cache, no-store, must-revalidate" http-equiv="cache-control"/>
<meta content="notranslate" name="googlebot"/>
<title data-react-helmet="true">8,753 Job Opportunities in Egypt - Apply Today!</title>
<meta content="Explore 8,753 job vacancies in Egypt. Find your next opportunity with a top recruitment company. Apply now and jumpstart your future!" data-react-helmet="true" name="description"/><meta content="8,753 Job Opportunities in Egypt - Apply Today!" data-react-helmet="true" property="og:title"/><meta content="Explore 8,753 job vacancies in Egypt. Find your next opportunity with a top recruitment company. Apply now and jumpstart

### Step 4: Scrape Bayt for Job Data
Switch to a different platform and extract job data from Bayt.

In [28]:
# Define the Bayt URL for data science jobs
url = "https://www.bayt.com/en/egypt/jobs/data-science-jobs/"

# Send a GET request with a user-agent header to mimic a browser
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
print(soup.title.text)

Data Science Jobs in Egypt (2024) - Bayt.com


#### Step 4.1: Extract Job Titles on Bayt
Extract and print job titles from Bayt.

In [29]:
# Find all job titles on the page
# <h2 class="col u-stretch t-large m0 t-nowrap-d t-trim">
# <a data-automation-is_aggregated="0" data-automation-is_external="1" data-js-aid="jobID" data-js-link="" href="/en/egypt/jobs/immdiate-hiring-for-product-managerfor-a-factory-in-egypt-5214512/">
# immdiate hiring for product managerfor a factory in Egypt </a>
# </h2>
title = soup.find_all("h2", class_="m0")
for j in title:
    print(j.text.strip())

immdiate hiring for product managerfor a factory in Egypt
Senior Presales Solution Architect " Data "
Data Analytics - Data Science Team Lead - Cairo
Data Analytics - Data Science Team Lead - Cairo
Power BI Developer
Customer Success Manager – Focused on Data Science - 218456
Microsoft CRM Administrator
Network & Security Head
Business Analytics & Insights Lead ELI & North Africa
Pharmacist
Safety Coordinator, Workplace Health and Safety
HCM Techno-Functional Consultant
Business Analyst (Retail Loan Origination System)
Data Science Manager
Senior Data Science Engineer
Data Architect - 218417
Senior Business Intelligence Consultant - 218408
Center Operations Manager - Cairo
Business Consultant- (Account Manager)
Internship Program - Quality Assurance Specialist


#### Step 4.2: Extract Job URLs on Bayt
Extract and print the URLs for job postings on Bayt.

In [None]:
# Extract job URLs on Bayt
jobs_url = []
urls = soup.find_all("h2", class_="m0")
for i in urls:
    jobs_url.append("https://www.bayt.com" + i.find('a').attrs['href'])
jobs_url

['https://www.bayt.com/en/egypt/jobs/immdiate-hiring-for-product-managerfor-a-factory-in-egypt-5214512/',
 'https://www.bayt.com/en/egypt/jobs/senior-presales-solution-architect-quot-data-quot-5206515/',
 'https://www.bayt.com/en/egypt/jobs/data-analytics-data-science-team-lead-cairo-72038421/',
 'https://www.bayt.com/en/egypt/jobs/data-analytics-data-science-team-lead-cairo-72038074/',
 'https://www.bayt.com/en/egypt/jobs/power-bi-developer-5201473/',
 'https://www.bayt.com/en/egypt/jobs/customer-success-manager-focused-on-data-science-218456-72042911/',
 'https://www.bayt.com/en/egypt/jobs/microsoft-crm-administrator-5199788/',
 'https://www.bayt.com/en/egypt/jobs/network-security-head-5209500/',
 'https://www.bayt.com/en/egypt/jobs/business-analytics-amp-insights-lead-eli-amp-north-africa-5198838/',
 'https://www.bayt.com/en/egypt/jobs/pharmacist-5205410/',
 'https://www.bayt.com/en/egypt/jobs/safety-coordinator-workplace-health-and-safety-5205289/',
 'https://www.bayt.com/en/egypt/

### Quizzes
Try to find additional information from the scraped pages using the following tasks:

#### Quiz 1: Find the Locations of Jobs

In [None]:
# Find the location
location = soup.find_all("....", class_ ="...")
for l in location:
    print(l.text)

#### Quiz 2: Find the Company Names of Jobs

In [None]:
# Find the company
company = soup.find_all("...", class_ ="....")
for c in company:
    print(c.text)

#### Quiz 3: Extract Job Descriptions

In [None]:
# job descreption
desc = soup.find_all("....", class_ ="...")
for d in desc:
    print(d.text)