<h1 style="text-align:center;">Web Scraping project</h1>

### Project Description
"This project involved web scraping a job listings website to collect valuable data related to the field of data analysis in Egypt.
We utilized Python, Beautiful Soup, and Selenium to extract job titles, locations, experience requirements, education levels, and other job-related information. 
The data was then organized into a structured format, making it easier for users to search and find relevant job opportunities."

#### Importing Libraries

In [1]:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd

#### Initialize empty lists to store data

In [2]:
# Initialize empty lists to store data extracted from job listings
Links=[] 
Job_Title = []
Job_Status=[]
Location=[]
Experience=[]
Career_Level=[]
Education_Level=[]
Num_applicant=[]
Salary=[]
Skills_And_Tools=[]
page_num=0

#### Initialize a loop to move between multiple pages

In [None]:
while True:
    url=requests.get(f'https://wuzzuf.net/search/jobs/?a=navbl&filters%5Bcountry%5D%5B0%5D=Egypt&q=data%20analyst&start={page_num}')
    src=url.content
    soup=BeautifulSoup(src,'lxml')
    
    # Find and extract the total page count for job listings
    page_limit=int(soup.find("strong").text)
    
    # Check if the current page number exceeds the total page count
    if(page_num>page_limit//15):
        print("page ended")
        break
        
     # Find all job titles on the current page   
    job_titles=soup.find_all("h2",{"class":"css-m604qf"})
    
    # Loop through the list of job titles and extract their links
    for i in range(len(job_titles)):
        Job_Title.append(job_titles[i].text)
        Links.append(job_titles[i].find("a").attrs['href'])
        
# Move to the next page by incrementing the page number
    page_num+=1
    print("page switched")

#### Loop through the job listing page links

In [4]:
for link in Links:
    driver = webdriver.Chrome()
    driver.get(link)
    page_source = driver.page_source
    driver.quit()
    soup=BeautifulSoup(page_source,'lxml')
    
# 1- Extract and append experience data  
    EXP=soup.find_all("span",{"class":"css-4xky9y"})[0]
    for i in EXP:
        Experience.append(i.text)
        
# 2- Extract and append career level data        
    Career=soup.find_all("span",{"class":"css-4xky9y"})[1]
    for x in Career:
        Career_Level.append(x.text)
        
# 3- Extract and append location data       
    locations=soup.find_all("strong",{"class":"css-9geu3q"})
    for d in locations:
        Location.append(d.text.split("-")[1].strip())
        
# 4- Extract and append job status data        
    status=soup.find_all("span",{"class":"css-ja0r8m eoyjyou0"})
    for a in Career:
        Job_Status.append(a.text)
        
# 5- Extract and append education level data       
    Edu=soup.find_all("span",{"class":"css-4xky9y"})[2]
    for y in Edu:
        Education_Level.append(y.text)
        
# 6- Extract and append salary data       
    salaries=soup.find_all("span",{"class":"css-4xky9y"})[3]
    for z in salaries:
        Salary.append(z.text)
        
# 7- Extract and append number of applicants data        
    Num_app=soup.find_all("strong",{"class":"css-u1gwks"})
    for t in Num_app:
        Num_applicant.append(t.text)
        
# 8- Extract and append skills and tools data        
    skill=soup.find_all("span",{"class":"css-158icaa"})
    for v in skill:
        Skills_And_Tools.append(v.text)    


#### Store data at DataFrame

In [31]:
# Create a dictionary with data for creating a Pandas DataFrame
data_frame = {'Title': Job_Title[:173] ,
              'Location': Location[:173] ,
              'Job_Status': Job_Status[:173] ,
              'Experience': Experience[:173] ,
              'Education_Level': Education_Level[:173] ,
              'Career_Level': Career_Level[:173] ,
              'Num_applicant': Num_applicant[:173] ,
              'Skills_And_Tools': Skills_And_Tools[:173] ,
              'Link': Links[:173]}
# Create a Pandas DataFrame from the dictionary
df = pd.DataFrame(data_frame)

In [33]:
df

Unnamed: 0,Title,Location,Job_Status,Experience,Education_Level,Career_Level,Num_applicant,Skills_And_Tools,Link
0,Data Analyst,"New Cairo, Cairo",Experienced (Non-Manager),3 to 5 years,Not Specified,Experienced (Non-Manager),68,Analyst/Research,https://wuzzuf.net/jobs/p/OXXJeNe42zTd-Data-An...
1,Digital Marketing Data Analyst,"New Cairo, Cairo",Manager,3 to 10 years,Bachelor's Degree,Manager,76,BI,https://wuzzuf.net/jobs/p/0c8j9mDVbnHu-Digital...
2,Master Data Analyst,"Cairo, Egypt",Not specified,Not Specified,Not Specified,Not specified,89,Analysis,https://wuzzuf.net/jobs/p/j0BpuCRPfxtL-Master-...
3,Programme Assistant (Data Analyst),"Cairo, Egypt",Not specified,Not Specified,Not Specified,Not specified,59,Data Analysis,https://wuzzuf.net/jobs/p/VcomUHv4Pk8n-Program...
4,Data Analyst,"Cairo, Egypt",Experienced (Non-Manager),2 to 4 years,Not Specified,Experienced (Non-Manager),43,Marketing/PR/Advertising,https://wuzzuf.net/jobs/p/GU5qkS59l3S7-Data-An...
...,...,...,...,...,...,...,...,...,...
168,"Full Stack Developer (.NET core , Angular 8+) ...","Abu Rawash, Giza",Experienced (Non-Manager),More than 5 years,Bachelor's Degree,Experienced (Non-Manager),43,Commerce,https://wuzzuf.net/jobs/p/tTCAVRc833oK-Full-St...
169,Senior Full Stack Developer (.net core -angular),"Abu Rawash, Giza",Experienced (Non-Manager),4 to 6 years,Not Specified,Experienced (Non-Manager),27,Project Management,https://wuzzuf.net/jobs/p/SUuiSGXK76ON-Senior-...
170,Monitoring And Evaluation Officer,"Sheikh Zayed, Giza",Experienced (Non-Manager),2 to 5 years,Not Specified,Experienced (Non-Manager),8,Software,https://wuzzuf.net/jobs/p/IiS2nxViL5Ua-Monitor...
171,Technical Lead Full Stack,"Sheikh Zayed, Giza",Experienced (Non-Manager),5 to 7 years,Not Specified,Experienced (Non-Manager),309,Business,https://wuzzuf.net/jobs/p/j9r5RobbaBnl-Technic...


#### Save data as Excel File

In [37]:
file_path = r'C:\Users\Khallaf\datasc.xlsx'

# Save the DataFrame to an Excel file
df.to_excel(file_path, index=False)

print(f"DataFrame saved to {file_path}")

DataFrame saved to C:\Users\Khallaf\datasc.xlsx
