# Remote Job Market Intelligence using Ethical Web Scraping
Project Type: Data Science Internship Project


# 1. Project Introduction & Business Context


Why This Project Matters
In today's job market, data is everything. Companies, recruiters, and business analysts constantly ask questions like: What skills are most in-demand right now? Which job titles pay the highest? How is the remote work market growing? These questions cannot be answered through guesswork—they require real data collected systematically from job boards.

This project teaches you how to collect that data ethically and legally. Remote OK is one of the largest job boards for remote work positions, with thousands of job postings updated daily. By scraping this data responsibly, you will understand how real companies and data professionals extract market intelligence without breaking laws or harming the platforms they use.

# Libraries Installation 

In [1]:
pip install requests


Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.




Installs the requests library, which is used to send HTTP requests in Python.
It allows easy interaction with APIs and web services using methods like GET and POST.

In [7]:
!pip install beautifulsoup4


Defaulting to user installation because normal site-packages is not writeable




Installs BeautifulSoup4, a Python library used for parsing and extracting data from HTML/XML documents.
Commonly used with requests for web scraping and DOM navigation.

Job Title

In [49]:
import requests
import time

url = "https://remoteok.com/api"
response = requests.get(url)

data = response.json()
jobs = data[1:]

count = 0
for job in jobs:
    title = job.get("position")
    if title:
        count += 1
        print(f"{count}. {title}")
        time.sleep(1)   

print("\nTotal jobs found:", count)


1. SQUIRE
2. Coding Bootcampï¹£Job Guaranteed
3. Executive Assistant Operations Virtual Assistant
4. Marketing &amp; Sponsorship Coordinator
5. Staff Software Engineer Money
6. Senior Cloud Engineer
7. Regional Director Public Sector Sales DOW
8. Senior Staff AI Software Engineer
9. Security Architect
10. Member of Technical Staff Fullstack Engineer
11. Staff Software Engineer Simulation
12. Software Engineer Networking Software and Services
13. Client Success Executive Wayforge
14. WNS Global Services
15. Virtual Operations Manager
16. Client Success Manager Coupa
17. Social Media Manager Platform
18. IT Operations Specialist
19. Senior Software Engineer
20. Senior Specialist Senior Accountant Shared Financial Services
21. Senior Product Manager
22. Senior Engineering Manager Infrastructure
23. Work From Home Benefits Services Representative
24. Senior Infrastructure Engineer
25. Senior Software Developer II Cloud
26. HR Administrator
27. Product Designer
28. Client Success Manager
29

This code retrieves job listings from the RemoteOK API and outputs job titles sequentially.
A 1-second delay (time.sleep(1)) is applied during iteration while counting and displaying the total number of job titles retrieved.

Company Name

In [50]:
data = response.json()

jobs = data[1:]

count = 0
for job in jobs:
    company = job.get("company")
    if company:
        count += 1
        print(f"{count}. {company}")
        time.sleep(1)   

print("\nTotal companies found:", count)


1. SQUIRE
2. Metana
3. Assist World
4. Assist World
5. Galileo Financial Technologies
6. ClickHouse
7. Chainguard
8. Aegis Ventures
9. Dexterity
10. Inflection AI
11. Muon Space
12. xAI
13. SBI Growth
14. WNS Global Services
15. Women Builders Council
16. CrossCountry Consulting
17. Digital Media Management
18. MyFunded Futures
19. Chainguard
20. Make-A-Wish America
21. Agile Six
22. MZLA Technologies Corporation
23. Global Elite Texas
24. Paradigm Health
25. Life360
26. Offshore Launch
27. Ahrefs
28. Offshore Launch
29. MicroVentures
30. Dune
31. Databricks
32. Xpansiv
33. INFUSE
34. Chainguard
35. Unlimit
36. Alpaca
37. AllTrails
38. Teleport
39. Redwood Materials
40. NASA Federal Credit Union
41. Koala Health
42. MyFunded Futures
43. CloudWalk
44. Mento
45. Thimble
46. Terra
47. Private Health Management
48. Trend Health Partners
49. Railroad19, Inc
50. HONK
51. Outreach
52. GiveDirectly
53. Calendly
54. FalconX
55. Arize AI
56. Wormhole Foundation
57. Startale
58. InfStones
59. Nan

This code fetches job data from the RemoteOK API and outputs company names sequentially.
A 1-second delay (time.sleep(1)) is applied during iteration while counting and displaying the total number of companies extracted.

Skills

In [51]:
data = response.json()

jobs = data[1:]

count = 0
for job in jobs:
    tags = job.get("tags")
    if tags:
        count += 1
        print(f"{count}. {', '.join(tags)}")
        time.sleep(1)   

print("\nTotal jobs with skills:", count)


1. No Tech Background Needed, Job or 100% Money Back
2. assistant, system, consulting, founder, support, bookkeeping, financial, admin, management, operations, executive
3. coordinator, support, content, marketing, sales, non tech
4. software, finance, financial, banking, excel, analytics, engineer, engineering
5. security, architect, c, cloud, senior, engineer
6. director, security, software, cloud, management, marketing, sales
7. software, python, technical, growth, senior, operational, health, healthcare, engineer, backend
8. security, architect, python, full-stack, docker, software, devops, c++, cloud, robotics, git, operations
9. technical, support, engineer, engineering, backend, fullstack
10. software, design, system, web, cloud, api, operations, engineer, engineering, digital nomad
11. software, support, travel, reliability, engineer, engineering
12. marketing, consulting, growth, management, executive
13. full time, other
14. consulting, manager, saas, support, software, growt


Total jobs with skills: 95


This code retrieves job listings from the RemoteOK API and prints the skills/tags for each job one by one.
A 1-second delay (time.sleep(1)) is applied during iteration while counting the jobs that contain skill information.

Locations

In [52]:
data = response.json()

jobs = data[1:]

count = 0
for job in jobs:
    location = job.get("location")
    if location:
        count += 1
        print(f"{count}. {location}")
        time.sleep(1)   

print("\nTotal jobs with location:", count)


1. Remote
2. New York City
3. United States
4. Washington DC
5. Remote, U.S.
6. Redwood City
7. Palo Alto
8. San Jose
9. Remote
10. Remote
11. India
12. United States
13. United States
14. Remote
15. Remote
16. Remote
17. United States
18. Remote Canada
19. Portland, Maine
20. Remote - United States
21. Remote, Canada
22. Remote
23. Remote
24. Remote
25. Texas
26. London
27. Remote
28. Remote
29. Remote
30. United States
31. Remote - North America
32. Remote
33. United States
34. San Francisco
35. Remote
36. Remote
37. Remote
38. United States
39. Remote
40. United States
41. U.S.
42. Remote
43. Remote
44. U.S. Remote
45. Remote
46. Hyderabad
47. London
48. Remote - US
49. New York City
50. Remote
51. Remote
52. Remote
53. Texas
54. Remote EMEA, Remote Asia
55. Nairobi
56. United States
57. Remote - Anywhere in the U.S.
58. Yerevan
59. Malaysia
60. Remote - North America
61. Remote - Europe
62. New Delhi
63. San Francisco
64. US - Remote
65. Remote
66. Remote
67. Remote - United States

This code fetches job listings from the RemoteOK API and outputs job locations sequentially.
A 1-second delay (time.sleep(1)) is applied after each iteration while counting and displaying jobs with location data.

Job Type

In [53]:
count = 0

job_type_keywords = [
    "full-time", "contract", "part-time", "freelance", "internship"
]

for job in jobs:
    tags = job.get("tags", [])
    job_types = [tag for tag in tags if tag.lower() in job_type_keywords]

    if job_types:
        count += 1
        print(f"{count}. {', '.join(job_types)}")
        time.sleep(1)   

print("\nTotal jobs with job type:", count)


1. part-time
2. full-time
3. part-time
4. part-time
5. full-time
6. full-time
7. full-time
8. full-time
9. full-time

Total jobs with job type: 9


This code identifies job types by matching predefined employment keywords within each job’s tags and prints them one by one.
A 1-second pause using time.sleep(1) is added after each iteration to apply rate limiting.

Date Posted

In [1]:
import requests
from datetime import datetime

url = "https://remoteok.com/api"
response = requests.get(url)
data = response.json()

jobs = data[1:] 

count = 0
for job in jobs:
    epoch_time = job.get("epoch")  
    if epoch_time:
        count += 1
        date_posted = datetime.fromtimestamp(epoch_time)
        print(f"{count}. Date Posted: {date_posted}")

print("\nTotal jobs with date:", count)


1. Date Posted: 2026-01-02 21:30:21
2. Date Posted: 2026-01-02 14:36:50
3. Date Posted: 2026-01-02 10:30:08
4. Date Posted: 2026-01-02 10:30:01
5. Date Posted: 2026-01-02 03:30:55
6. Date Posted: 2026-01-02 03:30:37
7. Date Posted: 2026-01-02 03:30:17
8. Date Posted: 2026-01-02 02:30:15
9. Date Posted: 2026-01-01 03:30:02
10. Date Posted: 2025-12-31 13:30:42
11. Date Posted: 2025-12-31 13:30:18
12. Date Posted: 2025-12-30 20:30:10
13. Date Posted: 2025-12-30 17:30:16
14. Date Posted: 2025-12-30 13:30:29
15. Date Posted: 2025-12-30 10:03:06
16. Date Posted: 2025-12-30 05:31:40
17. Date Posted: 2025-12-30 05:31:33
18. Date Posted: 2025-12-30 05:31:27
19. Date Posted: 2025-12-30 05:31:11
20. Date Posted: 2025-12-30 05:30:49
21. Date Posted: 2025-12-30 05:30:27
22. Date Posted: 2025-12-30 01:30:42
23. Date Posted: 2025-12-29 23:30:55
24. Date Posted: 2025-12-29 23:30:31
25. Date Posted: 2025-12-29 23:30:14
26. Date Posted: 2025-12-29 21:31:31
27. Date Posted: 2025-12-29 21:31:02
28. Date P

This code extracts the job posting date by reading the UNIX timestamp (epoch) associated with each job listing from the RemoteOK API. The timestamp is converted into a human-readable date format to analyze hiring velocity and job posting trends over time. Each job’s posting date is printed sequentially, enabling easy review of recent and older listings.

Job URL

In [3]:
import requests

url = "https://remoteok.com/api"
response = requests.get(url)
data = response.json()

jobs = data[1:]  

count = 0
for job in jobs:
    job_url = job.get("url")
    if job_url:
        count += 1
        print(f"{count}. Job URL:{job_url}")

print("\nTotal job URLs found:", count)


1. Job URL:https://remoteOK.com/remote-jobs/remote-squire-squire-1129396
2. Job URL:https://remoteOK.com/remote-jobs/remote-coding-bootcamp-job-guaranteed-metana-1129395
3. Job URL:https://remoteOK.com/remote-jobs/remote-executive-assistant-operations-virtual-assistant-assist-world-1129394
4. Job URL:https://remoteOK.com/remote-jobs/remote-marketing-amp-sponsorship-coordinator-assist-world-1129393
5. Job URL:https://remoteOK.com/remote-jobs/remote-staff-software-engineer-money-galileo-financial-technologies-1129390
6. Job URL:https://remoteOK.com/remote-jobs/remote-senior-cloud-engineer-clickhouse-1129389
7. Job URL:https://remoteOK.com/remote-jobs/remote-regional-director-public-sector-sales-dow-chainguard-1129388
8. Job URL:https://remoteOK.com/remote-jobs/remote-senior-staff-ai-software-engineer-aegis-ventures-1129387
9. Job URL:https://remoteOK.com/remote-jobs/remote-security-architect-dexterity-1129385
10. Job URL:https://remoteOK.com/remote-jobs/remote-member-of-technical-staff-f

This code retrieves the unique job URL associated with each job listing to ensure data traceability and direct access to the original posting. Since the API provides URLs in relative format, the base RemoteOK domain is appended to generate complete, clickable links. Each job URL is printed individually, allowing further inspection or downstream scraping of detailed job information.

 Pagination and Multi-Page Scraping

In [21]:
import requests
import time
import pandas as pd

url = "https://remoteok.com/api"

headers = {
    "User-Agent": "Mozilla/5.0",
    "Accept": "application/json"
}

all_jobs = []

response = requests.get(url, headers=headers)
response.raise_for_status()

data = response.json()
jobs = data[1:]  

for job in jobs:
    job_url = job.get("url")
    if job_url and job_url.startswith("/"):
        job_url = "https://remoteok.com" + job_url

    job_data = {
        "Job Title": job.get("position"),
        "Company Name": job.get("company"),
        "Job Tags / Skills": job.get("tags"),
        "Location": job.get("location"),
        "Job Type": job.get("type"),
        "Date Posted": pd.to_datetime(job.get("epoch"), unit="s"),
        "Job URL": job_url
    }

    all_jobs.append(job_data)
    time.sleep(1)

df = pd.DataFrame(all_jobs)
print("Total jobs scraped:", len(df))

Total jobs scraped: 100


This code fetches job listings from the RemoteOK API, extracts key job details such as title, company, skills, location, job type, posting date, and URL, and stores them in a Pandas DataFrame. A short delay is added between iterations to apply rate limiting and ensure safe data collection.