# Data Science Jobs - API Scraping Notebook

## Project: Data Science Job Market Analysis
**Author:** Mayenmein Terence Sama Aloah Jr
**Date:** 09/23/2025  
**Description:** This notebook handles the API-based scraping of data science job postings from Found.dev API.

In [4]:
import sys
import os
from pathlib import Path
import pandas as pd
import numpy as np
import requests
import time
from datetime import datetime
import json
# Add src to path
sys.path.insert(0, '..')
print("Libraries imported successfully!")

Libraries imported successfully!


## 1. Project Setup and Configuration
Configure paths and import the scraping module.

In [5]:
# Configuration
DATA_RAW_PATH = Path('../data/raw')

# Create directories if they don't exist
DATA_RAW_PATH.mkdir(parents=True, exist_ok=True)

print(f"📁 Data will be saved to: {DATA_RAW_PATH.absolute()}")

📁 Data will be saved to: c:\Users\MARIE\Desktop\scrape job details\notebooks\..\data\raw


In [6]:
# Import the scraping function
try:
    from scr.scraping.scrape_jobs import scrape_in_batches, fetch_jobs
    print("✅ Custom scraping modules imported successfully!")
except ImportError as e:
    print(f"❌ Error importing custom modules: {e}")    

✅ Custom scraping modules imported successfully!


## 2. API Connection Test
Test the API connection with a single page request.

In [None]:
# Test API connection
print("🧪 Testing API connection...")

try:
    test_data = fetch_jobs(page=1, skill="Data Science", ai=True)
    jobs_count = len(test_data.get("jobs", []))
    print(f"✅ API connection successful! Found {jobs_count} jobs on page 1")
    
    # Display sample job structure
    if jobs_count > 0:
        sample_job = test_data["jobs"][0]
        print("\n📋 Sample job structure:")
        print(json.dumps(sample_job, indent=2)[:500] + "...")
        
except Exception as e:
    print(f"❌ API test failed: {e}")

## 3. Scraping Parameters Configuration
Configure the scraping parameters for the batch process.

In [None]:
# Scraping configuration
SCRAPING_CONFIG = {
    "skill": "Data Science",
    "pages_per_batch": 20,
    "ai": True,
    "delay": 1,
    "start_page": 41,
    "start_batch": 3
}

print("⚙️  Scraping Configuration:")
for key, value in SCRAPING_CONFIG.items():
    print(f"   {key}: {value}")