# Inspector Lookup Scraper Project

## Objective
The goal of this project is to create a Python-based web scraping tool to automate the extraction of inspector data from the **Inspector Lookup** page of the Wisconsin Department of Safety and Professional Services website.

---

## Features
- Automates the selection of the "Boiler and Unfired Pressure Vessels" program.
- Extracts inspector details:
  - Inspector Name
  - Program Area
  - Email
  - Phone Number
  - Counties (one row per county)
- Ensures compliance with web scraping best practices, including checking the website's `robots.txt` file.
- Saves the extracted data into a clean, structured CSV file.

---

## Implementation

### 1. Import Libraries and Set Up Logging
This step initializes the required libraries and sets up logging for better debugging and transparency.



In [3]:
import time
import csv
import logging
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from urllib.robotparser import RobotFileParser

# Initialize logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")


### 2. Check `robots.txt` Compliance
Ensure that the scraping activity is allowed by checking the website's `robots.txt` file.


In [None]:
def check_robots_txt(base_url, user_agent):
    robots_url = f"{base_url}/robots.txt"
    logging.info(f"Checking robots.txt at {robots_url}...")
    
    rp = RobotFileParser()
    rp.set_url(robots_url)
    rp.read()
    return rp.can_fetch(user_agent, base_url)

# Base URL and User-Agent
base_url = "https://esla.wi.gov"
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"

# Check robots.txt
if not check_robots_txt(base_url, user_agent):
    logging.error("Scraping is not allowed by the robots.txt file. Exiting...")
    exit(1)

### 3. Configure Selenium WebDriver
Set up Selenium WebDriver with a `User-Agent` to mimic a real browser and run it in headless mode for efficient background execution.


In [None]:
# Add User-Agent to mimic a browser
chrome_options = Options()
chrome_options.add_argument("--headless")  # Run in headless mode
chrome_options.add_argument(f"user-agent={user_agent}")

# Initialize WebDriver
logging.info("Initializing WebDriver...")
driver = webdriver.Chrome(options=chrome_options)


### 4. Scraping the Inspector Lookup Page
The script interacts with the dropdown menu, clicks the search button, and extracts the table data.


In [None]:
# Target URL
url = "https://esla.wi.gov/inspectorlookup"

# Open the page
logging.info("Opening the target URL...")
driver.get(url)
time.sleep(2)

# Select the Program Area dropdown
logging.info("Selecting 'Boiler and Unfired Pressure Vessels' from the dropdown...")
program_dropdown = Select(driver.find_element(By.ID, "j_id0:j_id70:programArea"))
program_dropdown.select_by_visible_text("Boiler and Unfired Pressure Vessels")
time.sleep(1)

# Click Search
logging.info("Clicking the 'Search' button...")
driver.find_element(By.CSS_SELECTOR, ".btn.btn-primary.searchButton").click()
time.sleep(3)


### 5. Save Data to a CSV File
Save the extracted data into a CSV file. Each county listed in the table gets its own row for clean and structured output.


In [None]:
# Prepare CSV file
output_file = "inspectors_expanded.csv"
logging.info(f"Saving scraped data to {output_file}...")

with open(output_file, mode="w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["Inspector Name", "Program Area", "Email", "Phone Number", "County"])

    # Scrape the table
    rows = driver.find_elements(By.CSS_SELECTOR, ".table.table-striped.no-footer.dataTable tbody tr")
    for row in rows:
        cols = row.find_elements(By.TAG_NAME, "td")
        inspector_name = cols[0].text
        program_area = cols[1].text

        # Extract Contact Information (both spans)
        contact_spans = cols[2].find_elements(By.TAG_NAME, "span")
        email = contact_spans[0].text if len(contact_spans) > 0 else ""
        phone = contact_spans[1].text if len(contact_spans) > 1 else ""

        # Split the County column into individual counties
        counties = cols[3].text.split(", ")

        # Create a row for each county
        for county in counties:
            writer.writerow([inspector_name, program_area, email, phone, county])
            logging.info(f"Added record for {inspector_name} in {county}.")


### 6. Clean Up
Close the Selenium WebDriver after completing the scraping process.

In [None]:
# Close WebDriver
driver.quit()
logging.info("Web scraping completed successfully.")

print(f"Data saved to {output_file}")

---

## Output

The resulting CSV file (`inspectors_expanded.csv`) contains the following columns:
- Inspector Name
- Program Area
- Email
- Phone Number
- County (one row per county)

### Sample Data:

| Inspector Name   | Program Area                      | Email                           | Phone Number    | County    |
|-------------------|-----------------------------------|---------------------------------|-----------------|-----------|
| Dean Y   | Boiler and Unfired Pressure Vessels | dean.y@example.com     | (866) 123-4567  | Buffalo   |
| Dean Y    | Boiler and Unfired Pressure Vessels | dean.y@example.com     | (866) 123-4567  | Jackson   |

---

## Conclusion
This project demonstrates how to:
1. Scrape dynamic content using Selenium.
2. Respect `robots.txt` compliance and web scraping best practices.
3. Save structured data to a CSV file for easy analysis and usage.

This notebook showcases practical skills in web scraping, data organization, and Python programming. Let me know if you'd like to see more projects like this!
