## **Web Scraping of Used Car Data from Carpages.ca**

- This project scrapes used car listings from `carpages.ca`, collecting the **title, ...., price, ...., mileage, city**  posted for each car across 5 pages. The data is extracted using requests and BeautifulSoup, stored in a list, converted to a **pandas DataFrame**, and **saved as a CSV **file for future use.

### **Importing Required Libraries**
- `requests` for making HTTP requests.
- `BeautifulSoup` from `bs4` for parsing HTML.
- `time` for delays between requests.
- `pandas` for creating and saving data in tabular format.

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

In [None]:
# https://www.carpages.ca/used-cars/search/?num_results=50&fueltype_id%5B0%5D=3&fueltype_id%5B1%5D=7

## **Defining URLs and Collecting Data**

In [None]:
# Base URL and Headers
base_url = "https://www.carpages.ca/used-cars/search/"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
}

### **Initialize Empty List for Car Data**
- Creates an empty list car_data that will be used to store all scraped car information as dictionaries.

In [None]:
# List to hold all car data
car_data = []

### **Scrape Car Listings from Multiple Pages**
Loops through pages **1 to 5** of the used car listings.

For each page:

- Builds the URL.
- Sends an HTTP GET request.
- Parses the HTML with BeautifulSoup.
- Selects each car listing using a CSS selector.
- Extracts title, price, kilometers, and time posted.
- Appends the information as a dictionary into the car_data list.

Prints success message per page and delays 2 seconds between requests to avoid being blocked.

In [None]:
# Number of pages to scrape
for page in range(1, 6):  # Scrape first 5 pages (set according to our need)
    print(f"Scraping page {page}...")
    params = {
        'num_results': 50,
        'fueltype_id[0]': 3,
        'fueltype_id[1]': 7,
        'page': page
    }

    # Request the page
    response = requests.get(base_url, headers=headers, params=params)
    soup = BeautifulSoup(response.text, 'html.parser')

Scraping page 1...
Scraping page 2...
Scraping page 3...
Scraping page 4...
Scraping page 5...


In [None]:
    # All car cards
    car_cards = soup.find_all("div", class_="tw:flex tw:gap-6 tw:items-start tw:p-6")

    for card in car_cards:
        try:
            title = card.find('h4', class_='hN').text.strip()
        except:
            title = None

        try:
            subtitle = card.find_all('h5', class_='hN tw:text-gray-500')[0].text.strip()
        except:
            subtitle = None

        try:
            color = card.find('span', class_='tw:text-sm tw:font-bold').text.strip()
        except:
            color = None

        try:
            price = card.find('span', class_='tw:font-bold tw:text-xl').text.strip()
        except:
            price = None

        try:
            dealer = card.find_all('h5', class_='hN')[1].text.strip()
        except:
            dealer = None

        try:
            # Mileage from span tags inside the mileage div
            mileage_div = card.find_all('div', class_='tw:text-gray-500')
            mileage = ""
            for div in mileage_div:
                if "KM" in div.text:
                    mileage = div.text.strip().replace("\n", "").replace(" ", "")
                    break
        except:
            mileage = None

        try:
            city = card.find('p', class_='hN').text.strip()
        except:
            city = None

        # Save this car’s info
        car_data.append({
            'Title': title,
            'Subtitle': subtitle,
            'Color': color,
            'Price': price,
            'Dealer': dealer,
            'Mileage': mileage,
            'City': city
        })

    time.sleep(2)  # Be polite to the server

### **Convert to DataFrame and Save to CSV**
- Converts the `car_data` list into a pandas DataFrame, then exports it to a CSV file named `carpages_data.csv` without the index column.

In [None]:
# Export to CSV
df = pd.DataFrame(car_data)
df.to_csv("carpages_data.csv", index=False)
print("Data saved to carpages_data.csv")
df.head()

Data saved to carpages_cars.csv


Unnamed: 0,Title,Subtitle,Price,City,Dealer,Mileage,Color
0,2025 Honda CR-V,Hybrid Touring,"$52,875","Cranbrook, BC",Spring Honda,40KM,Grey
1,2025 Honda CR-V,Hybrid Touring,"$52,875","Cranbrook, BC",Spring Honda,40KM,Blue
2,2025 Mazda CX-70,MHEV GS-L,"$45,995","Cayuga, ON",Haldimand Motors Ltd.,"35,962KM",Black
3,2019 Chevrolet Bolt,EV LT,"$10,469","Kemptville, ON",Myers Automotive Group,"94,650KM",
4,2025 Kia NIRO,HEV SX,"$41,994","Niagara Falls, ON",Cardinal Kia,10KM,Aurora Black Pearl


In [None]:
print("Alhumdulillah (^_^)")

Alhumdulillah (^_^)
