# 01 - Data Collection 

This notebook collects and prepares all raw data for the project:

1. Load official BFS Bauperiode categories (construction period categories)
2. Save BFS Bauperiode categories in a SQLite database
3. Scrape rental listings from Homegate for 3 cantons
4. Save scraped data (CSV + SQLite)


## Load BFS Bauperiode categories

We uploaded a cleaned CSV file containing only the official BFS construction
period categories. This file contains a single column with 12 Bauperiode labels.

In [2]:
import pandas as pd

df_bauperiode = pd.read_csv("../Data/bfs_bauperiode_categories.csv")
df_bauperiode


Unnamed: 0,Bauperiode
0,Vor 1919 erbaut
1,Zwischen 1919 und 1945 erbaut
2,Zwischen 1946 und 1960 erbaut
3,Zwischen 1961 und 1970 erbaut
4,Zwischen 1971 und 1980 erbaut
5,Zwischen 1981 und 1990 erbaut
6,Zwischen 1991 und 2000 erbaut
7,Zwischen 2001 und 2005 erbaut
8,Zwischen 2006 und 2010 erbaut
9,Zwischen 2011 und 2015 erbaut


## Save Bauperiode Categories to SQLite

We store the Bauperiode categories in a SQLite database so that they can be used
later when we classify scraped listings into construction-period groups


In [3]:
import sqlite3

conn = sqlite3.connect("../Data/apartment_database.db")
df_bauperiode.to_sql("bfs_bauperiode_categories", conn, if_exists="replace", index=False)
conn.close()


# Web Scraping (Homegate)

We scrape rental listings from Homegate for three selected cantons.

For each listing, we collect:
- Rent (CHF)
- Area (mÂ²)
- Rooms
- Address
- Canton
- Year built (if available)

This data will later be cleaned and assigned to the BFS Bauperiode categories


## Define Homegate Scraper Function

This function scrapes Homegate listing cards from search result pages.
We extract price, area, rooms, and address where available.
Missing fields are handled with try/except to avoid crashes.
