<a href="https://colab.research.google.com/github/piotrevolta/krakow-apartment-price-prediction/blob/main/notebooks/01_data_collection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Collection – Apartment Prices in Kraków

## Goal
The goal of this notebook is to collect raw apartment listing data
that will be used in further stages of the data science project.

At this stage:
- no data cleaning is performed
- no analysis is done
- data is saved in raw form


## Data source
Planned data source:
- Otodom.pl (public real estate listings)

Important notes:
- Data collection is for educational purposes only
- No personal data will be collected
- Scraping logic will be implemented in a separate module


## Data collection logic

Scraping logic will be implemented in:
- src/scraping.py

This notebook will only call that logic and save results to:
- data/raw/


In [1]:
import pandas as pd

In [2]:
#Repository creation on Colab based on GitHub

from pathlib import Path

%cd /content

REPO_URL = "https://github.com/piotrevolta/krakow-apartment-price-prediction.git"
REPO_DIR = Path("/content/krakow-apartment-price-prediction")

if not REPO_DIR.exists():
    !git clone {REPO_URL}

%cd /content/krakow-apartment-price-prediction
!pwd
!ls


/content
Cloning into 'krakow-apartment-price-prediction'...
remote: Enumerating objects: 180, done.[K
remote: Counting objects: 100% (180/180), done.[K
remote: Compressing objects: 100% (154/154), done.[K
remote: Total 180 (delta 76), reused 75 (delta 22), pack-reused 0 (from 0)[K
Receiving objects: 100% (180/180), 69.87 KiB | 3.68 MiB/s, done.
Resolving deltas: 100% (76/76), done.
/content/krakow-apartment-price-prediction
/content/krakow-apartment-price-prediction
 data			   notebooks   reports		  src
'Fields description.txt'   README.md   requirements.txt


In [7]:
#repository synchronization GitHub -> Colab
#!git pull --rebase

import importlib
import src.scraping as sc

importlib.reload(sc)
from src.scraping import collect_raw_listings, enrich_with_details

print("Imported from:", sc.__file__)


Imported from: /content/krakow-apartment-price-prediction/src/scraping.py


In [8]:
df_raw = collect_raw_listings(max_pages=1)
df_enriched = enrich_with_details(df_raw, max_details=30, sleep_s=1.0)

print("df_raw shape:", df_raw.shape)
print("df_enriched shape:", df_enriched.shape)

out_dir = Path("data/raw")
out_dir.mkdir(parents=True, exist_ok=True)

out_path = out_dir / "apartments_krakow_enriched.csv"
df_enriched.to_csv(out_path, index=False)

print("Saved:", out_path.resolve())
print("Files in data/raw:", [p.name for p in out_dir.glob("*")])



df_raw shape: (17, 9)
df_enriched shape: (17, 19)
Saved: /content/krakow-apartment-price-prediction/data/raw/apartments_krakow_enriched.csv
Files in data/raw: ['.gitkeep', 'apartments_krakow_enriched.csv']


In [9]:

df_enriched.head(50)

Unnamed: 0,listing_url,address_text,address_street,address_subdistrict,address_district,address_city,address_voivodeship,price_text,price_per_m2_text,area_text,floor_text,rooms_count_text,has_garden,has_balcony,has_parking,has_basement,year_built_text,has_elevator,has_storage
0,https://www.otodom.pl/pl/oferta/sprzedam-2-pok...,"Jana Kantego Federowicza, Skotniki, Dębniki, K...",Jana Kantego Federowicza,Skotniki,Dębniki,Kraków,małopolskie,539 000 zł,16 152 zł/m²,33.37 m²,1/4,2,0,0,0,0,2024.0,0,0
1,https://www.otodom.pl/pl/oferta/sprzedam-3-pok...,"Jana Kantego Federowicza, Skotniki, Dębniki, K...",Jana Kantego Federowicza,Skotniki,Dębniki,Kraków,małopolskie,786 000 zł,14 354 zł/m²,54.76 m²,parter/4,3,0,0,0,0,2024.0,0,0
2,https://www.otodom.pl/pl/oferta/sprzedam-miesz...,"Jana Kantego Federowicza, Skotniki, Dębniki, K...",Jana Kantego Federowicza,Skotniki,Dębniki,Kraków,małopolskie,810 000 zł,14 816 zł/m²,54.67 m²,1/4,3,0,0,0,0,2024.0,0,0
3,https://www.otodom.pl/pl/oferta/sprzedam-2-pok...,"Jana Kantego Federowicza, Skotniki, Dębniki, K...",Jana Kantego Federowicza,Skotniki,Dębniki,Kraków,małopolskie,548 000 zł,16 353 zł/m²,33.51 m²,parter/4,2,0,0,0,0,2024.0,0,0
4,https://www.otodom.pl/pl/oferta/urocze-mieszka...,"ul. Jonatana Warszauera, Kazimierz, Stare Mias...",ul. Jonatana Warszauera,Kazimierz,Stare Miasto,Kraków,małopolskie,789 000 zł,19 627 zł/m²,40.2 m²,3,1,0,0,0,1,,0,0
5,https://www.otodom.pl/pl/oferta/2-pokoje-balko...,"Dąbie, Grzegórzki, Kraków, małopolskie",,Dąbie,Grzegórzki,Kraków,małopolskie,735 000 zł,19 865 zł/m²,37 m²,4/6,2,0,1,1,0,2023.0,1,0
6,https://www.otodom.pl/pl/oferta/sprzedam-miesz...,"ul. Gnieźnieńska, Azory, Prądnik Biały, Kraków...",ul. Gnieźnieńska,Azory,Prądnik Biały,Kraków,małopolskie,425 000 zł,17 708 zł/m²,24 m²,3/10,1,0,0,0,1,1970.0,1,0
7,https://www.otodom.pl/pl/oferta/swietna-lokali...,"ul. Żwirki i Wigury, Rakowice, Prądnik Czerwon...",ul. Żwirki i Wigury,Rakowice,Prądnik Czerwony,Kraków,małopolskie,415 000 zł,13 108 zł/m²,31.66 m²,parter/2,2,0,0,0,0,1910.0,0,0
8,https://www.otodom.pl/pl/oferta/doskonale-skom...,"os. Na Lotnisku, Nowe Bieńczyce, Bieńczyce, Kr...",os. Na Lotnisku,Nowe Bieńczyce,Bieńczyce,Kraków,małopolskie,435 000 zł,11 785 zł/m²,36.91 m²,2/4,2,0,0,0,1,1970.0,0,0
9,https://www.otodom.pl/pl/oferta/2-pokojowe-mie...,"ul. Piasta Kołodzieja, Mistrzejowice Północ, M...",ul. Piasta Kołodzieja,Mistrzejowice Północ,Mistrzejowice,Kraków,małopolskie,Zapytaj o cenę,,37.25 m²,5/8,2,0,1,1,0,2025.0,1,0
