<a href="https://colab.research.google.com/github/piotrevolta/krakow-apartment-price-prediction/blob/main/notebooks/01_data_collection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Collection – Apartment Prices in Kraków

## Goal
The goal of this notebook is to collect raw apartment listing data
that will be used in further stages of the data science project.

At this stage:
- no data cleaning is performed
- no analysis is done
- data is saved in raw form


## Data source
Planned data source:
- Otodom.pl (public real estate listings)

Important notes:
- Data collection is for educational purposes only
- No personal data will be collected
- Scraping logic will be implemented in a separate module


## Data collection logic

Scraping logic will be implemented in:
- src/scraping.py

This notebook will only call that logic and save results to:
- data/raw/


In [None]:
import pandas as pd

In [1]:
#Repository creation on Colab based on GitHub

from pathlib import Path

%cd /content

REPO_URL = "https://github.com/piotrevolta/krakow-apartment-price-prediction.git"
REPO_DIR = Path("/content/krakow-apartment-price-prediction")

if not REPO_DIR.exists():
    !git clone {REPO_URL}

%cd /content/krakow-apartment-price-prediction
!pwd
!ls


/content
Cloning into 'krakow-apartment-price-prediction'...
remote: Enumerating objects: 158, done.[K
remote: Counting objects: 100% (158/158), done.[K
remote: Compressing objects: 100% (136/136), done.[K
remote: Total 158 (delta 67), reused 64 (delta 18), pack-reused 0 (from 0)[K
Receiving objects: 100% (158/158), 55.58 KiB | 2.06 MiB/s, done.
Resolving deltas: 100% (67/67), done.
/content/krakow-apartment-price-prediction
/content/krakow-apartment-price-prediction
 data			   notebooks   reports		  src
'Fields description.txt'   README.md   requirements.txt


In [3]:
#repository synchronization GitHub -> Colab
#!git pull --rebase

import importlib
import src.scraping as sc

importlib.reload(sc)
from src.scraping import collect_raw_listings

print("Imported from:", sc.__file__)


Imported from: /content/krakow-apartment-price-prediction/src/scraping.py


In [4]:
df_raw = collect_raw_listings(max_pages=1)
df_enriched = enrich_with_details(df_raw, max_details=10)

print("df_raw shape:", df_raw.shape)
print("df_enriched shape:", df_enriched.shape)

out_dir = Path("data/raw")
out_dir.mkdir(parents=True, exist_ok=True)

out_path = out_dir / "apartments_krakow_enriched.csv"
df_enriched.to_csv(out_path, index=False)

print("Saved:", out_path.resolve())
print("Files in data/raw:", [p.name for p in out_dir.glob("*")])



df_raw shape: (19, 9)
Saved: /content/krakow-apartment-price-prediction/data/raw/apartments_krakow_raw.csv
Files in data/raw: ['apartments_krakow_raw.csv', '.gitkeep']


In [5]:

df_raw.head(50)

Unnamed: 0,listing_url,address_text,address_street,address_subdistrict,address_district,address_city,address_voivodeship,price_text,price_per_m2_text
0,https://www.otodom.pl/pl/oferta/bronowice-wyja...,"ul. Odlewnicza, Małe Błonia, Krowodrza, Kraków...",ul. Odlewnicza,Małe Błonia,Krowodrza,Kraków,małopolskie,1 224 960 zł,22 004 zł/m²
1,https://www.otodom.pl/pl/oferta/bronowice-apar...,"ul. Odlewnicza, Małe Błonia, Krowodrza, Kraków...",ul. Odlewnicza,Małe Błonia,Krowodrza,Kraków,małopolskie,2 310 630 zł,21 000 zł/m²
2,https://www.otodom.pl/pl/oferta/ruczaj-3-pokoj...,"ul. Lubostroń, Ruczaj, Dębniki, Kraków, małopo...",ul. Lubostroń,Ruczaj,Dębniki,Kraków,małopolskie,1 200 000 zł,17 956 zł/m²
3,https://www.otodom.pl/pl/oferta/ruczaj-2-pokoj...,"ul. Karola Bunscha, Płaszów, Podgórze, Kraków,...",ul. Karola Bunscha,Płaszów,Podgórze,Kraków,małopolskie,"698 006,71 zł",13 434 zł/m²
4,https://www.otodom.pl/pl/oferta/czyzyny-2-poko...,"ul. Śliwkowa, Czyżyny, Czyżyny, Kraków, małopo...",ul. Śliwkowa,Czyżyny,Czyżyny,Kraków,małopolskie,605 100 zł,18 198 zł/m²
5,https://www.otodom.pl/pl/oferta/4-pokoje-pradn...,"ul. Piaszczysta, Prądnik Biały Zachód, Prądnik...",ul. Piaszczysta,Prądnik Biały Zachód,Prądnik Biały,Kraków,małopolskie,1 756 350 zł,15 000 zł/m²
6,https://www.otodom.pl/pl/oferta/3-pokoje-pradn...,"ul. Piaszczysta, Prądnik Biały Zachód, Prądnik...",ul. Piaszczysta,Prądnik Biały Zachód,Prądnik Biały,Kraków,małopolskie,912 608 zł,15 200 zł/m²
7,https://www.otodom.pl/pl/oferta/inwestycja-3-p...,"ul. Ugorek, Rakowice, Prądnik Czerwony, Kraków...",ul. Ugorek,Rakowice,Prądnik Czerwony,Kraków,małopolskie,749 000 zł,13 187 zł/m²
8,https://www.otodom.pl/pl/oferta/dochodowa-inwe...,"ul. Wojciecha Weissa, Azory, Prądnik Biały, Kr...",ul. Wojciecha Weissa,Azory,Prądnik Biały,Kraków,małopolskie,1 090 000 zł,15 430 zł/m²
9,https://www.otodom.pl/pl/oferta/nowa-inwestycj...,"ul. Mogilska, Grzegórzki Północ, Grzegórzki, K...",ul. Mogilska,Grzegórzki Północ,Grzegórzki,Kraków,małopolskie,"760 469,05 zł",18 810 zł/m²
