<a href="https://colab.research.google.com/github/piotrevolta/krakow-apartment-price-prediction/blob/main/notebooks/01_data_collection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Collection – Apartment Prices in Kraków

## Goal
The goal of this notebook is to collect raw apartment listing data
that will be used in further stages of the data science project.

At this stage:
- no data cleaning is performed
- no analysis is done
- data is saved in raw form


## Data source
Planned data source:
- Otodom.pl (public real estate listings)

Important notes:
- Data collection is for educational purposes only
- No personal data will be collected
- Scraping logic will be implemented in a separate module


## Data collection logic

Scraping logic will be implemented in:
- src/scraping.py

This notebook will only call that logic and save results to:
- data/raw/


In [None]:
import pandas as pd

In [149]:
#Repository creation on Colab based on GitHub

from pathlib import Path

%cd /content

REPO_URL = "https://github.com/piotrevolta/krakow-apartment-price-prediction.git"
REPO_DIR = Path("/content/krakow-apartment-price-prediction")

if not REPO_DIR.exists():
    !git clone {REPO_URL}

%cd /content/krakow-apartment-price-prediction
!pwd
!ls


/content
/content/krakow-apartment-price-prediction
/content/krakow-apartment-price-prediction
data  notebooks  README.md  reports  requirements.txt  src


In [153]:
#repository synchronization GitHub -> Colab
!git pull --rebase

import importlib
import src.scraping as sc

importlib.reload(sc)
from src.scraping import collect_raw_listings

print("Imported from:", sc.__file__)


remote: Enumerating objects: 7, done.[K
remote: Counting objects:  14% (1/7)[Kremote: Counting objects:  28% (2/7)[Kremote: Counting objects:  42% (3/7)[Kremote: Counting objects:  57% (4/7)[Kremote: Counting objects:  71% (5/7)[Kremote: Counting objects:  85% (6/7)[Kremote: Counting objects: 100% (7/7)[Kremote: Counting objects: 100% (7/7), done.[K
remote: Compressing objects:  50% (1/2)[Kremote: Compressing objects: 100% (2/2)[Kremote: Compressing objects: 100% (2/2), done.[K
remote: Total 4 (delta 2), reused 4 (delta 2), pack-reused 0 (from 0)[K
Unpacking objects:  25% (1/4)Unpacking objects:  50% (2/4)Unpacking objects:  75% (3/4)Unpacking objects: 100% (4/4)Unpacking objects: 100% (4/4), 564 bytes | 564.00 KiB/s, done.
From https://github.com/piotrevolta/krakow-apartment-price-prediction
   aaf0754..7964719  main       -> origin/main
Rebasing (1/7)Rebasing (2/7)Rebasing (3/7)Rebasing (4/7)Rebasing (5/7)Rebasing (6/7)Rebasing (7/7)[KSuccessfully r

In [154]:
df_raw = collect_raw_listings()
print("df_raw shape:", df_raw.shape)

out_dir = Path("data/raw")
out_dir.mkdir(parents=True, exist_ok=True)

out_path = out_dir / "apartments_krakow_raw.csv"
df_raw.to_csv(out_path, index=False)

print("Saved:", out_path.resolve())
print("Files in data/raw:", [p.name for p in out_dir.glob("*")])


df_raw shape: (45, 6)
Saved: /content/krakow-apartment-price-prediction/data/raw/apartments_krakow_raw.csv
Files in data/raw: ['apartments_krakow_raw.csv', '.gitkeep', '.ipynb_checkpoints']


Unnamed: 0,title,price,area_m2,rooms,url,source
0,,,,,/,otodom
1,,,,,/pl/wyniki/sprzedaz/mieszkanie/cala-polska,otodom
2,,,,,/pl/wyniki/sprzedaz/mieszkanie/malopolskie,otodom
3,,,,,/pl/wyniki/sprzedaz/mieszkanie/malopolskie/kra...,otodom
4,Nowe| Wykończone pod klucz - standard 6000 pln/m2,"{'value': 829000, 'currency': 'PLN', '__typena...",39.41,,[lang]/ad/nowe-wykonczone-pod-klucz-standard-6...,otodom


In [155]:

df_raw.head(50)


Unnamed: 0,title,price,area_m2,rooms,url,source
0,,,,,/,otodom
1,,,,,/pl/wyniki/sprzedaz/mieszkanie/cala-polska,otodom
2,,,,,/pl/wyniki/sprzedaz/mieszkanie/malopolskie,otodom
3,,,,,/pl/wyniki/sprzedaz/mieszkanie/malopolskie/kra...,otodom
4,Nowe| Wykończone pod klucz - standard 6000 pln/m2,"{'value': 829000, 'currency': 'PLN', '__typena...",39.41,,[lang]/ad/nowe-wykonczone-pod-klucz-standard-6...,otodom
5,Essa Kliny,,,,,otodom
6,4-pokojowe mieszkanie 88m2 + balkon,"{'value': 1145300, 'currency': 'PLN', '__typen...",88.1,,[lang]/ad/4-pokojowe-mieszkanie-88m2-balkon-ID...,otodom
8,3-pokojowe mieszkanie 56m2 + balkon Bezpośrednio,"{'value': 673183, 'currency': 'PLN', '__typena...",56.57,,[lang]/ad/3-pokojowe-mieszkanie-56m2-balkon-be...,otodom
10,4-pokojowe mieszkanie 74m2 + balkon Bezpośrednio,"{'value': 877383, 'currency': 'PLN', '__typena...",74.99,,[lang]/ad/4-pokojowe-mieszkanie-74m2-balkon-be...,otodom
12,3-pokojowe mieszkanie 56m2 + balkon Bez Prowizji,"{'value': 683287, 'currency': 'PLN', '__typena...",56.47,,[lang]/ad/3-pokojowe-mieszkanie-56m2-balkon-be...,otodom
