# **Housing Price Analysis in Poznań**

## Notebook 1: Static Data Collection from Gratka.pl

Welcome to the first notebook of the *Housing Price Analysis in Poznań* project.  
In this notebook, we will collect static data about available apartments listed on the website **Gratka.pl**.

### Objectives of this Notebook:
1. Understand the structure of data available on Gratka.pl.
2. Extract relevant details such as price, location, size, and other key attributes of listed apartments.
3. Prepare the dataset for further analysis in subsequent notebooks.

---

In [90]:
import time
import os
from datetime import datetime
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException

# Firefox browser configuration
options = Options()
options.headless = True  # Set to False if you want to see the browser in action

# Get the current date and set the path to the folder
current_date = datetime.now().strftime("%d-%m-%Y")
base_folder = os.path.abspath(os.path.join(os.getcwd(), "../data/raw"))  # ../data/raw
output_folder = os.path.join(base_folder, current_date)

# Create the folder if it does not exist
os.makedirs(output_folder, exist_ok=True)

try:
    # Initialize the WebDriver
    driver = webdriver.Firefox(options=options)
    driver.set_page_load_timeout(30)  # Maximum page load time (in seconds)
    base_url = "https://gratka.pl/nieruchomosci/mieszkania/poznan/wtorny?page={page}&location[map]=1&location[map_bounds]=52.509141,17.0716593:52.2919161,16.7316724&sort=relevance"
    max_page = 40  # Maximum page number to process

    for page in range(1, max_page + 1):
        # Generate the URL for the given page
        url = base_url.format(page=page)
        print(f"Navigating to page: {url}")

        # Open the page
        driver.get(url)

        # Handle the "accept and close" button only on the first page
        if page == 1:
            try:
                WebDriverWait(driver, 10).until(
                    EC.element_to_be_clickable((By.CSS_SELECTOR, 'button[aria-label="accept and close"]'))
                ).click()
                print("Clicked the 'accept and close' button.")
            except TimeoutException:
                print("The 'accept and close' button was not found or was not clickable.")

        # Wait until the key element on the page (e.g., listings container) is loaded
        try:
            WebDriverWait(driver, 15).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, 'div.card__outer'))
            )
            print("Page elements loaded.")
        except TimeoutException:
            print("The page did not fully load within the specified time.")

        # Additional wait for the first page to ensure everything is fully loaded
        if page == 1:
            print("Additional wait for the first page to fully load...")
            time.sleep(5)

        # Get the HTML content of the page
        page_source = driver.page_source

        # Save to a file in the folder named after the current date
        file_name = os.path.join(output_folder, f"output_page_{page}.html")
        with open(file_name, "w", encoding="utf-8") as file:
            file.write(page_source)
        print(f"HTML of page {page} saved to file '{file_name}'.")

        # Wait before proceeding to the next page
        time.sleep(5)

except Exception as e:
    print(f"An error occurred: {e}")

finally:
    # Close the browser
    driver.quit()
    print("The browser has been closed.")


Przechodzenie do strony: https://gratka.pl/nieruchomosci/mieszkania/poznan/wtorny?page=1&location[map]=1&location[map_bounds]=52.509141,17.0716593:52.2919161,16.7316724&sort=relevance
Kliknięto przycisk 'accept and close'.
Elementy strony załadowane.
Dodatkowe oczekiwanie na pełne załadowanie pierwszej strony...
HTML strony 1 zapisany w pliku '/Users/less4spares/projects/Housing_Price_Analysis_in_Poznan/data/raw/19-12-2024/output_page_1.html'.
Przechodzenie do strony: https://gratka.pl/nieruchomosci/mieszkania/poznan/wtorny?page=2&location[map]=1&location[map_bounds]=52.509141,17.0716593:52.2919161,16.7316724&sort=relevance
Elementy strony załadowane.
HTML strony 2 zapisany w pliku '/Users/less4spares/projects/Housing_Price_Analysis_in_Poznan/data/raw/19-12-2024/output_page_2.html'.
Przechodzenie do strony: https://gratka.pl/nieruchomosci/mieszkania/poznan/wtorny?page=3&location[map]=1&location[map_bounds]=52.509141,17.0716593:52.2919161,16.7316724&sort=relevance
Elementy strony załado

KeyboardInterrupt: 