# Analysis of Family Houses and Apartments in Spanish (2019)

## Table of Contents

- [Introduction](#introduction)
- [Objectives](#objectives)
- [Data Source and Structure](#data-source-and-structure)
  - [Dataset Variables](#dataset-variables)

## Introduction

The family housing market in Spain reached approximately €165.79 billion in 2023 and is expected to grow at a compound annual rate of 5.8%, reaching around €276.69 billion by 2032. These figures highlight the importance and expansion of this sector in the Spanish economy.

Understanding regional particularities is crucial for buyers, sellers, and investors, as it provides valuable insights into the factors influencing prices and regional variations. This work aims to analyze apartments and family houses to offer valuable support to stakeholders, helping them make informed decisions in this evolving market.

## Objectives

The primary goal of this project is to perform a statistical analysis of apartments and family houses for sale in various Spanish provinces during 2019. The analysis will employ geolocation techniques and data visualization tools such as Tableau. The aim is to provide a detailed understanding of the characteristics and spatial distribution of the Spanish housing market during this period. Given the target audience of U.S. investors and private buyers, terminology will be adapted to align with their context. The study aims to achieve the following objectives:

- Clean and explore the data to ensure accuracy and reliability.
- Use geolocation techniques to convert property addresses into geographic coordinates.
- Calculate and summarize key statistical measures for property prices and characteristics.
- Visualize the geographic distribution and identify regional price differences.
- Analyze correlations between property features and their prices.
- Apply clustering techniques to identify patterns and segment the real estate market.
- Perform hypothesis testing to compare prices across different regions and assess the impact of specific features.
- Develop and validate linear regression models to predict property prices.
- Design and build an interactive Tableau dashboard for dynamic data exploration.
- Present key findings from the data analysis and provide practical recommendations based on the insights gained.


## Data Source and Structure

The dataset used in this project was sourced from Kaggle: [Spanish Housing Dataset](https://www.kaggle.com/datasets/thedevastator/spanish-housing-dataset-location-size-price-and/data) 

Was originally collected through web scraping from the [Idealista S.A.U.](https://www.idealista.com/) by github user [trueuoc](https://github.com/trueuoc) and published under the license: [CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/legalcode.en)

It contains key variables that describe real estate listings, including property details (e.g., price, size, condition), location (e.g., city, district), and amenities (e.g., air conditioning, garden, pool). The data is structured with clear identifiers and includes both categorical and numerical variables, allowing for robust analysis.

### Dataset Variables

- **ad_description**: Property listing description.
- **ad_last_update**: Date of the last listing update.
- **air_conditioner**: Whether the property has air conditioning (0: No, 1: Yes).
- **balcony**: Whether the property has a balcony (0: No, 1: Yes).
- **bath_num**: Number of bathrooms in the property.
- **built_in_wardrobe**: Whether the property has built-in wardrobes (0: No, 1: Yes).
- **chimney**: Whether the property has a chimney (0: No, 1: Yes).
- **condition**: Condition of the property (e.g., second-hand/good condition).
- **construct_date**: Year of construction of the property.
- **energetic_certif**: Property’s energy certification.
- **floor**: Floor on which the property is located.
- **garage**: Indicates whether it has a garage space.
- **garden**: Whether the property has a garden (0: No, 1: Yes).
- **ground_size**: Size of the property's land.
- **heating**: Whether the property has heating (0: No, 1: Yes).
- **house_id**: A unique property ID number.
- **house_type**: Type of housing.
- **kitchen**: Whether the property has a kitchen.
- **lift**: Whether the property has an elevator (0: No, 1: Yes).
- **loc_city**: City where the property is located.
- **loc_district**: District.
- **loc_full**: Full address of the property.
- **loc_neigh**: Neighborhood.
- **loc_street**: Street.
- **loc_zone**: Zone.
- **m2_real**: Actual square meters of the property.
- **m2_useful**: Usable square meters.
- **obtention_date**: Date when the data was collected.
- **orientation**: Property's orientation.
- **price**: Property price.
- **reduce_mobility**: Whether the house is adapted for people with reduced mobility.
- **room_num**: Number of rooms in the property.
- **storage_room**: Whether the property has a storage room (0: No, 1: Yes).
- **swimming_pool**: Whether the property has a swimming pool (0: No, 1: Yes).
- **terrace**: Whether the property has a terrace (0: No, 1: Yes).
- **unfurnished**: Whether the property is unfurnished.
- **number_of_companies_prov**: Number of companies in the province.
- **population_prov**: Population of the province.
- **companies_prov_vs_national_%**: Percentage of companies in the province compared to the national total.
- **population_prov_vs_national_%**: Percentage of the population in the province compared to the national total.
- **renta_media_prov**: Average income in the province.