Skip to content

This project analyzes house prices in Madrid, Spain using Python and several machine learning libraries. The project assumes a basic understanding of data analysis and machine learning concepts.

License

Notifications You must be signed in to change notification settings

ericmg97/madrid_houses_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

House Price Analysis in Madrid

Immune

This project analyzes house prices in Madrid, Spain using Python and several machine learning libraries. The project assumes a basic understanding of data analysis and machine learning concepts, and requires the following steps to install and use:

Installation

  1. Create a Python environment using your preferred method (e.g. conda, virtualenv, etc.).
  2. Activate the environment and navigate to the project directory.
  3. Install the required packages using pip and the requirements.txt file:
pip install -r requirements.txt
  1. Install the utils module by running the following command from the project directory:
pip install -e src/
  1. Start a JupyterLab server by running the following command:
jupyter lab

Alternatively, you can use the ipykernel package to select the kernel directly from the environment inside VSCode.

Usage

  1. Navigate to the notebooks directory and open the desired notebook.
  2. Execute the cells in the notebook to preprocess the data, perform exploratory data analysis, and build and evaluate machine learning models.
  3. The data is stored in the data directory, which contains four subfolders:
    • raw: contains the raw training and testing data in CSV format.
    • processed: contains the processed data in CSV format.
    • models: contains the trained machine learning models as pickle files, along with performance metrics as JSON files.
    • submission: contains the submission files in CSV format.
  4. The src directory contains a Python module with the necessary sklearn transformers for ETL and utility functions.
  5. The notebooks directory contains the notebooks to execute to verify all the steps followed for the analysis of the houses in Madrid.

Directory Structure

house_price_analysis/
├── data/
│   ├── raw/
│   │   ├── train.csv
│   │   └── predict.csv
│   ├── processed/
│   │   ├── train.csv
│   │   └── test.csv
│   ├── models/
│   │   ├── model_1.pkl
│   ├── metrics/
│   │   └── model_1.json
│   └── submission/
│       ├── submission_1.csv
│       └── submission_2.csv
├── src/
│   ├── utils/
│   │   ├── transformers.py
│   │   ├── paths.py
│   │   ├── functions.py
│   │   └── __init__.py
│   ├── pyproject.toml
│   ├── setup.cfg
│   └── setup.py
└── notebooks/
    ├── 01_EDA.ipynb
    └── 02_Modeling.ipynb

This directory structure shows the organization of the project. The data directory contains the raw and processed data, as well as the models and submission files. The src directory contains the Python module with the necessary transformer and utility functions. The notebooks directory contains the notebooks to execute to verify all the steps followed for the analysis of the houses in Madrid.

Data

The data used for this project is from the Kaggle competition "Machine Learning Avanzado I - Hands-on". The data is split into two files: train.csv and predict.csv. The train.csv file contains the training data, which includes the target variable buy_price_by_area. The predict.csv file contains the submission data, which does not include the target variable. The goal of the project is to predict the buy_price_by_area of the houses in the predict.csv file.

About

This project analyzes house prices in Madrid, Spain using Python and several machine learning libraries. The project assumes a basic understanding of data analysis and machine learning concepts.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published