# A look into Real Estate trends in Ireland. 

## Goal of the Analysis and Business questions

The analysis will use the open source data set that can be found IE Registry website (https://www.propertypriceregister.ie/). We assume the perspective of an investor that is willing to bet in the Real Estate Irish market. We want to provide insights to make sure that we drive the investments where it makes more sense, i.e. we have the best ReturnOnInvestment (ROI). 

These are the questions we want to answer: 

1. Where do we want to invest first? Practically, it would be useful to have a stack rank of regions where we could potentially have the highest ROIs
2. Is this the right moment to invest? Practically, use the historical data to have a simple forecast for the next 6-12 months of the region we believe is worth investing
3. What is the most relevant type of real estate we should be focusing on? Practically, we want to evaluate wether it makes sense to invest in condos or single house units, etc. 

## Structure of the Notebook
1. Import libraries and load the data
2. Clean up and manipulate the data
3. Exploration and visualization. First level of Insights
4. Model training and testing. Secon level of Inisghts
5. Wrap-up



## 1. Import libraries and load the data

In [1]:
# data manipulation and cleaning
import numpy as np
import pandas as pd
from pandas.tseries.offsets import MonthEnd
import datetime
from functools import reduce



# visualisation
import matplotlib.pyplot as plt
import matplotlib.ticker as tick
import seaborn as sns

plt.style.use('ggplot')

In [2]:
# import csv
raw = pd.read_csv('historical_data_registry.csv')

# create a copy for manipulation
df = raw.copy()

In [3]:
# basic dataframe evaluation
print(df.info()); print(df.shape); print(df.describe()); df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 410523 entries, 0 to 410522
Data columns (total 10 columns):
id                           410523 non-null int64
Date of Sale (dd/mm/yyyy)    410523 non-null object
Address                      410523 non-null object
Postal Code                  76876 non-null object
County                       410523 non-null object
Price                        410523 non-null object
Not Full Market Price        410523 non-null object
VAT Exclusive                410523 non-null object
Description of Property      410523 non-null object
Property Size Description    52567 non-null object
dtypes: int64(1), object(9)
memory usage: 31.3+ MB
None
(410523, 10)
                  id
count  410523.000000
mean   205329.797042
std    118744.625993
min         1.000000
25%    102631.500000
50%    205262.000000
75%    307892.500000
max    900559.000000


Unnamed: 0,id,Date of Sale (dd/mm/yyyy),Address,Postal Code,County,Price,Not Full Market Price,VAT Exclusive,Description of Property,Property Size Description
0,1,01/01/2010,"5 Braemor Drive, Churchtown, Co.Dublin",,Dublin,343000.0,No,No,Second-Hand Dwelling house /Apartment,
1,2,03/01/2010,"134 Ashewood Walk, Summerhill Lane, Portlaoise",,Laois,185000.0,No,Yes,New Dwelling house /Apartment,greater than or equal to 38 sq metres and less...
2,11,04/01/2010,"16 Aisling Geal, Fr. Russell Road",,Limerick,110000.0,No,No,New Dwelling house /Apartment,greater than or equal to 38 sq metres and less...
3,21,04/01/2010,"48 KILLIANS COURT, MULLAGH",,Cavan,122000.0,No,Yes,New Dwelling house /Apartment,greater than 125 sq metres
4,35,04/01/2010,"Knock, Lanesboro",,Longford,125000.0,No,No,Second-Hand Dwelling house /Apartment,


First thing I want to do some manipulation to the column names and index: 

- The date column is not in the right format, this can create problems during analysis. We change it to the right format
- We set the index to the date column to make it easier to create charts and analyze later
- We change the column names to reflect the pep8 format



In [4]:
# Prepare dict names and modify the headers in the dataframe and rename columns
columns = {
    'id' : 'id', 
    'Date of Sale (dd/mm/yyyy)': 'date_of_sale', 
    'Address': 'address', 
    'Postal Code' : 'postal_code', 
    'County' : 'county',
    'Price' : 'price', 
    'Not Full Market Price' : 'not_full_market_price', 
    'VAT Exclusive' : 'vat_exclusive',
    'Description of Property' : 'property_description', 
    'Property Size Description' : 'property_size'
}

df = df.rename(columns = columns)

# change to datetime format and set date as index
df['date_of_sale'] = pd.to_datetime(df['date_of_sale'])

df.set_index('date_of_sale', inplace = True)

In [6]:
df.head()

Unnamed: 0_level_0,id,address,postal_code,county,price,not_full_market_price,vat_exclusive,property_description,property_size
date_of_sale,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2010-01-01,1,"5 Braemor Drive, Churchtown, Co.Dublin",,Dublin,343000.0,No,No,Second-Hand Dwelling house /Apartment,
2010-03-01,2,"134 Ashewood Walk, Summerhill Lane, Portlaoise",,Laois,185000.0,No,Yes,New Dwelling house /Apartment,greater than or equal to 38 sq metres and less...
2010-04-01,11,"16 Aisling Geal, Fr. Russell Road",,Limerick,110000.0,No,No,New Dwelling house /Apartment,greater than or equal to 38 sq metres and less...
2010-04-01,21,"48 KILLIANS COURT, MULLAGH",,Cavan,122000.0,No,Yes,New Dwelling house /Apartment,greater than 125 sq metres
2010-04-01,35,"Knock, Lanesboro",,Longford,125000.0,No,No,Second-Hand Dwelling house /Apartment,
