# 1. Definition
## 1.1 Project Overview
From December 2008 until December 2018, the mean property prices in all of Norway have increased by 82%. However, for the region of Oslo, the growth for the same period was 121%[[1]](https://krogsveen.no/Boligprisstatistikk/(county)/Oslo 'Krogsveen Norge'). This complication makes the purchase of property in the biggest urban center in Norway a process having an inherent financial risk.

The purpose of this project is to provide a comprehensive view of the property market of Oslo and Akershuhs. Such analysis would inform a prudent purchase of real estate by the user in the above-mentioned areas. The main goals of investigation for the analysis are to:
* Estimate a reasonable price per square meter in various parts of Oslo and Akershus
* Understand the factors that have the highest influence on the price per square meter
* Detect areas with overvalued and undervalued real estate prices
* Find and recommend properties suitable for purchase based on the findings from this analysis

The datasets used in this analysis are:
* Real estate dataset
* Post codes dataset
* Google places dataset
* Dataset by the National Health Institute referred to also as FHI

## 1.2 Problem Statement
## 1.3 Metrics
# 2. Analysis
## 2.1 Data Exploration

In [14]:
import pandas as pd
import numpy as np
from sklearn import preprocessing as preprocess
from google.cloud import bigquery
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from folium import plugins
import datashader as ds
from datashader import transfer_functions as tf
import utils as u

In [2]:
pd.set_option('display.max_colwidth', 20)
pd.set_option('display.max_columns', 40)
pd.set_option('display.max_rows', 40)
from IPython.display import display, HTML
%matplotlib inline
sns.set(style='dark')
plt.rcParams['figure.figsize'] = [14, 10]

In [3]:
queries = u.get_queries('queries')

In [4]:
bq_client = bigquery.Client()

In [5]:
data = u.get_real_estate_data(queries, bq_client)

In [6]:
data = data.join(u.get_energy_cols(data))
data.drop('energy_character', axis=1, inplace=True)

In [8]:
data['post_code'] = data.address.apply(u.get_postcode_from_address)

expected string or bytes-like object
expected string or bytes-like object


In [9]:
data['price_per_sq_m'] = data.price.div(data.primary_size).round(0)

In [10]:
data['lat_bin'] = pd.cut(data['lat'], bins=25, precision=5)
data['lng_bin'] = pd.cut(data['lng'], bins=25, precision=5)

In [11]:
scaler = preprocess.MinMaxScaler()
data['price_per_sq_m_scaled'] = scaler.fit_transform(data['price_per_sq_m'].to_frame())

In [12]:
data.head()

Unnamed: 0,ad_id,apt_id,new_building,num_bedrooms,floor,primary_size,total_size,price,property_type,ownership_type,construction_year,common_expenses,brokerage_expenses,common_wealth,common_debt,time_s,lat,lng,address,energy_letter,energy_color,post_code,price_per_sq_m,lat_bin,lng_bin,price_per_sq_m_scaled
0,69757913,34,True,2.0,,89.0,93.0,3211622.0,Leilighet,Eier (Selveier),2019.0,,,,0.0,2771.0,59.807806,10.882621,"Siggerudbråten, ...",B,oransje,2080,36086.0,"(59.79092, 59.84...","(10.86124, 10.91...",0.066072
1,69757913,41,True,2.0,2.0,89.0,93.0,3261622.0,Leilighet,Eier (Selveier),2019.0,,,,0.0,2771.0,59.807806,10.882621,"Siggerudbråten, ...",B,oransje,2080,36647.0,"(59.79092, 59.84...","(10.86124, 10.91...",0.067568
2,69757913,43,True,2.0,2.0,89.0,93.0,3308582.0,Leilighet,Eier (Selveier),2019.0,,,,0.0,2771.0,59.807806,10.882621,"Siggerudbråten, ...",B,oransje,2080,37175.0,"(59.79092, 59.84...","(10.86124, 10.91...",0.068976
3,69757913,49,True,2.0,2.0,89.0,93.0,3409162.0,Leilighet,Eier (Selveier),2019.0,,,,0.0,2771.0,59.807806,10.882621,"Siggerudbråten, ...",B,oransje,2080,38305.0,"(59.79092, 59.84...","(10.86124, 10.91...",0.071989
4,69757913,50,True,2.0,2.0,67.0,71.0,2908582.0,Leilighet,Eier (Selveier),2019.0,,,,0.0,2771.0,59.807806,10.882621,"Siggerudbråten, ...",B,oransje,2080,43412.0,"(59.79092, 59.84...","(10.86124, 10.91...",0.085607


# Geographic Distribution

In [None]:
sample_size = 1400
to_plot = data[(data.lat.notnull()) & (data.price_per_sq_m.notnull())].sample(sample_size)

m = folium.Map([59.9116, 10.7545], zoom_start=11)
#mark each station as a point
for index, row in to_plot.iterrows():
    folium.CircleMarker([row['lat'], row['lng']],
                        radius=1,
                        popup=(row['lat'], row['lng']),
                        fill_color="#3db7e4", # divvy color
                       ).add_to(m)
#convert to (n, 2) nd-array format for heatmap
properties_array = to_plot[['lat', 'lng', 'price_per_sq_m_scaled']].values

#plot heatmap
m.add_child(plugins.HeatMap(properties_array, radius=15))
m