# WG - Gesucht Data Analysis 
#### Step 3: Price Analysis

##### This notebook explores the data of the WG-Gesucht dataset. The process includes the following: 
- Looking at the average and median Rent per Suburb in Berlin 
- View the extrema in each Suburb 
- Compare the median rent for shared flats with general housing market in Berlin

In [69]:
import pandas as pd
from datetime import datetime
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import rc
import src.language as ln
import src.style as st
import matplotlib
import numpy as np

In [70]:
df = pd.read_parquet('Data/apartmentsBerlinDataCleaned.parquet')

# Settings
langauge = 'german'
font = {'fontname': 'Calibri'}
# Say, "the default sans-serif font is COMIC SANS"
matplotlib.rcParams['font.sans-serif'] = "Calibri"
# Then, "ALWAYS use sans-serif fonts"
matplotlib.rcParams['font.family'] = "sans-serif"
pd.options.mode.copy_on_write = True


### Step 1: The median Rent per Suburb 

In this step, we will look at the average and median rent per suburb in Berlin.


In [71]:
df_price_analysis = df[['apartmentID', 'title', 'room_size', 'total_rent', 'suburb', 'apartment_size', 'max_roommate']]

# Clean the data and convert the columns to the right data type for calculations
df_price_analysis.loc[:, 'room_size'] = df_price_analysis['room_size'].str.replace('m²', '').astype(float)
df_price_analysis.loc[:, 'apartment_size'] = df_price_analysis['apartment_size'].str.replace('m²', '').astype(float)
df_price_analysis.loc[:, 'total_rent'] = df_price_analysis['total_rent'].str.replace('€', '').astype(float)
df_price_analysis.loc[:, 'max_roommate'] = df_price_analysis['max_roommate'].astype(float)

For the follwing, you can choose excluding the top and bottom percentages of the data.

For example 5%.  

In [72]:
# Settings:  
exclude_top_bottom_10percent = True  # Set to True to exclude, or False to include all data
columns_to_trim = ['room_size', 'total_rent']
bottom_quantile = 0.05
top_quantile = 0.95

In [73]:
def trim_dataset_all_columns(df, columns, exclude_top_bottom_10percent, bottom_quantile, top_quantile):

    if exclude_top_bottom_10percent:
        conditions = [df[col].between(df[col].quantile(bottom_quantile), df[col].quantile(top_quantile)) for col in columns]
        combined_condition = conditions[0]
        for condition in conditions[1:]:
            combined_condition &= condition
        return df[combined_condition]
    else:
        return df

In [74]:
# Apply trimming across specified columns
df_price_analysis = trim_dataset_all_columns(df_price_analysis, columns_to_trim, exclude_top_bottom_10percent, bottom_quantile, top_quantile)

Adjust the squaremeter price per room with the shared squaremeters by all roomates for better comparison. 

Assumption: 
- All people pay equal prizes in the apartment 
- Each person has same roomsize
- Resulting in a shared space calculated by apartment size subtracted by people living the apartment times average room size. 

In [75]:
# Calculating mean room size 
average_room_size = df_price_analysis['room_size'].mean()

# Filter only the ads where the apartment size is available
df_price_analysis_adjusted = df_price_analysis[df_price_analysis['apartment_size'].notnull()]

In [76]:
df_price_analysis_adjusted.loc[:, 'adjusted_roomsize_in_m2'] = df_price_analysis_adjusted['room_size'] + ((df_price_analysis_adjusted['apartment_size'] - (average_room_size * df_price_analysis_adjusted['max_roommate']))/df_price_analysis_adjusted['max_roommate'])

df_price_analysis_adjusted = df_price_analysis_adjusted[df_price_analysis_adjusted['adjusted_roomsize_in_m2'] >= 0]

df_price_analysis_adjusted.loc[:, 'adjusted_price_per_m2'] = df_price_analysis_adjusted['total_rent'] / df_price_analysis_adjusted['adjusted_roomsize_in_m2']

In [77]:
df_price_analysis.loc[:, 'price_per_m2'] = df_price_analysis['total_rent'] / df_price_analysis['room_size']

In [78]:
df_price_per_suburb = df_price_analysis.groupby('suburb').agg({'price_per_m2': 'mean'}).reset_index()

df_price_per_suburb_adjusted = df_price_analysis_adjusted.groupby('suburb').agg({'adjusted_price_per_m2': 'mean'}).reset_index()

In [79]:
df_price_per_suburb = df_price_per_suburb.sort_values('price_per_m2', ascending=False)

print('The highest price per m2 top 5 suburbs are:')
st.display_as_table(df_price_per_suburb[['suburb', 'price_per_m2']].head()) 

df_price_per_suburb_adjusted = df_price_per_suburb_adjusted.sort_values('adjusted_price_per_m2', ascending=False)

print('The highest adjusted price per m2 top 5 suburbs are:')
st.display_as_table(df_price_per_suburb_adjusted[['suburb', 'adjusted_price_per_m2']].head())

The highest price per m2 top 5 suburbs are:


Unnamed: 0,suburb,price_per_m2
2,Blankenfelde,44.545455
53,Rudow,39.955065
8,Friedrichsfelde,39.789453
52,Rosenthal,39.617521
16,Halensee,38.071429


The highest adjusted price per m2 top 5 suburbs are:


Unnamed: 0,suburb,adjusted_price_per_m2
51,Rosenthal,404.240696
70,Wittenau,41.830188
52,Rudow,41.444615
2,Blankenfelde,32.511221
58,Staaken,31.878933
