## Introduction
The data is two files based on the variable collection frequency (monthly and yearly). 
The datasets is primarily centered around the housing market of London. However, it contains a lot of additional relevant data:

Monthly average house prices, Yearly number of houses, Yearly number of houses sold, Yearly percentage of households that recycle,Yearly life satisfaction, Yearly median salary of the residents of the area, Yearly mean salary of the residents of the area, Monthly number of crimes committed, Yearly number of jobs, Yearly number of people living in the area, Area size in hectares.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
import warnings
import plotly.express as px 

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
data = pd.read_csv('/kaggle/input/housing-in-london/housing_in_london_monthly_variables.csv')
data.head()

## Data Processing for housing in london monthly variables:

In [None]:
data.info()

In [None]:
data = data.set_index(pd.to_datetime(data['date']))
data.head()

In [None]:
data['houses_sold'].fillna(data['houses_sold'].mean(), inplace = True)
data['no_of_crimes'].fillna(data['no_of_crimes'].mean(), inplace = True)
data['borough_flag'].fillna(0, inplace =True)

In [None]:
data['houses_sold']=data['houses_sold'].astype('int64')
data['no_of_crimes']=data['no_of_crimes'].astype('int64')

## Removing Unused Columns:

In [None]:
data.drop(['code'], axis=1, inplace=True)

In [None]:
data_borough_flag_1 = data[data['borough_flag']==1]
data_borough_flag_1_mean = data_borough_flag_1.groupby('area').mean().reset_index()
data_borough_flag_1_mean.head()

In [None]:
data_borough_flag_0 = data[data['borough_flag']==0]
data_borough_flag_0_mean = data_borough_flag_0.groupby('area').mean().reset_index()
data_borough_flag_0_mean.head()

## Data Exploranty

In [None]:
data_borough_flag_1.describe()

In [None]:
data_borough_flag_0.describe()

In [None]:
sns.distplot(data_borough_flag_1['average_price']);

In [None]:
sns.distplot(data_borough_flag_0['average_price']);

The distribution here is Double-Peaked or Bimodal with Right-skewed to the area of borough flag 1 & 0.

In [None]:
#skewness and kurtosis for average price to the area of borough flag 1 
print("Skewness for average price to the area of borough flag 1: %f" % data_borough_flag_1['average_price'].skew())
print("Kurtosis for average price to the area of borough flag 1: %f" % data_borough_flag_1['average_price'].kurt())

To the area of borough flag 1:
It's Positive & Highly skewed, Highly kurtosis (Leptokurtic (Kurtosis > 3)): Distribution is longer, tails are fatter. Peak is higher and sharper than Mesokurtic, which means that data are heavy-tailed or profusion of outliers.

In [None]:
#skewness and kurtosis for average price to the area of borough flag 0
print("Skewness for average price to the area of borough flag 0: %f" % data_borough_flag_0['average_price'].skew())
print("Kurtosis for average price to the area of borough flag 0: %f" % data_borough_flag_0['average_price'].kurt())

to the area of borough flag 0:
It's Positive & Highly skewed, lowly kurtosis Platykurtic: (Kurtosis < 3): Distribution is shorter, tails are thinner than the normal distribution. The peak is lower and broader than Mesokurtic, which means that data are light-tailed or lack of outliers.
The reason for this is because the extreme values are less than that of the normal distribution.

In [None]:
sns.set(style="darkgrid")
g = sns.jointplot('average_price','houses_sold', data=data_borough_flag_1, kind="reg", truncate=False, color="r", height=7)
plt.ylabel('Number of houses sold to the area of borough flag 1', fontsize=13)
plt.xlabel('Average Price', fontsize=13)

it's Negative correlation between Number of houses sold and Average price to the area of borough flag 1.

In [None]:
sns.set(style="darkgrid")
g = sns.jointplot('average_price','houses_sold', data=data_borough_flag_0, kind='reg', truncate=False, color='b', height=7)
plt.ylabel('Number of houses sold to the area of borough flag 0', fontsize=13)
plt.xlabel('Average Price', fontsize=13)

Oh same thing, it's Negative correlation between Number of houses sold and Average price to the area of borough flag 0.

In [None]:
sns.set(style="darkgrid")
d = sns.jointplot('average_price','no_of_crimes', data=data_borough_flag_1, kind='reg', truncate=False, color='g', height=7)
plt.ylabel('Number of Crimes', fontsize=13)
plt.xlabel('Average Price', fontsize=13)

Its Moderated Positive correlation between Number of Crimes and Average Price to the area of borough flag 1.

In [None]:
sns.set(style="darkgrid")
d = sns.jointplot('average_price','no_of_crimes', data=data_borough_flag_0, kind='reg', truncate=False, color='gold', 
                  height=7)
plt.ylabel('Number of Crimes', fontsize=13)
plt.xlabel('Average Price', fontsize=13)

 Its Moderated Positive correlation between Number of Crimes and Average Price to the area of borough flag 0.

In [None]:
fig = px.box(data_borough_flag_1, x='area', y='average_price')
fig.update_layout(
    template='gridon',
    title='Average Monthly London House Price to the area of borough flag 1',
    xaxis_title='Area',
    yaxis_title='Average Price (£)',
    xaxis_showgrid=False,
    yaxis_showgrid=False
)
fig.show()

The kensington and chelsea have highest price while the barking and dagenham is lowest price.

In [None]:
fig = px.box(data_borough_flag_0, x='area', y='average_price')
fig.update_layout(
    template='gridon',
    title='Average Monthly London House Price to the area of borough flag 0',
    xaxis_title='Area',
    yaxis_title='Average Price (£)',
    xaxis_showgrid=False,
    yaxis_showgrid=False
)
fig.show()

The inner london have highest price while the north east is lowest price.

In [None]:
fig = px.line(data_borough_flag_1, x='date', y='average_price', color='area', line_shape='hv', title='Average Price area over Years')
fig.update_layout(
    template='gridon',
    title='Average Monthly London House Price over Years to the area of borough flag 1',
    xaxis_title='Year',
    yaxis_title='Average Price (£)',
    xaxis_showgrid=False,
    yaxis_showgrid=False
)
# Show plot 
fig.show()

The kensington and chelsea have highest price while the barking and dagenham have lowest price over years.

In [None]:
fig = px.line(data_borough_flag_0, x='date', y='average_price', color='area', line_shape='hv')
fig.update_layout(
    template='gridon',
    title='Average Monthly London House Price over Years to the area of borough flag 0',
    xaxis_title='Year',
    yaxis_title='Average Price (£)',
    xaxis_showgrid=False,
    yaxis_showgrid=False
)
# Show plot 
fig.show()

The inner london have highest price while the north east have lowest price over years.

In [None]:
fig = px.scatter(data_borough_flag_1, x="date", y="average_price",size="no_of_crimes", color="area")
fig.update_layout(
    template='plotly_dark',
    title='Number of Crimes & Average price over Years to the area of borough flag 1',
    xaxis_title='Year',
    yaxis_title='Average Price (£)',
    xaxis_showgrid=False,
    yaxis_showgrid=False
)
fig.show()

The westminster have highest number of crimes while the city of london have lowest number of crimes over years.

In [None]:
fig = px.scatter(data_borough_flag_0, x="date", y="average_price",size="no_of_crimes", color="area")
fig.update_layout(
    template='plotly_dark',
    title='Number of Crimes & Average price over Years to the area of borough flag 0',
    xaxis_title='Year',
    yaxis_title='Average Price (£)',
    xaxis_showgrid=False,
    yaxis_showgrid=False
)
fig.show()

The east midlands have highest number of crimes & lowest number of crimes over years.

In [None]:
fig = px.line(data_borough_flag_1, x="date", y="houses_sold", color="area")
fig.update_layout(
    template='plotly_dark',
    title='Houses sold over Years to the area of borough flag 1',
    xaxis_title='Year',
    yaxis_title='houses sold',
    xaxis_showgrid=False,
    yaxis_showgrid=False
)
fig.show()

Many area have higest number of houses sold while the city of london have lowest number of houses sold over years.

In [None]:
fig = px.line(data_borough_flag_0, x="date", y="houses_sold", color="area")
fig.update_layout(
    template='plotly_dark',
    title='Houses sold over Years to the area of borough flag 0',
    xaxis_title='Year',
    yaxis_title='houses sold',
    xaxis_showgrid=False,
    yaxis_showgrid=False
)
fig.show()

The england have higest number of houses sold while the north east have lowest number of houses sold over years.

In [None]:
data2=pd.read_csv('/kaggle/input/housing-in-london/housing_in_london_yearly_variables.csv')
data2.head()

## Data Processing for housing in london yearly variables:

In [None]:
data2.info()

In [None]:
data2['median_salary'].fillna(data2['median_salary'].mean(), inplace = True)
data2['population_size'].fillna(data2['population_size'].mean(), inplace = True)
data2['number_of_jobs'].fillna(data2['number_of_jobs'].mean(), inplace = True)

In [None]:
data2['median_salary']=data2['median_salary'].astype('int64')
data2['population_size']=data2['population_size'].astype('int64')
data2['number_of_jobs']=data2['number_of_jobs'].astype('int64')

In [None]:
data2 = data2.set_index(pd.to_datetime(data2['date']))
data2.info()

## Removing Unused Columns:

In [None]:
data2.drop(['code','mean_salary','life_satisfaction', 'recycling_pct', 'area_size', 'no_of_houses'], axis=1,inplace=True)

In [None]:
data2.info()

In [None]:
data2_borough_flag_1 = data2[data2['borough_flag']==1]
data2_borough_flag_1_mean = data2_borough_flag_1.groupby('area').mean().reset_index()
data2_borough_flag_1_mean.head()

In [None]:
data2_borough_flag_0 = data2[data2['borough_flag']==0]
data2_borough_flag_0_mean = data2_borough_flag_0.groupby('area').mean().reset_index()
data2_borough_flag_0_mean.head()

In [None]:
sns.distplot(data2_borough_flag_1['median_salary'])

The Distribution of salaries is symmetric with some outliers to the area of borough flag 1.

In [None]:
sns.distplot(data2_borough_flag_0['median_salary'])

The Distribution of salaries is near to symmetric with some outliers to the area of borough flag 0.

In [None]:
fig = px.scatter(data2_borough_flag_1, x='date', y='median_salary', color='area', size='median_salary')
fig.update_layout(
    template='plotly_dark',
    title='Average salary over Years to the area of borough flag 1',
    xaxis_title='Year',
    yaxis_title='Average Salary (£)',
    xaxis_showgrid=False,
    yaxis_showgrid=False
)
# Show plot 
fig.show()

The city of london have higest average salary & the bromley have lowest average salary over years to the area borough flag 1.

In [None]:
fig = px.scatter(data2_borough_flag_0, x='date', y='median_salary', color='area', size='median_salary')
fig.update_layout(
    template='plotly_dark',
    title='Average salary over Years to the area of borough flag 0',
    xaxis_title='Year',
    yaxis_title='Average Salary (£)',
    xaxis_showgrid=False,
    yaxis_showgrid=False
)
# Show plot 
fig.show()

The inner london have higest average salary & the northern ireland  have lowest average salary over years to the area borough flag 0.

In [None]:
fig = px.pie(data2_borough_flag_1, names='area', values='population_size', color='area')
fig.update_layout(
    template='plotly_white',
    title='Population distribution to area of borough flag 1'
)
fig.show()

The percentages of the size of the population distribution are close for many area & the city of london have lowest percentages of the size of the population distribution for borough flag 1.

In [None]:
fig = px.pie(data2_borough_flag_0, names='area', values='population_size', color='area')
fig.update_layout(
    template='plotly_white',
    title='Population distribution to area of borough flag 0'
)
fig.show()

The united kingdom have highest percentages of the size of the population distribution & the northern ireland have lowest 
percentages of the size of the population distribution for borough flag 0.

In [None]:
fig = px.pie(data2_borough_flag_1, names='area', values='number_of_jobs', color='area')
fig.update_layout(
    template='plotly_white',
    title='Number of Jobs distribution to area of borough flag 1'
)
fig.show()

The westminster area is the first in number of jobs & the barking and dagenham is the last in number of jobs to area of borough flag 1.

In [None]:
fig = px.pie(data2_borough_flag_0, names='area', values='number_of_jobs', color='area')
fig.update_layout(
    template='plotly_white',
    title='Number of Jobs distribution to area of borough flag 0'
)
fig.show()

The united kingdom area is the first in number of jobs & the northern ireland is the last in number of jobs to area of borough flag 0.

## Conclusion
House prices, number of crimes, salaries and Population density in London are high and increasing while houses sold are in constant fluctuation.