### Preparation
First, make sure to install the python libraries listed in <b>requirements.txt</b> and to place the emergency calls data file in <b>current folder + "data/112_calls_data.csv"</b> (it is too large to add to git so you need to store it locally)

In [1]:
f = open('requirements.txt', 'r')
print(f.read())

geopandas, dtale, scikit-learn, pandas-profiling


Next, import the packages we use in this notebook

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import palettable as pltt
import seaborn as sns
from sklearn import preprocessing
import random
import sys
import dtale
import pandas_profiling as pp # !conda install --yes --prefix {sys.prefix} pandas-profiling



# Load, clean and rename data

## 1. Emergency calls data
For explanation of features, <a href="https://cusp.tbm.tudelft.nl/courses/epa1316/project/project-02.pdf"> click here </a>

In [10]:
# Load 112_calls_data.csv
%time em_calls = pd.read_csv("data/112_calls_data.csv", sep = ",")

CPU times: total: 1min 26s
Wall time: 1min 39s


In [11]:
print('{} different cities'.format(len(em_calls['wplNam'].unique())))
print('{} different municipalities'.format(len(em_calls['gemName'].unique())))
em_calls.shape

3326 different cities
404 different municipalities


(4775504, 31)

In [18]:
th = em_calls[em_calls['wplNam']=='Den Haag'] # the hague
print('For The Hague:')
print('\n{}\n'.format(th.shape))
print('priority\tcount')
for prio in th['pmePrioLevel'].unique():
    count = len(th[th['pmePrioLevel']==prio])
    print('{}\t\t{}'.format(prio, count))

th.head(2)

For The Hague:

(251719, 31)

priority	count
1		120735
2		127659
\N		2967
4		312
3		46


Unnamed: 0,pmeId,pmeTimeStamp,pmeProtocol1,pmeProtocol2,pmeTarget,pmeMessage,pmePrio,pmePrioLevel,pmeDienst,pmeStrippedMessage,...,pme_strId,pme_wplId,pme_gemId,pme_proId,pme_vrgId,pmeCapCodes,pmeLifeLiner,pme_catId,wplNam,gemName
3,12284702,2017-01-01 00:00:37,FLEX-A,1600,AORG,A1 Goudsbloemlaan 71-79 DHG 2565CP : 15101 Rit...,A1,1,A,15101 Ritnummer: 1,...,199928,4896,391,12,5170,1520001,\N,\N,Den Haag,Den Haag
31,12284730,2017-01-01 00:04:52,FLEX-A,1600,AORG,P 1 Buitenbrand afval/rommel Drebbelstraat DHG...,P 1,1,B,Buitenbrand afval/rommel 7630,...,199712,4896,391,12,5170,1500148150063215039022029568,\N,\N,Den Haag,Den Haag


In [16]:
%time dtale.show(th)





CPU times: total: 8.88 s
Wall time: 11.6 s




## 2. The Hague demographic data
Find dashboard with these variables <a href="https://denhaag.incijfers.nl/Jive?workspace_guid=83bb13aa-9f6f-49ab-ba68-f8fb2ffa2e34"> here </a>. They all refer to households, not persons, except 65 and older.

In [19]:
# load CBS data per neighborhood
data = pd.read_csv("data/th_neighborhoods_demographics.csv", sep = ";")

# drop 'Onbekend'
i = data.loc[data['Buurten'] == 'Onbekend'].index
data.drop(index=i, inplace=True)

# translate column labels
dictionary = {'Buurten': 'Neighborhood', 
        '65 jaar en ouder': '65_older_persons',
        "% 65 jaar en ouder":'65_older_%',
        'Verhouding b.v.o. medische voorzieningen/ totale gebiedsoppervlakte': 'medical_fac_rel_area',
        'Verhouding b.v.o. sportvoorzieningen/ totale gebiedsoppervlakte': 'sports_fac_rel_area',
        'Verhouding b.v.o. recreatieve voorzieningen/ totale gebiedsoppervlakte': 'leisure_fac_rel_area',
        'Gem. besteedbaar gestandaardiseerd part. huishoudens inkomen': 'disposable_income_std',
        'Aantal huishoudens': 'households_count',
        '% Met migratieachtergrond': 'migration_background_%',
        'Gemiddelde achterstandsscore': 'disadvantage_score',
        'Totaal doelgroephuishoudens met inkomen tot 130%':'130%_poverty_line_count'}
data.rename(columns=dictionary, inplace=True)
 
# set Neighborhood as index
data = data.set_index('Neighborhood', drop=True)

# make strings numerical
data.replace(',', '.', regex=True, inplace=True) # comma -> point
data.replace('x', 'NaN', regex=True, inplace=True) # 'x' -> NaN
data.replace('-', 'NaN', regex=False, inplace=True) # 'x' -> NaN

for i in range(6):
    data.iloc[:,i] = data.iloc[:,i].astype(float)

# check outcome
data.info()
data.head(5)


In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`



<class 'pandas.core.frame.DataFrame'>
Index: 114 entries, 01 Oud Scheveningen to 121 Rietbuurt
Data columns (total 10 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   65_older_persons         110 non-null    float64
 1   65_older_%               110 non-null    float64
 2   medical_fac_rel_area     113 non-null    float64
 3   sports_fac_rel_area      113 non-null    float64
 4   leisure_fac_rel_area     113 non-null    float64
 5   disposable_income_std    106 non-null    float64
 6   households_count         111 non-null    float64
 7   migration_background_%   111 non-null    object 
 8   disadvantage_score       102 non-null    object 
 9   130%_poverty_line_count  111 non-null    float64
dtypes: float64(8), object(2)
memory usage: 9.8+ KB


Unnamed: 0_level_0,65_older_persons,65_older_%,medical_fac_rel_area,sports_fac_rel_area,leisure_fac_rel_area,disposable_income_std,households_count,migration_background_%,disadvantage_score,130%_poverty_line_count
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
01 Oud Scheveningen,551.0,18.6,0.0,0.0,0.01,30700.0,1618.0,26.2,-5.9,300.0
02 Vissershaven,815.0,18.8,0.0,0.0,0.0,32500.0,2303.0,33.2,-2.9,300.0
03 Scheveningen Badplaats,972.0,17.3,0.0,0.01,0.01,36700.0,3132.0,37.5,-7.1,200.0
04 Visserijbuurt,862.0,21.5,0.01,0.0,0.01,30700.0,2072.0,37.2,-4.4,300.0
05 v Stolkpark/Schev Bosjes,209.0,25.7,0.01,0.0,0.0,75600.0,406.0,47.8,-23.3,0.0


In [20]:
test = gpd.GeoDataFrame()
test.head()