## Questions to answer:
- What could define a better or worse neighbourhood?
- Does the quality (bad,normal,good,etc) of the neighbourhood relate with its death rate? and with the birth rate?
- Do the neighbourhoods with higher births or higher population have more schools or hospitals? what about the other facilities (parks, sports centers,etc)

### Values to analyse:
- **Define Nbh quality**:
    - Are there facilities more valuable than others?:
        - top 5 for each facility
        - worst 5
        - top 5 **Nbh** and **Dist** with higher number of facilities
        - worst 5 
<br><br>         
- **Quality and death rate/birth rate**
    - top 5 Nhb and Dist with highest birth/death rates
    - Nbh and Dist quality ranking (very bad, bad, neither, good, very good)
    - find correlation
<br><br>  
- **Higher birth rate Nbh and Dist have more facilities? And schools in particular?**
    - Top 5 Nbh and Dist with highest birth rates
    - Facilities number
    - schools number
<br><br>  
- **Nbh and Dist with higher population number have more facilities? And schools in particular?**
    - Top 5 Nbh and Dist with highest population
    - Facilities number
    - schools number

### Description variables:
- **nbh**: Barcelona's Neighbourhoods
- **bars**: number of bars in 2017
- **children_places**: number of places dedicated to children  in 2017
- **cinemas_theatres**: number of cinemas or theatres  in 2017
- **schools**: number of schools in 2017
- **pre-schools**: number of pre-schools in 2017
- **hospitals**: number of hospitals in 2017
- **libraries_museums**: number of libraries and museums in 2017
- **park_gardens**: number of parks and gardens in 2017
- **sport_centers**: number of sports centers in 2017
- **population**: number of people in 2017
- **net_density(hab/ha)**: density per habitable are (habitants per habitable hectares) in 2017
- **avg_occupation**: average number of people living in a house
- **dist**: Barcelona's district
- **num_immi**: number of immigrants in 2017
- **mort_rate**: number of people that died in 2017
- **rent_price**: price of rent per m2 in 2017
- **num_crimes**: number of crimes during 2017

In [133]:
#import libraries to work
import pandas as pd
import numpy as np

In [134]:
#import the cleaned dataset

Complete=pd.read_csv("../datasets/Data_filtered/complete_dataset.csv")

In [135]:
#Have a quick look

Complete.head(5)

Unnamed: 0.1,Unnamed: 0,nbh,bars,children_places,cinemas_theatres,schools,pre-schools,hospitals,libraries_theatres,park_gardens,...,num_crimes,children_places_pop,cinemas_theatres_pop,schools_pop,pre-schools_pop,hospitals_pop,libraries_theatres_pop,park_gardens_pop,sport_centers_pop,facilities_pop
0,0,Horta,4.0,9.0,3.0,11.0,7.0,3.0,5.0,3.0,...,7871.0,0.336889,0.112296,0.411754,0.262025,0.112296,0.187161,0.112296,0.074864,0.168445
1,1,Navas,2.0,3.0,,6.0,3.0,,,,...,10657.0,0.135569,,0.271137,0.135569,,,,,
2,2,Pedralbes,1.0,7.0,,8.0,41.0,,10.0,9.0,...,7444.0,0.579662,,0.662471,3.395164,,0.828089,0.74528,0.082809,
3,3,Sant Andreu,3.0,22.0,5.0,22.0,17.0,5.0,2.0,5.0,...,10657.0,0.38473,0.087439,0.38473,0.297291,0.087439,0.034975,0.087439,0.052463,0.139902
4,4,Sant Antoni,9.0,8.0,4.0,11.0,11.0,1.0,5.0,7.0,...,46210.0,0.208632,0.104316,0.286869,0.286869,0.026079,0.130395,0.182553,0.026079,0.120615


In [136]:
#Eliminate the index Unnamed:0

Complete=Complete.drop(["Unnamed: 0"],axis=1)

In [137]:
#Make a copy to modify stuff

Complete.to_csv("../datasets/Data_filtered/Analysis_M.csv",index=False)

***

In [138]:
#Import the copy

Analysis=pd.read_csv("../datasets/Data_filtered/Analysis_M.csv")


In [139]:
Analysis.columns

Index(['nbh', 'bars', 'children_places', 'cinemas_theatres', 'schools',
       'pre-schools', 'hospitals', 'libraries_theatres', 'park_gardens',
       'sport_centers', 'population', 'net_density(hab/ha)', 'avg_occupation',
       'dist', 'num_immi', 'mort_rate', 'rent_price', 'num_crimes',
       'children_places_pop', 'cinemas_theatres_pop', 'schools_pop',
       'pre-schools_pop', 'hospitals_pop', 'libraries_theatres_pop',
       'park_gardens_pop', 'sport_centers_pop', 'facilities_pop'],
      dtype='object')

In [170]:
#Creation of a dataset with lesser features

Analysis=Analysis.drop(['children_places_pop', 'cinemas_theatres_pop', 'schools_pop',
       'pre-schools_pop', 'hospitals_pop', 'libraries_theatres_pop',
       'park_gardens_pop', 'sport_centers_pop', 'facilities_pop'], axis=1)

In [141]:
#changing the variables order

Analysis=Analysis[['dist','nbh','population', 'net_density(hab/ha)','num_immi',
                  'bars', 'children_places','cinemas_theatres', 'schools','pre-schools',
                  'hospitals', 'libraries_theatres','park_gardens','sport_centers',
                  'avg_occupation','mort_rate', 'rent_price', 'num_crimes']]

In [142]:
#Renaming the column libraries (wrong name)

Analysis=Analysis.rename(columns={"libraries_theatres":"libraries_museums"})

In [143]:
Analysis.head(5)

Unnamed: 0,dist,nbh,population,net_density(hab/ha),num_immi,bars,children_places,cinemas_theatres,schools,pre-schools,hospitals,libraries_museums,park_gardens,sport_centers,avg_occupation,mort_rate,rent_price,num_crimes
0,Horta-Guinardó,Horta,26715.0,422.0,1127.0,4.0,9.0,3.0,11.0,7.0,3.0,5.0,3.0,2.0,2.5,805.5,2.97,7871.0
1,Sant Andreu,Navas,22129.0,984.0,988.0,2.0,3.0,,6.0,3.0,,,,,2.5,696.0,3.029,10657.0
2,Les Corts,Pedralbes,12076.0,147.0,764.0,1.0,7.0,,8.0,41.0,,10.0,9.0,1.0,2.9,661.6,6.34,7444.0
3,Sant Andreu,Sant Andreu,57183.0,746.0,1965.0,3.0,22.0,5.0,22.0,17.0,5.0,2.0,5.0,3.0,2.4,763.0,3.21,10657.0
4,Eixample,Sant Antoni,38345.0,928.0,2490.0,9.0,8.0,4.0,11.0,11.0,1.0,5.0,7.0,1.0,2.3,737.3,4.591,46210.0


In [144]:
#Step 1: Define Nbh quality
#Are there facilities more valuable than others?:
#top 5 for each facility
#worst 5
#top 5 Nbh and Dist with higher number of facilities
#worst 5

In [145]:
Analysis["dist"].describe()
#There are 10 dist

count             73
unique            10
top       Nou Barris
freq              13
Name: dist, dtype: object

In [146]:
Analysis["num_crimes"]=Analysis["num_crimes"].astype("float")

In [147]:
Analysis["rent_price"]=Analysis["rent_price"].astype("float")

In [148]:
Analysis["mort_rate"]=Analysis["mort_rate"].astype("float")

In [149]:
Analysis.dtypes

dist                    object
nbh                     object
population             float64
net_density(hab/ha)    float64
num_immi               float64
bars                   float64
children_places        float64
cinemas_theatres       float64
schools                float64
pre-schools            float64
hospitals              float64
libraries_museums      float64
park_gardens           float64
sport_centers          float64
avg_occupation         float64
mort_rate              float64
rent_price             float64
num_crimes             float64
dtype: object

In [162]:
#Information by dist (rounded)

MeanDist=Analysis.groupby("dist").aggregate({"mean"}).round(decimals=2)

In [163]:
MeanDist.head()
MeanDist.columns

MultiIndex([(         'population', 'mean'),
            ('net_density(hab/ha)', 'mean'),
            (           'num_immi', 'mean'),
            (               'bars', 'mean'),
            (    'children_places', 'mean'),
            (   'cinemas_theatres', 'mean'),
            (            'schools', 'mean'),
            (        'pre-schools', 'mean'),
            (          'hospitals', 'mean'),
            (  'libraries_museums', 'mean'),
            (       'park_gardens', 'mean'),
            (      'sport_centers', 'mean'),
            (     'avg_occupation', 'mean'),
            (          'mort_rate', 'mean'),
            (         'rent_price', 'mean'),
            (         'num_crimes', 'mean')],
           )

In [164]:
MeanDist.columns =['population', 'net_density(hab/ha)', 'num_immi', 'bars',
       'children_places', 'cinemas_theatres', 'schools', 'pre-schools',
       'hospitals', 'libraries_museums', 'park_gardens', 'sport_centers',
       'avg_occupation', 'mort_rate', 'rent_price', 'num_crimes']

In [165]:
MeanDist.head()

Unnamed: 0_level_0,population,net_density(hab/ha),num_immi,bars,children_places,cinemas_theatres,schools,pre-schools,hospitals,libraries_museums,park_gardens,sport_centers,avg_occupation,mort_rate,rent_price,num_crimes
dist,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Ciutat Vella,25346.75,809.25,3152.75,28.75,5.0,9.0,7.5,15.5,2.0,17.75,3.5,2.0,2.45,1002.95,4.81,44241.0
Eixample,44402.67,800.33,3174.5,15.0,9.6,3.2,12.8,15.0,2.4,6.6,9.0,1.25,2.38,794.48,4.8,46210.0
Gràcia,24269.4,622.2,1450.8,15.5,5.4,9.5,9.6,8.2,3.75,2.5,2.2,2.6,2.34,805.1,3.98,7731.0
Horta-Guinardó,15341.0,559.18,709.0,2.5,7.2,3.0,7.4,7.33,2.4,3.0,3.55,1.71,2.43,871.53,2.92,7871.0
Les Corts,27344.33,528.67,1458.33,4.33,8.67,3.0,12.33,37.33,3.5,8.33,10.67,2.0,2.6,677.3,5.17,7444.0


In [166]:
#saving the dataframe with the mean values per dist and not the info per nbh

MeanDist.to_csv("../datasets/Data_filtered/Analysis_M_dist.csv")

In [167]:
#importing the dataset

Dist=pd.read_csv("../datasets/Data_filtered/Analysis_M_dist.csv")

In [168]:
Dist.head()

Unnamed: 0,dist,population,net_density(hab/ha),num_immi,bars,children_places,cinemas_theatres,schools,pre-schools,hospitals,libraries_museums,park_gardens,sport_centers,avg_occupation,mort_rate,rent_price,num_crimes
0,Ciutat Vella,25346.75,809.25,3152.75,28.75,5.0,9.0,7.5,15.5,2.0,17.75,3.5,2.0,2.45,1002.95,4.81,44241.0
1,Eixample,44402.67,800.33,3174.5,15.0,9.6,3.2,12.8,15.0,2.4,6.6,9.0,1.25,2.38,794.48,4.8,46210.0
2,Gràcia,24269.4,622.2,1450.8,15.5,5.4,9.5,9.6,8.2,3.75,2.5,2.2,2.6,2.34,805.1,3.98,7731.0
3,Horta-Guinardó,15341.0,559.18,709.0,2.5,7.2,3.0,7.4,7.33,2.4,3.0,3.55,1.71,2.43,871.53,2.92,7871.0
4,Les Corts,27344.33,528.67,1458.33,4.33,8.67,3.0,12.33,37.33,3.5,8.33,10.67,2.0,2.6,677.3,5.17,7444.0
