## Immigrants, Nationalities, Districts and Neighbours
- What is the percentage of immigrants living in certain neighbourhoods? And percentage by nationality?
- Is there a link between the country of origin of immigrants and the Barcelona’s neighbourhood chosen to live?

### What is the percentage of immigrants living in certain neighbourhoods? And percentage by nationality?

- The vast majority of immigrants come from Spain (36.30%). The next biggest group is Italy (6.44%), followed by China (3.40%), Colombia (3.32%) and Venezuela (3.10%).
- The most populated neighbourhood by immigrants is *el Barri Gòtic* (16.73% of total neighbourhood population).
- The most populated district by immigrants is *Ciutat Vella* (12.99% of total neighbourhood population) and the least populated is *Sant Andreu* (4.64% of total neighbourhood population).
- *Ciutat Vella* hold the biggest number of foreign vs national immigrants, at over 75% in all its neighbourhoods. *El Gòtic* is the top neighbourhood at 82.06%.
- Immigrants from one country are dispersed among differnet dist ????

In [1]:
import pandas as pd
import numpy as np

In [27]:
df = pd.read_csv("../datasets/Data_filtered/complete_dataset.csv", index_col=0)
df_original = df.copy()

immi = pd.read_csv("../datasets/Data_filtered/immi.csv", index_col=0)
immi.columns = ['dist', 'nbh', 'nationality', 'num_immi']
immi_original = immi.copy()

immi_nbh = pd.read_csv("../datasets/Data_filtered/immi_nbh.csv", index_col=0)
immi_nbh.columns = ['nbh', 'dist', 'num_immi']
immi_nbh_original = immi_nbh.copy()

immi_foreign = pd.read_csv("../datasets/Data_filtered/immi_foreign.csv", index_col=0)
immi_foreign.columns = ['dist', 'nbh', 'nationality', 'num_immi']
immi_foreign_original = immi_foreign.copy()

immi_foreign_nbh = pd.read_csv("../datasets/Data_filtered/immi_foreign_nbh.csv", index_col=0)
immi_foreign_nbh.columns = ['nbh', 'dist', 'num_immi']
immi_foreign_nbh_original = immi_foreign_nbh.copy()

df = pd.read_csv("../datasets/Data_filtered/complete_dataset.csv")
df["perc_immi"]=round(df["num_immi"]/df["population"]*100,2)
df_original = df.copy()

### First analysis: include "Spain" in the dataset

#### Basic analysis *Immi*

In [3]:
immi.head()

Unnamed: 0,dist,nbh,nationality,num_immi
0,Ciutat Vella,el Raval,Spain,1109
1,Ciutat Vella,el Barri Gòtic,Spain,482
2,Ciutat Vella,la Barceloneta,Spain,414
3,Ciutat Vella,"Sant Pere, Santa Caterina i la Ribera",Spain,537
4,Eixample,el Fort Pienc,Spain,663


In [4]:
immi.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 11766 entries, 0 to 11765
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   dist         11766 non-null  object
 1   nbh          11766 non-null  object
 2   nationality  11766 non-null  object
 3   num_immi     11766 non-null  int64 
dtypes: int64(1), object(3)
memory usage: 459.6+ KB


In [5]:
immi.describe()

Unnamed: 0,num_immi
count,11766.0
mean,8.271885
std,50.821491
min,0.0
25%,0.0
50%,0.0
75%,2.0
max,1593.0


In [6]:
# list of nbh, ordered by highest % of immigrants
df.groupby("nbh").aggregate({"num_immi":"sum",
                             "population":"sum",
                            "perc_immi":"mean"}).sort_values("perc_immi", ascending=False)

Unnamed: 0_level_0,num_immi,population,perc_immi
nbh,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
el Barri Gòtic,2687.0,16062.0,16.73
"Sant Pere, Santa Caterina i la Ribera",2765.0,22721.0,12.17
la Barceloneta,1759.0,14996.0,11.73
el Raval,5400.0,47608.0,11.34
el Besòs i el Maresme,2016.0,23009.0,8.76
...,...,...,...
la Marina del Prat Vermell,32.0,1149.0,2.79
Can Peguera,57.0,2271.0,2.51
Canyelles,140.0,6856.0,2.04
el Poble-sec,0.0,0.0,


In [7]:
# list of dist, ordered by highest % of immigrants
df.groupby("dist").aggregate({"num_immi":"sum",
                             "population":"sum",
                            "perc_immi":"mean"}).sort_values("perc_immi", ascending=False)

Unnamed: 0_level_0,num_immi,population,perc_immi
dist,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Ciutat Vella,12611.0,101387.0,12.9925
Eixample,19047.0,266416.0,7.148333
Sants-Montjuïc,11683.0,181910.0,5.87625
Les Corts,4375.0,82033.0,5.616667
Gràcia,7254.0,121347.0,5.554
Sant Martí,12720.0,235513.0,5.418
Nou Barris,8274.0,166579.0,4.755385
Horta-Guinardó,7799.0,168751.0,4.700909
Sarrià-Sant Gervasi,7227.0,149279.0,4.645
Sant Andreu,6335.0,147594.0,4.635714


__*Droping rows where num_immi = 0*__

In [8]:
immi = immi[immi["num_immi"]!=0]
immi.head()

Unnamed: 0,dist,nbh,nationality,num_immi
0,Ciutat Vella,el Raval,Spain,1109
1,Ciutat Vella,el Barri Gòtic,Spain,482
2,Ciutat Vella,la Barceloneta,Spain,414
3,Ciutat Vella,"Sant Pere, Santa Caterina i la Ribera",Spain,537
4,Eixample,el Fort Pienc,Spain,663


In [9]:
immi.info

<bound method DataFrame.info of                  dist                                    nbh     nationality  \
0        Ciutat Vella                               el Raval           Spain   
1        Ciutat Vella                         el Barri Gòtic           Spain   
2        Ciutat Vella                         la Barceloneta           Spain   
3        Ciutat Vella  Sant Pere, Santa Caterina i la Ribera           Spain   
4            Eixample                          el Fort Pienc           Spain   
...               ...                                    ...             ...   
11724  Horta-Guinardó                       el Baix Guinardó  No information   
11726  Horta-Guinardó                            el Guinardó  No information   
11735      Nou Barris          Vilapicina i la Torre Llobeta  No information   
11747      Nou Barris                               Vallbona  No information   
11756      Sant Martí                                el Clot  No information   

       

In [10]:
immi.describe()

Unnamed: 0,num_immi
count,4716.0
mean,20.637617
std,78.673256
min,1.0
25%,1.0
50%,4.0
75%,13.0
max,1593.0


In [11]:
# Merging dataset to add percentage of immigrants by nbh
perc_immi = df[["dist", "nbh", "num_immi", "population", "perc_immi"]]
perc_immi.sort_values("perc_immi", ascending=False)

Unnamed: 0,dist,nbh,num_immi,population,perc_immi
10,Ciutat Vella,el Barri Gòtic,2687.0,16062.0,16.73
7,Ciutat Vella,"Sant Pere, Santa Caterina i la Ribera",2765.0,22721.0,12.17
21,Ciutat Vella,la Barceloneta,1759.0,14996.0,11.73
19,Ciutat Vella,el Raval,5400.0,47608.0,11.34
50,Sant Martí,el Besòs i el Maresme,2016.0,23009.0,8.76
...,...,...,...,...,...
62,Sants-Montjuïc,la Marina del Prat Vermell,32.0,1149.0,2.79
32,Nou Barris,Can Peguera,57.0,2271.0,2.51
33,Nou Barris,Canyelles,140.0,6856.0,2.04
16,,el Poble-sec,,,


### Second analysis: include "Spain" in the dataset

In [12]:
# Nbh by % of foreigners immigrants vs total number of immigrants

df_foreign = pd.merge(df, immi_foreign_nbh, how="left", on=["nbh", "dist"])

df_foreign.columns = ['Unnamed: 0', 'nbh', 'bars', 'children_places', 'cinemas_theatres',
       'schools', 'pre-schools', 'hospitals', 'libraries_theatres',
       'park_gardens', 'sport_centers', 'population', 'net_density(hab/ha)',
       'avg_occupation', 'dist', 'num_immi_total', 'mort_rate', 'rent_price',
       'num_crimes', 'children_places_pop', 'cinemas_theatres_pop',
       'schools_pop', 'pre-schools_pop', 'hospitals_pop',
       'libraries_theatres_pop', 'park_gardens_pop', 'sport_centers_pop',
       'facilities_pop', 'perc_immi', 'num_immi_foreign']

df_foreign["perc_foreign"]=round(df_foreign["num_immi_foreign"]/df_foreign["num_immi_total"]*100,2)

df_foreign["perc_immi_foreign"]=df_foreign["num_immi_foreign"]/df_foreign["population"]*100

df_foreign[["dist", "nbh", "num_immi_total", "num_immi_foreign", "perc_foreign"]].sort_values("perc_foreign", ascending=False)

Unnamed: 0,dist,nbh,num_immi_total,num_immi_foreign,perc_foreign
10,Ciutat Vella,el Barri Gòtic,2687.0,2205.0,82.06
7,Ciutat Vella,"Sant Pere, Santa Caterina i la Ribera",2765.0,2228.0,80.58
19,Ciutat Vella,el Raval,5400.0,4291.0,79.46
21,Ciutat Vella,la Barceloneta,1759.0,1345.0,76.46
67,Sant Andreu,la Trinitat Vella,675.0,501.0,74.22
...,...,...,...,...,...
32,Nou Barris,Can Peguera,57.0,24.0,42.11
33,Nou Barris,Canyelles,140.0,40.0,28.57
62,Sants-Montjuïc,la Marina del Prat Vermell,32.0,9.0,28.12
16,,el Poble-sec,,,


In [13]:
# list of nbh, ordered by highest % of foreign immigrants
df_foreign.groupby("nbh").aggregate({"num_immi_foreign":"sum",
                             "population":"sum",
                            "perc_immi_foreign":"mean"}).sort_values("perc_immi_foreign", ascending=False)

Unnamed: 0_level_0,num_immi_foreign,population,perc_immi_foreign
nbh,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
el Barri Gòtic,2205.0,16062.0,13.728054
"Sant Pere, Santa Caterina i la Ribera",2228.0,22721.0,9.805906
el Raval,4291.0,47608.0,9.013191
la Barceloneta,1345.0,14996.0,8.969058
el Besòs i el Maresme,1478.0,23009.0,6.423573
...,...,...,...
Can Peguera,24.0,2271.0,1.056803
la Marina del Prat Vermell,9.0,1149.0,0.783290
Canyelles,40.0,6856.0,0.583431
el Poble-sec,0.0,0.0,


In [14]:
# list of dist, ordered by highest % of foreign immigrants
df_foreign.groupby("dist").aggregate({"num_immi_foreign":"sum",
                             "population":"sum",
                            "perc_immi_foreign":"mean"}).sort_values("perc_immi_foreign", ascending=False)

Unnamed: 0_level_0,num_immi_foreign,population,perc_immi_foreign
dist,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Ciutat Vella,10069.0,101387.0,10.379052
Eixample,12487.0,266416.0,4.715639
Sants-Montjuïc,7464.0,181910.0,3.558621
Sant Martí,7994.0,235513.0,3.424302
Gràcia,4310.0,121347.0,3.239564
Les Corts,2420.0,82033.0,3.172486
Nou Barris,5261.0,166579.0,2.94646
Sant Andreu,3622.0,147594.0,2.802886
Horta-Guinardó,4544.0,168751.0,2.683435
Sarrià-Sant Gervasi,3802.0,149279.0,2.422785


### Third analysis

In [15]:
# More frequent immigrant nationality by Nbh
immi_foreign.groupby('nbh', group_keys=False).apply(lambda x: x.nlargest(3, "num_immi"))

Unnamed: 0,dist,nbh,nationality,num_immi
649,Sant Andreu,Baró de Viver,Peru,7
279,Sant Andreu,Baró de Viver,Colombia,6
797,Sant Andreu,Baró de Viver,Argentina,6
107,Horta-Guinardó,Can Baró,Italy,27
255,Horta-Guinardó,Can Baró,Colombia,18
...,...,...,...,...
197,Nou Barris,les Roquetes,China,37
1159,Nou Barris,les Roquetes,Ecuador,33
911,Sarrià-Sant Gervasi,les Tres Torres,United States,59
2613,Sarrià-Sant Gervasi,les Tres Torres,Japan,34


In [16]:
# Ordering immi dataset by district, including Spain
immi_dist = immi.groupby(["dist","nationality"])["num_immi"].sum()
immi_dist = immi_dist.to_frame()
immi_dist

Unnamed: 0_level_0,Unnamed: 1_level_0,num_immi
dist,nationality,Unnamed: 2_level_1
Ciutat Vella,Afghanistan,10
Ciutat Vella,Albania,9
Ciutat Vella,Algeria,75
Ciutat Vella,Andorra,1
Ciutat Vella,Angola,1
...,...,...
Sarrià-Sant Gervasi,United States,263
Sarrià-Sant Gervasi,Uruguay,13
Sarrià-Sant Gervasi,Uzbekistan,1
Sarrià-Sant Gervasi,Venezuela,215


In [17]:
# Top 5 most frequent immigrant nationality by District, including Spain
immi_dist.groupby('dist', group_keys=False).apply(lambda x: x.nlargest(5, "num_immi"))

Unnamed: 0_level_0,Unnamed: 1_level_0,num_immi
dist,nationality,Unnamed: 2_level_1
Ciutat Vella,Spain,2542
Ciutat Vella,Italy,1275
Ciutat Vella,Pakistan,998
Ciutat Vella,France,596
Ciutat Vella,Bangladesh,566
Eixample,Spain,6560
Eixample,Italy,1568
Eixample,China,918
Eixample,Colombia,781
Eixample,Venezuela,724


In [18]:
# Ordering immi dataset by district, not including Spain
immi_dist_foreign = immi_foreign.groupby(["dist","nationality"])["num_immi"].sum()
immi_dist_foreign = immi_dist_foreign.to_frame()
immi_dist_foreign

Unnamed: 0_level_0,Unnamed: 1_level_0,num_immi
dist,nationality,Unnamed: 2_level_1
Ciutat Vella,Afghanistan,10
Ciutat Vella,Albania,9
Ciutat Vella,Algeria,75
Ciutat Vella,Andorra,1
Ciutat Vella,Angola,1
...,...,...
Sarrià-Sant Gervasi,Venezuela,215
Sarrià-Sant Gervasi,Vietnam,4
Sarrià-Sant Gervasi,Yemen,0
Sarrià-Sant Gervasi,Zambia,0


In [19]:
# Top 5 most frequent immigrant nationality by District, including Spain
immi_dist_foreign.groupby('dist', group_keys=False).apply(lambda x: x.nlargest(3, "num_immi"))

Unnamed: 0_level_0,Unnamed: 1_level_0,num_immi
dist,nationality,Unnamed: 2_level_1
Ciutat Vella,Italy,1275
Ciutat Vella,Pakistan,998
Ciutat Vella,France,596
Eixample,Italy,1568
Eixample,China,918
Eixample,Colombia,781
Gràcia,Italy,598
Gràcia,France,277
Gràcia,Colombia,200
Horta-Guinardó,Italy,413


In [20]:
# Computing percentage of immigrants from each nationality vs total number of immigrants
immi["perc_immi"]=round((immi["num_immi"]/immi["num_immi"].sum())*100,2)
immi

Unnamed: 0,dist,nbh,nationality,num_immi,perc_immi
0,Ciutat Vella,el Raval,Spain,1109,1.14
1,Ciutat Vella,el Barri Gòtic,Spain,482,0.50
2,Ciutat Vella,la Barceloneta,Spain,414,0.43
3,Ciutat Vella,"Sant Pere, Santa Caterina i la Ribera",Spain,537,0.55
4,Eixample,el Fort Pienc,Spain,663,0.68
...,...,...,...,...,...
11724,Horta-Guinardó,el Baix Guinardó,No information,1,0.00
11726,Horta-Guinardó,el Guinardó,No information,1,0.00
11735,Nou Barris,Vilapicina i la Torre Llobeta,No information,1,0.00
11747,Nou Barris,Vallbona,No information,1,0.00


In [21]:
# Listing countries by percentage of immigrants vs total number of immigrants, including Spain
nationality_immi= immi.groupby("nationality").sum().sort_values("perc_immi", ascending=False)
nationality_immi.head(20)

Unnamed: 0_level_0,num_immi,perc_immi
nationality,Unnamed: 1_level_1,Unnamed: 2_level_1
Spain,35354,36.3
Italy,6309,6.44
China,3299,3.4
Colombia,3255,3.32
Venezuela,3021,3.1
Pakistan,2967,3.01
Honduras,2767,2.85
France,2670,2.75
Peru,2473,2.54
Argentina,1885,1.99


In [30]:
immi_foreign["perc_immi"]=round((immi_foreign["num_immi"]/immi_foreign["num_immi"].sum())*100, 2)

In [31]:
# Listing countries by percentage of immigrants vs total number of immigrants, not including Spain
nationality_immi_foreign = immi_foreign.groupby("nationality").sum().sort_values("perc_immi", ascending=False)
nationality_immi_foreign.head(20)

Unnamed: 0_level_0,num_immi,perc_immi
nationality,Unnamed: 1_level_1,Unnamed: 2_level_1
Italy,6309,10.18
China,3299,5.31
Colombia,3255,5.26
Venezuela,3021,4.91
Pakistan,2967,4.73
Honduras,2767,4.45
France,2670,4.26
Peru,2473,3.97
Morocco,1931,3.12
Argentina,1885,3.03


In [51]:
immi_dist = immi.groupby(["dist","nationality"])["num_immi"].sum()
immi_dist = immi_dist.to_frame()
immi_dist

Unnamed: 0_level_0,Unnamed: 1_level_0,num_immi
dist,nationality,Unnamed: 2_level_1
Ciutat Vella,Afghanistan,10
Ciutat Vella,Albania,9
Ciutat Vella,Algeria,75
Ciutat Vella,Andorra,1
Ciutat Vella,Angola,1
...,...,...
Sarrià-Sant Gervasi,Venezuela,215
Sarrià-Sant Gervasi,Vietnam,4
Sarrià-Sant Gervasi,Yemen,0
Sarrià-Sant Gervasi,Zambia,0


In [69]:
immi_countries = immi.groupby(["nationality", "dist"])["num_immi"].sum()
immi_countries.to_frame()

Unnamed: 0_level_0,Unnamed: 1_level_0,num_immi
nationality,dist,Unnamed: 2_level_1
Afghanistan,Ciutat Vella,10
Afghanistan,Eixample,2
Afghanistan,Gràcia,0
Afghanistan,Horta-Guinardó,2
Afghanistan,Les Corts,0
...,...,...
Zimbabwe,Nou Barris,1
Zimbabwe,Sant Andreu,0
Zimbabwe,Sant Martí,1
Zimbabwe,Sants-Montjuïc,0


In [75]:
df.columns

Index(['Unnamed: 0', 'nbh', 'bars', 'children_places', 'cinemas_theatres',
       'schools', 'pre-schools', 'hospitals', 'libraries_theatres',
       'park_gardens', 'sport_centers', 'population', 'net_density(hab/ha)',
       'avg_occupation', 'dist', 'num_immi', 'mort_rate', 'rent_price',
       'num_crimes', 'children_places_pop', 'cinemas_theatres_pop',
       'schools_pop', 'pre-schools_pop', 'hospitals_pop',
       'libraries_theatres_pop', 'park_gardens_pop', 'sport_centers_pop',
       'facilities_pop', 'perc_immi'],
      dtype='object')

In [85]:
df_facilities = df[["nbh","dist",'children_places_pop', 'cinemas_theatres_pop',
       'schools_pop', 'pre-schools_pop', 'hospitals_pop',
       'libraries_theatres_pop', 'park_gardens_pop', 'sport_centers_pop',
       'facilities_pop']]
df_facilities.var()

children_places_pop       0.176853
cinemas_theatres_pop      0.012635
schools_pop               0.077465
pre-schools_pop           0.347296
hospitals_pop             0.019008
libraries_theatres_pop    0.115985
park_gardens_pop          0.207063
sport_centers_pop         0.008830
facilities_pop            0.008498
dtype: float64

In [91]:
# Calculating facilities by 10,000 people, for each district
df_facilities = df_facilities.groupby("dist")[['children_places_pop', 'cinemas_theatres_pop',
       'schools_pop', 'pre-schools_pop', 'hospitals_pop',
       'libraries_theatres_pop', 'park_gardens_pop', 'sport_centers_pop',
       'facilities_pop']].sum()
df_facilities

Unnamed: 0_level_0,children_places_pop,cinemas_theatres_pop,schools_pop,pre-schools_pop,hospitals_pop,libraries_theatres_pop,park_gardens_pop,sport_centers_pop,facilities_pop
dist,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Ciutat Vella,0.841614,1.261204,1.271879,2.85951,0.369339,3.220423,0.51395,0.410593,0.986125
Eixample,1.051116,0.389449,1.443506,1.646192,0.273516,0.76386,1.032618,0.111092,0.47351
Gràcia,1.434094,0.384426,1.874243,1.739069,0.870079,0.406422,0.603838,0.552885,0.246473
Horta-Guinardó,5.89402,0.112296,5.341223,6.201795,1.053489,1.847809,6.644612,1.096897,0.168445
Les Corts,1.132781,0.230521,1.432938,6.099632,0.212211,1.434424,1.465427,0.211506,0.402687
Nou Barris,6.890549,0.274415,4.831139,4.631401,0.840238,0.795183,0.771735,0.922278,0.0
Sant Andreu,3.906377,0.192663,3.679676,2.772317,0.335993,0.53416,1.165218,0.266634,0.243051
Sant Martí,4.204484,0.89717,3.780758,4.595351,0.507897,0.955476,1.714455,0.537414,0.839048
Sants-Montjuïc,3.8501,0.480488,4.042112,4.143657,0.3564,2.029622,1.610135,0.271946,0.345214
Sarrià-Sant Gervasi,2.512037,0.666398,3.199712,3.814591,1.397647,1.06595,1.959767,0.406711,1.119175
