# ANALYSIS

## Question to be answered

We want to answer the question: How are the immigrants in Barcelona in each District?

To answer that we will focus on:
- Immigrants in Barcelona by Age and District (2017)
- Immigrants in Barcelona by Gender and District (2017)
- Immigrants in Barcelona by Nationality and District (2017)

And after that, we will compare this data with the population of Barcelona by District (2017)

## First approach: Immigrants in Barcelona by Age and District

In [3]:
import pandas as pd
import numpy as np 

In [6]:
i_by_age = pd.read_csv('datasets/3.-Population/immigrants-by-age_cleaned.csv')

In [7]:
i_by_age

Unnamed: 0.1,Unnamed: 0,Year,District Name,Age,Immigrants
0,0,2017,Ciutat Vella,0-4,154
1,1,2017,Ciutat Vella,0-4,58
2,2,2017,Ciutat Vella,0-4,38
3,3,2017,Ciutat Vella,0-4,56
4,4,2017,Eixample,0-4,79
...,...,...,...,...,...
1549,1549,2017,Sant Martí,>=100,0
1550,1550,2017,Sant Martí,>=100,0
1551,1551,2017,Sant Martí,>=100,0
1552,1552,2017,Sant Martí,>=100,0


In [9]:
# as for the analysis we don't need the column year neither the index, we will drop them. 

i_by_age.drop(['Unnamed: 0', 'Year'], axis=1, inplace=True)

In [25]:
# We are changing the name of the columns to avoid white spaces
i_by_age.columns = ['District', 'Age', 'Num_Immigrants']

In [26]:
# We are checking the unique values for the column 'District' to be sure there are no surprises.
i_by_age.District.unique()

array(['Ciutat Vella', 'Eixample', 'Sants-Montjuïc', 'Les Corts',
       'Sarrià-Sant Gervasi', 'Gràcia', 'Horta-Guinardó', 'Nou Barris',
       'Sant Andreu', 'Sant Martí', 'No consta'], dtype=object)

In [28]:
# as we have a 'District' named 'No consta', and it is not a District at all, we check the values for it 'District' to determine if its relevant or not. 

i_by_age.groupby('District').sum()

Unnamed: 0_level_0,Num_Immigrants
District,Unnamed: 1_level_1
Ciutat Vella,12611
Eixample,19047
Gràcia,7254
Horta-Guinardó,7799
Les Corts,4375
No consta,2
Nou Barris,8274
Sant Andreu,6335
Sant Martí,12720
Sants-Montjuïc,11683


In [49]:
# as there are just 2 values in 'No consta' district, we can avoid them.

i_by_age.drop(i_by_age.loc[i_by_age['District'] == 'No consta'].index, inplace=True)

In [50]:
i_by_age

Unnamed: 0,District,Age,Num_Immigrants
0,Ciutat Vella,0-4,154
1,Ciutat Vella,0-4,58
2,Ciutat Vella,0-4,38
3,Ciutat Vella,0-4,56
4,Eixample,0-4,79
...,...,...,...
1548,Sant Martí,>=100,0
1549,Sant Martí,>=100,0
1550,Sant Martí,>=100,0
1551,Sant Martí,>=100,0


In [57]:
# As our dataset was initially considering neighborhoods, we have the data splitted in several rows for each district, so we need to group by District and Age.
i_by_age = i_by_age.groupby(['District', 'Age']).sum().reset_index()

In [58]:
i_by_age

Unnamed: 0,District,Age,Num_Immigrants
0,Ciutat Vella,0-4,306
1,Ciutat Vella,10-14,190
2,Ciutat Vella,15-19,601
3,Ciutat Vella,20-24,2157
4,Ciutat Vella,25-29,3433
...,...,...,...
205,Sarrià-Sant Gervasi,80-84,65
206,Sarrià-Sant Gervasi,85-89,49
207,Sarrià-Sant Gervasi,90-94,29
208,Sarrià-Sant Gervasi,95-99,6


In [None]:
i_by_age

In [59]:
# Now we will check the diferents ranges for 'Age'
i_by_age['Age'].unique()

array(['0-4', '10-14', '15-19', '20-24', '25-29', '30-34', '35-39',
       '40-44', '45-49', '5-9', '50-54', '55-59', '60-64', '65-69',
       '70-74', '75-79', '80-84', '85-89', '90-94', '95-99', '>=100'],
      dtype=object)

In [81]:
# to get the data more structured, we will join rows by age considering the following groups: 0-19, 20-39, 40-64 and >= 64
group0_19 = i_by_age.loc[(i_by_age.Age.isin(['0-4','5-9','10-14','15-19'])),]
group20_39 = i_by_age.loc[(i_by_age.Age.isin(['20-24','25-29','30-34','35-39'])),]
group40_64 = i_by_age.loc[(i_by_age.Age.isin(['40-44','45-49','50-54','55-59', '60-64'])),]
group_65over = i_by_age.loc[(i_by_age.Age.isin(['65-69','70-79', '80-84', '85-89', '90-94', '95-99', '>=100'])),]

In [96]:
# we use apply to change all the values of this subdataframe to 0-19, as we want to group it. 
group0_19['Age'] = group0_19['Age'].apply(lambda x: '0-19')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [99]:
# After changing the values, we can group by district and age, summing up the number of immigrants. 
group0_19 = group0_19.groupby(['District', 'Age']).sum()

In [100]:
group0_19

Unnamed: 0_level_0,Unnamed: 1_level_0,Num_Immigrants
District,Age,Unnamed: 2_level_1
Ciutat Vella,0-19,1341
Eixample,0-19,2037
Gràcia,0-19,836
Horta-Guinardó,0-19,1168
Les Corts,0-19,767
Nou Barris,0-19,1589
Sant Andreu,0-19,989
Sant Martí,0-19,1918
Sants-Montjuïc,0-19,1592
Sarrià-Sant Gervasi,0-19,1490


In [None]:
# as we need to repeat this operation some more times, we define a function to get these to operations in just one command.

In [101]:
def change_age(df, range):
    """
    Input: DataFrame, String
    Output: DataFrame
    
    # Change the Age field of the given DataFrame and return a grouped by and summed dataframe.
    """
    df['Age'] = df.Age.apply(lambda x : range)
    df = df.groupby(['District', 'Age']).sum()

    return df

In [102]:
# and we apply this function to all other subdataframes.
group20_39 = change_age(group20_39, '20-39')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [103]:
group20_39

Unnamed: 0_level_0,Unnamed: 1_level_0,Num_Immigrants
District,Age,Unnamed: 2_level_1
Ciutat Vella,20-39,9113
Eixample,20-39,12829
Gràcia,20-39,4938
Horta-Guinardó,20-39,4619
Les Corts,20-39,2413
Nou Barris,20-39,4335
Sant Andreu,20-39,3488
Sant Martí,20-39,7496
Sants-Montjuïc,20-39,7398
Sarrià-Sant Gervasi,20-39,3467


In [104]:
group40_64 = change_age(group40_64, '40-64')
group_65over = change_age(group_65over, '>= 65')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [110]:
# we create a new dataframe concatenating all the groups of age
i_by_agegroups = pd.concat([group0_19, group20_39, group40_64, group_65over])

In [111]:
# and we order it by District and Age to get the format we want.
i_by_agegroups = i_by_agegroups.groupby(['District', 'Age']).sum()

In [112]:
i_by_agegroups

Unnamed: 0_level_0,Unnamed: 1_level_0,Num_Immigrants
District,Age,Unnamed: 2_level_1
Ciutat Vella,0-19,1341
Ciutat Vella,20-39,9113
Ciutat Vella,40-64,1955
Ciutat Vella,>= 65,133
Eixample,0-19,2037
Eixample,20-39,12829
Eixample,40-64,3343
Eixample,>= 65,508
Gràcia,0-19,836
Gràcia,20-39,4938


## Second approach: Population in Barcelona by District, Age and Gender






In [113]:
population = pd.read_csv('datasets/3.-Population/population_cleaned.csv')

In [114]:
population

Unnamed: 0.1,Unnamed: 0,Year,District.Name,Gender,Age,Number
0,0,2017,Ciutat Vella,Male,0-4,224
1,1,2017,Ciutat Vella,Male,0-4,50
2,2,2017,Ciutat Vella,Male,0-4,43
3,3,2017,Ciutat Vella,Male,0-4,95
4,4,2017,Eixample,Male,0-4,124
...,...,...,...,...,...,...
14011,14011,2017,Sant Martí,Female,>=95,11
14012,14012,2017,Sant Martí,Female,>=95,41
14013,14013,2017,Sant Martí,Female,>=95,28
14014,14014,2017,Sant Martí,Female,>=95,57


In [115]:
# as for the analysis we don't need the column year neither the index, we will drop them. 

population.drop(['Unnamed: 0', 'Year'], axis=1, inplace=True)

In [119]:
# We are changing the name of the columns to avoid white spaces
population = population.rename(columns = {'District.Name':'District', 'Number':'Num_Immigrants'})

In [120]:
population

Unnamed: 0,District,Gender,Age,Num_Immigrants
0,Ciutat Vella,Male,0-4,224
1,Ciutat Vella,Male,0-4,50
2,Ciutat Vella,Male,0-4,43
3,Ciutat Vella,Male,0-4,95
4,Eixample,Male,0-4,124
...,...,...,...,...
14011,Sant Martí,Female,>=95,11
14012,Sant Martí,Female,>=95,41
14013,Sant Martí,Female,>=95,28
14014,Sant Martí,Female,>=95,57
