# ANALYSIS

## Question to be answered

We want to answer the question: How are the immigrants in Barcelona in each District?

To answer that we will focus on:
- Immigrants in Barcelona by Age and District
- Immigrants in Barcelona by Gender and District
- Immigrants in Barcelona by Nationality and District

And after that, we will compare this data with the population of Barcelona by District

## Analysis 1: immigrants by nationality

In [1]:
# First thing, we import pandas and numpy:
import pandas as pd
import numpy as np

In [2]:
# Now we import our csv to start analysing it:
n = pd.read_csv('/Users/andressalomferrer/Desktop/ironhack/Projects/Project-Week-2-Barcelona/datasets/nationality_cleaned.csv')

In [3]:
# We will type 'n.dtypes' to understand with which kind of data are we working
n.dtypes

Unnamed: 0        int64
Year              int64
District Name    object
Nationality      object
Number            int64
dtype: object

In [4]:
#Deleting first column and Year because we do not need them.
n.drop(['Unnamed: 0', 'Year'], axis=1, inplace=True)

In [5]:
n.head()

Unnamed: 0,District Name,Nationality,Number
0,Ciutat Vella,Spain,1109
1,Ciutat Vella,Spain,482
2,Ciutat Vella,Spain,414
3,Ciutat Vella,Spain,537
4,Eixample,Spain,663


In [6]:
#Now we want to group by District Name and Nationality
n = n.groupby(['District Name', 'Nationality']).sum()

In [7]:
#And next step is to delete all the rows where there are no number of immigrants
n = n.loc[n['Number']!=0]

In [8]:
#Sorting the data to get the highest number of immigrants by disctric
n = n.groupby(level=0, group_keys=False).apply(lambda x: x.sort_values(['Number'], ascending=False))

In [9]:
#getting the total number of immigrants by distric
n_total = n.groupby(['District Name']).sum()

In [10]:
n_total

Unnamed: 0_level_0,Number
District Name,Unnamed: 1_level_1
Ciutat Vella,12611
Eixample,19047
Gràcia,7254
Horta-Guinardó,7799
Les Corts,4375
No consta,2
Nou Barris,8274
Sant Andreu,6335
Sant Martí,12720
Sants-Montjuïc,11683


In [11]:
#Getting the top 9 nationalities by district
n_l = n.groupby(level=0, group_keys=False)['Number'].apply(lambda x: x.nlargest(9))
pd.DataFrame(n_l, index=None)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Number
District Name,District Name,Nationality,Unnamed: 3_level_1
Ciutat Vella,Ciutat Vella,Spain,2542
Ciutat Vella,Ciutat Vella,Italy,1275
Ciutat Vella,Ciutat Vella,Pakistan,998
Ciutat Vella,Ciutat Vella,France,596
Ciutat Vella,Ciutat Vella,Bangladesh,566
Ciutat Vella,Ciutat Vella,Morocco,434
Ciutat Vella,Ciutat Vella,United Kingdom,393
Ciutat Vella,Ciutat Vella,Philippines,368
Ciutat Vella,Ciutat Vella,India,364
Eixample,Eixample,Spain,6560


In [12]:
#getting the total of the top 9 immigrants by district 
sum_top9= n_l.groupby(['District Name']).sum()

In [13]:
top9=pd.DataFrame(sum_top9)

In [14]:
top9

Unnamed: 0_level_0,Number
District Name,Unnamed: 1_level_1
Ciutat Vella,7536
Eixample,12690
Gràcia,4819
Horta-Guinardó,5389
Les Corts,2992
No consta,2
Nou Barris,6018
Sant Andreu,4534
Sant Martí,8332
Sants-Montjuïc,7596


In [15]:
# Getting the total number of nationalities by district except the top9. The data we want here is the total number of immigrants except the strongest 9 natonalities by district.
n_others = n.groupby('District Name', group_keys=False).apply(lambda x: x.iloc[9:].sum())

In [16]:
n_others

Unnamed: 0_level_0,Number
District Name,Unnamed: 1_level_1
Ciutat Vella,5075.0
Eixample,6357.0
Gràcia,2435.0
Horta-Guinardó,2410.0
Les Corts,1383.0
No consta,0.0
Nou Barris,2256.0
Sant Andreu,1801.0
Sant Martí,4388.0
Sants-Montjuïc,4087.0


## Analysis 2: immigrants by gender

In [17]:
# Importing our csv to start analysing it:

In [18]:
g = pd.read_csv('/Users/andressalomferrer/Desktop/ironhack/Projects/Project-Week-2-Barcelona/datasets/gender_cleaned.csv')

In [19]:
g.head()

Unnamed: 0.1,Unnamed: 0,Year,District Name,Gender,Immigrants
0,0,2017,Ciutat Vella,Male,3063
1,1,2017,Ciutat Vella,Male,1499
2,2,2017,Ciutat Vella,Male,910
3,3,2017,Ciutat Vella,Male,1438
4,4,2017,Eixample,Male,1082


In [20]:
# We will type 'n.dtypes' to understand with which kind of data are we working
g.dtypes

Unnamed: 0        int64
Year              int64
District Name    object
Gender           object
Immigrants        int64
dtype: object

In [21]:
#We do no need 'Unnamed: 0' and 'Year' columns, so let's drop them!

g.drop(['Unnamed: 0','Year'], axis=1, inplace = True)

In [63]:
g.head()

Unnamed: 0,District Name,Gender,Immigrants
0,Ciutat Vella,Male,3063
1,Ciutat Vella,Male,1499
2,Ciutat Vella,Male,910
3,Ciutat Vella,Male,1438
4,Eixample,Male,1082


In [26]:
#Now we need to group our data by district and gender, to see how many Females and how many Males has each District.

bydis = g.groupby(['District Name','Gender']).sum()

In [88]:
bydis

Unnamed: 0_level_0,Unnamed: 1_level_0,Immigrants
District Name,Gender,Unnamed: 2_level_1
Ciutat Vella,Female,5701
Ciutat Vella,Male,6910
Eixample,Female,9856
Eixample,Male,9191
Gràcia,Female,3895
Gràcia,Male,3359
Horta-Guinardó,Female,4054
Horta-Guinardó,Male,3745
Les Corts,Female,2346
Les Corts,Male,2029


In [131]:
#total num of immigrants by district
both= g.groupby(['District Name']).sum()

In [132]:
both

Unnamed: 0_level_0,Immigrants
District Name,Unnamed: 1_level_1
Ciutat Vella,12611
Eixample,19047
Gràcia,7254
Horta-Guinardó,7799
Les Corts,4375
Nou Barris,8274
Sant Andreu,6335
Sant Martí,12720
Sants-Montjuïc,11683
Sarrià-Sant Gervasi,7227
