# ANALYSIS

## Question to be answered

We want to answer the question: How are the immigrants in Barcelona in each District?

To answer that we will focus on:
- Immigrants in Barcelona by Age and District
- Immigrants in Barcelona by Gender and District
- Immigrants in Barcelona by Nationality and District

And after that, we will compare this data with the population of Barcelona by District

## Analysis 1: immigrants by nationality

In [1]:
# First thing, we import pandas and numpy:
import pandas as pd
import numpy as np

In [2]:
# Now we import our csv to start analysing it:
n = pd.read_csv('datasets/Cleaned Datasets/nationality_cleaned.csv')

In [3]:
# We will type 'n.dtypes' to understand with which kind of data are we working
n.dtypes

Unnamed: 0        int64
Year              int64
District Name    object
Nationality      object
Number            int64
dtype: object

In [4]:
#Deleting first column and Year because we do not need them.
n.drop(['Unnamed: 0', 'Year'], axis=1, inplace=True)

In [5]:
n.head()

Unnamed: 0,District Name,Nationality,Number
0,Ciutat Vella,Spain,1109
1,Ciutat Vella,Spain,482
2,Ciutat Vella,Spain,414
3,Ciutat Vella,Spain,537
4,Eixample,Spain,663


In [6]:
#Now we want to group by District Name and Nationality
n = n.groupby(['District Name', 'Nationality']).sum()

In [7]:
#And next step is to delete all the rows where there are no number of immigrants
n = n.loc[n['Number']!=0]

In [8]:
#Sorting the data to get the highest number of immigrants by disctric
n = n.groupby(level=0, group_keys=False).apply(lambda x: x.sort_values(['Number'], ascending=False))

In [9]:
#getting the total number of immigrants by distric
n_total = n.groupby(['District Name']).sum()

In [10]:
n_total

Unnamed: 0_level_0,Number
District Name,Unnamed: 1_level_1
Ciutat Vella,12611
Eixample,19047
Gràcia,7254
Horta-Guinardó,7799
Les Corts,4375
No consta,2
Nou Barris,8274
Sant Andreu,6335
Sant Martí,12720
Sants-Montjuïc,11683


In [11]:
#Getting the top 9 nationalities by district
n_l = n.groupby(level=0, group_keys=False)['Number'].apply(lambda x: x.nlargest(9))
pd.DataFrame(n_l, index=None)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Number
District Name,District Name,Nationality,Unnamed: 3_level_1
Ciutat Vella,Ciutat Vella,Spain,2542
Ciutat Vella,Ciutat Vella,Italy,1275
Ciutat Vella,Ciutat Vella,Pakistan,998
Ciutat Vella,Ciutat Vella,France,596
Ciutat Vella,Ciutat Vella,Bangladesh,566
...,...,...,...
Sarrià-Sant Gervasi,Sarrià-Sant Gervasi,Venezuela,215
Sarrià-Sant Gervasi,Sarrià-Sant Gervasi,China,163
Sarrià-Sant Gervasi,Sarrià-Sant Gervasi,Colombia,128
Sarrià-Sant Gervasi,Sarrià-Sant Gervasi,United Kingdom,126


In [12]:
#getting the total of the top 9 immigrants by district 
sum_top9= n_l.groupby(['District Name']).sum()

In [13]:
top9=pd.DataFrame(sum_top9)

In [14]:
top9

Unnamed: 0_level_0,Number
District Name,Unnamed: 1_level_1
Ciutat Vella,7536
Eixample,12690
Gràcia,4819
Horta-Guinardó,5389
Les Corts,2992
No consta,2
Nou Barris,6018
Sant Andreu,4534
Sant Martí,8332
Sants-Montjuïc,7596


In [15]:
# Getting the total number of nationalities by district except the top9. The data we want here is the total number of immigrants except the strongest 9 natonalities by district.
n_others = n.groupby('District Name', group_keys=False).apply(lambda x: x.iloc[9:].sum())

In [16]:
n_others

Unnamed: 0_level_0,Number
District Name,Unnamed: 1_level_1
Ciutat Vella,5075
Eixample,6357
Gràcia,2435
Horta-Guinardó,2410
Les Corts,1383
No consta,0
Nou Barris,2256
Sant Andreu,1801
Sant Martí,4388
Sants-Montjuïc,4087


## Analysis 2: immigrants by gender

In [17]:
# Importing our csv to start analysing it:

In [18]:
g = pd.read_csv('datasets/Cleaned Datasets/gender_cleaned.csv')

In [19]:
g.head()

Unnamed: 0.1,Unnamed: 0,Year,District Name,Gender,Immigrants
0,0,2017,Ciutat Vella,Male,3063
1,1,2017,Ciutat Vella,Male,1499
2,2,2017,Ciutat Vella,Male,910
3,3,2017,Ciutat Vella,Male,1438
4,4,2017,Eixample,Male,1082


In [20]:
# We will type 'n.dtypes' to understand with which kind of data are we working
g.dtypes

Unnamed: 0        int64
Year              int64
District Name    object
Gender           object
Immigrants        int64
dtype: object

In [21]:
#We do no need 'Unnamed: 0' and 'Year' columns, so let's drop them!

g.drop(['Unnamed: 0','Year'], axis=1, inplace = True)

In [22]:
g.head()

Unnamed: 0,District Name,Gender,Immigrants
0,Ciutat Vella,Male,3063
1,Ciutat Vella,Male,1499
2,Ciutat Vella,Male,910
3,Ciutat Vella,Male,1438
4,Eixample,Male,1082


In [23]:
#Now we need to group our data by district and gender, to see how many Females and how many Males has each District.

bydis = g.groupby(['District Name','Gender']).sum()

In [24]:
bydis

Unnamed: 0_level_0,Unnamed: 1_level_0,Immigrants
District Name,Gender,Unnamed: 2_level_1
Ciutat Vella,Female,5701
Ciutat Vella,Male,6910
Eixample,Female,9856
Eixample,Male,9191
Gràcia,Female,3895
Gràcia,Male,3359
Horta-Guinardó,Female,4054
Horta-Guinardó,Male,3745
Les Corts,Female,2346
Les Corts,Male,2029


In [25]:
#total num of immigrants by district
both= g.groupby(['District Name']).sum()

In [26]:
both

Unnamed: 0_level_0,Immigrants
District Name,Unnamed: 1_level_1
Ciutat Vella,12611
Eixample,19047
Gràcia,7254
Horta-Guinardó,7799
Les Corts,4375
Nou Barris,8274
Sant Andreu,6335
Sant Martí,12720
Sants-Montjuïc,11683
Sarrià-Sant Gervasi,7227


In [54]:
def percentage(district):
  males = int(bydis.loc[district, 'Male'])
  females = int(bydis.loc[district, 'Female'])
  m_perc = round(100 * (males / (males + females)),1)
  f_perc = round(100 * (females / (males + females)),1)
  return f'The immigrants in {district} are {m_perc}% males and {f_perc}% females'

In [55]:
print(percentage('Ciutat Vella'))
print(percentage('Eixample'))
print(percentage('Gràcia'))
print(percentage('Horta-Guinardó'))
print(percentage('Les Corts'))
print(percentage('Nou Barris'))
print(percentage('Sant Andreu'))
print(percentage('Sant Martí'))
print(percentage('Sants-Montjuïc'))
print(percentage('Sarrià-Sant Gervasi'))

The immigrants in Ciutat Vella are 54.8% males and 45.2% females
The immigrants in Eixample are 48.3% males and 51.7% females
The immigrants in Gràcia are 46.3% males and 53.7% females
The immigrants in Horta-Guinardó are 48.0% males and 52.0% females
The immigrants in Les Corts are 46.4% males and 53.6% females
The immigrants in Nou Barris are 46.8% males and 53.2% females
The immigrants in Sant Andreu are 49.0% males and 51.0% females
The immigrants in Sant Martí are 50.4% males and 49.6% females
The immigrants in Sants-Montjuïc are 50.1% males and 49.9% females
The immigrants in Sarrià-Sant Gervasi are 46.9% males and 53.1% females


In [67]:
top10 = n.groupby('Nationality').sum().apply(lambda x: x.nlargest(10))


In [68]:
top10

Unnamed: 0_level_0,Number
Nationality,Unnamed: 1_level_1
Spain,35354
Italy,6309
China,3299
Colombia,3255
Venezuela,3021
Pakistan,2967
Honduras,2767
France,2670
Peru,2473
Morocco,1931


In [74]:
nTop3 = n.groupby(level=0, group_keys=False)['Number'].apply(lambda x: x.nlargest(3))

In [75]:
nTop3

District Name        District Name        Nationality
Ciutat Vella         Ciutat Vella         Spain          2542
                                          Italy          1275
                                          Pakistan        998
Eixample             Eixample             Spain          6560
                                          Italy          1568
                                          China           918
Gràcia               Gràcia               Spain          2944
                                          Italy           598
                                          France          277
Horta-Guinardó       Horta-Guinardó       Spain          3255
                                          Italy           413
                                          Honduras        365
Les Corts            Les Corts            Spain          1955
                                          Italy           205
                                          Venezuela       178
No consta       