#  Complete analysis of the dataset of barcelona

**Context**

Data sets from the Portal **`Open Data BCN`**, the Ajuntament de Barcelona's open data service.

Open Data BCN, a project that was born in 2010, implementing the portal in 2011, has evolved and is now part of the Barcelona Ciutat Digital strategy, fostering a pluralistic digital economy and developing a new model of urban innovation based on the transformation and digital innovation of the public sector and the implication among companies, administrations, the academic world, organizations, communities and people, with a clear public and citizen leadership.

**Content**

* `births.csv:` Births by nationalities and by neighbourhoods of the city of Barcelona (2013-2017).

* `deaths.csv:` Deaths by quinquennial ages and by neighbourhoods of the city of Barcelona (2015-2017).

* `immigrants_by_nationality.csv:` Immigrants by nationality and by neighbourhoods of the city of Barcelona (2015-2017).

* `immigrants_emigrants_by_age.csv:` Immigrants and emigrants by quinquennial ages and by neighbourhood of the city of Barcelona (2015-2017).

* `immigrants_emigrants_by_destination.csv:` Immigrants and emigrants by place of origin and destination, respectively (2017).

* `immigrants_emigrants_by_destination2.csv :` Immigrants and emigrants by place of origin and destination, respectively, and by neighbourhoods of the city of Barcelona (2017).

* `immigrants_emigrants_by_sex.csv:` Immigrants and emigrants by sex by neighbourhoods of the city of Barcelona (2013-2017).

* `most_frequent_baby_names.csv:` 25 Most common baby names in Barcelona, disaggregated by sex. Years 1996-2016.

* `most_frequent_names.csv:` 50 Most common names of the inhabitants of Barcelona, disaggregated by decade of birth and sex.

* `population.csv:` Population by neighbourhood, by quinquennial ages and by genre of the city of Barcelona (2013-2017). Reading registers of inhabitants.

* `unemployment.csv : ` Registered unemployement by neighbourhood and genre in the city of Barcelona (2013-2017).

# Import libraries

In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns 
%matplotlib inline 

# 1) Analysis of barcelona births (2013-2017)

In [None]:
df_births=pd.read_csv('../input/births.csv')

In [None]:
df_births.info()

In [None]:
df_births.head()

In [None]:
df_births['District Name'].value_counts()

# Total births per year

In [None]:
Number_by_year=df_births[['Year','Gender','Number']].groupby(['Year']).sum()
Number_by_year.sort_values(by=['Number'],ascending=False)

# Births by year and by gender

In [None]:
df_births[['Year','Gender','Number']].groupby(['Year','Gender']).sum()

#  Number of births by district

It was necessary to format the indexes to create a dataframe and thus be able to access the data

In [None]:
Dist_birth=df_births[['District Name','Gender','Number']].groupby(['District Name']).sum()
test=pd.DataFrame(Dist_birth)
New_Dist_birth=test.reset_index(inplace=False) 
New_Dist_birth.sort_values(by=['Number'],ascending=False)

# Births visualization by district

In [None]:
plt.figure(figsize=(14,8))
plt.title('Births by district')
sns.set_style(style='darkgrid')
sns.barplot(x="Number", y="District Name", data=New_Dist_birth.sort_values(by=['Number'],ascending=False))

**ANALISIS :** As we can see the districts with more births are `"Sant Martí"` and `"Eixample"`


In [None]:
df_births.describe()

# District with the highest birth registration

In [None]:
df_births[df_births['Number']==283]

# District that does not register births

In [None]:
df_births[df_births['Number']==0]

# Maximum number of births per district

**NOTA :** Indexes are reformed for better data management 

In [None]:
Max_neig=df_births[['District Name','Number']].groupby(['District Name']).max()
test=pd.DataFrame(Max_neig)
New_Max_neig=test.reset_index(inplace=False)
New_Max_neig

**We create an empty dataframe to fill it with our cycle for**

We use this cycle to extract the name of the neighborhood with the most registered births within a district.

In [None]:
barrio=pd.DataFrame()
for i in range(11): 
    barrio=barrio.append(df_births[(df_births['Number']==New_Max_neig['Number'][i])
                & (df_births['District Name']==New_Max_neig['District Name'][i])])



# Maximum births in each district by neighborhood

In the following Dataframe we can see the name of the neighborhood with more births registered by district, the gender and the year in which it was registered.

In [None]:
barrio[['Year','District Name','Neighborhood Name','Gender','Number']]

# 2) Analysis of the death of barcelona (2015-2017)

In [None]:
df_death=pd.read_csv('../input/deaths.csv')

In [None]:
df_death.head()

In [None]:
df_death.info()

In [None]:
death_byAge=df_death[['Age','Number']].groupby(['Age']).sum()
test=pd.DataFrame(death_byAge)
New_death_byAge=test.reset_index(inplace=False)
New_death_byAge


# Visualization of the number of deaths by age range

As we can see in our graph, the greatest number of deaths is in the following age range:

1. ** 85-89 ** with a rate of 10187 deaths
2. ** 90-94 ** with a rate of 9009 deaths
3. ** 80-84 ** with a rate of 7589 deaths

In [None]:
plt.figure(figsize=(14,8))
plt.title('Deaths by Age Range ')
sns.set_style(style='darkgrid')
sns.barplot(x="Number", y="Age", data=New_death_byAge.sort_values(by=['Number'],ascending=False))


# Visualization of the number of registered deaths by district

Below we can see both in the table and in the graph that the districts that register the highest death per district are:

1. Eixample `8128`
2. Sant Martí `6148`
3. Horta-Guinardó `5445`
4. Sants-Montjuïc `5053`

In [None]:
death_byDist=df_death[['District.Name','Number']].groupby(['District.Name']).sum()
test=pd.DataFrame(death_byDist)
New_death_byDist=test.reset_index(inplace=False)
New_death_byDist.sort_values(by=['Number'],ascending=False)

In [None]:
plt.figure(figsize=(14,8))
plt.title('Deaths by district')
sns.set_style(style='darkgrid')
sns.barplot(x="Number", y="District.Name", data=New_death_byDist.sort_values(by=['Number'],ascending=False))


## ANALYSIS:

`** MAXIMUM VALUES BY DISTRICT **`
Now we have to find out in which neighborhood of each district are presenting the highest records of deaths

In [None]:
Max_Death_neig=df_death[['District.Name','Number']].groupby(['District.Name']).max()
test=pd.DataFrame(Max_Death_neig)
New_Max_Death_neig=test.reset_index(inplace=False)
New_Max_Death_neig


** `Creation of the cycle to find out which neighborhoods have the highest number of deaths per district` **

In [None]:
death_barrio=pd.DataFrame()
for i in range(10): 
    death_barrio=death_barrio.append(df_death[(df_death['Number']==New_Max_Death_neig['Number'][i])
                & (df_death['District.Name']==New_Max_Death_neig['District.Name'][i])])


# Neighborhood with the most registered deaths by district

The following table shows the name of the neighborhood with the highest registered deaths by district as well as the year in which it was registere

In [None]:
death_barrio[['Year','District.Name','Neighborhood.Name','Number']]

# 3) Analisis de los immigrants_by_nationality (2015-2017)

In [None]:
df_imigrants=pd.read_csv('../input/immigrants_by_nationality.csv')

In [None]:
df_imigrants.info()

In [None]:
df_imigrants.head()

In [None]:
df_imigrants['Nationality'].nunique()

In [None]:
#  There are a total of 177 nationalities


# Top 15 of the nationalities that most enter Barcelona

In [None]:
Number_immi=df_imigrants[['Nationality','Number']].groupby(['Nationality']).sum()
test=pd.DataFrame(Number_immi)
New_Number_immi=test.reset_index(inplace=False)
New_Number_immi.sort_values(by=['Number'],ascending=False).head(15)


# Visualization of the number of immigrants from Barcelona

In [None]:
plt.figure(figsize=(14,8))
plt.title('IMIGRANTS BARCELONA')
sns.set_style(style='darkgrid')
sns.barplot(x="Number", y="Nationality", data=New_Number_immi.sort_values(by=['Number'],ascending=False).head(15))

In [None]:
immi_by_Dist=df_imigrants[['District Name','Number']].groupby(['District Name']).max()
test=pd.DataFrame(immi_by_Dist)
New_immi_by_neig=test.reset_index(inplace=False)
New_immi_by_neig

**`Creation of the cycle to find out which neighborhoods have the highest number of immigrants per district and to which nationality they belong`**

In [None]:
immi_barrio=pd.DataFrame()
for i in range(11): 
    immi_barrio=immi_barrio.append(df_imigrants[(df_imigrants['Number']==New_immi_by_neig['Number'][i])
                & (df_imigrants['District Name']==New_immi_by_neig['District Name'][i])])

In [None]:
immi_barrio[['Year','District Name','Neighborhood Name','Nationality','Number']]


**ANALYSIS:** We can clearly see that the majority of immigrants come from their own country.

# 4) Analysis of immigrants_emigrants_by_age

In [None]:
df_immi_By_Age=pd.read_csv('../input/immigrants_emigrants_by_age.csv')

In [None]:
df_immi_By_Age.info()

In [None]:
df_immi_By_Age.head()

In [None]:
immi_By_Age=df_immi_By_Age[['Age','Immigrants','Emigrants']].groupby(['Age']).sum()
test=pd.DataFrame(immi_By_Age)
New_immi_By_Age=test.reset_index(inplace=False)
New_immi_By_Age

In [None]:
# Initialize the matplotlib figure
f, ax = plt.subplots(figsize=(14, 8))

sns.set_color_codes("muted")
sns.set_style(style='darkgrid')
sns.barplot(x="Immigrants", y="Age", data=New_immi_By_Age,label="IMMIGRANTS",color='r')

sns.set_color_codes("pastel")
plt.title('EMIGRANTS Vs IMMIGRANTS')
sns.set_style(style='darkgrid')
sns.barplot(x="Emigrants", y="Age", data=New_immi_By_Age,label="EMIGRANTS",color='g')

# Add a legend and informative axis label
ax.legend(ncol=2, loc="center right", frameon=True)
sns.despine(left=True, bottom=True)

**ANALYSIS :**

We can clearly see in our graph that the largest number of immigrants is observed in the range of ages between 20 to 34 and the people who migrate the most are between the 25 to 39 ranks in turn we can observe the graph has a distribution behavior normal where the highest concentration of immigrants / emigrants are in the ages of 20 to 49 years.

# 5)  y 6 ) immigrants_emigrants_by_destination 2017

In [None]:
df_iemmi_by_dest=pd.read_csv('../input/immigrants_emigrants_by_destination.csv')

In [None]:
df_iemmi_by_dest.head()


# Top 5 of the destination cities of immigrants coming from Barcelona

As we can see most of the immigrants who leave Barcelona arrive at the cities of `Catalonia` and` Abroad `

In [None]:
df_iemmi_by_dest['from'].value_counts().head(3)

In [None]:
df_iemmi_by_dest[df_iemmi_by_dest['from']=='Barcelona'].sort_values(by='weight',ascending=False).head(5)


# Top 5 of the cities of origin of immigrants arriving in Barcelona

As we can see, most of the immigrants that arrive in Barcelona come from the cities of `Abroad` and` Catalonia `

In [None]:
df_iemmi_by_dest['to'].value_counts().head(3)

In [None]:
df_iemmi_by_dest[df_iemmi_by_dest['to']=='Barcelona'].sort_values(by='weight',ascending=False).head(5)

In [None]:
df_iemmi_by_dest2=pd.read_csv('../input/immigrants_emigrants_by_destination2.csv')

In [None]:
df_iemmi_by_dest2.head() # Barrios de Barcelona 2017


# Top 10 of origin of migrants in 2017

As we can see, the people who most emigrate in Barcelona belong to the neighborhoods of `Abroad` followed by` Catalonia `and` Eixample`.

In [None]:
df_iemmi_by_dest2['from'].nunique()

In [None]:
df_iemmi_by_dest2.groupby(['from']).sum().sort_values(by='weight',ascending=False).head(10)


# Top 10 of the neighborhoods that migrants arrive in 2017
Of the 30 neighborhoods that are part of Barcelona, ​​the most frequent destination is `Catalonia`, followed by` Eixample` and `Sant marti`

In [None]:
df_iemmi_by_dest2['to'].nunique()

In [None]:
df_iemmi_by_dest2.groupby(['to']).sum().sort_values(by='weight',ascending=False).head(10)

# 7)  Immigrants_emigrants_by_sex 2013-2017

In [None]:
immi_By_sex=pd.read_csv('../input/immigrants_emigrants_by_sex.csv')

In [None]:
immi_By_sex.head()

In [None]:
IE_Bysex=immi_By_sex[['Gender','Immigrants','Emigrants']].groupby(['Gender']).sum()
test=pd.DataFrame(IE_Bysex)
New_IE_Bysex=test.reset_index(inplace=False)
New_IE_Bysex

In [None]:
New_IE_Bysex.plot(kind='bar',x='Gender',colormap='rainbow')

In [None]:
iemmi_bar=immi_By_sex[['District Name','Gender','Immigrants','Emigrants']].groupby(['District Name']).sum().sort_values(by='Immigrants',ascending=False)
test=pd.DataFrame(iemmi_bar)
New_iemmi_bar=test.reset_index(inplace=False)
New_iemmi_bar

In [None]:
# Initialize the matplotlib figure
f, ax = plt.subplots(figsize=(14, 8))
plt.title('IMMIGRANTS Vs EMIGRANTS BY DISTRICT')

sns.set_color_codes("muted")
sns.barplot(data=New_iemmi_bar,x='Immigrants',y='District Name',label='Immigrants',color='r')
sns.set_color_codes("pastel")
sns.barplot(data=New_iemmi_bar,x='Emigrants',y='District Name',label='Emigrants',color='g')

# Add a legend and informative axis label
ax.legend(ncol=2, loc="center right", frameon=True)
sns.despine(left=True, bottom=True)

# 8) most_frequent_baby_names 1996-2016

In [None]:
df_baby_names=pd.read_csv('../input/most_frequent_baby_names.csv')

In [None]:
df_baby_names.info()

In [None]:
df_baby_names['Name'].nunique()

In [None]:
# There are 97 types of names between men and women

In [None]:
df_baby_names[df_baby_names['Gender']=='Female']['Name'].nunique()

In [None]:
# There are 50 types of names for girls

In [None]:
df_baby_names[df_baby_names['Gender']=='Male']['Name'].nunique()

In [None]:
# There are 47 types of names for children

In [None]:
df_baby_names['Order'].nunique()


# Top 10 of the most common names among girls 1996-2016

In [None]:
Most_freq=df_baby_names[df_baby_names['Gender']=='Female'][['Name','Frequency']].groupby(['Name']).sum().sort_values(by='Frequency',ascending=False).head(10)
test=pd.DataFrame(Most_freq)
New_Most_freq=test.reset_index(inplace=False)
New_Most_freq

In [None]:
sns.barplot(data=New_Most_freq,x='Frequency',y='Name')


# Top 10 of the most common names among barons 1996-2016

In [None]:
Most_freqM=df_baby_names[df_baby_names['Gender']=='Male'][['Name','Frequency']].groupby(['Name']).sum().sort_values(by='Frequency',ascending=False).head(10)
test=pd.DataFrame(Most_freqM)
New_Most_freqM=test.reset_index(inplace=False)
New_Most_freqM

In [None]:
sns.barplot(data=New_Most_freqM,x='Frequency',y='Name')


# 9) Most frequent names among the inhabitants of Barcelona

In [None]:
df_mf_names=pd.read_csv('../input/most_frequent_names.csv')

In [None]:
df_mf_names.info()

In [None]:
df_mf_names['Decade'].unique()

In [None]:
df_mf_names['Name'].nunique()

In [None]:
# There are 276 different names among men and women in total

In [None]:
df_mf_names[df_mf_names['Gender']=='Female']['Name'].nunique()

In [None]:
# There are 153 different names among women

In [None]:
df_mf_names[df_mf_names['Gender']=='Male']['Name'].nunique()

In [None]:
# There are 123 different names among men

# Top 10 of the most frequent women's names in Barcelona

In [None]:
mf_namesW=df_mf_names[df_mf_names['Gender']=='Female'][['Name','Frequency']].groupby('Name').sum().sort_values(by='Frequency',ascending=False).head(10)
test=pd.DataFrame(mf_namesW)
New_mf_namesW=test.reset_index(inplace=False)
New_mf_namesW

In [None]:
plt.title('TOP 10 NAMES WOMEN ')
sns.barplot(data=New_mf_namesW,x='Frequency',y='Name')

# Top 10 of the names of the most frequent men in Barcelona

In [None]:
mf_namesM=df_mf_names[df_mf_names['Gender']=='Male'][['Name','Frequency']].groupby('Name').sum().sort_values(by='Frequency',ascending=False).head(10)
test=pd.DataFrame(mf_namesM)
New_mf_namesM=test.reset_index(inplace=False)
New_mf_namesM

In [None]:
plt.title('TOP 10 NAMES MEN ')
sns.barplot(data=New_mf_namesM,x='Frequency',y='Name')

# Most common names by decada in women

In [None]:
mf_by_dec=df_mf_names[['Name','Gender','Decade','Frequency']]

In [None]:
Women_dec=mf_by_dec[mf_by_dec['Gender']=='Female'][['Decade','Frequency']].groupby(['Decade']).max()
test=pd.DataFrame(Women_dec)
New_women_dec=test.reset_index(inplace=False)
New_women_dec

In [None]:
women_mf=pd.DataFrame()
for i in range(11):
    women_mf=women_mf.append(df_mf_names[(df_mf_names['Frequency']==New_women_dec['Frequency'][i]) & (df_mf_names['Gender']=='Female')])

In [None]:
women_mf[['Decade','Name','Frequency']]

# Most common name for decades among men

In [None]:
Men_dec=mf_by_dec[mf_by_dec['Gender']=='Male'][['Decade','Frequency']].groupby(['Decade']).max()
test=pd.DataFrame(Men_dec)
New_Men_dec=test.reset_index(inplace=False)
New_Men_dec

In [None]:
men_mf=pd.DataFrame()
for i in range(11):
    men_mf=men_mf.append(df_mf_names[(df_mf_names['Frequency']==New_Men_dec['Frequency'][i]) & (df_mf_names['Gender']=='Male')])

In [None]:
men_mf[['Decade','Name','Frequency']]

# 10) population.csv 2013-2017

In [None]:
df_popu=pd.read_csv('../input/population.csv')

In [None]:
df_popu.head()

In [None]:
popu=df_popu[['Age','Number']].groupby(['Age']).sum()
test=pd.DataFrame(popu)
New_popu=test.reset_index(inplace=False)

plt.figure(figsize=(12,8))
sns.barplot(data=New_popu.sort_values(by='Number',ascending=False),x='Number',y='Age')

# 11) unemployment.csv

In [None]:
Unemplo=pd.read_csv('../input/unemployment.csv')
Unemplo.head()

In [None]:
Unemplo.info()

# Number of registered unemployed Vs the established demands

In [None]:
Unemplo['Demand_occupation'].value_counts()

# Percentage of unemployed people Vs people with established demands

In [None]:
Do_unemplo=Unemplo[['Gender','Demand_occupation','Number']].groupby('Demand_occupation').sum()
Do_unemplo

In [None]:
plt.pie(Do_unemplo
        ,autopct='%1.1f%%',shadow=True,labels=['Reg_Unmp','Unmp_Dem'],startangle=90,radius=0.8)
plt.legend(loc=4)

#  Unemployment  for men

In [None]:
Unemplo[Unemplo['Gender']=='Male'][['Demand_occupation','Number']].groupby(['Demand_occupation']).sum()

#  Unemployment  for women

In [None]:
Unemplo[Unemplo['Gender']=='Female'][['Demand_occupation','Number']].groupby(['Demand_occupation']).sum()


# Top 5 with more unemployment

In [None]:
Unemplo[Unemplo['Demand_occupation']=='Registered unemployed'][['District Name','Neighborhood Name','Number']].groupby(['District Name']).sum().sort_values(by='Number',ascending=False).head(5)


# Top 5 with more unemployment claims

In [None]:
Unemplo[Unemplo['Demand_occupation']=='Unemployment demand'][['District Name','Neighborhood Name','Number']].groupby(['District Name']).sum().sort_values(by='Number',ascending=False).head(5)

**FINAL.....**