# **Introducción a Python para Análisis de Datos**
## Capítulo 3: Manipulación de Datos - Soluciones a Ejercicios
---
Acerca de este notebook
* **Autor:** Juan Martin Bellido
* **Descripción:** *este notebook incluye las soluciones a los ejercicios del capítulo 3*
* **¿Feedback? ¿comentarios?** Por favor compártelo conmigo escribiéndome por [LinkedIn](https://www.linkedin.com/in/jmartinbellido/)


# Capítulo 3: Ejercicios
---

### Ejercicio #1

##### 1.A Calcular suma total de facturación (revenue), según sector
##### 1.B Repetir el ejercicio anterior, pero filtrando únicamente por sectores *Technology, Energy y Retailing*

> Dataset https://data-wizards.s3.amazonaws.com/datasets/fortune1000.csv


In [None]:
# importamos librería y df
import pandas as pd
df_fortune = pd.read_csv("https://data-wizards.s3.amazonaws.com/datasets/fortune1000.csv")
df_fortune.dtypes

Rank          int64
Company      object
Sector       object
Industry     object
Location     object
Revenue       int64
Profits       int64
Employees     int64
dtype: object

In [None]:
# EX 1.A
df_fortune.groupby("Sector").agg(
    {"Revenue":"sum"}
).sort_values("Revenue",ascending=False)

Unnamed: 0_level_0,Revenue
Sector,Unnamed: 1_level_1
Financials,2217159
Health Care,1614707
Energy,1517809
Retailing,1465076
Technology,1377600
"Food, Beverages & Tobacco",555967
Industrials,497581
Food and Drug Stores,483769
Motor Vehicles & Parts,482540
Telecommunications,461834


In [None]:
# EX 1.B
cond = df_fortune["Sector"].isin(["Technology","Energy","Retailing"])

df_fortune[cond].groupby("Sector").agg(
    {"Revenue":"sum"}
).sort_values("Revenue",ascending=False)

Unnamed: 0_level_0,Revenue
Sector,Unnamed: 1_level_1
Energy,1517809
Retailing,1465076
Technology,1377600


### Ejercicio #2


##### Extrear el top 5 planetas (homeworlds) con mayor número de personajes incluídos en el dataset.

> Dataset https://data-wizards.s3.amazonaws.com/datasets/starwarsdb_people.csv


In [None]:
# importamos librería y df
import pandas as pd
df_starwars_people = pd.read_csv("https://data-wizards.s3.amazonaws.com/datasets/starwarsdb_people.csv")
df_starwars_people.dtypes

name           object
height        float64
mass          float64
hair_color     object
skin_color     object
eye_color      object
birth_year    float64
gender         object
homeworld      object
species        object
sex            object
dtype: object

In [None]:
df_starwars_people.groupby('homeworld').agg({
  "name":"nunique"   
}).rename(
    {"name":"count_characteres"}
    ,axis='columns'
).sort_values('count_characteres',ascending=False)\
.iloc[:5,]

Unnamed: 0_level_0,count_characteres
homeworld,Unnamed: 1_level_1
Naboo,11
Tatooine,10
Alderaan,3
Kamino,3
Coruscant,3


### Ejercicio #3

##### Agregar las siguiente métricas según continente,

*   *Población total*
*   *PIB per cápita medio*
*   *media de % población viviendo por debajo de la línea de la pobreza*

> Dataset https://data-wizards.s3.amazonaws.com/datasets/dataset_na_who.csv


In [1]:
# importamos librería y df
import pandas as pd
df_who = pd.read_csv('https://data-wizards.s3.amazonaws.com/datasets/dataset_na_who.csv')
df_who.dtypes

Country                                                    object
CountryID                                                   int64
ContinentID                                                 int64
Adolescent fertility rate (%)                             float64
Adult literacy rate (%)                                   float64
Gross national income per capita (PPP international $)    float64
Net primary school enrolment ratio female (%)             float64
Net primary school enrolment ratio male (%)               float64
Population (in thousands) total                           float64
Population annual growth rate (%)                         float64
Population in urban areas (%)                             float64
Population living below the poverty line                  float64
Continent                                                  object
dtype: object

In [5]:
output = df_who.groupby('Continent')\
  .agg({
      'Population (in thousands) total':'sum'
      ,'Gross national income per capita (PPP international $)':'mean'
      ,'Population living below the poverty line':'mean'
  })\
  .rename(columns={
      'Population (in thousands) total':'population',
      'Gross national income per capita (PPP international $)':'avg_GDP',
      'Population living below the poverty line':'avg_pop_below_pov_line'
  })

output['population'] = output.population.round(0)
output['avg_GDP'] = output.avg_GDP.round(2)
output['avg_pop_below_pov_line'] = output.avg_pop_below_pov_line.round(2)

output

Unnamed: 0_level_0,population,avg_GDP,avg_pop_below_pov_line
Continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Africa,759147.0,3127.95,35.84
Asia,2859153.0,2865.56,28.31
Europe,880241.0,19777.08,2.59
Middle East,336867.0,14893.53,2.37
North America,441464.0,24524.0,3.0
Oceania,714480.0,11716.25,12.42
South America,453480.0,7397.14,14.04


### Ejercicio #4

La tabla de datos a importar contiene valores de cotización diarios para el Bitcoin. Agregar DataFrame para calcular media mensual de valor de cotización de apertura.  

> Dataset https://data-wizards.s3.amazonaws.com/datasets/dataset_bitcoin.csv

In [None]:
# importamos librería y df
import pandas as pd
df_btc = pd.read_csv('https://data-wizards.s3.amazonaws.com/datasets/dataset_bitcoin.csv')
df_btc.dtypes

date           object
Open          float64
High          float64
Low           float64
Close         float64
volume_BTC    float64
volume_usd    float64
dtype: object

In [None]:
# forzamos variable date a formato fecha
df_btc['date'] = pd.to_datetime(df_btc.date)
df_btc.dtypes

date          datetime64[ns]
Open                 float64
High                 float64
Low                  float64
Close                float64
volume_BTC           float64
volume_usd           float64
dtype: object

In [None]:
# calculamos campos de fecha
df_btc['month'] = df_btc.date.dt.month
df_btc['year'] = df_btc.date.dt.year

In [None]:
# agregamos data, agrupando según campos de fecha calculados
df = df_btc.groupby(['year','month']).agg({'Open':'mean'}).rename(columns={'Open':'avg_open'})
df

Unnamed: 0_level_0,Unnamed: 1_level_0,avg_open
year,month,Unnamed: 2_level_1
2017,1,910.993226
2017,2,1056.590714
2017,3,1130.477419
2017,4,1203.115333
2017,5,1844.232258
2017,6,2613.179
2017,7,2496.058065
2017,8,3803.843548
2017,9,4100.085
2017,10,5276.062903


In [None]:
# redondeamos decimales
df['avg_open'] = df['avg_open'].round(2)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,avg_open
year,month,Unnamed: 2_level_1
2017,1,910.99
2017,2,1056.59
2017,3,1130.48
2017,4,1203.12
2017,5,1844.23
2017,6,2613.18
2017,7,2496.06
2017,8,3803.84
2017,9,4100.08
2017,10,5276.06
