# Saturdays.AI Guadalajara 4a Ed. 
### Pandas

#### Luis Román

### Agosto de 2022

Pandas o Python Data Analysis Library, es una de las mejores librerías para manipulación de datos

![imagen.png](attachment:imagen.png)

In [1]:
!pip install pandas



In [2]:
# Importación
import pandas as pd

### Pandas Series

In [3]:
# Iniciar un pandas series a partir de una lista de Python
frutas = ["Plátano", "Fresa", "Mango", "Durazno"]
frutas_series = pd.Series(frutas)
print(frutas_series)
print(type(frutas_series))

0    Plátano
1      Fresa
2      Mango
3    Durazno
dtype: object
<class 'pandas.core.series.Series'>


In [4]:
# A partir de un diccionario
transportes = {1:"Bicicleta", 2:"Motocicleta", 3:"Automóvil", 4:"Camión"}
transportes_series = pd.Series(transportes)
transportes_series

1      Bicicleta
2    Motocicleta
3      Automóvil
4         Camión
dtype: object

In [5]:
# Acceder a los valores de una serie de pandas usando el índice
transportes_series[1]

'Bicicleta'

### Dataframes

In [6]:
# Diccionario de platillos con su nombre y precio
platillos = {
    "nombre":["Hot-Cakes", "Molletes", "Quesadillas"],
    "precio":[120,50,60]
}

df = pd.DataFrame(platillos)
df

Unnamed: 0,nombre,precio
0,Hot-Cakes,120
1,Molletes,50
2,Quesadillas,60


In [7]:
df['nombre']

0      Hot-Cakes
1       Molletes
2    Quesadillas
Name: nombre, dtype: object

In [8]:
# Verificar los índices
print(df.columns)

print(df.index)

Index(['nombre', 'precio'], dtype='object')
RangeIndex(start=0, stop=3, step=1)


Pandas no cuenta con dataframes de prueba como R, por lo que para usar datos de prueba podemos utilizar las funciones generadoras de pd.util.testing https://towardsdatascience.com/4-pandas-tricks-that-most-people-dont-know-86a70a007993, 
métodos que actualmente no aparecen en la documentación. 

O podemos acceder al amplio catálogo de Kaggle https://www.kaggle.com/datasets

In [9]:
# Dataframes de prueba
# Generar un df con datos mezclados
df_prueba = pd.util.testing.makeMixedDataFrame()
df_prueba

  import pandas.util.testing


Unnamed: 0,A,B,C,D
0,0.0,0.0,foo1,2009-01-01
1,1.0,1.0,foo2,2009-01-02
2,2.0,0.0,foo3,2009-01-05
3,3.0,1.0,foo4,2009-01-06
4,4.0,0.0,foo5,2009-01-07


### Dataframes desde un archivo (csv, xml, json)

Para consulta: https://pandas.pydata.org/docs/reference/io.html

#### csv

In [10]:
"""
Podemos indicar el separador mediante el argumento "sep", 
La localización del encabezado usando "header"
y sobreescribir los nombres de las columnas con "names"
"""
df_peliculas = pd.read_csv('TopAnimatedImDb.csv', header=0)
df_peliculas

Unnamed: 0,Title,Rating,Votes,Gross,Genre,Metascore,Certificate,Director,Year,Description,Runtime
0,Sen to Chihiro no kamikakushi,8.6,747148,$10.06M,"Adventure, Family",96.0,U,Hayao Miyazaki,2001,"[""\nDuring her family's move to the suburbs, a...",125 min
1,The Lion King,8.5,1041158,$422.78M,"Adventure, Drama",88.0,U,Roger Allers,1994,['\nLion prince Simba and his father are targe...,88 min
2,Hotaru no haka,8.5,272469,,"Drama, War",94.0,U,Isao Takahata,1988,['\nA young boy and his little sister struggle...,89 min
3,Kimi no na wa.,8.4,259975,$5.02M,"Drama, Fantasy",79.0,U,Makoto Shinkai,2016,['\nTwo strangers find themselves linked in a ...,106 min
4,Spider-Man: Into the Spider-Verse,8.4,510227,$190.24M,"Action, Adventure",87.0,U,Bob Persichetti,2018,['\nTeen Miles Morales becomes the Spider-Man ...,117 min
...,...,...,...,...,...,...,...,...,...,...,...
80,Kung Fu Panda,7.6,463897,$215.43M,"Action, Adventure",74.0,U,Mark Osborne,2008,"[""\nTo everyone's surprise, including his own,...",92 min
81,Mulan,7.6,284542,$120.62M,"Adventure, Comedy",71.0,U,Tony Bancroft,1998,"[""\nTo save her father from death in the army,...",88 min
82,The Little Mermaid,7.6,260026,$111.54M,"Adventure, Comedy",88.0,U,Ron Clements,1989,"[""\nA mermaid princess makes a Faustian bargai...",83 min
83,The Jungle Book,7.6,181528,$141.84M,"Adventure, Comedy",65.0,U,Wolfgang Reitherman,1967,['\nBagheera the Panther and Baloo the Bear ha...,78 min


#### xml

In [11]:
df_libros = pd.read_xml('books.xml')
df_libros

Unnamed: 0,id,author,title,genre,price,publish_date,description
0,bk101,"Gambardella, Matthew",XML Developer's Guide,Computer,44.95,2000-10-01,An in-depth look at creating applications \n ...
1,bk102,"Ralls, Kim",Midnight Rain,Fantasy,5.95,2000-12-16,"A former architect battles corporate zombies, ..."
2,bk103,"Corets, Eva",Maeve Ascendant,Fantasy,5.95,2000-11-17,After the collapse of a nanotechnology \n ...
3,bk104,"Corets, Eva",Oberon's Legacy,Fantasy,5.95,2001-03-10,"In post-apocalypse England, the mysterious \n ..."
4,bk105,"Corets, Eva",The Sundered Grail,Fantasy,5.95,2001-09-10,"The two daughters of Maeve, half-sisters, \n ..."
5,bk106,"Randall, Cynthia",Lover Birds,Romance,4.95,2000-09-02,When Carla meets Paul at an ornithology \n ...
6,bk107,"Thurman, Paula",Splish Splash,Romance,4.95,2000-11-02,A deep sea diver finds true love twenty \n ...
7,bk108,"Knorr, Stefan",Creepy Crawlies,Horror,4.95,2000-12-06,"An anthology of horror stories about roaches,\..."
8,bk109,"Kress, Peter",Paradox Lost,Science Fiction,6.95,2000-11-02,After an inadvertant trip through a Heisenberg...
9,bk110,"O'Brien, Tim",Microsoft .NET: The Programming Bible,Computer,36.95,2000-12-09,Microsoft's .NET initiative is explored in \n ...


#### json

In [12]:
df_posts = pd.read_json('posts.json')
df_posts

Unnamed: 0,userId,id,title,body
0,1,1,sunt aut facere repellat provident occaecati e...,quia et suscipit\nsuscipit recusandae consequu...
1,1,2,qui est esse,est rerum tempore vitae\nsequi sint nihil repr...
2,1,3,ea molestias quasi exercitationem repellat qui...,et iusto sed quo iure\nvoluptatem occaecati om...
3,1,4,eum et est occaecati,ullam et saepe reiciendis voluptatem adipisci\...
4,1,5,nesciunt quas odio,repudiandae veniam quaerat sunt sed\nalias aut...
...,...,...,...,...
95,10,96,quaerat velit veniam amet cupiditate aut numqu...,in non odio excepturi sint eum\nlabore volupta...
96,10,97,quas fugiat ut perspiciatis vero provident,eum non blanditiis soluta porro quibusdam volu...
97,10,98,laboriosam dolor voluptates,doloremque ex facilis sit sint culpa\nsoluta a...
98,10,99,temporibus sit alias delectus eligendi possimu...,quo deleniti praesentium dicta non quod\naut e...


### Descripción del dataframe

In [13]:
# Mostrar los estadísticos básicos
df_peliculas.describe()

Unnamed: 0,Rating,Metascore,Year
count,85.0,80.0,85.0
mean,7.922353,80.85,2003.211765
std,0.257449,8.965616,15.25583
min,7.6,59.0,1937.0
25%,7.7,74.0,1995.0
50%,7.9,82.0,2007.0
75%,8.1,87.0,2014.0
max,8.6,96.0,2021.0


In [14]:
# Mostrar la forma de un df
df_peliculas.shape

(85, 11)

In [15]:
# Información del df
df_peliculas.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 85 entries, 0 to 84
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Title        85 non-null     object 
 1   Rating       85 non-null     float64
 2   Votes        85 non-null     object 
 3   Gross        50 non-null     object 
 4   Genre        85 non-null     object 
 5   Metascore    80 non-null     float64
 6   Certificate  84 non-null     object 
 7   Director     85 non-null     object 
 8   Year         85 non-null     int64  
 9   Description  85 non-null     object 
 10  Runtime      85 non-null     object 
dtypes: float64(2), int64(1), object(8)
memory usage: 7.4+ KB


In [16]:
# Conteo de valores
df_peliculas.count()

Title          85
Rating         85
Votes          85
Gross          50
Genre          85
Metascore      80
Certificate    84
Director       85
Year           85
Description    85
Runtime        85
dtype: int64

In [17]:
# Obtener las filas superiores
df_peliculas.head()

Unnamed: 0,Title,Rating,Votes,Gross,Genre,Metascore,Certificate,Director,Year,Description,Runtime
0,Sen to Chihiro no kamikakushi,8.6,747148,$10.06M,"Adventure, Family",96.0,U,Hayao Miyazaki,2001,"[""\nDuring her family's move to the suburbs, a...",125 min
1,The Lion King,8.5,1041158,$422.78M,"Adventure, Drama",88.0,U,Roger Allers,1994,['\nLion prince Simba and his father are targe...,88 min
2,Hotaru no haka,8.5,272469,,"Drama, War",94.0,U,Isao Takahata,1988,['\nA young boy and his little sister struggle...,89 min
3,Kimi no na wa.,8.4,259975,$5.02M,"Drama, Fantasy",79.0,U,Makoto Shinkai,2016,['\nTwo strangers find themselves linked in a ...,106 min
4,Spider-Man: Into the Spider-Verse,8.4,510227,$190.24M,"Action, Adventure",87.0,U,Bob Persichetti,2018,['\nTeen Miles Morales becomes the Spider-Man ...,117 min


In [18]:
# Ordenar el dataframe
df_peliculas.sort_values('Year')

Unnamed: 0,Title,Rating,Votes,Gross,Genre,Metascore,Certificate,Director,Year,Description,Runtime
84,Snow White and the Seven Dwarfs,7.6,197860,$184.93M,"Adventure, Family",95.0,U,William Cottrell,1937,['\nExiled into the dangerous forest by her wi...,83 min
74,Fantasia,7.7,96094,$76.41M,"Family, Fantasy",96.0,U,James Algar,1940,['\nA collection of animated interpretations o...,125 min
83,The Jungle Book,7.6,181528,$141.84M,"Adventure, Comedy",65.0,U,Wolfgang Reitherman,1967,['\nBagheera the Panther and Baloo the Bear ha...,78 min
73,La planète sauvage,7.7,31226,$0.19M,Sci-Fi,73.0,U,René Laloux,1973,['\nOn a faraway planet where blue giants rule...,72 min
37,Kaze no tani no Naushika,8.0,168233,,"Adventure, Sci-Fi",86.0,U,Hayao Miyazaki,1984,['\nWarrior and pacifist Princess Nausicaä des...,117 min
...,...,...,...,...,...,...,...,...,...,...,...
60,Toy Story 4,7.7,245779,$434.04M,"Adventure, Comedy",84.0,U,Josh Cooley,2019,"['\nWhen a new toy called ""Forky"" joins Woody ...",100 min
24,WolfWalkers,8.0,31618,,"Adventure, Family",87.0,UA,Tomm Moore,2020,['\nA young apprentice hunter and her father j...,103 min
12,Kimetsu no Yaiba: Mugen Ressha-Hen,8.2,55255,,"Action, Adventure",72.0,UA,Haruo Sotozaki,2020,"[""\nAfter his family was brutally murdered and...",117 min
25,Soul,8.0,319513,,"Adventure, Comedy",83.0,U,Pete Docter,2020,"['\nAfter landing the gig of a lifetime, a New...",100 min


### Filtrado de datos

![imagen.png](attachment:imagen.png)

Consulta a profundidad:

https://www.educba.com/pandas-loc-vs-iloc/  
https://towardsdatascience.com/loc-vs-iloc-in-pandas-heres-the-difference-16cd4bcbecab

#### loc

Filtrado a través de labels, inclusivo.

In [19]:
df_peliculas.loc[0:5, ['Title','Genre']]

Unnamed: 0,Title,Genre
0,Sen to Chihiro no kamikakushi,"Adventure, Family"
1,The Lion King,"Adventure, Drama"
2,Hotaru no haka,"Drama, War"
3,Kimi no na wa.,"Drama, Fantasy"
4,Spider-Man: Into the Spider-Verse,"Action, Adventure"
5,Coco,"Adventure, Comedy"


#### iloc

Filtrado a través de índice, no inclusivo

In [20]:
# Traer todas las filas, pero basarse por índice de las columnas
df_peliculas.iloc[0:5, 0:5]

Unnamed: 0,Title,Rating,Votes,Gross,Genre
0,Sen to Chihiro no kamikakushi,8.6,747148,$10.06M,"Adventure, Family"
1,The Lion King,8.5,1041158,$422.78M,"Adventure, Drama"
2,Hotaru no haka,8.5,272469,,"Drama, War"
3,Kimi no na wa.,8.4,259975,$5.02M,"Drama, Fantasy"
4,Spider-Man: Into the Spider-Verse,8.4,510227,$190.24M,"Action, Adventure"


In [21]:
df_peliculas.iloc[:3, 3:]

Unnamed: 0,Gross,Genre,Metascore,Certificate,Director,Year,Description,Runtime
0,$10.06M,"Adventure, Family",96.0,U,Hayao Miyazaki,2001,"[""\nDuring her family's move to the suburbs, a...",125 min
1,$422.78M,"Adventure, Drama",88.0,U,Roger Allers,1994,['\nLion prince Simba and his father are targe...,88 min
2,,"Drama, War",94.0,U,Isao Takahata,1988,['\nA young boy and his little sister struggle...,89 min


### Filtrado por condiciones

In [22]:
# Generar una lista de resultados booleanos de una condición
condicion_peliculas_miyazaki = df_peliculas['Director'] == 'Hayao Miyazaki'

df_peliculas[condicion_peliculas_miyazaki]

Unnamed: 0,Title,Rating,Votes,Gross,Genre,Metascore,Certificate,Director,Year,Description,Runtime
0,Sen to Chihiro no kamikakushi,8.6,747148,$10.06M,"Adventure, Family",96.0,U,Hayao Miyazaki,2001,"[""\nDuring her family's move to the suburbs, a...",125 min
7,Mononoke-hime,8.4,388925,$2.38M,"Adventure, Fantasy",76.0,U,Hayao Miyazaki,1997,"[""\nOn a journey to find the cure for a Tatari...",134 min
13,Hauru no ugoku shiro,8.2,387977,$4.71M,"Adventure, Family",80.0,U,Hayao Miyazaki,2004,['\nWhen an unconfident young woman is cursed ...,119 min
23,Tonari no Totoro,8.1,333845,$1.11M,"Comedy, Family",86.0,U,Hayao Miyazaki,1988,['\nWhen two girls move to the country to be n...,86 min
36,Tenkû no shiro Rapyuta,8.0,165250,,"Adventure, Family",78.0,U,Hayao Miyazaki,1986,['\nA young boy and a girl with a magic crysta...,125 min
37,Kaze no tani no Naushika,8.0,168233,,"Adventure, Sci-Fi",86.0,U,Hayao Miyazaki,1984,['\nWarrior and pacifist Princess Nausicaä des...,117 min
54,Majo no takkyûbin,7.8,143013,,"Adventure, Family",83.0,U,Hayao Miyazaki,1989,"['\nA young witch, on her mandatory year of in...",103 min
59,Kaze tachinu,7.7,85123,$5.21M,"Biography, Drama",83.0,Not Rated,Hayao Miyazaki,2013,"['\nA look at the life of Jiro Horikoshi, the ...",126 min
71,Kurenai no buta,7.7,89026,,"Adventure, Comedy",83.0,U,Hayao Miyazaki,1992,"['\nIn 1930s Italy, a veteran World War I pilo...",94 min
79,Gake no ue no Ponyo,7.6,142408,$15.09M,"Adventure, Comedy",86.0,U,Hayao Miyazaki,2008,['\nA five-year-old boy develops a relationshi...,101 min


In [23]:
# Utilizar dos condiciones
df_peliculas[(condicion_peliculas_miyazaki) & (df_peliculas['Metascore'] > 85.0)]

Unnamed: 0,Title,Rating,Votes,Gross,Genre,Metascore,Certificate,Director,Year,Description,Runtime
0,Sen to Chihiro no kamikakushi,8.6,747148,$10.06M,"Adventure, Family",96.0,U,Hayao Miyazaki,2001,"[""\nDuring her family's move to the suburbs, a...",125 min
23,Tonari no Totoro,8.1,333845,$1.11M,"Comedy, Family",86.0,U,Hayao Miyazaki,1988,['\nWhen two girls move to the country to be n...,86 min
37,Kaze no tani no Naushika,8.0,168233,,"Adventure, Sci-Fi",86.0,U,Hayao Miyazaki,1984,['\nWarrior and pacifist Princess Nausicaä des...,117 min
79,Gake no ue no Ponyo,7.6,142408,$15.09M,"Adventure, Comedy",86.0,U,Hayao Miyazaki,2008,['\nA five-year-old boy develops a relationshi...,101 min


### Manejo de faltantes

df.dropna()  
df.fillna(df.mean())  
df.replace("a", "b" )  

In [24]:
# Cotejar los valores nulos
df_peliculas.isnull()

Unnamed: 0,Title,Rating,Votes,Gross,Genre,Metascore,Certificate,Director,Year,Description,Runtime
0,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,True,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...
80,False,False,False,False,False,False,False,False,False,False,False
81,False,False,False,False,False,False,False,False,False,False,False
82,False,False,False,False,False,False,False,False,False,False,False
83,False,False,False,False,False,False,False,False,False,False,False


In [25]:
# Llenar los valores con la palabar faltante
# df_peliculas.fillna('faltante')

# Llenar los valores faltantes con el promedio de la columna
# df_peliculas.fillna(df_peliculas.mean())

# Llenar los valores faltantes, usando una interpolación
# df_peliculas.interpolate()

In [26]:
# Eliminar las filas que tengan datos faltantes
df_peliculas = df_peliculas.dropna()

df_peliculas.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 50 entries, 0 to 84
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Title        50 non-null     object 
 1   Rating       50 non-null     float64
 2   Votes        50 non-null     object 
 3   Gross        50 non-null     object 
 4   Genre        50 non-null     object 
 5   Metascore    50 non-null     float64
 6   Certificate  50 non-null     object 
 7   Director     50 non-null     object 
 8   Year         50 non-null     int64  
 9   Description  50 non-null     object 
 10  Runtime      50 non-null     object 
dtypes: float64(2), int64(1), object(8)
memory usage: 4.7+ KB


### Datos duplicados

df.unique  
df.duplicated( )  
df.drop_duplicates( , keep= )  
df.index.duplicated()  

Referencia: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html

In [27]:
df_juegos = pd.read_csv('steam-200k.csv' )
df_juegos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 199999 entries, 0 to 199998
Data columns (total 5 columns):
 #   Column                      Non-Null Count   Dtype  
---  ------                      --------------   -----  
 0   151603712                   199999 non-null  int64  
 1   The Elder Scrolls V Skyrim  199999 non-null  object 
 2   purchase                    199999 non-null  object 
 3   1.0                         199999 non-null  float64
 4   0                           199999 non-null  int64  
dtypes: float64(1), int64(2), object(2)
memory usage: 7.6+ MB


In [28]:
duplicados = df_juegos[df_juegos.duplicated(keep=False)]
duplicados.head()

Unnamed: 0,151603712,The Elder Scrolls V Skyrim,purchase,1.0,0
1786,11373749,Sid Meier's Civilization IV,purchase,1.0,0
1967,11373749,Sid Meier's Civilization IV,purchase,1.0,0
1968,11373749,Sid Meier's Civilization IV Beyond the Sword,purchase,1.0,0
1969,11373749,Sid Meier's Civilization IV Beyond the Sword,purchase,1.0,0
1970,11373749,Sid Meier's Civilization IV Warlords,purchase,1.0,0


In [65]:
# Eliminar duplicados
df_juegos_limpio = df_juegos.drop_duplicates()

In [30]:
df_juegos_limpio.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 199292 entries, 0 to 199998
Data columns (total 5 columns):
 #   Column                      Non-Null Count   Dtype  
---  ------                      --------------   -----  
 0   151603712                   199292 non-null  int64  
 1   The Elder Scrolls V Skyrim  199292 non-null  object 
 2   purchase                    199292 non-null  object 
 3   1.0                         199292 non-null  float64
 4   0                           199292 non-null  int64  
dtypes: float64(1), int64(2), object(2)
memory usage: 9.1+ MB


### Agrupamiento de datos

In [31]:
# Contabilizar las películas agrupadas por Director
directores = df_peliculas.groupby(['Director'])['Title'].count()
directores.head()

Director
Andrew Stanton     2
Bob Persichetti    1
Brad Bird          4
Claude Barras      1
Dean DeBlois       1
Name: Title, dtype: int64

In [32]:
# Obtención del total recaudado, agrupado por género y año
peliculas_genero = df_peliculas[['Year','Genre','Gross']].groupby(['Year','Genre']).sum()
peliculas_genero.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Gross
Year,Genre,Unnamed: 2_level_1
1937,"Adventure, Family",$184.93M
1940,"Family, Fantasy",$76.41M
1967,"Adventure, Comedy",$141.84M
1973,Sci-Fi,$0.19M
1988,"Adventure, Comedy",$156.45M
1988,"Comedy, Family",$1.11M
1989,"Adventure, Comedy",$111.54M
1992,"Adventure, Comedy",$217.35M
1993,"Action, Adventure",$5.62M
1994,"Adventure, Drama",$422.78M


### Iteración de valores

In [39]:
# Iterar el df basándonos en columnas
for columna, contenido in df_peliculas.iteritems():
  # column name
  print(columna)
  # column content
  print(contenido)

Title
0          Sen to Chihiro no kamikakushi
1                          The Lion King
3                         Kimi no na wa.
4      Spider-Man: Into the Spider-Verse
5                                   Coco
6                                 WALL·E
7                          Mononoke-hime
8                                     Up
9                            Toy Story 3
10                             Toy Story
11                            Inside Out
13                  Hauru no ugoku shiro
14                          Finding Nemo
19              How to Train Your Dragon
20                           Ratatouille
21                        Monsters, Inc.
22                        The Iron Giant
23                      Tonari no Totoro
30                       The Incredibles
33                               Aladdin
50                           Sennen joyû
51       Cowboy Bebop: Tengoku no tobira
53          Batman: Mask of the Phantasm
55              Kubo and the Two Strings
56        

In [34]:
# Iterar el df basándonos en filas
for index, row in df_peliculas.iterrows():
    print(row['Title'], row['Genre'])

Sen to Chihiro no kamikakushi  Adventure, Family
The Lion King  Adventure, Drama
Kimi no na wa.  Drama, Fantasy
Spider-Man: Into the Spider-Verse  Action, Adventure
Coco  Adventure, Comedy
WALL·E  Adventure, Family
Mononoke-hime  Adventure, Fantasy
Up  Adventure, Comedy
Toy Story 3  Adventure, Comedy
Toy Story  Adventure, Comedy
Inside Out  Adventure, Comedy
Hauru no ugoku shiro  Adventure, Family
Finding Nemo  Adventure, Comedy
How to Train Your Dragon  Action, Adventure
Ratatouille  Adventure, Comedy
Monsters, Inc.  Adventure, Comedy
The Iron Giant  Action, Adventure
Tonari no Totoro  Comedy, Family
The Incredibles  Action, Adventure
Aladdin  Adventure, Comedy
Sennen joyû  Drama, Fantasy
Cowboy Bebop: Tengoku no tobira  Action, Crime
Batman: Mask of the Phantasm  Action, Adventure
Kubo and the Two Strings  Action, Adventure
The Breadwinner  Drama, Family
Omoide no Mânî  Drama, Family
Ma vie de Courgette  Comedy, Drama
Kaze tachinu  Biography, Drama
Toy Story 4  Adventure, Comedy
Wrec

### Apply 

Método que nos permite aplicar funciones al df

In [35]:
# Función que cambia el valor de un número a Pesos Mexicanos
def convertir_pesos(valor):
    return valor * 20

In [36]:
# Aplicar la función
df_libros['price'] = df_libros['price'].apply(convertir_pesos)

In [37]:
# Aplicar la misma función utilizando una lambda
# df_libros['price'] = df_libros['price'].apply(lambda x: x * 20)

# Aplicación de lambda con condicionales
# df['Result'] = df['Maths'].apply(lambda x: 'Pass' if x>=5 else 'Fail')

# Fuente: https://www.geeksforgeeks.org/using-apply-in-pandas-lambda-functions-with-multiple-if-statements/

In [38]:
df_libros

Unnamed: 0,id,author,title,genre,price,publish_date,description
0,bk101,"Gambardella, Matthew",XML Developer's Guide,Computer,899.0,2000-10-01,An in-depth look at creating applications \n ...
1,bk102,"Ralls, Kim",Midnight Rain,Fantasy,119.0,2000-12-16,"A former architect battles corporate zombies, ..."
2,bk103,"Corets, Eva",Maeve Ascendant,Fantasy,119.0,2000-11-17,After the collapse of a nanotechnology \n ...
3,bk104,"Corets, Eva",Oberon's Legacy,Fantasy,119.0,2001-03-10,"In post-apocalypse England, the mysterious \n ..."
4,bk105,"Corets, Eva",The Sundered Grail,Fantasy,119.0,2001-09-10,"The two daughters of Maeve, half-sisters, \n ..."
5,bk106,"Randall, Cynthia",Lover Birds,Romance,99.0,2000-09-02,When Carla meets Paul at an ornithology \n ...
6,bk107,"Thurman, Paula",Splish Splash,Romance,99.0,2000-11-02,A deep sea diver finds true love twenty \n ...
7,bk108,"Knorr, Stefan",Creepy Crawlies,Horror,99.0,2000-12-06,"An anthology of horror stories about roaches,\..."
8,bk109,"Kress, Peter",Paradox Lost,Science Fiction,139.0,2000-11-02,After an inadvertant trip through a Heisenberg...
9,bk110,"O'Brien, Tim",Microsoft .NET: The Programming Bible,Computer,739.0,2000-12-09,Microsoft's .NET initiative is explored in \n ...


### Combinación de dataframes 

![imagen.png](attachment:imagen.png)
![imagen-2.png](attachment:imagen-2.png)

In [53]:
# Créditos dfs https://www.tutorialspoint.com/python_pandas/python_pandas_merging_joining.htm
left = pd.DataFrame({
   'id':[1,2,3,4,5],
   'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
   'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame(
   {'id_new':[1,2,3,4,5],
   'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
   'subject_id':['sub2','sub4','sub3','sub6','sub5']})

#### Concat

In [48]:
# Unir mediante concat, utilizando las filas
pd.concat([left,right], ignore_index=True)

Unnamed: 0,id,Name,subject_id
0,1,Alex,sub1
1,2,Amy,sub2
2,3,Allen,sub4
3,4,Alice,sub6
4,5,Ayoung,sub5
5,1,Billy,sub2
6,2,Brian,sub4
7,3,Bran,sub3
8,4,Bryce,sub6
9,5,Betty,sub5


In [49]:
# Unir mediante concat, utilizando las columnas
pd.concat([left,right], axis=1)

Unnamed: 0,id,Name,subject_id,id.1,Name.1,subject_id.1
0,1,Alex,sub1,1,Billy,sub2
1,2,Amy,sub2,2,Brian,sub4
2,3,Allen,sub4,3,Bran,sub3
3,4,Alice,sub6,4,Bryce,sub6
4,5,Ayoung,sub5,5,Betty,sub5


#### Merge 

In [52]:
# Left join
left.merge(right, on='subject_id')

Unnamed: 0,id_x,Name_x,subject_id,id_y,Name_y
0,2,Amy,sub2,1,Billy
1,3,Allen,sub4,2,Brian
2,4,Alice,sub6,4,Bryce
3,5,Ayoung,sub5,5,Betty


In [55]:
# Si el nombre de la columna tiene nombre diferente pero se puede hacer la unión
# left.merge(right, left_on='id', right_on="id_new")

# Indicar el tipo de join mediante el atributo
# how="left", "inner", "right"

#### Join

In [62]:
left_n = pd.DataFrame({
   'a':[1,2,3,4,5],
   'b': [3,4,6,8,7],
   'c':[9,5,3,7,6]},
    index=['i1', 'i2', 'i3', 'i4', 'i6'])
right_n = pd.DataFrame(
   {'d':[1,2,3,4,5],
   'e': [6,5,7,4,3]},
    index=['i1', 'i2', 'i4', 'i5', 'i9'])

In [64]:
# Se puede indicar el tipo de join usando el atributo how=
left_n.join(right_n, how='right')

Unnamed: 0,a,b,c,d,e
i1,1.0,3.0,9.0,1,6
i2,2.0,4.0,5.0,2,5
i4,4.0,8.0,7.0,3,7
i5,,,,4,4
i9,,,,5,3
