# Joins en Pandas

La operacion de merge nos permite realizar operaciones de join similares a las que se pueden realizar en bases de datos para unir DataFrames o named Series.

In [34]:
# magic function para hacer que los graficos de matplotlib se renderizen en el notebook.
%matplotlib notebook

import datetime as datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

plt.style.use('default') # Make the graphs a bit prettier
plt.rcParams['figure.figsize'] = (15, 5)

## Definicion de los Data Frames de Ejemplo

In [35]:
person_data = {
        'subject_id': ['1', '2', '3', '4', '5'],
        'person_name': ['Person A', 'Person B', 'Person C', 'Person D', 'Person E'],
}

df_a = pd.DataFrame(person_data, columns = ['subject_id', 'person_name'])

In [36]:
df_a

Unnamed: 0,subject_id,person_name
0,1,Person A
1,2,Person B
2,3,Person C
3,4,Person D
4,5,Person E


In [37]:
subject_data = {
        'subject_id': ['1', '2','200'],
        'subject_name': ['Subject 1', 'Subject 2', 'Subject 200'],
}
df_b = pd.DataFrame(subject_data, columns = ['subject_id', 'subject_name'])

In [38]:
df_b

Unnamed: 0,subject_id,subject_name
0,1,Subject 1
1,2,Subject 2
2,200,Subject 200


## Tipos de Join

### Inner Join

Solamente tendremos los registros que tienen en comun el campo por el cual estamos haciendo el join en el dataframe.

In [39]:
pd.merge(df_a, df_b, on='subject_id', how='inner')

Unnamed: 0,subject_id,person_name,subject_name
0,1,Person A,Subject 1
1,2,Person B,Subject 2


### Left Join

Todos los registros del dataframe izquierdo estan en el resultado del join. 

In [40]:
pd.merge(df_a, df_b, on='subject_id', how='left')

Unnamed: 0,subject_id,person_name,subject_name
0,1,Person A,Subject 1
1,2,Person B,Subject 2
2,3,Person C,
3,4,Person D,
4,5,Person E,


### Right Join
Todos los registros del dataframe derecho estan en el resultado del join. 

In [41]:
# notar el cambio de orden de data frames
pd.merge(df_a, df_b, on='subject_id', how='right')

Unnamed: 0,subject_id,person_name,subject_name
0,1,Person A,Subject 1
1,2,Person B,Subject 2
2,200,,Subject 200


### Outer/Full Join

Todos los registros ambos dataframes estan en el resultado del join. 

In [42]:
pd.merge(df_a, df_b, on='subject_id', how='outer')

Unnamed: 0,subject_id,person_name,subject_name
0,1,Person A,Subject 1
1,2,Person B,Subject 2
2,3,Person C,
3,4,Person D,
4,5,Person E,
5,200,,Subject 200


### Resumen

En el siguiente grafico se pueden ver como son cada uno de los distintos joins desde el punto de vista de pandas.

![title](img/join-types-merge-names.jpg)

## Indicando columnas para el join

In [43]:
df_a.columns = ['subject_id_a', 'person_name']

In [44]:
df_b.columns = ['subject_id_b', 'subject_name']

In [45]:
pd.merge(df_a, df_b, left_on='subject_id_a', right_on='subject_id_b', how='inner')

Unnamed: 0,subject_id_a,person_name,subject_id_b,subject_name
0,1,Person A,1,Subject 1
1,2,Person B,2,Subject 2


## Realizando Join por Indices

In [46]:
df_a.columns = ['subject_id', 'person_name']

In [47]:
df_b.columns = ['subject_id', 'subject_name']

In [48]:
df_a.set_index('subject_id', inplace=True)

In [49]:
df_b.set_index('subject_id', inplace=True)

In [51]:
df_a

Unnamed: 0_level_0,person_name
subject_id,Unnamed: 1_level_1
1,Person A
2,Person B
3,Person C
4,Person D
5,Person E


In [52]:
df_b

Unnamed: 0_level_0,subject_name
subject_id,Unnamed: 1_level_1
1,Subject 1
2,Subject 2
200,Subject 200


In [55]:
# notar que se mantiene como indice
# se pueden hacer otras variantes, como right_index, left_index (a investigar)
pd.merge(df_a, df_b, on='subject_id', how='inner')

Unnamed: 0_level_0,person_name,subject_name
subject_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Person A,Subject 1
2,Person B,Subject 2
