This file includes a summary of joins in the Python library, including definitions, disadvantages, and example code for each join type.

In [15]:
import pandas as pd


In [16]:
# Load data
data1 = {
    'key': ['A', 'B', 'C', 'D'],
    'value': [10, 20, 30, 40]
}

df1 = pd.DataFrame(data1)
print("data1")
print(df1, "\n")

data2 = {
    'key': ['B', 'C', 'D', 'E'],
    'value': [50, 60, 70, 80]
}
df2 = pd.DataFrame(data2)
print("data2")
print(df2)

data1
  key  value
0   A     10
1   B     20
2   C     30
3   D     40 

data2
  key  value
0   B     50
1   C     60
2   D     70
3   E     80


**Inner join**

*   An inner join merges two dataframes based on keys that are common to both.
*   Feature: Only the rows that exist in both dataframes will be included in the result.
*   Disadvantange: Any data present in only one dataframe will be excluded, potentially resulting in the loss of important information.



In [3]:
inner_join = pd.merge(df1, df2, on='key', how='inner')
print("Inner Join Result:")
print(inner_join)

Inner Join Result:
  key  value_df1  value_df2
0   B         20         50
1   C         30         60
2   D         40         70


In [19]:
default_join = pd.merge(df1, df2, on='key') # default is inner_join
print("Default Join Result:")
print(default_join)

Default Join Result:
  key  value_x  value_y
0   B       20       50
1   C       30       60
2   D       40       70


**Outer join**

*   An outer join combines two dataframes and includes all keys from both, even if there is no match.
*   Feature: It includes all rows from both dataframes, with NaN for missing values where there is no match.
*   Disadvantage: It can create many NaN values and might be slower and use more memory than other joins.


In [13]:
outer_join = pd.merge(df1, df2, on='key', how='outer')
print("Outer Join Result:")
print(outer_join)

Outer Join Result:
  key  value_x  value_y
0   A     10.0      NaN
1   B     20.0     50.0
2   C     30.0     60.0
3   D     40.0     70.0
4   E      NaN     80.0


**Left join**

*   A left join combines two dataframes, including all keys from the left dataframe and matching keys from the right dataframe.
*   Feature: It includes all rows from the left dataframe, with NaN for columns from the right dataframe where there is no match.
*   Disadvantage: It can result in NaN values for unmatched rows, and the size of the result can increase, using more memory compared to an inner join.


In [14]:
left_join = pd.merge(df1, df2, on='key', how='left')
print("Left Join Result:")
print(left_join)

Left Join Result:
  key  value_x  value_y
0   A       10      NaN
1   B       20     50.0
2   C       30     60.0
3   D       40     70.0


**Right join**

*   A right join combines two dataframes, including all keys from the right dataframe and matching keys from the left dataframe.
*   Feature: It includes all rows from the right dataframe, with NaN for columns from the left dataframe where there is no match.
*   Disadvantage: It can result in NaN values for unmatched rows, and the size of the result can increase, using more memory compared to an inner join. Additionally, right joins can sometimes be less intuitive to interpret.

In [17]:
right_join = pd.merge(df1, df2, on='key', how='right')
print("Right Join Result:")
print(right_join)

Right Join Result:
  key  value_x  value_y
0   B     20.0       50
1   C     30.0       60
2   D     40.0       70
3   E      NaN       80
