<a href="https://colab.research.google.com/github/stevenkhwun/P4DS/blob/main/SQL_in_Python_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Comparison with SQL - Part 2

This notebook is based on this [link](https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_sql.html#compare-with-sql).

**Contents of this notebook**:
* JOIN

In [2]:
import pandas as pd
import numpy as np

In [3]:
url = (
    "https://raw.githubusercontent.com/pandas-dev"
    "/pandas/main/pandas/tests/io/data/csv/tips.csv"
)

## JOIN

`JOIN`s can be performed with `join()` or `merge()`. By default, `join()` will join the DataFrames on their indices. Each method has parameters allowing you to specify the type of join to perform (`LEFT`, `RIGHT`, `INNER`, `FULL`) or the columns to join on (column names or indices).

In [24]:
df1 = pd.DataFrame({"key": ["A", "B", "C", "D"], "value": np.random.randn(4)})
df2 = pd.DataFrame({"key": ["B", "D", "D", "E"], "value": np.random.randn(4)})

In [25]:
df1

Unnamed: 0,key,value
0,A,0.219892
1,B,-0.297982
2,C,0.05027
3,D,0.969485


In [26]:
df2

Unnamed: 0,key,value
0,B,-0.265158
1,D,-0.169257
2,D,-1.487602
3,E,-0.878397


Assume we have two database tables of the same name and structure as our DataFrames.

Now let's go over the various types of `JOIN`s.

### INNER JOIN

```SAS
# SAS code
SELECT *
FROM df1
INNER JOIN df2
  ON df1.key =df2.key;
```

In [27]:
# merge performs an INNER JOIN by default
pd.merge(df1, df2, on="key")

Unnamed: 0,key,value_x,value_y
0,B,-0.297982,-0.265158
1,D,0.969485,-0.169257
2,D,0.969485,-1.487602


`merge()` also offers parameters for cases when you'd like to join one DataFrame's column with another DataFrame's index.

In [28]:
indexed_df2 = df2.set_index("key")
indexed_df2

Unnamed: 0_level_0,value
key,Unnamed: 1_level_1
B,-0.265158
D,-0.169257
D,-1.487602
E,-0.878397


In [30]:
pd.merge(df1, indexed_df2, left_on="key", right_index=True)

Unnamed: 0,key,value_x,value_y
1,B,-0.297982,-0.265158
3,D,0.969485,-0.169257
3,D,0.969485,-1.487602
