# Chapter 1

- Pandas inner join : `merged_df = left_df.merge(right_df, on=['col1','col2'], suffixes=('_left','_right'))`
- group by : 
    1. `df.groupby("col1").agg({'col2':'count'})`
    2. `df.groupby(["col1","col2"])["another_col"].agg([min,max,sum])`


# Chapter 2

- Pandas left join: `merged_df = left_df.merge(right_df, on=['col1','col2'], validate=None, suffixes=('_left','_right'), how='left')`
- Substring search : `df[df['col'].str.contains("hello")]`
- Pandas outer join : 
    ```
    merged_df = left_df.merge(right_df, left_on='left_col', right_on='right_col', 
                          left_index=True, right_index=True, # if they are index
                          suffixes=('_left','_right'), how='outer')
    ```
- When to use self join:
    1. Graph data
    2. Hierarchical relationship
    3. Sequential relationship
- Pandas rename column
    1. `df.columns = ['A','B','C']`
    2. `df.rename(columns={'old_col': 'A', 'abc': 'B'}, inplace=True)`

# Chapter 3

<center><img src="images/03.01.png"  style="width: 400px, height: 300px;"/></center>
<center><img src="images/03.02.png"  style="width: 400px, height: 300px;"/></center>

- Semi Join:
    - Inner join, only returns the intersection result of the left table (Not the right table)
    - No duplicates
- Anti Join
    - Inverse inner join, only returns the result outside of intersection of the left table (Not the right table)
    - No duplicates


### Semi Join

```
inner_join_df = left_df.merge(right_df, on='id')
semi_join_df = left_df[left_df['id'].isin(inner_join_df['id'])]
```

### Anti Join

```
inner_join_df = left_df.merge(right_df, on='id')
anti_join_df = left_df[~left_df['id'].isin(inner_join_df['id'])]
```

### Concat Dataframes

```
# Horizontal concatenation = concat side by side, increase no of columns
horizontal_concat_df = pd.concat([df1,df2,df3], axis=1)
# Vertical concatenation = concat on top of another, increase no of rows
vertical_concat_df = pd.concat([df1,df2,df3], axis=0, verify_integrity=False # Verifies duplicate entries
                        ignore_index=False, # Retain index of source dfs
                        keys=['df1','df2','df3']) # Provide key for each source df 
```

- Multi index group-by : `df.groupby(level=0).agg({'col':'mean'}) # Outermost = level 0`


### Verify Integrity

<center><img src="images/03.04.png"  style="width: 400px, height: 300px;"/></center>


# Chapter 4

<center><img src="images/04.01.png"  style="width: 400px, height: 300px;"/></center>

- Outer join by Default 
- Result dataframe is ordered according to the "on" column
- Used in ordered data like time series
```
merged_df = pd.merge_ordered(left_df, right_df, on='date',
suffixes=('_left','_right'),
fill_method='ffill')
```

<center><img src="images/04.02.png"  style="width: 400px, height: 300px;"/></center>


- Merged Ordered on Left join
- Joined on nearest value as key
- key column(s) [the column used to match on] must be sorted
- used for nearest value sampling

```
merged_df = pd.merge_asof(left_df, right_df, on=['date'],
    suffixes=('_left','_right'),
    direction='nearest')
```

- using list comprehension on dataframe column:
    - `['X' if val=='some_val' else 'Y' for val in df['col']]`
- querying a dataframe : `df.query('col1=="A" or (col2=="B" and col3 < 90)')`


### Unpivoting From wide table to long table

<center><img src="images/04.04.png"  style="width: 400px, height: 300px;"/></center>


```
unpivot_df = df.melt(id_vars=['col1_to_keep','col2_to_keep'],
            value_vars=['col3_to_unpivot','col4_to_unpivot'],
            var_name=['variable_col'], value_name='value_col')

```