
# 4.4 Left Join Template

1. [Identify the *Left* Table](#1.-Identify-the-Left-Table)    
2. [Identify the *Right* Table](#2.-Identify-the-Right-Table)   
3. [Identify the *Merge* Field/Column](#3.-Identify-the-Merge-Field)   
4. [Merge the Two Dataframes](#4.-Merge-the-Two-Dataframes)  
5. [Verify that the Merge is Correct](#5.-Verify-that-the-Merge-is-Correct)  
6. [Save the Merged Dataframe as a Pickle File](#6.-Save-the-Merged-Dataframe-as-a-Pickle-File)

In [2]:
import pandas as pd
import pickle

# 1. Identify the Left Table  

### Questions to answer:  
1. What is the Left Table? Products
1. Why is this the Left table/data set?  We are merging Products and Categories.
2. Number of Rows:  77  
3. Number of Columns:  6 

In [29]:
file = 'Data//w3schools_Data.xlsx'

df = pd.ExcelFile(file)

df.sheet_names

['Customers',
 'Categories',
 'Employees',
 'OrderDetails',
 'Orders',
 'Products',
 'Shippers',
 'Suppliers']

In [43]:
# Read the Left Table data into a dataframe named: df_Left
df_Left = pd.read_excel(file, 'Products', skiprows=2)

In [44]:
# Display the first rows in the dataframe
df_Left.head()

Unnamed: 0,ProductID,ProductName,SupplierID,CategoryID,Unit,Price
0,1,Chais,1,1,10 boxes x 20 bags,18.0
1,2,Chang,1,1,24 - 12 oz bottles,19.0
2,3,Aniseed Syrup,1,2,12 - 550 ml bottles,10.0
3,4,Chef Anton's Cajun Seasoning,2,2,48 - 6 oz jars,22.0
4,5,Chef Anton's Gumbo Mix,2,2,36 boxes,21.35


In [45]:
print("Number of Rows in the Left dataframe:  ", df_Left.shape[0])
print("Number of Columns in Left dataframe:  ", df_Left.shape[1])

Number of Rows in the Left dataframe:   77
Number of Columns in Left dataframe:   6


# 2. Identify the Right Table  

### Questions to answer:  
1. What is the Right Table? Categories
1. Why is this the Right table/data set?  We are merging Products and Categories.
2. What columns does it contain that we want to add to the columns in the Left table/dataframe?  CategoryName and Description
2. Number of Rows:  8   
3. Number of Columns:  3

In [46]:
# Read the Right Table data into a dataframe named: df_Right
df_Right = pd.read_excel(file, 'Categories')

In [47]:
# Display the first rows in the dataframe
df_Right.head()

Unnamed: 0,CategoryID,CategoryName,Description
0,1,Beverages,"Soft drinks, coffees, teas, beers, and ales"
1,2,Condiments,"Sweet and savory sauces, relishes, spreads, an..."
2,3,Confections,"Desserts, candies, and sweet breads"
3,4,Dairy Products,Cheeses
4,5,Grains/Cereals,"Breads, crackers, pasta, and cereal"


In [48]:
print("Number of Rows in the Right dataframe:  ", df_Right.shape[0])
print("Number of Columns in Right dataframe:  ", df_Right.shape[1])

Number of Rows in the Right dataframe:   8
Number of Columns in Right dataframe:   3


# 3. Identify the Merge Field  

Questions to answer:  
1. What is the Merge Column?  **?????**
2. Is this Merge Column in both dataframes? **???**  
3. Why is this the correct Merge Column? **???**

# 4. Merge the Two Dataframes 

In [49]:
# Create a new Dataframe from a Left Join of two existing dataframes
df_left_join = pd.merge(df_Left, df_Right, on='CategoryID', how='left')

In [50]:
# Display the first rows in the merged dataframe
df_left_join.head()

Unnamed: 0,ProductID,ProductName,SupplierID,CategoryID,Unit,Price,CategoryName,Description
0,1,Chais,1,1,10 boxes x 20 bags,18.0,Beverages,"Soft drinks, coffees, teas, beers, and ales"
1,2,Chang,1,1,24 - 12 oz bottles,19.0,Beverages,"Soft drinks, coffees, teas, beers, and ales"
2,3,Aniseed Syrup,1,2,12 - 550 ml bottles,10.0,Condiments,"Sweet and savory sauces, relishes, spreads, an..."
3,4,Chef Anton's Cajun Seasoning,2,2,48 - 6 oz jars,22.0,Condiments,"Sweet and savory sauces, relishes, spreads, an..."
4,5,Chef Anton's Gumbo Mix,2,2,36 boxes,21.35,Condiments,"Sweet and savory sauces, relishes, spreads, an..."


# 5. Verify that the Merge is Correct  

1. Display the first several rows of the merged dataframe.
2. Display the number of rows and columns in the merged dataframe  


### Questions to answer:  
1. Number of Rows:  77
  1. How many should there be? 77  
2. Number of Columns:   8
  1. How many should there be?  The number in the Left Table + number in Right Table - 1 (Merge Column is not duplicated) 8
3. Are all the original rows and columns from the Left Table still here? Yes 
4. Are the additional columns from the Right table now here? Yes

In [51]:
# Display Left dataframe info
print("Left dataframe: ")
print("Number of Rows in Left dataframe:  ", df_Left.shape[0])
print("Number of Cols in Left dataframe:  ", df_Left.shape[1])
print()

# Display Right dataframe info
print("Right dataframe: ")
print("Number of Rows in Right dataframe:  ", df_Right.shape[0])
print("Number of Cols in Right dataframe:  ", df_Right.shape[1])
print()

# Display Merged dataframe info
print("Merged dataframe: ")
print("Number of Rows in Merged dataframe:  ", df_left_join.shape[0], "(Note: This should be same as the Left dataframe)")
print("Number of Cols in Merged dataframe:  ", df_left_join.shape[1], "(Note: This should be Left Cols + Right Cols - 1)")
print()

# Check for correct number of Rows and Cols
expected_merge_cols = df_Left.shape[1] + df_Right.shape[1] - 1

assert df_left_join.shape[0] == df_Left.shape[0], "The Number of Rows in the Merge is NOT Correct!!"
assert df_left_join.shape[1] == expected_merge_cols, "The Number of Columns in the Merge is NOT Correct!!"

print()
print("YOUR MERGE WAS SUCCESSFULL!!!")

Left dataframe: 
Number of Rows in Left dataframe:   77
Number of Cols in Left dataframe:   6

Right dataframe: 
Number of Rows in Right dataframe:   8
Number of Cols in Right dataframe:   3

Merged dataframe: 
Number of Rows in Merged dataframe:   77 (Note: This should be same as the Left dataframe)
Number of Cols in Merged dataframe:   8 (Note: This should be Left Cols + Right Cols - 1)


YOUR MERGE WAS SUCCESSFULL!!!


# 6. Save the Merged Dataframe as a Pickle File

In [52]:
df_left_join.to_pickle('./Data/Ex_4.4_InClass_Merge')