### Merge operation in pandas

Pandas Library Dataframe class provides a function to merge Dataframes i.e.<br>
`DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)` <br>


Without getting into too much details of the above function call <i>let's try out some examples</i>:

### Joins
![joins](./img/joins.png)

In [1]:
### Let's start by creating 2 dataframes and then proceed with the merge operation

In [2]:
# importing libs
import pandas as pd

In [50]:
# Create an employee dataframe
# List of Tuples
employees = [ (111, 'EmpA', 24, 'Phoenix', 3) ,
           (112, 'EmpB', 21, 'Kolkata' , 9) ,
           (113, 'EmpC', 26, 'San Fransisco', 11) ,
           (114, 'EmpD', 22,'New York' , 14) ,
           (115, 'EmpE', 23, 'London' , 6) ,
           (116, 'EmpF', 25, 'Cairo', 8 ),
            (117, 'EmpG', 25, 'Denver', 10)
            ]
# Create a DataFrame object
empdf = pd.DataFrame(employees, columns=['EmpID', 'EmpName', 'EmpAge', 'EmpCity', 'Emp_Experience'], index=['a1', 'b1', 'c1', 'd1', 'e1', 'f1', 'g1'])

In [51]:
# Create a salary dataframe
# List of Tuples
salaries = [(111, 3, 70000, 1000) ,
           (112, 9, 72200, 1100) ,
           (113, 11, 84999, 1000) ,
           (114, 14, 90000, 2000) ,
           (115, 6, 61000, 1500) ,
           (116, 8, 71000, 1000),
           (121, 10,81000, 2000)
            ]
# Create a DataFrame object
salarydf = pd.DataFrame(salaries, columns=['EmpID', 'Emp_Experience' , 'EmpSalary', 'EmpBonus'], index=['a1', 'b1', 'c1', 'd1', 'e1', 'f1', 'g1'])

In [52]:
# Create a salary dataframe
# List of Tuples
parking = [(111, "A", "Honda") ,
           (112, "B", "Toyota") ,
           (113, "C", "BMW") ,
           (114, "C", "Lincoln") ,
           (115, "A", "Mazda") ,
           (116, "B", "Chevy"),
           (121, "A","Ram Truck")
            ]
# Create a DataFrame object
parkingdf = pd.DataFrame(parking, columns=['ID', 'Parking_Col', 'Car_Make'])

In [53]:
empdf

Unnamed: 0,EmpID,EmpName,EmpAge,EmpCity,Emp_Experience
a1,111,EmpA,24,Phoenix,3
b1,112,EmpB,21,Kolkata,9
c1,113,EmpC,26,San Fransisco,11
d1,114,EmpD,22,New York,14
e1,115,EmpE,23,London,6
f1,116,EmpF,25,Cairo,8
g1,117,EmpG,25,Denver,10


In [54]:
salarydf

Unnamed: 0,EmpID,Emp_Experience,EmpSalary,EmpBonus
a1,111,3,70000,1000
b1,112,9,72200,1100
c1,113,11,84999,1000
d1,114,14,90000,2000
e1,115,6,61000,1500
f1,116,8,71000,1000
g1,121,10,81000,2000


In [55]:
parkingdf

Unnamed: 0,ID,Parking_Col,Car_Make
0,111,A,Honda
1,112,B,Toyota
2,113,C,BMW
3,114,C,Lincoln
4,115,A,Mazda
5,116,B,Chevy
6,121,A,Ram Truck


### Inner join

In [56]:
# Example 1: Inner join
# This is the default join option. Think of it like an intersection between 2 sets (check the image above)
# Inner join comprises of only rows from Left & Right dataframes which have same values in key columns.
# Note that in an inner join, rows that don’t have a match in the other DataFrame’s key column will be discarded

In [57]:
empdf.merge(salarydf)

Unnamed: 0,EmpID,EmpName,EmpAge,EmpCity,Emp_Experience,EmpSalary,EmpBonus
0,111,EmpA,24,Phoenix,3,70000,1000
1,112,EmpB,21,Kolkata,9,72200,1100
2,113,EmpC,26,San Fransisco,11,84999,1000
3,114,EmpD,22,New York,14,90000,2000
4,115,EmpE,23,London,6,61000,1500
5,116,EmpF,25,Cairo,8,71000,1000


- In the example above  columns on which inner join happened were `EmpID` and `Emp_Experience` columns. Therefore only those rows are picked in merged dataframe for which values of `EmpID` and `Emp_Experience` columns are same across both dataframes.
- As expected `Emp 117` and `121` are **left out** on invoking merge without any parameters. Same output will be observed on explicitly specifiying join type as `inner`.

In [58]:
empdf.merge(salarydf,how="inner")

Unnamed: 0,EmpID,EmpName,EmpAge,EmpCity,Emp_Experience,EmpSalary,EmpBonus
0,111,EmpA,24,Phoenix,3,70000,1000
1,112,EmpB,21,Kolkata,9,72200,1100
2,113,EmpC,26,San Fransisco,11,84999,1000
3,114,EmpD,22,New York,14,90000,2000
4,115,EmpE,23,London,6,61000,1500
5,116,EmpF,25,Cairo,8,71000,1000


### Left join

In [59]:
# Take into account of all rows from Left dataframe and replace with NaN for values which are missing in right dataframe for those keys.
# 117 doesn't have an entry in salarydf. So EmpSalary and EmpBonus will be replaced with NaN
# 121 (which is present only on the right dataframe will not be considered)
empdf.merge(salarydf,how="left")

Unnamed: 0,EmpID,EmpName,EmpAge,EmpCity,Emp_Experience,EmpSalary,EmpBonus
0,111,EmpA,24,Phoenix,3,70000.0,1000.0
1,112,EmpB,21,Kolkata,9,72200.0,1100.0
2,113,EmpC,26,San Fransisco,11,84999.0,1000.0
3,114,EmpD,22,New York,14,90000.0,2000.0
4,115,EmpE,23,London,6,61000.0,1500.0
5,116,EmpF,25,Cairo,8,71000.0,1000.0
6,117,EmpG,25,Denver,10,,


In [60]:
# Similar observation on putting the salary dataframe on the left
# Notice the columns getting shifted accordingly
salarydf.merge(empdf, how="left")

Unnamed: 0,EmpID,Emp_Experience,EmpSalary,EmpBonus,EmpName,EmpAge,EmpCity
0,111,3,70000,1000,EmpA,24.0,Phoenix
1,112,9,72200,1100,EmpB,21.0,Kolkata
2,113,11,84999,1000,EmpC,26.0,San Fransisco
3,114,14,90000,2000,EmpD,22.0,New York
4,115,6,61000,1500,EmpE,23.0,London
5,116,8,71000,1000,EmpF,25.0,Cairo
6,121,10,81000,2000,,,


### Right join

In [61]:
# Take into account of all rows from Right dataframe and replace with NaN for values which are missing in left dataframe for those keys.
# 121 doesn't have an entry in empdf. So EmpName,EmpAge and EmpCity will be replaced with NaN
# 
empdf.merge(salarydf,how="right")

Unnamed: 0,EmpID,EmpName,EmpAge,EmpCity,Emp_Experience,EmpSalary,EmpBonus
0,111,EmpA,24.0,Phoenix,3,70000,1000
1,112,EmpB,21.0,Kolkata,9,72200,1100
2,113,EmpC,26.0,San Fransisco,11,84999,1000
3,114,EmpD,22.0,New York,14,90000,2000
4,115,EmpE,23.0,London,6,61000,1500
5,116,EmpF,25.0,Cairo,8,71000,1000
6,121,,,,10,81000,2000


**Note: Sequence of columns will still be empdf followed by salary df (as that's how merge has been invoked). Only extra rows in salarydf will now be considered.**

### Outer join

In [62]:
# Consider this like an UNION operation across dataframes (data from either dataframe or both)
# Missing values are replaced with NaN
# Take note for 117 and 121 where NaN has been used appropriately

In [63]:
empdf.merge(salarydf,how="outer")

Unnamed: 0,EmpID,EmpName,EmpAge,EmpCity,Emp_Experience,EmpSalary,EmpBonus
0,111,EmpA,24.0,Phoenix,3,70000.0,1000.0
1,112,EmpB,21.0,Kolkata,9,72200.0,1100.0
2,113,EmpC,26.0,San Fransisco,11,84999.0,1000.0
3,114,EmpD,22.0,New York,14,90000.0,2000.0
4,115,EmpE,23.0,London,6,61000.0,1500.0
5,116,EmpF,25.0,Cairo,8,71000.0,1000.0
6,117,EmpG,25.0,Denver,10,,
7,121,,,,10,81000.0,2000.0


### When join column is different across dataframes

In [64]:
# Publish a list with employee details and associated parking
empdf.merge(parkingdf,how="inner",left_on="EmpID",right_on="ID")

Unnamed: 0,EmpID,EmpName,EmpAge,EmpCity,Emp_Experience,ID,Parking_Col,Car_Make
0,111,EmpA,24,Phoenix,3,111,A,Honda
1,112,EmpB,21,Kolkata,9,112,B,Toyota
2,113,EmpC,26,San Fransisco,11,113,C,BMW
3,114,EmpD,22,New York,14,114,C,Lincoln
4,115,EmpE,23,London,6,115,A,Mazda
5,116,EmpF,25,Cairo,8,116,B,Chevy


**Note**
- `on` parameter won't work here as EmpID and ID are different columns 
- Both `EmpID` and `ID` columns have been pulled into the resulting dataframe

In [65]:
# Drop the extra ID column
empdf.merge(parkingdf,how = "inner", left_on = "EmpID", right_on = "ID").drop("ID",axis=1)

Unnamed: 0,EmpID,EmpName,EmpAge,EmpCity,Emp_Experience,Parking_Col,Car_Make
0,111,EmpA,24,Phoenix,3,A,Honda
1,112,EmpB,21,Kolkata,9,B,Toyota
2,113,EmpC,26,San Fransisco,11,C,BMW
3,114,EmpD,22,New York,14,C,Lincoln
4,115,EmpE,23,London,6,A,Mazda
5,116,EmpF,25,Cairo,8,B,Chevy


### Merging on indexes
- Let's create 2 dataframes containing details of students and books issues from the library

In [79]:
studdf = pd.DataFrame({
    "Roll":[1,2,3,4,5,6],
    "Score":[75,85,95,96,89,78],
    "Section":["A","B","C","A","C","B"]
            })

In [80]:
studdf.set_index("Roll",inplace=True)

In [81]:
studdf

Unnamed: 0_level_0,Score,Section
Roll,Unnamed: 1_level_1,Unnamed: 2_level_1
1,75,A
2,85,B
3,95,C
4,96,A
5,89,C
6,78,B


In [99]:
libdf = pd.DataFrame({
    "LibID":[11,12,13,14,15,16],
    "Roll":[1,2,3,4,5,6],
    "Num_Books":[2,3,1,2,0,1],
})

In [100]:
libdf.set_index("Roll",inplace=True)

In [101]:
libdf

Unnamed: 0_level_0,LibID,Num_Books
Roll,Unnamed: 1_level_1,Unnamed: 2_level_1
1,11,2
2,12,3
3,13,1
4,14,2
5,15,0
6,16,1


In [102]:
# Now if we want to merge studdf and libdf, we need to try other parameters to merge
# Note that the common column is the index column across both dataframes
# studdf.merge(libdf): This will throw MergeError: No common columns to perform merge on.
# Therefore we have to use left_index and right_index parameters
studdf.merge(libdf, how = "inner",left_index=True,right_index=True)

Unnamed: 0_level_0,Score,Section,LibID,Num_Books
Roll,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,75,A,11,2
2,85,B,12,3
3,95,C,13,1
4,96,A,14,2
5,89,C,15,0
6,78,B,16,1


**Note: There might be a situation where we might have to do a mix of left_index/right_index, and left_on/right_on.**
- Check the example below
- We shall start by changing the index of libdf dataframe

In [104]:
libdf.reset_index(inplace=True)

In [105]:
libdf

Unnamed: 0,Roll,LibID,Num_Books
0,1,11,2
1,2,12,3
2,3,13,1
3,4,14,2
4,5,15,0
5,6,16,1


In [106]:
libdf.set_index("LibID",inplace=True)

In [107]:
libdf

Unnamed: 0_level_0,Roll,Num_Books
LibID,Unnamed: 1_level_1,Unnamed: 2_level_1
11,1,2
12,2,3
13,3,1
14,4,2
15,5,0
16,6,1


In [None]:
# Note that there's no common column index across both dataframes
# We have an index and a column which shall serve as the common keys
# Therefore if we want to merge studdf and libdf, we need to try a mix of _index and _on parameters

In [109]:
stud_merged = studdf.merge(libdf,how="inner",left_index=True,right_on=["Roll"])
stud_merged

Unnamed: 0_level_0,Score,Section,Roll,Num_Books
LibID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
11,75,A,1,2
12,85,B,2,3
13,95,C,3,1
14,96,A,4,2
15,89,C,5,0
16,78,B,6,1


In [110]:
stud_merged.index

Int64Index([11, 12, 13, 14, 15, 16], dtype='int64', name='LibID')

### Some more examples

In [19]:
week1 = pd.read_csv("./img/data_for_analysis/Restaurant - Week 1 Sales.csv")
week2 = pd.read_csv("./img/data_for_analysis/Restaurant - Week 2 Sales.csv")
customers = pd.read_csv("./img/data_for_analysis/Restaurant - Customers.csv")
food = pd.read_csv("./img/data_for_analysis/Restaurant - Foods.csv")

In [20]:
week1.head(n=2)

Unnamed: 0,Customer ID,Food ID
0,537,9
1,97,4


In [21]:
week2.head(n=2)

Unnamed: 0,Customer ID,Food ID
0,688,10
1,813,7


In [22]:
# Example: Find out all customers whose food is common for both weeks
# So the Inner Join should  happen on Customer ID and Food ID
# Thus same Customer ID but different Food ID (across both dataframes), and same Food ID (but different Customer ID)
# will not be selected

# Option 1:
week1.merge(week2)

Unnamed: 0,Customer ID,Food ID
0,304,3
1,540,3
2,937,10
3,233,3
4,21,4
5,21,4
6,922,1
7,578,5
8,578,5


In [23]:
# Option 2:
week1.merge(week2,how="inner",on=["Customer ID","Food ID"])

Unnamed: 0,Customer ID,Food ID
0,304,3
1,540,3
2,937,10
3,233,3
4,21,4
5,21,4
6,922,1
7,578,5
8,578,5


In [24]:
# Let's doublecheck
week1[week1["Customer ID"]==304]

Unnamed: 0,Customer ID,Food ID
55,304,3
113,304,2


In [25]:
week2[week2["Customer ID"]==304]

Unnamed: 0,Customer ID,Food ID
88,304,3


**As observed above, only one row for 304 is selected within the resulting dataframe**

In [26]:
# Example: Find out all customers who visited the restaurant on both weeks

In [27]:
week1.merge(week2, how = "inner", on = ["Customer ID"])

Unnamed: 0,Customer ID,Food ID_x,Food ID_y
0,537,9,5
1,155,9,3
2,155,1,3
3,503,5,8
4,503,5,9
...,...,...,...
57,945,5,4
58,343,3,5
59,343,3,2
60,343,3,7


In [28]:
week1[week1["Customer ID"]== 155] 

Unnamed: 0,Customer ID,Food ID
4,155,9
17,155,1


In [29]:
week2[week2["Customer ID"]== 155] 

Unnamed: 0,Customer ID,Food ID
208,155,3


**Note**: Check the new column names as `Food ID_x` and `Food ID_y`.
- The idea here is that if columns of same names are present across both dataframes, pandas adds suffix to the dataframes for differentiation.
- Also note how the join happens for the `Customer ID 155`. Since there are 2 entries in week1 and one entry in week2, pandas just applies sort of a cartesian product for both these values.

**What if we want to use other suffixes to the column names**<br>
Use the suffixes parameter to add custom suffixes

In [30]:
week1.merge(week2,how="inner",on=["Customer ID"],suffixes = ("_1st_week","_2nd_week") )

Unnamed: 0,Customer ID,Food ID_1st_week,Food ID_2nd_week
0,537,9,5
1,155,9,3
2,155,1,3
3,503,5,8
4,503,5,9
...,...,...,...
57,945,5,4
58,343,3,5
59,343,3,2
60,343,3,7


### Outer join

In [31]:
outer_joindf = week1.merge(week2, how = "outer", on =["Customer ID"])
outer_joindf

Unnamed: 0,Customer ID,Food ID_x,Food ID_y
0,537,9.0,5.0
1,97,4.0,
2,658,1.0,
3,202,2.0,
4,155,9.0,3.0
...,...,...,...
449,855,,4.0
450,559,,10.0
451,276,,4.0
452,556,,10.0


**Note**:
- The above merge acts like an UNION (FULL OUTER JOIN including COMMON rows)
- Also note that rows which are common to both the dataframes have been placed side by side (e.g: `Customer ID 537`) and so on.
- Rows that don't have subsequent common entries are replaced with NaN for subsequent left or right column

`To understand how the rows have been joined together, let's explore one more option for merge`<br>
**indicator** parameter

In [32]:
outer_explicitdf = week1.merge(week2, how = "outer", on =["Customer ID"],indicator=True)
outer_explicitdf

Unnamed: 0,Customer ID,Food ID_x,Food ID_y,_merge
0,537,9.0,5.0,both
1,97,4.0,,left_only
2,658,1.0,,left_only
3,202,2.0,,left_only
4,155,9.0,3.0,both
...,...,...,...,...
449,855,,4.0,right_only
450,559,,10.0,right_only
451,276,,4.0,right_only
452,556,,10.0,right_only


In [33]:
outer_explicitdf[outer_explicitdf["_merge"] == "both"]

Unnamed: 0,Customer ID,Food ID_x,Food ID_y,_merge
0,537,9.0,5.0,both
4,155,9.0,3.0,both
5,155,1.0,3.0,both
8,503,5.0,8.0,both
9,503,5.0,9.0,both
...,...,...,...,...
246,945,5.0,4.0,both
247,343,3.0,5.0,both
248,343,3.0,2.0,both
249,343,3.0,7.0,both


In [34]:
outer_explicitdf["_merge"].value_counts()

right_only    197
left_only     195
both           62
Name: _merge, dtype: int64

**As observed**:
The source of each row now becomes more explicit.
- Cust ID 97 was found only in left df(week 1), while Cust ID 855 was found only in right df(week 2)

In [35]:
# Full outer join excluding common rows (only present in either dataframe)
outer_explicitdf[outer_explicitdf["_merge"] != "both"]

Unnamed: 0,Customer ID,Food ID_x,Food ID_y,_merge
1,97,4.0,,left_only
2,658,1.0,,left_only
3,202,2.0,,left_only
6,213,8.0,,left_only
7,600,1.0,,left_only
...,...,...,...,...
449,855,,4.0,right_only
450,559,,10.0,right_only
451,276,,4.0,right_only
452,556,,10.0,right_only


In [36]:
outer_explicitdf[outer_explicitdf["_merge"] == "both"]

Unnamed: 0,Customer ID,Food ID_x,Food ID_y,_merge
0,537,9.0,5.0,both
4,155,9.0,3.0,both
5,155,1.0,3.0,both
8,503,5.0,8.0,both
9,503,5.0,9.0,both
...,...,...,...,...
246,945,5.0,4.0,both
247,343,3.0,5.0,both
248,343,3.0,2.0,both
249,343,3.0,7.0,both


In [37]:
outer_explicitdf[outer_explicitdf["_merge"] == "both"]["Customer ID"].nunique()

46

**Note**:
- Please pay special attention to the number `46` here. It means there are _**46** Customer IDs_ which are present across both dataframes (week1 and week2).
- As there are multiple occurances for some Customer IDs, pandas ended up doing a Cartesian product for every common row. So the number **62** for **both**. (thus there a 3 entries for `Customer ID 343`).

In [38]:
week1[week1["Customer ID"]==343]

Unnamed: 0,Customer ID,Food ID
241,343,3


In [39]:
week2[week2["Customer ID"]==343]

Unnamed: 0,Customer ID,Food ID
43,343,5
53,343,2
229,343,7


### What if we want to know the details of food (name and price) for all food items in week1

In [40]:
food.head()

Unnamed: 0,Food ID,Food Item,Price
0,1,Sushi,3.99
1,2,Burrito,9.99
2,3,Taco,2.99
3,4,Quesadilla,4.25
4,5,Pizza,2.49


In [41]:
week1.head()

Unnamed: 0,Customer ID,Food ID
0,537,9
1,97,4
2,658,1
3,202,2
4,155,9


In [42]:
week1.merge(food,how = "left", on = ["Food ID"])

Unnamed: 0,Customer ID,Food ID,Food Item,Price
0,537,9,Donut,0.99
1,97,4,Quesadilla,4.25
2,658,1,Sushi,3.99
3,202,2,Burrito,9.99
4,155,9,Donut,0.99
...,...,...,...,...
245,413,9,Donut,0.99
246,926,6,Pasta,13.99
247,134,3,Taco,2.99
248,396,6,Pasta,13.99


In [43]:
# Sorting the values
week1.merge(food,how = "left", on = ["Food ID"],sort=True)

Unnamed: 0,Customer ID,Food ID,Food Item,Price
0,658,1,Sushi,3.99
1,600,1,Sushi,3.99
2,155,1,Sushi,3.99
3,341,1,Sushi,3.99
4,20,1,Sushi,3.99
...,...,...,...,...
245,809,10,Drink,1.75
246,584,10,Drink,1.75
247,274,10,Drink,1.75
248,151,10,Drink,1.75


### We want to know the details of customers who had bought food in week 1 

In [44]:
customers.head()

Unnamed: 0,ID,First Name,Last Name,Gender,Company,Occupation
0,1,Joseph,Perkins,Male,Dynazzy,Community Outreach Specialist
1,2,Jennifer,Alvarez,Female,DabZ,Senior Quality Engineer
2,3,Roger,Black,Male,Tagfeed,Account Executive
3,4,Steven,Evans,Male,Fatz,Registered Nurse
4,5,Judy,Morrison,Female,Demivee,Legal Assistant


In [45]:
week1.head()

Unnamed: 0,Customer ID,Food ID
0,537,9
1,97,4
2,658,1
3,202,2
4,155,9


In [46]:
# Note that Customer ID and ID are different columns across the dataframes
# So "on" shall be relplaced with "left_on" and "right_on"
cust_detailsdf = week1.merge(customers,how="left",left_on="Customer ID", right_on="ID")
cust_detailsdf.head()

Unnamed: 0,Customer ID,Food ID,ID,First Name,Last Name,Gender,Company,Occupation
0,537,9,537,Cheryl,Carroll,Female,Zoombeat,Registered Nurse
1,97,4,97,Amanda,Watkins,Female,Ozu,Account Coordinator
2,658,1,658,Patrick,Webb,Male,Browsebug,Community Outreach Specialist
3,202,2,202,Louis,Campbell,Male,Rhynoodle,Account Representative III
4,155,9,155,Carolyn,Diaz,Female,Gigazoom,Database Administrator III


In [47]:
cust_detailsdf.drop("ID",axis=1,inplace=True)

In [48]:
cust_detailsdf.head()

Unnamed: 0,Customer ID,Food ID,First Name,Last Name,Gender,Company,Occupation
0,537,9,Cheryl,Carroll,Female,Zoombeat,Registered Nurse
1,97,4,Amanda,Watkins,Female,Ozu,Account Coordinator
2,658,1,Patrick,Webb,Male,Browsebug,Community Outreach Specialist
3,202,2,Louis,Campbell,Male,Rhynoodle,Account Representative III
4,155,9,Carolyn,Diaz,Female,Gigazoom,Database Administrator III


### Now I want the details of food consumed by these customers

In [49]:
cust_detailsdf.merge(food, how="left",on=["Food ID"])

Unnamed: 0,Customer ID,Food ID,First Name,Last Name,Gender,Company,Occupation,Food Item,Price
0,537,9,Cheryl,Carroll,Female,Zoombeat,Registered Nurse,Donut,0.99
1,97,4,Amanda,Watkins,Female,Ozu,Account Coordinator,Quesadilla,4.25
2,658,1,Patrick,Webb,Male,Browsebug,Community Outreach Specialist,Sushi,3.99
3,202,2,Louis,Campbell,Male,Rhynoodle,Account Representative III,Burrito,9.99
4,155,9,Carolyn,Diaz,Female,Gigazoom,Database Administrator III,Donut,0.99
...,...,...,...,...,...,...,...,...,...
245,413,9,Diane,Bailey,Female,Wikibox,Technical Writer,Donut,0.99
246,926,6,Anne,Wagner,Female,Skyba,Legal Assistant,Pasta,13.99
247,134,3,Diana,Hall,Female,Quinu,Financial Advisor,Taco,2.99
248,396,6,Juan,Romero,Male,Zoonder,Analyst Programmer,Pasta,13.99


### Let's end the tutorial while performing a merge with a mix of index and column

In [111]:
customers

Unnamed: 0,ID,First Name,Last Name,Gender,Company,Occupation
0,1,Joseph,Perkins,Male,Dynazzy,Community Outreach Specialist
1,2,Jennifer,Alvarez,Female,DabZ,Senior Quality Engineer
2,3,Roger,Black,Male,Tagfeed,Account Executive
3,4,Steven,Evans,Male,Fatz,Registered Nurse
4,5,Judy,Morrison,Female,Demivee,Legal Assistant
...,...,...,...,...,...,...
995,996,Debra,Garcia,Female,Dazzlesphere,Structural Engineer
996,997,Douglas,Bishop,Male,Livepath,Developer I
997,998,Frank,Franklin,Male,Brainverse,Nurse Practicioner
998,999,Jessica,Burns,Female,Babbleblab,Financial Advisor


In [116]:
custmoddf = customers.set_index("ID")

In [117]:
custmoddf.head()

Unnamed: 0_level_0,First Name,Last Name,Gender,Company,Occupation
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,Joseph,Perkins,Male,Dynazzy,Community Outreach Specialist
2,Jennifer,Alvarez,Female,DabZ,Senior Quality Engineer
3,Roger,Black,Male,Tagfeed,Account Executive
4,Steven,Evans,Male,Fatz,Registered Nurse
5,Judy,Morrison,Female,Demivee,Legal Assistant


In [118]:
# We want to find all customers who had ordered food on week 2
week2.merge(custmoddf,how="left",left_on="Customer ID",right_index= True)

Unnamed: 0,Customer ID,Food ID,First Name,Last Name,Gender,Company,Occupation
0,688,10,Carl,Williamson,Male,Thoughtmix,Graphic Designer
1,813,7,Johnny,Walker,Male,Kayveo,Developer II
2,495,10,Deborah,Little,Female,Babbleblab,VP Accounting
3,189,5,Roger,Gordon,Male,Skilith,Operator
4,267,3,Matthew,Wood,Male,Agimba,Product Engineer
...,...,...,...,...,...,...,...
245,783,10,Phyllis,Meyer,Female,Voolia,Information Systems Manager
246,556,10,Samuel,Bailey,Male,Oyoloo,Nurse
247,547,9,Tina,Watkins,Female,Thoughtstorm,Accountant II
248,252,9,Douglas,Powell,Male,Jetwire,Geologist IV
