# Pandas - Merging Datasets
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/Python%20Snippets/Pandas/Merging%20Datasets.ipynb" target="_parent"><img src="https://img.shields.io/badge/-Open%20in%20Naas-success?labelColor=000000&logo="/></a>

## Input

In [None]:
import pandas as pd
import numpy as np

In [None]:
# Creating values to be used as datasets
dict1={"student_id":[1,2,3,4,5,6,7,8,9,10],
       "student_name":["Peter","Dolly","Maggie","David","Isabelle","Harry","Akin","Abbey","Victoria","Sam"],
       "student_course":np.random.choice(["Biology","Physics","Chemistry"],size=10)}

In [None]:
dict2={"student_grade":np.random.choice(["A","B","C","D","E","F"],size=100),
       "professors":np.random.choice(["Mark Levinson","Angela Marge","Bonnie James","Klaus Michealson"],size=100),
       "student_id":np.random.choice([1,2,3,4,5,6,7,8,9,10],size=100)}


In [None]:
Data1=pd.DataFrame(dict1)  # OR Data1=pd.read_csv(filepath)
Data2=pd.DataFrame(dict2)  # OR Data2=pd.read_csv(filepath)

In [None]:
print(Data1)
print(Data1.shape)

In [None]:
print(Data2)
print(Data2.shape)

## Model
To merge two or more dataframes together, pd.merge and pd.concat is the go-to method

#### Parameters:<br />
* Data1: FIRST DATAFRAME<br />
* Data2: SECOND DATAFRAME<br />
* pd.merge: ACTS LIKE AN SQL INNER JOIN AND JOINS BASED ON SIMILAR COLUMNS OR INDEX UNLESS SPECIFIED TO JOIN DIFFERENTLY<br />
* left: THE DATA THAT WE WANT TO APPEND SECOND DATA ON
* right: THE DATA THAT IS TO BE APPENDED
* pd.concat: COMBINES TWO OR MORE DATAFRAMES ROW-WISE OR COLUMN-WISE<br />
* filepath: THE PATHWAY LINK TO WHERE YOUR FILES ARE STORED<br />
* Inner Join: INCLUDING ROWS OF FIRST AND SECOND ONLY IF THE VALUE IS THE SAME IN BOTH DATAFRAMES<br />
* Outer Join: IT JOINS ALL THE ROWS OF FIRST AND SECOND DATAFRAMES TOGETHER AND CREATE NaN VALUE IF A ROW DOESN'T HAVE A VALUE AFTER JOINING<br />
* Left Join: INCLUDES ALL THE ROWS IN THE FIRST DATAFRAME AND ADDS THE COLUMNS OF SECOND DATAFRAME BUT IT WON'T INCLUDE THE ROWS OF THE SECOND DATAFRAME IF IT'S NOT THE SAME WITH THE FIRST<br />
* Right Join: INCLUDES ALL THE ROWS OF SECOND DATAFRAME AND THE COLUMNS OF THE FIRST DATAFRAME BUT WON'T INCLUDE THE ROWS OF THE FIRST DATAFRAME IF IT'S NOT SIMILAR TO THE SECOND DATAFRAME

## Output of merging

#### Using pd.merge(left,right) acts like sql inner join and only joins on the common column they have they have.It  tries finding everything from the right  and append to left

In [None]:
pd.merge(Data1,Data2) 
#student_id is common to both so it has been merged into one and included all the other Data2 columns to Data1 table

#### Merging dataframes with same values but different column names

In [None]:
Data1=Data1.rename(columns={"student_id":"id"}) # Renamed student_id to id so as to give this example
Data1

In [None]:
pd.merge(Data1,Data2,left_on="id",right_on="student_id")
# we add two more parameters. Left_on means merge using this column name. Right_on means merge using this column name
# i.e merge both id and student_id together
# since they don't have same name, they will create different columns on the new table

### Merging with the index of the first dataframe

In [None]:
Data1.set_index("id") # this will make id the new index for Data 1

In [None]:
pd.merge(Data1,Data2,left_index=True,right_on="student_id")#the new index will be from index of Data2 where they joined

### Merging both table on their index i.e two indexes

In [None]:
Data2.set_index("student_id")
# making student_id the index of Data2

In [None]:
pd.merge(Data1,Data2,left_index=True,right_index=True)
# new index will be from the left index unlike when joining only one index


### Specifying what kind of Joins you want since merging does inner joins by default

In [None]:
Data1.merge(Data2, how='inner')
Data1.merge(Data2, how='outer')
Data1.merge(Data2, how='right')
Data1.merge(Data2, how='left')


##### CONCATENATING IS USEFUL FOR JOINING TABLE ROW-WISE AND COLUMN-WISE

In [None]:
pd.concat([Data1,Data2])# Joining row-wise

In [None]:
pd.concat([Data1,Data2],axis=1)# Joining Column-wise

In [None]:
pd.concat([Data1,Data2],keys=["Data1","Data2"]) # To know where each dataset starts and ends in the new table