## **Chapter 4**
## **Merging Joining Concatination**

In [1]:
import numpy as np
import pandas as pd

**1) Merging 2 dataframes**

In [2]:
# DataFrame 1: Employee information

employees = pd.DataFrame({
    'Employee_id': [1, 2, 3, 4, 5],
    'Name': ['Maaz', 'Abu Musa', 'Zubair', 'Saaed', 'Ammar'],
    'Department': ['HR', 'IT', 'Finance', 'IT', 'HR']
})

# DataFrame 2: Salary information

salaries = pd.DataFrame({
    'Employee_id': [1, 2, 3, 6, 7],
    'Salary in Dollars': [60000, 80000, 65000, 70000, 90000],
    'Additionally Bonus': [5000, 10000, 7000, 8000, 12000]
})

In [17]:
employees

Unnamed: 0,Employee_id,Name,Department
0,1,Maaz,HR
1,2,Abu Musa,IT
2,3,Zubair,Finance
3,4,Saaed,IT
4,5,Ammar,HR


In [15]:
salaries

Unnamed: 0,Employee_id,Salary in Dollars,Additionally Bonus
0,1,60000,5000
1,2,80000,10000
2,3,65000,7000
3,6,70000,8000
4,7,90000,12000


In [21]:
pd.merge(employees, salaries, on='Employee_id', how='inner')
# by the use of **inner** join we are getting only those rows which are **common** in both the dataframes

Unnamed: 0,Employee_id,Name,Department,Salary in Dollars,Additionally Bonus
0,1,Maaz,HR,60000,5000
1,2,Abu Musa,IT,80000,10000
2,3,Zubair,Finance,65000,7000


In [29]:
pd.merge(employees, salaries, on='Employee_id', how='outer')

# by the use of **outer** join we are getting all the rows from both the dataframes and filling NaN where there is no match


Unnamed: 0,Employee_id,Name,Department,Salary in Dollars,Additionally Bonus
0,1,Maaz,HR,60000.0,5000.0
1,2,Abu Musa,IT,80000.0,10000.0
2,3,Zubair,Finance,65000.0,7000.0
3,4,Saaed,IT,,
4,5,Ammar,HR,,
5,6,,,70000.0,8000.0
6,7,,,90000.0,12000.0


In [28]:
pd.merge(employees, salaries, on='Employee_id', how='left')

# by the use of **left** join we are getting all the rows from left dataframe and filling NaN where there is no match

Unnamed: 0,Employee_id,Name,Department,Salary in Dollars,Additionally Bonus
0,1,Maaz,HR,60000.0,5000.0
1,2,Abu Musa,IT,80000.0,10000.0
2,3,Zubair,Finance,65000.0,7000.0
3,4,Saaed,IT,,
4,5,Ammar,HR,,


In [27]:
pd.merge(employees, salaries, on='Employee_id', how='right')

# by the use of **right** join we are getting all the rows from right dataframe and filling NaN where there is no match

Unnamed: 0,Employee_id,Name,Department,Salary in Dollars,Additionally Bonus
0,1,Maaz,HR,60000,5000
1,2,Abu Musa,IT,80000,10000
2,3,Zubair,Finance,65000,7000
3,6,,,70000,8000
4,7,,,90000,12000


* where merge is used to join two dataframes on a common column

**2) Concatination of 2 dataframes**

In [3]:
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2'],
    'C': ['C0', 'C1', 'C2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5'],
    'C': ['C3', 'C4', 'C5']
})

In [5]:
df1

Unnamed: 0,A,B,C
0,A0,B0,C0
1,A1,B1,C1
2,A2,B2,C2


In [6]:
df2

Unnamed: 0,A,B,C
0,A3,B3,C3
1,A4,B4,C4
2,A5,B5,C5


In [10]:
pd.concat([df1, df2])

# occured with respect to index and row wise

Unnamed: 0,A,B,C
0,A0,B0,C0
1,A1,B1,C1
2,A2,B2,C2
0,A3,B3,C3
1,A4,B4,C4
2,A5,B5,C5


In [8]:
pd.concat([df2,df1])

Unnamed: 0,A,B,C
0,A3,B3,C3
1,A4,B4,C4
2,A5,B5,C5
0,A0,B0,C0
1,A1,B1,C1
2,A2,B2,C2


In [None]:
pd.concat([df1, df2], axis=1)
# occured with respect to column and column wise

Unnamed: 0,A,B,C,A.1,B.1,C.1
0,A0,B0,C0,A3,B3,C3
1,A1,B1,C1,A4,B4,C4
2,A2,B2,C2,A5,B5,C5


* **where 'concat' combines (joins) multiple pandas objects (DataFrames or Series) into one, either row-wise or column-wise.**

--} Row-wise (default, axis=0) → stacks data vertically (like adding rows).

--} Column-wise (axis=1) → stacks data horizontally (like adding columns).

**3) Joining 2 data frames**

In [16]:
df1 = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie']
}, index=[1, 2, 3])

# Second DataFrame
df2 = pd.DataFrame({
    'score': [85, 90, 75]
}, index=[3, 4,6])


In [14]:
df1

Unnamed: 0,name
1,Alice
2,Bob
3,Charlie


In [17]:
df2

Unnamed: 0,score
3,85
4,90
6,75


In [18]:
df1.join(df2)

Unnamed: 0,name,score
1,Alice,
2,Bob,
3,Charlie,85.0


In [19]:
df2.join(df1)

Unnamed: 0,score,name
3,85,Charlie
4,90,
6,75,


In [21]:
df1.join(df2,how='outer')

Unnamed: 0,name,score
1,Alice,
2,Bob,
3,Charlie,85.0
4,,90.0
6,,75.0


* where 'join' is the keyword used to join two or more dataframes.