# Combining DataFrames in Pandas: `concat()` and `merge()`

In [1]:
import pandas as pd


There are two important functions for combining DataFrames in Pandas: 
`concat()` and `merge()`. These functions allow us to efficiently concatenate and merge datasets, 
whether they are vertically or horizontally aligned or based on a common key.


## Using `pd.concat()`


The `concat()` function in Pandas allows you to concatenate DataFrames either by stacking them 
vertically (row-wise) or side-by-side (column-wise). The default behavior is row-wise concatenation.

### Row-wise Concatenation
Let's start by concatenating two DataFrames row-wise.


In [None]:
# Creating two DataFrames with similar columns
df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
df1

In [None]:
df2 = pd.DataFrame({
    'A': [7, 8, 9],
    'B': [10, 11, 12]
})
df2

In [None]:
# Concatenating them row-wise
result = pd.concat([df1, df2], axis=0)
result



### Column-wise Concatenation
We can also concatenate two DataFrames column-wise by specifying `axis=1`.


In [None]:

# Concatenating the same DataFrames column-wise
result = pd.concat([df1, df2], axis=1)

result


## Using `pd.merge()`


The `merge()` function allows us to combine DataFrames based on a common column or index. 

It can merge DataFrames in 4 different ways: with inner, outer, left, or right joins.

<img src="https://miro.medium.com/v2/resize:fit:1200/1*9eH1_7VbTZPZd9jBiGIyNA.png" alt="drawing" style="width:300px;"/>

### Inner Join Example
It only keeps the rows where the key matches in both DataFrames

In [None]:
# Creating two DataFrames with a common column 'ID'
df1 = pd.DataFrame({
    'ID': [1, 2, 3],
    'Name': ['Alice', 'Bob', 'Charlie']
})
df1

In [None]:
df2 = pd.DataFrame({
    'ID': [1, 2, 4],
    'Score': [85, 90, 95]
})
df2

In [None]:
# Merging them on the 'ID' column using an inner join
result = pd.merge(df1, df2, on='ID', how='inner')

result



### Outer Join Example
If we want to keep all rows from both DataFrames, we can perform an outer join. Missing data will be filled with `NaN`.


In [None]:
# Merging with an outer join
result = pd.merge(df1, df2, on='ID', how='outer')

result


### Right Join Example
A right join returns all rows from df2 and only the matching rows from df1. If there’s no match, the result will contain NaN for columns from df1.

In [None]:
# Merging with a right join
result = pd.merge(df1, df2, on='ID', how='right')

result


As a conclusion : 

- Use `concat()` when you need to stack or append DataFrames.
- Use `merge()` when you need to combine DataFrames based on a common key.


### Your turn !
#### Exercises
You have the following two dataframes : 

In [None]:
# Create first DataFrame: students
students_data = {
    'Student_ID': [101, 102, 103, 104, 105],
    'Name': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve'],
    'Major': ['Computer Sci', 'Physics', 'Mathematics', 'Chemistry', 'Biology']
}
students = pd.DataFrame(students_data)
students

In [None]:
# Create second DataFrame: grades
grades_data = {
    'Student_ID': [101, 102, 106, 107],
    'Grade': ['A', 'B', 'C', 'A'],
    'Semester': ['Spring 2023', 'Spring 2023', 'Spring 2023', 'Fall 2022']
}
grades = pd.DataFrame(grades_data)
grades

A) Create a Dataframe called "students_grades" with 4 columns (Name, Major, Grade and Semester) where only studends from whom we know the names are kept. If there is no available value for a grade, replace the NaN by "No Grade". If there is no semester available, replace the NaN by "No Semester".

B) Create a Dataframe called "students_grades_bis" only keeping 2 columns (Name and Grade) for the student for whom we know the grade.

#### Solutions
A

In [None]:
students_grades = pd.merge(students, grades, on='Student_ID', how='left').fillna({
    'Grade': 'No Grade',
    'Semester': 'No Semester'
})

print(students_grades)

B

In [None]:
students_grades_bis = pd.merge(students, grades, on='Student_ID', how='inner')
students_grades_bis = students_grades_bis.drop(columns =["Major", "Semester"])

print(students_grades_bis)