<a href="https://colab.research.google.com/github/krauseannelize/nb-py-ms-exercises/blob/sprint04/notebooks/s04_pandas_data_wrangling/41_exercises_data_integration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 41 | Exercises - Data Integration

## Concatenating DataFrames

To combine multiple **DataFrames** into one, you can use the `.concat()` method to concatenate:

- vertically (stacking rows) or
- horizontally (adding columns).

```python
# basic syntax
pd.concat(objs, axis=0, ignore_index=False)
```

- `objs`: List of **DataFrames** to concatenate
- `axis`:
  - `0` → Vertical concatenation (stack rows)
  - `1` → Horizontal concatenation (add columns)
- `ignore_index`:
  - `True` → Creates a new integer index (useful to avoid duplicate indices)
  - `False` → Retains original indices from source DataFrames

💡 _Note: `ignore_index=True` helps avoid messy or duplicate indices during concatenation. If you’ve already concatenated and want to clean up the index afterward, use `.reset_index(drop=True)` instead._

In [1]:
import pandas as pd

# Create January sales DataFrame
data_jan = pd.DataFrame({
    'Region': ['North', 'South', 'East', 'West'],
    'Sales': [25000, 30000, 20000, 15000],
    'Month': ['January'] * 4
})

# Create February sales DataFrame
data_feb = pd.DataFrame({
    'Region': ['North', 'South', 'East', 'West'],
    'Sales': [27000, 32000, 23000, 18000],
    'Month': ['February'] * 4
})

# Create March sales DataFrame
data_mar = pd.DataFrame({
    'Region': ['North', 'South', 'East', 'West'],
    'Sales': [28000, 31000, 24000, 20000],
    'Month': ['March'] * 4
})

# Concatenate 3 DataFrames without ignore_index
# Note duplicates in index
sales_df1 = pd.concat([data_jan, data_feb, data_mar])
sales_df1


Unnamed: 0,Region,Sales,Month
0,North,25000,January
1,South,30000,January
2,East,20000,January
3,West,15000,January
0,North,27000,February
1,South,32000,February
2,East,23000,February
3,West,18000,February
0,North,28000,March
1,South,31000,March


In [2]:
# Concatenate 3 DataFrames with ignore_index
# Note no duplicates in index
sales_df2 = pd.concat([data_jan, data_feb, data_mar], ignore_index=True)
sales_df2

Unnamed: 0,Region,Sales,Month
0,North,25000,January
1,South,30000,January
2,East,20000,January
3,West,15000,January
4,North,27000,February
5,South,32000,February
6,East,23000,February
7,West,18000,February
8,North,28000,March
9,South,31000,March


When you concatenate **DataFrames** _without_ using `ignore_index=True`, the resulting DataFrame retains the original row indices from each source. This can lead to duplicate or non-sequential indices, which may cause confusion during analysis or plotting.

Sometimes you may want to preserve the original indices during the merge for reference in a separate column, but also want to add a sequential index without duplicates to work with the DataFrame.

To clean up the index after concatenation, use the `.reset_index()` method:

```python
# basic syntax
df.reset_index(drop=True)
```

- `drop`:
  - `True` → Discards the old index entirely.
  - `False` → Moves the old index into a new column in the DataFrame.
- `inplace`:
  - `True` → Modifies the original DataFrame directly (no need to assign it to a new variable).
  - `False` → Returns a new DataFrame with the reset index, leaving the original unchanged.

In [3]:
# Create a new DataFrame from first concatenated DataFrame
# that preserves the old index in a new column and
# assigns a new sequential index
sales_df1.reset_index(drop=False, inplace=False)

Unnamed: 0,index,Region,Sales,Month
0,0,North,25000,January
1,1,South,30000,January
2,2,East,20000,January
3,3,West,15000,January
4,0,North,27000,February
5,1,South,32000,February
6,2,East,23000,February
7,3,West,18000,February
8,0,North,28000,March
9,1,South,31000,March


In [4]:
# .reset_index() can also be used during concatenation
# This drops the original indices and assigns a clean, sequential index immediately
df_concatenated = pd.concat([data_jan, data_feb, data_mar], axis=0).reset_index(drop=True)
df_concatenated

Unnamed: 0,Region,Sales,Month
0,North,25000,January
1,South,30000,January
2,East,20000,January
3,West,15000,January
4,North,27000,February
5,South,32000,February
6,East,23000,February
7,West,18000,February
8,North,28000,March
9,South,31000,March


## Merging Dataframes

**Joins** allow you to combine data from multiple sources based on shared keys or columns. The join type controls which rows are included in the merged dataset.

In [19]:
import pandas as pd

df1 = pd.DataFrame({
    'Product_ID': [101, 102, 103, 104],
    'Product_Name': ['Laptop', 'Tablet', 'Smartphone', 'Monitor'],
    'Price': [1200, 500, 800, 300]
})

df2 = pd.DataFrame({
    'Product_ID': [101, 102, 103, 105],
    'Category': ['Electronics', 'Electronics', 'Electronics', 'Accessories'],
    'Stock': [50, 20, 100, 150]
})

print("DataFrame 1:\n")
print(df1)
print("\nDataFrame 2:\n")
print(df2)

DataFrame 1:

   Product_ID Product_Name  Price
0         101       Laptop   1200
1         102       Tablet    500
2         103   Smartphone    800
3         104      Monitor    300

DataFrame 2:

   Product_ID     Category  Stock
0         101  Electronics     50
1         102  Electronics     20
2         103  Electronics    100
3         105  Accessories    150


### Inner Join

Only keep rows with matching keys in the left and right DataFrames.

In [20]:
print("\nMerged DataFrame | INNER JOIN:\n")
df_inner = pd.merge(df1, df2, on='Product_ID', how='inner')
print(df_inner)


Merged DataFrame | INNER JOIN:

   Product_ID Product_Name  Price     Category  Stock
0         101       Laptop   1200  Electronics     50
1         102       Tablet    500  Electronics     20
2         103   Smartphone    800  Electronics    100


### Left Join

Keep all rows from the left DataFrame with matched rows from the right DataFrame.

In [21]:
print("\nMerged DataFrame | LEFT JOIN:\n")
df_left = pd.merge(df1, df2, on='Product_ID', how='left')
print(df_left)


Merged DataFrame | LEFT JOIN:

   Product_ID Product_Name  Price     Category  Stock
0         101       Laptop   1200  Electronics   50.0
1         102       Tablet    500  Electronics   20.0
2         103   Smartphone    800  Electronics  100.0
3         104      Monitor    300          NaN    NaN


### Right Join

Keep all rows from the right DataFrame with matched rows from the left DataFrame.

In [22]:
print("\nMerged DataFrame | RIGHT JOIN:\n")
df_right = pd.merge(df1, df2, on='Product_ID', how='right')
print(df_right)


Merged DataFrame | RIGHT JOIN:

   Product_ID Product_Name   Price     Category  Stock
0         101       Laptop  1200.0  Electronics     50
1         102       Tablet   500.0  Electronics     20
2         103   Smartphone   800.0  Electronics    100
3         105          NaN     NaN  Accessories    150


### Outer Join (Full Join)

Keeps all rows from both DataFrames, combining them wherever there are matching keys, and filling in `NaN` for any missing values from either DataFrame.

In [23]:
print("\nMerged DataFrame | OUTER JOIN:\n")
df_outer = pd.merge(df1, df2, on='Product_ID', how='outer')
print(df_outer)


Merged DataFrame | OUTER JOIN:

   Product_ID Product_Name   Price     Category  Stock
0         101       Laptop  1200.0  Electronics   50.0
1         102       Tablet   500.0  Electronics   20.0
2         103   Smartphone   800.0  Electronics  100.0
3         104      Monitor   300.0          NaN    NaN
4         105          NaN     NaN  Accessories  150.0


## Exercise 1

Concatenate the following DataFrames through rows. As you will see, there is some of the code done, you only have to replace the “___” part of the code.

```python
import pandas as pd

# Create the first DataFrame
data1 = {
    'A': [1, 2, 3],
    'B': ['a', 'b', 'c']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame
data2 = {
    'A': [4, 5, 6],
    'B': ['d', 'e', 'f']
}
df2 = pd.DataFrame(data2)

# Concatenate along rows
# Replace the "___"
concatenated_rows = pd.concat([___, ___], axis=___, ignore_index=True)

print("Concatenated along rows:")
print(concatenated_rows)
```

In [5]:
import pandas as pd

# Create the first DataFrame
data1 = {
    'A': [1, 2, 3],
    'B': ['a', 'b', 'c']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame
data2 = {
    'A': [4, 5, 6],
    'B': ['d', 'e', 'f']
}
df2 = pd.DataFrame(data2)

# Concatenate along rows
concatenated_rows = pd.concat([df1, df2], axis=0, ignore_index=True)

print("Concatenated along rows:")
print(concatenated_rows)

Concatenated along rows:
   A  B
0  1  a
1  2  b
2  3  c
3  4  d
4  5  e
5  6  f


## Exercise 2

Concatenate two data frames vertically.

```python
import pandas as pd

# Sample data for January
data_jan = {'Date': ['2023-01-01', '2023-01-02'],
            'Stock Price': [100, 102],
            'Volume': [3000, 3200],
            'Region': ['North', 'South']}
df_jan = pd.DataFrame(data_jan)

# Sample data for February
data_feb = {'Date': ['2023-02-01', '2023-02-02'],
            'Stock Price': [110, 112],
            'Volume': [3300, 3400],
            'Region': ['East', 'West']}
df_feb = pd.DataFrame(data_feb)

# Print the heads of the dataframes
print("January DataFrame:")
print(df_jan.head())
print("\nFebruary DataFrame:")
print(df_feb.head())

# Step 1: Concatenate the two dataframes vertically
df_combined = pd.________([__________, __________], axis=____, ignore_index=True)  # Fill in the blanks

# Display the concatenated dataframe
print("\nConcatenated DataFrame:")
print(df_combined.head())
```

In [6]:
import pandas as pd

# Sample data for January
data_jan = {'Date': ['2023-01-01', '2023-01-02'],
            'Stock Price': [100, 102],
            'Volume': [3000, 3200],
            'Region': ['North', 'South']}
df_jan = pd.DataFrame(data_jan)

# Sample data for February
data_feb = {'Date': ['2023-02-01', '2023-02-02'],
            'Stock Price': [110, 112],
            'Volume': [3300, 3400],
            'Region': ['East', 'West']}
df_feb = pd.DataFrame(data_feb)

# Print the heads of the dataframes
print("January DataFrame:")
print(df_jan.head())
print("\nFebruary DataFrame:")
print(df_feb.head())

# Step 1: Concatenate the two dataframes vertically
df_combined = pd.concat([df_jan, df_feb], axis=0, ignore_index=True)

# Display the concatenated dataframe
print("\nConcatenated DataFrame:")
print(df_combined.head())

January DataFrame:
         Date  Stock Price  Volume Region
0  2023-01-01          100    3000  North
1  2023-01-02          102    3200  South

February DataFrame:
         Date  Stock Price  Volume Region
0  2023-02-01          110    3300   East
1  2023-02-02          112    3400   West

Concatenated DataFrame:
         Date  Stock Price  Volume Region
0  2023-01-01          100    3000  North
1  2023-01-02          102    3200  South
2  2023-02-01          110    3300   East
3  2023-02-02          112    3400   West


## Exercise 3

Concatenate two data frames horizontally.

```python
import pandas as pd

# Sample stock data for January with additional columns
data_jan = {'Date': ['2023-01-01', '2023-01-02'],
            'Stock Price': [100, 102],
            'Volume': [3000, 3200],
            'Region': ['North', 'South'],
            'Market Cap': [1.5e9, 1.7e9]}
df_jan = pd.DataFrame(data_jan)

# Sample stock data for January with additional columns
data_feb = {'Date': ['2023-02-01', '2023-02-02'],
            'Stock Price': [110, 112],
            'Volume': [3300, 3400],
            'Region': ['East', 'West'],
            'Market Cap': [1.8e9, 2.0e9]}
df_feb = pd.DataFrame(data_feb)

# Print the heads of the dataframes
print("January DataFrame:")
print(df_jan.head())
print("\nFebruary DataFrame:")
print(df_feb.head())

# Step 1: Concatenate the dataframes horizontally
# Fill in the blanks
df_combined = ______.______([__________, __________], ________=__________)  

# Display the concatenated dataframe
print("\nConcatenated DataFrame:")
print(df_combined.head())
```

In [7]:
import pandas as pd

# Sample stock data for January with additional columns
data_jan = {'Date': ['2023-01-01', '2023-01-02'],
            'Stock Price': [100, 102],
            'Volume': [3000, 3200],
            'Region': ['North', 'South'],
            'Market Cap': [1.5e9, 1.7e9]}
df_jan = pd.DataFrame(data_jan)

# Sample stock data for January with additional columns
data_feb = {'Date': ['2023-02-01', '2023-02-02'],
            'Stock Price': [110, 112],
            'Volume': [3300, 3400],
            'Region': ['East', 'West'],
            'Market Cap': [1.8e9, 2.0e9]}
df_feb = pd.DataFrame(data_feb)

# Print the heads of the dataframes
print("January DataFrame:")
print(df_jan.head())
print("\nFebruary DataFrame:")
print(df_feb.head())

# Step 1: Concatenate the dataframes horizontally
df_combined = pd.concat([df_jan, df_feb], axis=1)

# Display the concatenated dataframe
print("\nConcatenated DataFrame:")
print(df_combined.head())

January DataFrame:
         Date  Stock Price  Volume Region    Market Cap
0  2023-01-01          100    3000  North  1.500000e+09
1  2023-01-02          102    3200  South  1.700000e+09

February DataFrame:
         Date  Stock Price  Volume Region    Market Cap
0  2023-02-01          110    3300   East  1.800000e+09
1  2023-02-02          112    3400   West  2.000000e+09

Concatenated DataFrame:
         Date  Stock Price  Volume Region    Market Cap        Date  \
0  2023-01-01          100    3000  North  1.500000e+09  2023-02-01   
1  2023-01-02          102    3200  South  1.700000e+09  2023-02-02   

   Stock Price  Volume Region    Market Cap  
0          110    3300   East  1.800000e+09  
1          112    3400   West  2.000000e+09  


## Exercise 4

Concatenate two data frames with different columns and reset the index.

```python
import pandas as pd

# Sample data for January
data_jan = {'Date': ['2023-01-01', '2023-01-02'],
            'Stock Price': [100, 102],
            'Volume': [3000, 3200],
            'Region': ['North', 'South'],
            'Market Cap': [1.5e9, 1.7e9]}
df_jan = pd.DataFrame(data_jan)

# Sample data for February (missing 'Region' and 'Market Cap' columns)
data_feb = {'Date': ['2023-02-01', '2023-02-02'],
            'Stock Price': [110, 112],
            'Volume': [3300, 3400]}
df_feb = pd.DataFrame(data_feb)

# Print the heads of the dataframes
print("January DataFrame:")
print(df_jan.head())
print("\nFebruary DataFrame:")
print(df_feb.head())

# Step 1: Concatenate the dataframes and reset the index
# Fill in the blanks
df_combined = pd.concat(______)  

# Display the concatenated dataframe
print("\nConcatenated DataFrame:")
print(df_combined.head())
```

In [10]:
import pandas as pd

# Sample data for January
data_jan = {'Date': ['2023-01-01', '2023-01-02'],
            'Stock Price': [100, 102],
            'Volume': [3000, 3200],
            'Region': ['North', 'South'],
            'Market Cap': [1.5e9, 1.7e9]}
df_jan = pd.DataFrame(data_jan)

# Sample data for February (missing 'Region' and 'Market Cap' columns)
data_feb = {'Date': ['2023-02-01', '2023-02-02'],
            'Stock Price': [110, 112],
            'Volume': [3300, 3400]}
df_feb = pd.DataFrame(data_feb)

# Print the heads of the dataframes
print("January DataFrame:")
print(df_jan.head())
print("\nFebruary DataFrame:")
print(df_feb.head())

# Step 1: Concatenate the dataframes and reset the index
df_combined = pd.concat([df_jan, df_feb], join='outer', axis=0, ignore_index=True)

# Display the concatenated dataframe
print("\nConcatenated DataFrame:")
print(df_combined.head())

January DataFrame:
         Date  Stock Price  Volume Region    Market Cap
0  2023-01-01          100    3000  North  1.500000e+09
1  2023-01-02          102    3200  South  1.700000e+09

February DataFrame:
         Date  Stock Price  Volume
0  2023-02-01          110    3300
1  2023-02-02          112    3400

Concatenated DataFrame:
         Date  Stock Price  Volume Region    Market Cap
0  2023-01-01          100    3000  North  1.500000e+09
1  2023-01-02          102    3200  South  1.700000e+09
2  2023-02-01          110    3300    NaN           NaN
3  2023-02-02          112    3400    NaN           NaN


## Exercise 5

Concatenate data frames with different indexes, and reset the index in the final data frame.

```python
import pandas as pd

# Sample sales data for January with custom index
data_jan = {'Date': ['2023-01-01', '2023-01-02'],
            'Sales': [1000, 1200],
            'Region': ['North', 'South']}
df_jan = pd.DataFrame(data_jan, index=['A', 'B'])

# Sample sales data for February with custom index
data_feb = {'Date': ['2023-02-01', '2023-02-02'],
            'Sales': [1100, 1300],
            'Region': ['East', 'West']}
df_feb = pd.DataFrame(data_feb, index=['C', 'D'])

# Print the heads of the dataframes
print("January DataFrame:")
print(df_jan.head())
print("\nFebruary DataFrame:")
print(df_feb.head())

# Step 1: Concatenate the dataframes and reset the index
# Fill in the blanks
df_combined = pd.concat([__________, __________], axis=0)  

# Step 2: Reset the index of the concatenated dataframe
df_combined_reset = df_combined.reset_index(drop=True)  # Reset the index

# Display the concatenated dataframe with reset index
print("\nConcatenated DataFrame with Reset Index:")
print(df_combined_reset.head())
```

In [None]:
import pandas as pd

# Sample sales data for January with custom index
data_jan = {'Date': ['2023-01-01', '2023-01-02'],
            'Sales': [1000, 1200],
            'Region': ['North', 'South']}
df_jan = pd.DataFrame(data_jan, index=['A', 'B'])

# Sample sales data for February with custom index
data_feb = {'Date': ['2023-02-01', '2023-02-02'],
            'Sales': [1100, 1300],
            'Region': ['East', 'West']}
df_feb = pd.DataFrame(data_feb, index=['C', 'D'])

# Print the heads of the dataframes
print("January DataFrame:")
print(df_jan.head())
print("\nFebruary DataFrame:")
print(df_feb.head())

# Step 1: Concatenate the dataframes
df_combined = pd.concat([df_jan, df_feb], axis=0)

# Step 2: Reset the index of the concatenated dataframe
df_combined_reset = df_combined.reset_index(drop=True)

# Display the concatenated dataframe with reset index
print("\nConcatenated DataFrame with Reset Index:")
print(df_combined_reset.head())

## Exercise 6

Select a join that will keep all rows from the left DataFrame with matched rows from the right DataFrame. Some of the code is already written; you only need to replace the '___' part.

```python
import pandas as pd

# Create the first DataFrame (left)
data1 = {
    'ID': [1, 2, 3, 4, 5, 6, 7],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (right)
data2 = {
    'ID': [2, 3, 5, 6, 8, 9, 10],
    'Age': [25, 30, 22, 28, 35, 40, 45]
}
df2 = pd.DataFrame(data2)

# Perform a left join on the 'ID' column
merged_df = ___.merge(___, ___, on=___, how=___) # replace the "___"

print("Merged DataFrame with Left Join:")
print(merged_df)
```

In [24]:
import pandas as pd

# Create the first DataFrame (left)
data1 = {
    'ID': [1, 2, 3, 4, 5, 6, 7],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (right)
data2 = {
    'ID': [2, 3, 5, 6, 8, 9, 10],
    'Age': [25, 30, 22, 28, 35, 40, 45]
}
df2 = pd.DataFrame(data2)

# Perform a left join on the 'ID' column
merged_df = pd.merge(df1, df2, on='ID', how='left')

print("Merged DataFrame with Left Join:")
print(merged_df)

Merged DataFrame with Left Join:
   ID     Name   Age
0   1    Alice   NaN
1   2      Bob  25.0
2   3  Charlie  30.0
3   4    David   NaN
4   5      Eve  22.0
5   6    Frank  28.0
6   7    Grace   NaN


## Exercise 7

You have two DataFrames, `df1` containing information about employees, and `df2` containing information about their departments. Perform an **inner join** on the 'Employee_ID' column to combine both DataFrames, keeping only the employees who have a matching department.

```python
import pandas as pd

# Create the first DataFrame (employees)
data1 = {
    'Employee_ID': [101, 102, 103, 104, 105],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (departments)
data2 = {
    'Employee_ID': [101, 102, 106, 107],
    'Department': ['HR', 'Finance', 'IT', 'Sales']
}
df2 = pd.DataFrame(data2)

# Perform an inner join on 'Employee_ID'
merged_df = df1.merge(___, on='____', how='___')  # Fill in the '___'
print(merged_df)
```

In [25]:
import pandas as pd

# Create the first DataFrame (employees)
data1 = {
    'Employee_ID': [101, 102, 103, 104, 105],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (departments)
data2 = {
    'Employee_ID': [101, 102, 106, 107],
    'Department': ['HR', 'Finance', 'IT', 'Sales']
}
df2 = pd.DataFrame(data2)

# Perform an inner join on 'Employee_ID'
merged_df = df1.merge(df2, on='Employee_ID', how='inner')
print(merged_df)

   Employee_ID   Name Department
0          101  Alice         HR
1          102    Bob    Finance


## Exercise 8

You have two DataFrames, `df1` containing product details and `df2` containing sales data. Perform an **inner join** to combine them, showing only the products that have matching sales data.

```python
import pandas as pd

# Create the first DataFrame (products)
data1 = {
    'Product_ID': [101, 102, 103, 104],
    'Product_Name': ['Laptop', 'Tablet', 'Smartphone', 'Monitor']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (sales)
data2 = {
    'Product_ID': [101, 102, 105],
    'Sales': [500, 300, 100]
}
df2 = pd.DataFrame(data2)

# Perform an inner join on 'Product_ID'
merged_df = ___.merge(___, ___='Product_ID', ____='___')  # Fill in the '___'
print(merged_df)
```

In [26]:
import pandas as pd

# Create the first DataFrame (products)
data1 = {
    'Product_ID': [101, 102, 103, 104],
    'Product_Name': ['Laptop', 'Tablet', 'Smartphone', 'Monitor']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (sales)
data2 = {
    'Product_ID': [101, 102, 105],
    'Sales': [500, 300, 100]
}
df2 = pd.DataFrame(data2)

# Perform an inner join on 'Product_ID'
merged_df = df1.merge(df2, on='Product_ID', how='inner')
print(merged_df)

   Product_ID Product_Name  Sales
0         101       Laptop    500
1         102       Tablet    300


## Exercise 9

You have two DataFrames, `df1` containing information about products, and `df2` containing data about product discounts. Perform a **left join** to combine both DataFrames, keeping all products from `df1`, even if they don’t have a matching discount in `df2`.

```python
import pandas as pd

# Create the first DataFrame (products)
data1 = {
    'Product_ID': [101, 102, 103, 104],
    'Product_Name': ['Laptop', 'Tablet', 'Smartphone', 'Monitor']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (discounts)
data2 = {
    'Product_ID': [101, 102],
    'Discount': [10, 15]
}
df2 = pd.DataFrame(data2)

# Perform a left join on 'Product_ID'
merged_df = df1._____(_____)  # Fill in the '___'
print(merged_df)
```

In [27]:
import pandas as pd

# Create the first DataFrame (products)
data1 = {
    'Product_ID': [101, 102, 103, 104],
    'Product_Name': ['Laptop', 'Tablet', 'Smartphone', 'Monitor']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (discounts)
data2 = {
    'Product_ID': [101, 102],
    'Discount': [10, 15]
}
df2 = pd.DataFrame(data2)

# Perform a left join on 'Product_ID'
merged_df = df1.merge(df2, on='Product_ID', how='left')
print(merged_df)

   Product_ID Product_Name  Discount
0         101       Laptop      10.0
1         102       Tablet      15.0
2         103   Smartphone       NaN
3         104      Monitor       NaN


## Exercise 10

You have two DataFrames, `df1` containing employee details, and `df2` containing department information. Perform a **right join** to combine both DataFrames, keeping all departments, even if there are no matching employees.

```python
import pandas as pd

# Create the first DataFrame (employees)
data1 = {
    'Employee_ID': [101, 102, 103, 104],
    'Name': ['Alice', 'Bob', 'Charlie', 'David']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (departments)
data2 = {
    'Department_ID': [1, 2, 3],
    'Department': ['HR', 'Finance', 'Sales'],
    'Employee_ID': [101, 102, 105]
}
df2 = pd.DataFrame(data2)

# Perform a right join on 'Employee_ID'
merged_df = ______  # Fill in the '___'
print(merged_df)
```

In [28]:
import pandas as pd

# Create the first DataFrame (employees)
data1 = {
    'Employee_ID': [101, 102, 103, 104],
    'Name': ['Alice', 'Bob', 'Charlie', 'David']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (departments)
data2 = {
    'Department_ID': [1, 2, 3],
    'Department': ['HR', 'Finance', 'Sales'],
    'Employee_ID': [101, 102, 105]
}
df2 = pd.DataFrame(data2)

# Perform a right join on 'Employee_ID'
merged_df = df1.merge(df2, on='Employee_ID', how='right')
print(merged_df)

   Employee_ID   Name  Department_ID Department
0          101  Alice              1         HR
1          102    Bob              2    Finance
2          105    NaN              3      Sales


## Exercise 11

You have two DataFrames, `df1` containing product details, and `df2` containing stock information. Perform a **right join** to combine both DataFrames, keeping all stock information, even if there’s no matching product.

```python
import pandas as pd

# Create the first DataFrame (products)
data1 = {
    'Product_ID': [101, 102, 103],
    'Product_Name': ['Laptop', 'Tablet', 'Smartphone']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (stock)
data2 = {
    'Product_ID': [102, 103, 104],
    'Stock': [50, 100, 200]
}
df2 = pd.DataFrame(data2)

# Perform a right join on 'Product_ID'
merged_df = ______  # Fill in the '___'
print(merged_df)
```

In [29]:
import pandas as pd

# Create the first DataFrame (products)
data1 = {
    'Product_ID': [101, 102, 103],
    'Product_Name': ['Laptop', 'Tablet', 'Smartphone']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (stock)
data2 = {
    'Product_ID': [102, 103, 104],
    'Stock': [50, 100, 200]
}
df2 = pd.DataFrame(data2)

# Perform a right join on 'Product_ID'
merged_df = df1.merge(df2, on='Product_ID', how='right')
print(merged_df)

   Product_ID Product_Name  Stock
0         102       Tablet     50
1         103   Smartphone    100
2         104          NaN    200


## Exercise 12

You have two DataFrames, `df1` containing information about employees, and `df2` containing employee attendance. Perform a **full outer join** to combine both DataFrames, keeping all employees and attendance records, filling missing values with NaN where necessary.

```python
import pandas as pd

# Create the first DataFrame (employees)
data1 = {
    'Employee_ID': [101, 102, 103, 104],
    'Name': ['Alice', 'Bob', 'Charlie', 'David']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (attendance)
data2 = {
    'Employee_ID': [102, 103, 105],
    'Attendance': ['Present', 'Absent', 'Present']
}
df2 = pd.DataFrame(data2)

# Perform an outer join on 'Employee_ID'
merged_df = _______  # Fill in the '___'
print(merged_df)
```

In [30]:
import pandas as pd

# Create the first DataFrame (employees)
data1 = {
    'Employee_ID': [101, 102, 103, 104],
    'Name': ['Alice', 'Bob', 'Charlie', 'David']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (attendance)
data2 = {
    'Employee_ID': [102, 103, 105],
    'Attendance': ['Present', 'Absent', 'Present']
}
df2 = pd.DataFrame(data2)

# Perform an outer join on 'Employee_ID'
merged_df = df1.merge(df2, on='Employee_ID', how='outer')
print(merged_df)

   Employee_ID     Name Attendance
0          101    Alice        NaN
1          102      Bob    Present
2          103  Charlie     Absent
3          104    David        NaN
4          105      NaN    Present


## Exercise 13

You have two DataFrames, `df1` containing product details, and `df2` containing sales data. Perform a **full outer join** to combine both DataFrames, ensuring all product and sales information is retained, even if some products or sales are missing in either DataFrame.

```python
import pandas as pd

# Create the first DataFrame (products)
data1 = {
    'Product_ID': [101, 102, 103, 104],
    'Product_Name': ['Laptop', 'Tablet', 'Smartphone', 'Monitor']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (sales)
data2 = {
    'Product_ID': [101, 103, 105],
    'Sales': [500, 200, 150]
}
df2 = pd.DataFrame(data2)

# Perform an outer join on 'Product_ID'
merged_df = ________  # Fill in the '___'
print(_______)
```

In [31]:
import pandas as pd

# Create the first DataFrame (products)
data1 = {
    'Product_ID': [101, 102, 103, 104],
    'Product_Name': ['Laptop', 'Tablet', 'Smartphone', 'Monitor']
}
df1 = pd.DataFrame(data1)

# Create the second DataFrame (sales)
data2 = {
    'Product_ID': [101, 103, 105],
    'Sales': [500, 200, 150]
}
df2 = pd.DataFrame(data2)

# Perform an outer join on 'Product_ID'
merged_df = df1.merge(df2, on='Product_ID', how='outer')
print(merged_df)

   Product_ID Product_Name  Sales
0         101       Laptop  500.0
1         102       Tablet    NaN
2         103   Smartphone  200.0
3         104      Monitor    NaN
4         105          NaN  150.0
