# Class 10: Data Analysis with Pandas

Welcome to the tenth class of our Python course! Today, we will dive deeper into Pandas to explore advanced data manipulation techniques, such as merging, joining, and concatenating DataFrames, performing GroupBy operations, and visualizing data directly within Pandas. Let's get started!

## 1. Data Manipulation

### 1.1. Merging, Joining, and Concatenating DataFrames

In real-world data analysis, you often need to combine data from multiple sources. Pandas provides several functions to merge, join, and concatenate DataFrames.

**Merging DataFrames:**

The `merge` function is similar to SQL joins. You can merge two DataFrames based on a common column or index.

In [None]:
import pandas as pd

# Creating two DataFrames
df1 = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40]
})

df2 = pd.DataFrame({
    'ID': [3, 4, 5, 6],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'],
    'Salary': [70000, 80000, 90000, 100000]
})

# Merging the DataFrames on the 'ID' column
merged_df = pd.merge(df1, df2, on='ID', how='inner')
print(merged_df)

**Joining DataFrames:**

The `join` function is useful when you want to combine DataFrames on their index.

In [None]:
# Setting the 'ID' column as the index
df1.set_index('ID', inplace=True)
df2.set_index('ID', inplace=True)

# Joining the DataFrames
joined_df = df1.join(df2, how='inner')
print(joined_df)

**Concatenating DataFrames:**

The `concat` function allows you to concatenate DataFrames either vertically (along rows) or horizontally (along columns).

In [None]:
# Concatenating DataFrames vertically
concat_df = pd.concat([df1, df2], axis=0)
print(concat_df)

# Concatenating DataFrames horizontally
concat_df = pd.concat([df1, df2], axis=1)
print(concat_df)

## 2. Data Aggregation and Grouping

### 2.1. GroupBy Operations

The `groupby` function allows you to split your data into groups based on some criteria, apply a function to each group independently, and then combine the results.

In [None]:
# Creating a DataFrame
data = {
    'Department': ['Sales', 'Sales', 'HR', 'HR', 'Finance', 'Finance'],
    'Employee': ['John', 'Jane', 'Jim', 'Jake', 'Alice', 'Bob'],
    'Salary': [70000, 80000, 60000, 65000, 90000, 85000]
}
df = pd.DataFrame(data)

# Grouping by 'Department' and calculating the mean salary
grouped_df = df.groupby('Department')['Salary'].mean()
print(grouped_df)

### 2.2. Aggregation Functions

Pandas provides several aggregation functions that can be applied to grouped data, such as `mean`, `sum`, `count`, `min`, `max`, etc.

In [None]:
# Aggregating data using multiple functions
agg_df = df.groupby('Department')['Salary'].agg(['mean', 'sum', 'count'])
print(agg_df)

### 2.3. Grouping Data

Grouping data allows for more complex operations, like applying custom functions to groups or transforming data within groups.

In [None]:
# Applying a custom function to each group
def salary_range(series):
    return series.max() - series.min()

range_df = df.groupby('Department')['Salary'].apply(salary_range)
print(range_df)

## 3. Data Visualization with Pandas

Pandas integrates with Matplotlib to allow for quick and easy data visualization directly from DataFrames.

### 3.1. Basic Plotting with Pandas

Pandas provides built-in plotting functions that make it easy to create simple visualizations.

In [None]:
# Creating a simple line plot
df = pd.DataFrame({
    'Year': [2018, 2019, 2020, 2021],
    'Sales': [25000, 27000, 29000, 31000]
})
df.plot(x='Year', y='Sales', kind='line', title='Yearly Sales')

### 3.2. Customizing Plots

You can customize your plots by adding titles, labels, legends, and changing plot styles.

In [None]:
import matplotlib.pyplot as plt

# Creating a bar plot with customization
df.plot(x='Year', y='Sales', kind='bar', color='orange')
plt.title('Yearly Sales')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.show()

## 4. Exercises

Now it's time to practice what you've learned! Try to solve the following exercises.

### Exercise 1: Merge DataFrames

Create two DataFrames:

1. A DataFrame with columns 'ID', 'Name', and 'Age'.
2. A DataFrame with columns 'ID', 'Department', and 'Salary'. Merge these DataFrames on the 'ID' column.

### Exercise 2: GroupBy and Aggregation

Create a DataFrame with sales data for different regions and products. Group the data by region and calculate the total sales for each region.

### Exercise 3: Custom Aggregation

Using the same sales data, group by product and calculate both the total and average sales for each product.

### Exercise 4: Basic Plotting

Create a DataFrame with monthly sales data for a year. Plot the data using a line plot and customize the plot with titles, labels, and a legend.

### Exercise 5: Advanced Plotting

Using the sales data from previous exercises, create a bar plot that shows the total sales for each product across different regions. Customize the plot with colors and labels.

Feel free to explore the powerful data manipulation and visualization capabilities of Pandas. Happy coding!