# Day 6: Matplotlib - Bar Charts and Histograms

Welcome to Day 6! You're now comfortable creating line and scatter plots. Today, we'll expand your visualization toolkit with two more fundamental plot types in Matplotlib.

Today's topics include:
1.  **Creating Bar Charts** to compare quantities across different categories.
2.  **Generating Histograms** to understand the distribution of a single numerical variable.
3.  **Customizing** these plots by adjusting bins, colors, and edges.

As always, we start by setting up our environment with the necessary libraries and our familiar Iris dataset.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Load the Iris dataset into a DataFrame
iris_data = load_iris()
iris_df = pd.DataFrame(data=iris_data.data, columns=iris_data.feature_names)
iris_df['target'] = iris_data.target

# Let's add the species names to the DataFrame for easier plotting
iris_df['species'] = iris_df['target'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})

# Preview the updated data
iris_df.head()

---

## Part 1: Bar Charts

Bar charts are ideal for comparing a numerical value across different categories. For example, what is the average sepal length for each species of iris? A bar chart is the perfect way to visualize this.

**Exercise 1.1:** First, we need the data to plot. Calculate the average 'sepal length (cm)' for each species. Then, create a bar chart to display these averages.

*Hint: Use pandas' `.groupby()` and `.mean()` methods to get the average values.*

In [None]:
# Step 1: Calculate the average sepal length for each species
avg_sepal_length = iris_df.groupby('species')['sepal length (cm)'].mean()
print(avg_sepal_length)

# Step 2: Create the bar chart using plt.bar()
# Your code here

**Solution 1.1:**

In [None]:
# Solution
avg_sepal_length = iris_df.groupby('species')['sepal length (cm)'].mean()

plt.bar(x=avg_sepal_length.index, height=avg_sepal_length.values)

# Don't forget labels and a title!
plt.title('Average Sepal Length by Species')
plt.xlabel('Species')
plt.ylabel('Average Sepal Length (cm)')

plt.show()

**Exercise 1.2:** Customize the bar chart from the previous exercise. Change the color of the bars to `'skyblue'`. 

*Hint: Use the `color` parameter within `plt.bar()`.*

In [None]:
# Your code here

**Solution 1.2:**

In [None]:
# Solution
avg_sepal_length = iris_df.groupby('species')['sepal length (cm)'].mean()

plt.bar(x=avg_sepal_length.index, height=avg_sepal_length.values, color='skyblue')

plt.title('Average Sepal Length by Species')
plt.xlabel('Species')
plt.ylabel('Average Sepal Length (cm)')

plt.show()

---

## Part 2: Histograms

Histograms help us understand the distribution of a single continuous variable. They do this by dividing the range of the data into 'bins' and counting how many data points fall into each bin.

**Exercise 2.1:** Create a histogram for the 'petal length (cm)' column of the `iris_df`. Use `plt.hist()` for this.

In [None]:
# Your code here

**Solution 2.1:**

In [None]:
# Solution
plt.hist(iris_df['petal length (cm)'])

plt.title('Distribution of Petal Length')
plt.xlabel('Petal Length (cm)')
plt.ylabel('Frequency')

plt.show()

**Exercise 2.2:** The default number of bins might not always be the best. Customize the histogram from 2.1 by:
1.  Setting the number of bins to 20.
2.  Adding a black edge color to the bars for better separation.

*Hint: Use the `bins` and `edgecolor` parameters in `plt.hist()`.*

In [None]:
# Your code here

**Solution 2.2:**

In [None]:
# Solution
plt.hist(iris_df['petal length (cm)'], bins=20, edgecolor='black')

plt.title('Distribution of Petal Length (20 Bins)')
plt.xlabel('Petal Length (cm)')
plt.ylabel('Frequency')

plt.show()

---

### Great job today!

You've now mastered four essential plot types. Bar charts help you compare categories, and histograms reveal the shape of your data's distribution. These are invaluable tools for exploratory data analysis.

Tomorrow, you'll put everything you've learned in the first week together in a mini-project to analyze a new dataset from start to finish!