# BUAN 446 Week 5 Assignment
## Data Visualization with Matplotlib and Seaborn

---

**Name:** _____________________

**Date:** _____________________

---

### Instructions

This assignment reinforces the data visualization skills covered in Chapter 8. You will create various chart types, customize their appearance, and practice choosing appropriate visualizations for different questions.

**Rules:**
- You **MAY** use AI tools to help with this assignment
- You **MUST** understand and be able to explain all code you submit
- All visualizations must be publication-quality (proper titles, labels, legends)
- Document any AI usage in the reflection section at the end

**Grading:**
- Part 1: Basic Matplotlib Charts (25 points)
- Part 2: Customization and Styling (25 points)
- Part 3: Seaborn Statistical Plots (25 points)
- Part 4: Multi-Panel Figures (25 points)
- **Total: 100 points**

---

## Setup

Run the cell below to import libraries and load the data.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the student data
df = pd.read_csv('lehigh_students_clean.csv')

# Create derived columns
df['Completion_Rate'] = (df['Credits_Earned'] / df['Credits_Attempted'] * 100).round(1)

def classify_standing(gpa):
    if gpa >= 3.5:
        return "Dean's List"
    elif gpa >= 2.0:
        return "Good Standing"
    else:
        return "Probation"

df['Academic_Standing'] = df['GPA'].apply(classify_standing)

# Set default figure size
plt.rcParams['figure.figsize'] = [10, 6]
plt.rcParams['figure.dpi'] = 100

print(f"Loaded {len(df)} student records")
df.head()

---

# Part 1: Basic Matplotlib Charts (25 points)

### Task 1.1: Histogram (8 points)

Create a histogram of student GPAs with the following requirements:
- Use 25 bins
- Add black edges to the bars
- Set transparency (alpha) to 0.7
- Add a title: "Distribution of Student GPAs at Lehigh University"
- Label the x-axis: "GPA"
- Label the y-axis: "Number of Students"
- Add a vertical dashed red line at the mean GPA with a label
- Add a vertical dashed green line at GPA = 3.5 (Dean's List threshold)
- Include a legend

In [None]:
# Your code here





plt.show()

### Task 1.2: Bar Chart (8 points)

Create a bar chart showing the number of students in each college:
- Calculate the count of students per college
- Sort the bars from most students to fewest
- Add a title: "Student Enrollment by College"
- Label the axes appropriately
- Rotate x-axis labels 45 degrees for readability
- Use `plt.tight_layout()` to prevent label cutoff

In [None]:
# Your code here





plt.show()

### Task 1.3: Scatter Plot (9 points)

Create a scatter plot showing Credits_Attempted vs Credits_Earned:
- Use alpha=0.5 for transparency (since points may overlap)
- Add a diagonal line (y = x) representing 100% completion in red
- Add a title: "Credit Completion Analysis"
- Label the axes appropriately
- Add a legend for the 100% completion line
- Make the figure square (equal aspect ratio) using `figsize=(8, 8)`

In [None]:
# Your code here





plt.show()

---

# Part 2: Customization and Styling (25 points)

### Task 2.1: Horizontal Bar Chart with Values (10 points)

Create a horizontal bar chart showing average GPA by college:
- Calculate mean GPA for each college
- Sort from highest to lowest GPA
- Use `plt.barh()` for horizontal bars
- Add the GPA value as text at the end of each bar (hint: use `plt.text()` or `ax.text()`)
- Use a color gradient or distinct colors for each bar
- Add title and axis labels

In [None]:
# Your code here





plt.show()

### Task 2.2: Styled Pie Chart (7 points)

Create a pie chart showing the distribution of Academic Standing:
- Count students in each Academic Standing category
- Use appropriate colors (green for Dean's List, blue for Good Standing, red for Probation)
- Show percentages on each slice
- "Explode" the Probation slice slightly to highlight it
- Add a title: "Academic Standing Distribution"

In [None]:
# Your code here





plt.show()

### Task 2.3: Saving Figures (8 points)

Recreate the histogram from Task 1.1 and save it in two formats:
1. Save as 'gpa_distribution.png' with dpi=150
2. Save as 'gpa_distribution.pdf' for vector graphics

Use `bbox_inches='tight'` to prevent cutoff.

Print a confirmation message after each save.

In [None]:
# Your code here





plt.show()

---

# Part 3: Seaborn Statistical Plots (25 points)

### Task 3.1: Box Plot (8 points)

Create a seaborn box plot showing GPA distribution by Class Year:
- Order the x-axis logically: Freshman, Sophomore, Junior, Senior, Graduate
- Add horizontal reference lines at GPA = 2.0 (red) and GPA = 3.5 (green)
- Add a title: "GPA Distribution by Class Year"
- Label axes appropriately

In [None]:
# Your code here





plt.show()

### Task 3.2: Scatter Plot with Hue (8 points)

Create a seaborn scatter plot:
- x-axis: Credits_Earned
- y-axis: GPA
- Color points by College using the `hue` parameter
- Use alpha=0.6 for transparency
- Add title: "GPA vs Credits Earned by College"
- Move the legend outside the plot if it overlaps with data

In [None]:
# Your code here





plt.show()

### Task 3.3: Violin Plot (9 points)

Create a seaborn violin plot comparing Completion Rate across Academic Standing:
- x-axis: Academic_Standing
- y-axis: Completion_Rate
- Order: Dean's List, Good Standing, Probation
- Add a title: "Completion Rate by Academic Standing"
- Add axis labels

**Interpretation:** Write 2-3 sentences below the plot explaining what the visualization reveals about the relationship between academic standing and completion rate.

In [None]:
# Your code here





plt.show()

**Your interpretation:**

*Write 2-3 sentences here*



---

# Part 4: Multi-Panel Figures (25 points)

### Task 4.1: Side-by-Side Comparison (10 points)

Create a figure with 1 row and 2 columns comparing:
- Left: Histogram of GPA
- Right: Histogram of Completion_Rate

Requirements:
- Use `fig, axes = plt.subplots(1, 2, figsize=(14, 5))`
- Each subplot should have its own title
- Add appropriate axis labels to each
- Use `plt.tight_layout()`

In [None]:
# Your code here





plt.show()

### Task 4.2: Comprehensive Dashboard (15 points)

Create a 2x2 dashboard figure with the following plots:

1. **Top-left:** Box plot of GPA by College (use seaborn)
2. **Top-right:** Bar chart of student count by Class Year (in logical order)
3. **Bottom-left:** Scatter plot of Credits_Attempted vs GPA with color by Academic_Standing
4. **Bottom-right:** Histogram of Completion_Rate

Requirements:
- Use `figsize=(16, 12)`
- Each subplot must have a descriptive title
- Rotate x-axis labels where needed
- Use `plt.tight_layout()`
- Add a main title for the entire figure using `plt.suptitle()`
- Save the dashboard as 'student_dashboard.png' with dpi=150

In [None]:
# Your code here








plt.show()

---

# Bonus Challenge (Optional, +5 points)

Create a heatmap showing the correlation between numerical variables (GPA, Credits_Attempted, Credits_Earned, Completion_Rate):
- Calculate the correlation matrix using `df[columns].corr()`
- Use `sns.heatmap()` with annotations showing the correlation values
- Use an appropriate colormap (e.g., 'coolwarm' or 'RdYlGn')
- Add a title

**Interpretation:** Which variables are most strongly correlated? Does this make sense?

In [None]:
# Bonus code here (optional)






**Bonus interpretation:**

*Your answer here (optional)*



---

# Reflection

Answer the following questions:

**1. When would you choose matplotlib over seaborn, and vice versa?**

*Your answer:*

**2. What makes a visualization "publication-quality"? What elements did you include to achieve this?**

*Your answer:*

**3. Which visualization from this assignment do you think communicates its message most effectively? Why?**

*Your answer:*

**4. Did you use AI assistance? If so, for which parts? What did you learn?**

*Your answer:*

---

## Submission

Submit the following files via CourseConnect:
1. This completed notebook (.ipynb)
2. gpa_distribution.png
3. gpa_distribution.pdf
4. student_dashboard.png