# BUAN 446 Midterm Project
## Student Success Data Analysis

---

**Name:** _____________________

**Date:** _____________________

---

### Instructions

This project assesses your ability to apply Python fundamentals to a realistic data analysis task. You will work with the Lehigh student dataset to answer questions about student success patterns.

**Time Estimate:** 1-2 hours

**Rules:**
- You may use your notes, the textbook, and Python documentation
- You may **NOT** use AI tools (ChatGPT, Claude, Copilot, etc.)
- You may **NOT** collaborate with other students
- Submit your completed notebook via CourseConnect by the deadline

**Grading:**
- Part 1: Data Loading (15 points)
- Part 2: Basic Analysis Functions (25 points)
- Part 3: Student Classification (20 points)
- Part 4: College Comparison (20 points)
- Part 5: Data Quality Report (20 points)
- **Total: 100 points**

**Tips:**
- Read each task completely before coding
- Test your functions with the provided examples
- Partial credit is awarded for reasonable attempts
- If stuck, move on and come back later

---

## Dataset Overview

You will work with `lehigh_students_clean.csv`, which contains 600 student records with the following columns:

| Column | Type | Description |
|--------|------|-------------|
| Student_ID | String | Unique identifier (e.g., "LU100001") |
| College | String | One of 5 Lehigh colleges |
| Major | String | Student's declared major |
| Class_Year | String | First Year, Sophomore, Junior, Senior, or Graduate |
| GPA | Float | Grade point average (0.0 - 4.0) |
| Credits_Attempted | Integer | Total credits attempted |
| Credits_Earned | Integer | Total credits earned |

The five colleges are:
- College of Business
- P.C. Rossin College of Engineering
- College of Arts and Sciences
- College of Health
- College of Education

---

# Part 1: Data Loading (15 points)

Load the student data from the CSV file into a list of dictionaries with proper type conversion.

### Task 1.1: Load the Data (10 points)

Write code to:
1. Open `lehigh_students_clean.csv` using the `csv` module
2. Read each row into a dictionary with these keys: `id`, `college`, `major`, `class_year`, `gpa`, `credits_attempted`, `credits_earned`
3. Convert `gpa` to float, and `credits_attempted` and `credits_earned` to integers
4. Store all student dictionaries in a list called `students`

**After your code runs, `students` should be a list of 600 dictionaries.**

In [None]:
import csv

# Your code here
students = []







### Task 1.2: Verify Your Data (5 points)

Write code to verify your data loaded correctly by printing:
1. The total number of students loaded
2. The first student's complete record
3. The data type of the first student's GPA (should be `<class 'float'>`)

In [None]:
# Your verification code here





---

# Part 2: Basic Analysis Functions (25 points)

Write functions to calculate basic statistics about the student population.

### Task 2.1: Calculate Average GPA (10 points)

Write a function called `calculate_average_gpa` that:
- Takes a list of student dictionaries as input
- Returns the average GPA as a float
- Handles the case where the list is empty (return 0.0)

**Example:**
```python
avg = calculate_average_gpa(students)
print(f"Average GPA: {avg:.2f}")  # Should print something like "Average GPA: 2.95"
```

In [None]:
def calculate_average_gpa(student_list):
    """
    Calculate the average GPA of all students in the list.
    
    Parameters:
        student_list: List of student dictionaries
    
    Returns:
        float: The average GPA, or 0.0 if list is empty
    """
    # Your code here
    pass


# Test your function
avg = calculate_average_gpa(students)
print(f"Average GPA: {avg:.2f}")

### Task 2.2: Calculate Credit Completion Rate (10 points)

Write a function called `calculate_completion_rate` that:
- Takes a single student dictionary as input
- Returns the percentage of credits earned out of credits attempted
- Handles the case where credits_attempted is 0 (return 0.0 to avoid division by zero)

**Formula:** completion_rate = (credits_earned / credits_attempted) * 100

**Example:**
```python
student = {"credits_attempted": 100, "credits_earned": 95, ...}
rate = calculate_completion_rate(student)
print(f"Completion rate: {rate:.1f}%")  # Should print "Completion rate: 95.0%"
```

In [None]:
def calculate_completion_rate(student):
    """
    Calculate the credit completion rate for a student.
    
    Parameters:
        student: A student dictionary with 'credits_attempted' and 'credits_earned'
    
    Returns:
        float: The completion rate as a percentage (0-100)
    """
    # Your code here
    pass


# Test your function with the first student
rate = calculate_completion_rate(students[0])
print(f"First student completion rate: {rate:.1f}%")

### Task 2.3: Count Students by College (5 points)

Write a function called `count_by_college` that:
- Takes a list of student dictionaries as input
- Returns a dictionary where keys are college names and values are counts

**Example output:**
```python
{
    'College of Business': 125,
    'P.C. Rossin College of Engineering': 118,
    ...
}
```

In [None]:
def count_by_college(student_list):
    """
    Count the number of students in each college.
    
    Parameters:
        student_list: List of student dictionaries
    
    Returns:
        dict: College names as keys, counts as values
    """
    # Your code here
    pass


# Test your function
college_counts = count_by_college(students)
for college, count in college_counts.items():
    print(f"{college}: {count} students")

---

# Part 3: Student Classification (20 points)

Write functions to classify students into academic standing categories.

### Task 3.1: Determine Academic Standing (10 points)

Write a function called `get_academic_standing` that:
- Takes a GPA (float) as input
- Returns a string based on these rules:
  - GPA >= 3.5: "Dean's List"
  - GPA >= 3.0: "Good Standing"
  - GPA >= 2.0: "Satisfactory"
  - GPA < 2.0: "Academic Warning"

**Example:**
```python
get_academic_standing(3.7)  # Returns "Dean's List"
get_academic_standing(2.5)  # Returns "Satisfactory"
get_academic_standing(1.8)  # Returns "Academic Warning"
```

In [None]:
def get_academic_standing(gpa):
    """
    Determine academic standing based on GPA.
    
    Parameters:
        gpa: A float representing the student's GPA
    
    Returns:
        str: The academic standing category
    """
    # Your code here
    pass


# Test your function
test_gpas = [3.7, 3.2, 2.5, 1.8]
for gpa in test_gpas:
    print(f"GPA {gpa}: {get_academic_standing(gpa)}")

### Task 3.2: Count Students by Standing (10 points)

Using your `get_academic_standing` function, write code to:
1. Count how many students are in each academic standing category
2. Store the results in a dictionary called `standing_counts`
3. Print the results in a readable format

**Expected output format:**
```
Academic Standing Distribution:
Dean's List: XX students
Good Standing: XX students
Satisfactory: XX students
Academic Warning: XX students
```

In [None]:
# Your code here
standing_counts = {}




# Print results
print("Academic Standing Distribution:")


---

# Part 4: College Comparison (20 points)

Analyze and compare performance across colleges.

### Task 4.1: Filter Students by College (8 points)

Write a function called `get_students_by_college` that:
- Takes a list of student dictionaries and a college name as input
- Returns a new list containing only students from that college

**Example:**
```python
business_students = get_students_by_college(students, "College of Business")
print(len(business_students))  # Should print the count of Business students
```

In [None]:
def get_students_by_college(student_list, college_name):
    """
    Filter students to only those in a specific college.
    
    Parameters:
        student_list: List of student dictionaries
        college_name: String name of the college to filter by
    
    Returns:
        list: Students belonging to the specified college
    """
    # Your code here
    pass


# Test your function
business_students = get_students_by_college(students, "College of Business")
print(f"Business students: {len(business_students)}")

### Task 4.2: Compare College GPAs (12 points)

Using your `get_students_by_college` and `calculate_average_gpa` functions, write code to:
1. Calculate the average GPA for each of the 5 colleges
2. Store the results in a dictionary called `college_gpas`
3. Find and print which college has the highest average GPA
4. Find and print which college has the lowest average GPA

**Hint:** You'll need to loop through the college names and use your existing functions.

In [None]:
# List of all colleges
colleges = [
    "College of Business",
    "P.C. Rossin College of Engineering",
    "College of Arts and Sciences",
    "College of Health",
    "College of Education"
]

# Your code here
college_gpas = {}





# Print all college GPAs
print("Average GPA by College:")
print("-" * 50)



# Find and print highest and lowest
print("\n")


---

# Part 5: Data Quality Report (20 points)

Create a summary report with exception handling for potential data issues.

### Task 5.1: Safe GPA Lookup (8 points)

Write a function called `safe_get_student_gpa` that:
- Takes the list of students and a student ID as input
- Returns the GPA if the student is found
- Returns `None` and prints an error message if the student is not found
- Uses try/except to handle potential errors

**Example:**
```python
gpa = safe_get_student_gpa(students, "LU100001")  # Returns the GPA
gpa = safe_get_student_gpa(students, "INVALID")   # Prints error, returns None
```

In [None]:
def safe_get_student_gpa(student_list, student_id):
    """
    Safely retrieve a student's GPA by ID.
    
    Parameters:
        student_list: List of student dictionaries
        student_id: The ID to search for
    
    Returns:
        float: The student's GPA if found, None otherwise
    """
    # Your code here
    pass


# Test your function
print("Testing with valid ID:")
gpa = safe_get_student_gpa(students, "LU100001")
print(f"GPA: {gpa}")

print("\nTesting with invalid ID:")
gpa = safe_get_student_gpa(students, "INVALID123")
print(f"GPA: {gpa}")

### Task 5.2: Generate Summary Report (12 points)

Write a function called `generate_report` that:
- Takes the list of students as input
- Returns a dictionary containing summary statistics
- Uses try/except to handle any calculation errors

The report dictionary should contain:
- `total_students`: Total count of students
- `average_gpa`: Overall average GPA (rounded to 2 decimal places)
- `deans_list_count`: Number of students with GPA >= 3.5
- `warning_count`: Number of students with GPA < 2.0
- `unique_majors`: Number of different majors in the dataset
- `avg_completion_rate`: Average credit completion rate across all students (rounded to 1 decimal place)

**Hint:** You can reuse your earlier functions!

In [None]:
def generate_report(student_list):
    """
    Generate a summary report of student data.
    
    Parameters:
        student_list: List of student dictionaries
    
    Returns:
        dict: Summary statistics
    """
    report = {}
    
    try:
        # Your code here
        pass
        
    except Exception as e:
        print(f"Error generating report: {e}")
        return None
    
    return report


# Generate and display the report
report = generate_report(students)

if report:
    print("=" * 50)
    print("LEHIGH STUDENT SUCCESS REPORT")
    print("=" * 50)
    print(f"Total Students: {report.get('total_students', 'N/A')}")
    print(f"Average GPA: {report.get('average_gpa', 'N/A')}")
    print(f"Dean's List Students: {report.get('deans_list_count', 'N/A')}")
    print(f"Students on Academic Warning: {report.get('warning_count', 'N/A')}")
    print(f"Unique Majors: {report.get('unique_majors', 'N/A')}")
    print(f"Average Credit Completion Rate: {report.get('avg_completion_rate', 'N/A')}%")
    print("=" * 50)

---

# Submission Checklist

Before submitting, verify that:

- [ ] All code cells run without errors
- [ ] Part 1: Data loads correctly (600 students)
- [ ] Part 2: All three functions work and produce output
- [ ] Part 3: Academic standing function works; distribution is printed
- [ ] Part 4: College comparison shows all 5 colleges with highest/lowest identified
- [ ] Part 5: Safe lookup handles invalid IDs; report generates all 6 statistics
- [ ] Your name is at the top of the notebook

**Save your notebook and submit the .ipynb file via CourseConnect.**

---

## Honor Code Statement

By submitting this project, I affirm that:
- I completed this work independently without AI assistance
- I did not collaborate with other students
- All code is my own work

**Electronic Signature (type your name):** _____________________