READ ME for THIS VISUAL

(After working with you one-on-one to do a secondary data clean for the total counts by year visual, I had a much better grasp of how this works. I wanted to do the same for my other two visuals for practice and to provide a better working code for data storage on GitHub.)
(It also includes a new visual that matches my other one but is done correctly.)

Description: 
This script visualizes the distribution of housing maintenance code violations by violation class (A, B, C, I) for the years 2019, 2021, and 2024. Each violation is categorized by severity, and the resulting bar chart compares how these categories are distributed across the selected years.

These 3 csv files must be unzipped and extracted for use:
`Housing_Maintenance_Code_Violations_2019.csv`
`Housing_Maintenance_Code_Violations_2021.csv`
`Housing_Maintenance_Code_Violations_2024.csv`

Files Used:
`Housing_Maintenance_Code_Violations_2019.csv`
`Housing_Maintenance_Code_Violations_2021.csv`
`Housing_Maintenance_Code_Violations_2024.csv`
`final_violation_classes.csv`: Cleaned file with only two columns: Class and Year

Data Cleaning & Proccessing Steps:
Load original datasets for 2019, 2021, and 2024
Keep only the Class column and add a Year column to each
Combine all three years into a single DataFrame
Save a reduced CSV `final_violation_classes.csv` with only Class and Year
Group by Class and Year to count violations per class per year

Visualization:
Bar chart grouped by violation class (A, B, C, I)
Each class shows three bars (for 2019, 2021, 2024)
X-axis: Violation Class  
Y-axis: Count of Violations  
Colors: Three distinct purple hues for visual distinction (`#9B7EDC`, `#580F41`, `#C4B7D3`)  
Chart filename: `violation_classes_by_year.png`

In [None]:
import pandas as pd

# Step 1: Load full files
df19 = pd.read_csv("data/Housing_Maintenance_Code_Violations_2019")
df21 = pd.read_csv("data/Housing_Maintenance_Code_Violations_2021.csv")
df24 = pd.read_csv("data/Housing_Maintenance_Code_Violations_2024.csv")

# Step 2: Keep only the 'Class' column, and add a Year label
df19 = df19[["Class"]].copy()
df19["Year"] = 2019

df21 = df21[["Class"]].copy()
df21["Year"] = 2021

df24 = df24[["Class"]].copy()
df24["Year"] = 2024

# Step 3: Combine all three years into one small DataFrame
df_combined = pd.concat([df19, df21, df24])

# Step 4: Save to cleaned CSV
df_combined.to_csv("data/final_violation_classes.csv", index=False)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Step 1: Load the cleaned data
df = pd.read_csv("data/final_violation_classes.csv")

# Step 2: Group by class and year, then count
grouped = df.groupby(["Class", "Year"]).size().unstack(fill_value=0)

# Step 3: Plot
grouped.plot(kind="bar", figsize=(8,6), color=['#9B7EDC', '#580F41', '#C4B7D3'])
plt.title("Violation Class Distribution by Year")
plt.xlabel("Violation Class")
plt.ylabel("Number of Violations")
plt.xticks(rotation=0)
plt.legend(title="Year")
plt.tight_layout()
plt.savefig("violation_classes_by_year.png", bbox_inches='tight')
plt.show()