READ ME for This Visual

Description 
This script visualizes the number of housing maintenance code violations by borough for the years 2019, 2021, and 2024. The data shows how violation counts vary geographically across NYC’s five boroughs and highlights changes over time.

These 3 csv files must be unzipped and extracted for use:
`Housing_Maintenance_Code_Violations_2019.csv`
`Housing_Maintenance_Code_Violations_2021.csv`
`Housing_Maintenance_Code_Violations_2024.csv`
    
Files Used:
`Housing_Maintenance_Code_Violations_2019.csv`
`Housing_Maintenance_Code_Violations_2021.csv`
`Housing_Maintenance_Code_Violations_2024.csv`
`final_borough_counts.csv`: Cleaned file with only two columns, Borough and Year

Data Steps:
Load original datasets for 2019, 2021, and 2024
Keep only the Borough column and add a Year column to each
Combine all three years into one DataFrame
Save as `final_borough_counts.csv`
Group by Year and Borough to count violations per borough per year

Visualization
Bar chart comparing violation totals for each borough
Each borough shows three bars for 2019, 2021, 2024
X-axis: Borough  
Y-axis: Violation Count  
Colors:  
'#9B7EDC' (lavender purple – 2019)  
'#580F41' (deep plum – 2021)  
'#C4B7D3' (soft lavender – 2024)  
Chart filename: `violations_by_borough_comparison.png`


In [None]:
import pandas as pd

# Step 1: Load original CSVs
violation19 = pd.read_csv("data/Housing_Maintenance_Code_Violations_2019.csv")
violation21 = pd.read_csv("data/Housing_Maintenance_Code_Violations_2021.csv")
violation24 = pd.read_csv("data/Housing_Maintenance_Code_Violations_2024.csv")

# Step 2: Keep only 'Borough' and add 'Year'
df19 = violation19[["Borough"]].copy()
df19["Year"] = 2019

df21 = violation21[["Borough"]].copy()
df21["Year"] = 2021

df24 = violation24[["Borough"]].copy()
df24["Year"] = 2024

# Step 3: Combine and save to a cleaned CSV
borough_combined = pd.concat([df19, df21, df24])
borough_combined.to_csv("data/final_borough_counts.csv", index=False)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Step 1: Load the cleaned CSV
df = pd.read_csv("data/final_borough_counts.csv")

# Step 2: Group by Year and Borough
grouped = df.groupby(["Year", "Borough"]).size().unstack()

# Step 3: Plot
grouped.T.plot(kind='bar', figsize=(8,6), color=['#9B7EDC', '#580F41', '#C4B7D3'])
plt.title("Housing Violations by Borough: 2019 vs 2021 vs 2024")
plt.xlabel("Borough")
plt.ylabel("Violations")
plt.xticks(rotation=45)
plt.legend(title="Year")
plt.tight_layout()
plt.savefig("violations_by_borough_comparison.png", bbox_inches='tight')
plt.show()