# Step 1: Data Retrieval

Download up to date crime data in csv format from the [UVA Open Data Portal](https://data-uvalibrary.opendata.arcgis.com/datasets/charlottesville::crime-data/about). If you don't have access to the portal, you may use the pre-downloaded dataset `crime_data.csv` provided in the `DATA` folder in this repository. If you download updated data, ensure to save it as `crime_data.csv` in the `DATA` folder.

In [None]:
# Load necessary libraries
from datetime import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import pandas as pd
from meteostat import Point, Daily
from pathlib import Path
from scipy import stats
import numpy as np

In [None]:
# Load crime data
parent_path = Path(__file__).parent.parent
crime_df = pd.read_csv(parent_path / "DATA/Crime_Data.csv")

In [None]:
# Retrieve weather data

# Set time period based on crime data
oldest_date = datetime.strptime(crime_df['DateReported'].min(), '%Y/%m/%d %H:%M:%S+00')
newest_date = datetime.strptime(crime_df['DateReported'].max(), '%Y/%m/%d %H:%M:%S+00')

# Get daily data for Charlottesville, VA
# TODO: Research Charlottesville's coordinates and update values below
latitude = None
longitude = None

charlottesville = Point(latitude, longitude)
data = Daily(charlottesville, oldest_date, newest_date)
weather_data = data.fetch()

# Step 2: Data Processing & Visualization
Process the data retrieved above and create meaningful visualizations to guide your analysis. Create at least 2 plots using matplotlib or seaborn. Potential visualizations include:
- Bar graph of all crime offense types
- Line graph of average temperature over time
- Line graph of occurences of crime over time (overall or per specific offense type)
- Combination of temperature and crime data over time

In [None]:
# Process both datasets to be in matching formats and create a merged crime/temperature dataset

# Ensure the 'DateReported' column in crime data is in datetime format
crime_df['DateReported'] = pd.to_datetime(crime_df['DateReported'])

# Aggregate crime data by date
crime_by_date = crime_df.groupby(crime_df['DateReported'].dt.date).size().reset_index(name='Crime Count')
crime_by_date.rename(columns={'DateReported': 'Date'}, inplace=True)

# Ensure the 'time' column in weather data is in datetime format
weather_data = weather_data.reset_index()
weather_data['time'] = pd.to_datetime(weather_data['time'])

# Convert both 'Date' and 'time' columns to the same format (date only)
crime_by_date['Date'] = pd.to_datetime(crime_by_date['Date'])
weather_data['time'] = weather_data['time'].dt.date
weather_data['time'] = pd.to_datetime(weather_data['time'])

# Merge crime and weather data on the 'Date' column
merged_data = pd.merge(crime_by_date, weather_data, left_on='Date', right_on='time', how='inner')

In [None]:
# TODO: Generate plot 1 of your choosing below

In [None]:
# TODO: Generate plot 2 of your choosing below

# Step 3: Statistical Analysis
Time to analyze! Utilize the Pearson correlation coefficient to calculate the degree of relation between temperature and crime in Charlottesville. Calculate outputs for overall crime and specific offense types.

In [None]:
# Drop rows with missing data
corr_df = merged_data[['Crime Count', 'tavg']].dropna()

# TODO: Calculate overall Pearson correlation coefficient and p-values
# Hint: use scipy.stats.pearsonr on the two relevant columns in corr_df
r, p_two_sided = None

p_one_sided = p_two_sided / 2 if r > 0 else 1.0

print(f"[TOTAL] Pearson r = {r:.3f}")
print(f"[TOTAL] Two-sided p-value = {p_two_sided:.4g}")
print(f"[TOTAL] One-sided p (H1: r > 0) = {p_one_sided:.4g}")

In [None]:
# TODO: Select specific offense types that you wish to analyze
# Hint: a visualization of the most common offense types may help you decide


# TODO: Repeat the above correlation calculation for each selected offense type
r, p_two_sided = None

p_one_sided = p_two_sided / 2 if r > 0 else 1.0

print(f"[TOTAL] Pearson r = {r:.3f}")
print(f"[TOTAL] Two-sided p-value = {p_two_sided:.4g}")
print(f"[TOTAL] One-sided p (H1: r > 0) = {p_one_sided:.4g}")