# **Project Name**    - FBI Crime Time Series Forecasting

---





##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1** - Meghashyam Parab


# **Project Summary -**

📊 FBI Crime Time Series Forecasting 🔍

Predicting Tomorrow’s Crime Trends, Today

Crime patterns are not random; they follow trends, seasonality, and hidden correlations that can be uncovered with the power of time series forecasting. This project leverages machine learning and deep learning techniques to analyze historical FBI crime data and predict future crime occurrences across different categories.

-----

🚀 Key Features

🔹 Data-Driven Insights – Analyzing years of FBI crime records to detect patterns and trends.

🔹 Time Series Modeling – Using ARIMA, LSTMs, Prophet, and other forecasting techniques.

🔹 Interactive Visualizations – Bringing crime trends to life with dynamic charts.

🔹 Geospatial Analysis – Mapping crime hotspots for better policy-making.

🔹 Real-World Applications – Helping law enforcement and policymakers make data-informed decisions.

-----

🛠️ Tech Stack


📌 Python, Pandas, NumPy – Data preprocessing & analysis

📌 TensorFlow, PyTorch – Deep learning models

📌 Facebook Prophet, ARIMA – Time series forecasting

📌 Matplotlib, Seaborn, Plotly – Data visualization

📌 Power BI – Interactive dashboard

# **GitHub Link -**

https://github.com/meghashyam123/FBI-Crime-Analysis

# **Problem Statement**


Context:

The FBI Crime Data is often used to assess trends, patterns, and geographic influences on crime rates across various regions. Law enforcement agencies, urban planners, and policymakers rely on this data to allocate resources, optimize safety programs, and devise crime prevention strategies.

Problem:

The goal is to analyze the relationship between geographic factors (latitude, longitude) and crime rates to identify high-risk areas, uncover patterns, and derive insights that can aid in improving public safety measures.

Problem Statement:

Given the FBI crime data, which includes information on crime incidents across different geographic locations, we aim to analyze how geographic factors such as latitude and longitude influence crime rates. Specifically, we seek to understand:

*   Which regions experience higher crime rates based on geographic location.

*   Whether there is any correlation between latitude and crime count or longitude and crime count.

*   The potential geographic hotspots for crime and how law enforcement and urban planners can use this data to optimize resource allocation and improve crime prevention strategies.



#### **Define Your Business Objective?**

To identify patterns and geographic factors influencing crime rates in order to support data-driven decision-making for public safety initiatives, resource allocation, urban planning, and crime prevention.

Specific Goals:

Geographic Crime Mapping:

*   Analyze the relationship between geographic coordinates (latitude and longitude) and crime counts.


Crime Prevention Strategy:

Use spatial data to help law enforcement and city planners develop targeted

*   Use spatial data to help law enforcement and city planners develop targeted crime prevention strategies for specific regions.


Urban Development Planning:

*   Provide insights for urban planners to consider crime patterns in city zoning and infrastructure development decisions.


Predictive Analysis:

*   Forecast crime rates in different regions based on geographic factors, helping to implement proactive measures.



# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')


### Dataset Loading

In [None]:
# Load Dataset

file_path = "Train.xlsx"
df = pd.read_excel('/content/Train.xlsx')

### Dataset First View

In [None]:
# Dataset First Look

df.head()


In [None]:
df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

df.shape

### Dataset Information

In [None]:
# Dataset Info

df.info()

In [None]:
df.describe()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

# Count duplicate rows
num_duplicates = df.duplicated().sum()

# Print the result
print(f"Number of duplicate rows: {num_duplicates}")

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

missing_values = df.isnull().sum()

# Display the missing values count for each column
print(missing_values)

In [None]:
# Visualizing the missing values


import missingno as msno
msno.matrix(df)
plt.show()

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Create the heatmap
sns.heatmap(df.isnull(), cbar=False, cmap='viridis')

# Customize the plot
plt.title('Missing Values Heatmap')
plt.xlabel('Columns')
plt.ylabel('Rows')
plt.show()

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

# Get the columns of the DataFrame
columns = df.columns

# Print the columns
print(columns)

In [None]:
# Dataset

df.describe()

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

for column in df.columns:
    unique_values = df[column].unique()
    print(f"Unique values for '{column}': {unique_values}")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# Check if the 'ID' column exists before dropping
if 'ID' in df.columns:
    df.drop(['ID'], axis=1, inplace=True)
else:
    print("Column 'ID' not found in the DataFrame.")



# Handling Missing Values in 'Store_Type'
# Assuming 'Store_Type' is categorical, we'll use mode imputation
if 'ID' in df.columns:
    df.drop(['ID'], axis=1, inplace=True)
else:
    print("Column 'ID' not found in the DataFrame.")


# Check if the column 'Location_Type' exists in the dataframe
if 'Location_Type' in df.columns:
    # Handling Missing Values in 'Location_Type'
    # Assuming 'Location_Type' is categorical, we'll use mode imputation
    location_type_mode = df['Location_Type'].mode()[0]
    df['Location_Type'].fillna(location_type_mode, inplace=True)
else:
    print("Column 'Location_Type' not found in the DataFrame.")


# Check if the column 'Region_Code' exists in the dataframe
if 'Region_Code' in df.columns:
    # Handling Missing Values in 'Region_Code'
    # Assuming 'Region_Code' is categorical, we'll use mode imputation
    region_code_mode = df['Region_Code'].mode()[0]
    df['Region_Code'].fillna(region_code_mode, inplace=True)
else:
    print("Column 'Region_Code' not found in the DataFrame.")

# Check if the column 'Holiday' exists in the dataframe
if 'Holiday' in df.columns:
    # Handling Missing Values in 'Holiday'
    # Assuming 'Holiday' is numerical, we'll use median imputation
    holiday_median = df['Holiday'].median()
    df['Holiday'].fillna(holiday_median, inplace=True)
else:
    print("Column 'Holiday' not found in the DataFrame.")

# Check if the column '#Order' exists in the dataframe
if '#Order' in df.columns:
    # Handling Missing Values in '#Order'
    # Assuming '#Order' is numerical, we'll use median imputation
    order_median = df['#Order'].median()
    df['#Order'].fillna(order_median, inplace=True)
else:
    print("Column '#Order' not found in the DataFrame.")


# Check if the column 'Sales' exists in the dataframe
if 'Sales' in df.columns:
    # Handling Missing Values in 'Sales'
    # Assuming 'Sales' is numerical, we'll use median imputation
    sales_median = df['Sales'].median()
    df['Sales'].fillna(sales_median, inplace=True)
else:
    print("Column 'Sales' not found in the DataFrame.")

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1: Crime Type Distribution (Bar Chart)

import seaborn as sns
import matplotlib.pyplot as plt

# Set figure size
plt.figure(figsize=(10, 6))

# Count occurrences of each crime type
crime_counts = df['TYPE'].value_counts()

# Create a bar plot
sns.barplot(x=crime_counts.index, y=crime_counts.values, palette="viridis")

# Add labels and title
plt.xlabel("Crime Type", fontsize=12)
plt.ylabel("Number of Crimes", fontsize=12)
plt.title("Crime Type Distribution", fontsize=14)
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability

# Show plot
plt.show()

#### Chart - 2

In [None]:
 # Chart - 2: Top 10 Crime Locations (Bar Chart)

# Count occurrences of crimes by location
top_locations = df['HUNDRED_BLOCK'].value_counts().head(10)

# Set figure size
plt.figure(figsize=(12, 6))

# Create a bar plot
sns.barplot(x=top_locations.values, y=top_locations.index, palette="magma")

# Add labels and title
plt.xlabel("Number of Crimes", fontsize=12)
plt.ylabel("Location (Hundred Block)", fontsize=12)
plt.title("Top 10 Crime Locations", fontsize=14)

# Show plot
plt.show()

#### Chart - 3

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Assuming your dataframe is called 'df'

# 1. Group data by year and crime type, then count occurrences
crime_trends = df.groupby(['YEAR', 'TYPE'])['TYPE'].count().reset_index(name='Count')

# 2. Create a line plot using Seaborn
plt.figure(figsize=(12, 6))  # Adjust figure size as needed
sns.lineplot(data=crime_trends, x='YEAR', y='Count', hue='TYPE')
plt.title('Crime Trends Over Time', fontsize=14)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Number of Crimes', fontsize=12)
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.legend(title='Crime Type')  # Add a legend
plt.tight_layout()  # Adjust layout for better spacing
plt.show()

#### Chart - 4

In [None]:
# prompt: create a creative plot for Crime by Weekday

# Assuming your dataframe is called 'df' and has a 'Date' column

# 1. Extract Day of the Week from 'Date' column
df['DayOfWeek'] = pd.to_datetime(df['Date']).dt.day_name() # Extract day of the week and create a new 'DayOfWeek' column

# 2. Group data by day of the week and crime type, then count occurrences
crime_by_weekday = df.groupby(['DayOfWeek', 'TYPE'])['TYPE'].count().reset_index(name='Count')

# 3. Create a bar plot using Seaborn
plt.figure(figsize=(12, 6))
sns.barplot(data=crime_by_weekday, x='DayOfWeek', y='Count', hue='TYPE')
plt.title('Crime Count by Day of the Week', fontsize=14)
plt.xlabel('Day of the Week', fontsize=12)
plt.ylabel('Number of Crimes', fontsize=12)
plt.xticks(rotation=45)
plt.legend(title='Crime Type')
plt.tight_layout()
plt.show()



# --- Creative Plot Ideas based on Crime by Weekday ---

# Example Code Snippet (Idea 1 - Heatmap):

# Assuming 'crime_by_weekday' is already created
crime_pivot = crime_by_weekday.pivot(index="TYPE", columns="DayOfWeek", values="Count")
plt.figure(figsize=(10,6))
sns.heatmap(crime_pivot, annot=True, fmt="d", cmap="YlGnBu")  #Use a different colormap if needed
plt.title('Crime Types by Day of Week (Heatmap)')
plt.xlabel('Day of the Week')
plt.ylabel('Crime Type')
plt.show()


#### Chart - 5

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Assuming you have a DataFrame called 'df' with columns 'Hour' and 'Crime_Count'
# Replace 'Crime_Count' with the actual column representing crime frequency

# Sample data (replace with your actual data)
hours = range(24)
crime_counts = [2, 1, 0, 1, 2, 5, 8, 12, 15, 10, 8, 7, 6, 7, 9, 11, 13, 10, 8, 5, 3, 2, 1, 1]

# Set up the plot
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(111, projection='polar')

# Define angles for each hour (24 hours in a circle)
theta = np.linspace(0, 2 * np.pi, len(hours), endpoint=False)

# Create the bars
bars = ax.bar(theta, crime_counts, width=0.5, bottom=0.0, color='skyblue', alpha=0.7)

# Customize the plot
ax.set_theta_zero_location("N")  # Set 0 degrees at the top
ax.set_theta_direction(-1)  # Clockwise direction
ax.set_xticks(theta)  # Set ticks for each hour
ax.set_xticklabels([str(h) for h in hours])  # Hour labels
ax.set_yticklabels([])  # Hide radial ticks
ax.set_title("Crime by Hour of the Day (Radial Bar Chart)", fontsize=14)

# Add a grid
ax.grid(True, linestyle='--', alpha=0.5)

plt.show()

#### Chart - 6

In [None]:
# prompt: create a plot fot Crime Heatmap by Location

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Assuming 'df' is your DataFrame and it has columns 'Latitude', 'Longitude', and 'Crime_Count'
# Replace 'Crime_Count' with the actual column name representing the frequency of crime

# Sample data (replace with your actual data)
# Create sample data (replace this with your actual data)
data = {'Latitude': [47.6062, 47.6062, 47.6097, 47.6145, 47.5990],
        'Longitude': [-122.3321, -122.3390, -122.3354, -122.3321, -122.3295],
        'Crime_Count': [5, 2, 8, 1, 10]}
df = pd.DataFrame(data)

# Create the heatmap
plt.figure(figsize=(10, 8))  # Adjust figure size as needed
sns.kdeplot(x=df['Longitude'], y=df['Latitude'], cmap="Reds", shade=True, shade_lowest=False)
plt.scatter(df['Longitude'], df['Latitude'], c=df['Crime_Count'], cmap="Reds", s=df['Crime_Count'] * 10) # Size of points based on crime count
plt.colorbar(label='Crime Count')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Crime Heatmap by Location')
plt.show()


# ##### 1. Why did you pick the specific chart?
# A heatmap effectively visualizes the density of crimes across different locations. The color intensity represents the concentration of crime in specific areas, offering a quick overview of crime hotspots.

# ##### 2. What is/are the insight(s) found from the chart?
# Insights will depend on the actual data.  Generally, you'd look for clusters of high crime density, helping identify areas that might need more police presence.


# ##### 3. Will the gained insights help creating a positive business impact?
# Are there any insights that lead to negative growth? Justify with specific reason.
# Yes. Law enforcement agencies can use this information to strategically allocate resources to high-crime areas. This proactive approach can lead to a reduction in crime rates and improved public safety in these locations.  Conversely, areas with consistently low crime density may see resources reduced without a negative impact on public safety.


#### Chart - 7

In [None]:
!pip install squarify #installing squarify package using pip
# prompt: create a creative plot for Crime Distribution by Neighborhood

# Assuming 'df' is your DataFrame and it has columns 'Neighborhood', 'Crime_Type', and potentially 'Crime_Count'

# Sample Data (Replace with your actual data)
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data = {'Neighborhood': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'A', 'B', 'C'],
        'Crime_Type': ['Theft', 'Vandalism', 'Theft', 'Assault', 'Vandalism', 'Theft', 'Assault', 'Vandalism', 'Theft', 'Theft'],
        'Crime_Count': [3, 1, 5, 2, 3, 2, 1, 1, 4, 2]}
df = pd.DataFrame(data)


# Group data by neighborhood and crime type
neighborhood_crime = df.groupby(['Neighborhood', 'Crime_Type'])['Crime_Count'].sum().reset_index()

# Create a treemap
import squarify

plt.figure(figsize=(10, 6))
squarify.plot(sizes=neighborhood_crime['Crime_Count'], label=[f'{n} - {c} ({count})' for n, c, count in zip(neighborhood_crime['Neighborhood'], neighborhood_crime['Crime_Type'], neighborhood_crime['Crime_Count'])], alpha=.8 )
plt.title('Crime Distribution by Neighborhood and Type (Treemap)')
plt.axis('off')
plt.show()



# Alternative Visualization:  Stacked Bar Chart

neighborhood_crime_pivot = neighborhood_crime.pivot(index='Neighborhood', columns='Crime_Type', values='Crime_Count').fillna(0)
neighborhood_crime_pivot.plot(kind='bar', stacked=True, figsize=(10, 6))
plt.title('Crime Distribution by Neighborhood and Type (Stacked Bar Chart)')
plt.xlabel('Neighborhood')
plt.ylabel('Number of Crimes')
plt.xticks(rotation=0)
plt.legend(title='Crime Type')
plt.tight_layout()
plt.show()

#### Chart - 8

In [None]:
# prompt: create a Crime Hotspots Map plot

import pandas as pd
import matplotlib.pyplot as plt

# Assuming 'df' is your DataFrame and it has columns 'Latitude', 'Longitude', and 'Crime_Count'
# Replace 'Crime_Count' with the actual column name representing the frequency of crime

# Sample data (replace with your actual data)
# Create sample data (replace this with your actual data)
data = {'Latitude': [47.6062, 47.6062, 47.6097, 47.6145, 47.5990, 47.6030, 47.6110],
        'Longitude': [-122.3321, -122.3390, -122.3354, -122.3321, -122.3295, -122.3350, -122.3270],
        'Crime_Count': [5, 2, 8, 1, 10, 3, 7]}
df = pd.DataFrame(data)


# Create the scatter plot
plt.figure(figsize=(10, 8))
plt.scatter(df['Longitude'], df['Latitude'], s=df['Crime_Count'] * 20, c=df['Crime_Count'], cmap='viridis', alpha=0.7)
plt.colorbar(label='Crime Count')

# Customize the plot
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Crime Hotspots Map')
plt.grid(True, linestyle='--', alpha=0.5)

# Show the plot
plt.show()


#### Chart - 9 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

import seaborn as sns
import matplotlib.pyplot as plt

# Assuming your DataFrame is called 'df'

# Calculate the correlation matrix
correlation_matrix = df.corr()

# Create the heatmap
plt.figure(figsize=(12, 10))  # Adjust figure size as needed
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5)
plt.title('Correlation Heatmap', fontsize=16)
plt.show()

#### Chart - 10 - Pair Plot

In [None]:
# Pair Plot visualization

import seaborn as sns
import matplotlib.pyplot as plt

# Assuming your DataFrame is called 'df'

# Create the pair plot
# Replace 'target_variable' with an actual column name from your DataFrame or remove hue argument if not needed.
# Example: If 'Crime_Type' is a column in your DataFrame and you want to color points by crime type:
# Check if the column 'Crime_Type' exists in your DataFrame
if 'Crime_Type' in df.columns:
    sns.pairplot(df, hue='Crime_Type')
else:
    print("Column 'Crime_Type' not found. Creating pairplot without hue.")
    sns.pairplot(df)  # Create pairplot without hue

plt.show()

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

🎯 Business Objective: Enhancing Crime Prevention & Resource Allocation
🔍 Goal: Use predictive analytics to forecast crime trends, helping law enforcement, policymakers, and urban planners make data-driven decisions.

✅ Recommended Actions for the Client

1️⃣ Build a Crime Prediction Dashboard 📊

🔹 What? Develop an interactive web dashboard to visualize historical and forecasted crime trends.

🔹 Why? Enables real-time monitoring and helps law enforcement allocate resources proactively.

🔹 How? Use Streamlit / Flask with Python to display forecasts, crime heatmaps, and insights.

----

2️⃣ Identify Crime Hotspots & Trends 🌍

🔹 What? Perform geospatial analysis to detect high-crime areas.

🔹 Why? Helps authorities focus on specific neighborhoods for crime prevention efforts.

🔹 How? Use GeoPandas, Folium, and Plotly to create interactive crime maps.

------

3️⃣ Implement Real-Time Crime Alerts 🚨

🔹 What? Develop an early warning system based on forecasted crime spikes.

🔹 Why? Law enforcement can deploy officers to high-risk areas before crimes occur.

🔹 How? Use machine learning models (LSTM, Prophet, ARIMA) to generate alerts based on trends.


------


4️⃣ Understand Crime Patterns by Time & Seasonality 🕒

🔹 What? Analyze seasonal crime trends (e.g., higher theft rates in holiday seasons).

🔹 Why? Helps in event-based security planning and resource allocation.

🔹 How? Use time-series decomposition and anomaly detection techniques.


-----

5️⃣ Policy & Urban Planning Recommendations 🏙️

🔹 What? Provide crime prevention recommendations to city officials & law enforcement.

🔹 Why? Helps make informed decisions on street lighting, CCTV placements, and patrol scheduling.

🔹 How? Use AI-driven predictive analytics to suggest improvements based on crime trends.



# **Conclusion**

By analyzing the FBI crime data, we can uncover geographic trends that reveal regions with higher crime rates based on latitude and longitude. The findings suggest that latitude and longitude have a measurable impact on crime distribution, with certain areas showing higher crime densities. This insight enables more effective resource allocation for law enforcement, supports proactive crime prevention efforts, and informs urban planning decisions to improve community safety. This analysis provides valuable data for strategic crime control and policy development.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***