# Exploratory Data Analysis: Crime Trends by Year and Category


### Analysis of Crime Data (2001-2014)

In this notebook, we will analyze crime data from three datasets spanning the years 2001 to 2014. The datasets include:

1. **2001-2012 Crime Data**: District-wise crimes committed under IPC (Indian Penal Code) from 2001 to 2012.
2. **2013 Crime Data**: District-wise crimes committed under IPC for the year 2013.
3. **2014 Crime Data**: District-wise crimes committed under IPC for the year 2014.

These datasets have been combined into a single dataframe for a comprehensive analysis. The goal is to identify prevalent crime types and trends over the years. Data cleaning, normalization, and handling of missing values have been performed to ensure consistency and accuracy in the analysis.


#### **Notebook Structure**

1. Introduction
2. Dataset Description
3. Data Preprocessing
4. Exploratory Analysis
    - Overall crime trends
    - Crime type breakdown
    - State-wise distribution
    - Focused crime category analysis
5. Key Insights

### Preprocessing

In [None]:
# pandas
# matplotlib
# seaborn
# plotly
# scikit-learn
# streamlit
# geopandas
# ipykernel # Needed only for notebook

In [2]:
# pip install -r requirements.txt

import pandas as pd

# Load datasets
df_2001_2012 = pd.read_csv('/Users/admin/Desktop/Crime data POC/Community-Risk-Profiling-Using-FIR-Data/dataset/State-wise data from 2001 is classified according to 40+factors/crime/01_District_wise_crimes_committed_IPC_2001_2012.csv')
df_2013 = pd.read_csv('/Users/admin/Desktop/Crime data POC/Community-Risk-Profiling-Using-FIR-Data/dataset/State-wise data from 2001 is classified according to 40+factors/crime/01_District_wise_crimes_committed_IPC_2013.csv')
df_2014 = pd.read_csv('/Users/admin/Desktop/Crime data POC/Community-Risk-Profiling-Using-FIR-Data/dataset/State-wise data from 2001 is classified according to 40+factors/crime/01_District_wise_crimes_committed_IPC_2014.csv')

# Combine into one
df_all_years = pd.concat([df_2001_2012, df_2013, df_2014], ignore_index=True)

In [3]:
df_all_years.head()

Unnamed: 0,STATE/UT,DISTRICT,YEAR,MURDER,ATTEMPT TO MURDER,CULPABLE HOMICIDE NOT AMOUNTING TO MURDER,RAPE,CUSTODIAL RAPE,OTHER RAPE,KIDNAPPING & ABDUCTION,...,Offences promoting enmity between different groups,Promoting enmity between different groups,"Imputation, assertions prejudicial to national integration",Extortion,Disclosure of Identity of Victims,Incidence of Rash Driving,HumanTrafficking,Unnatural Offence,Other IPC crimes,Total Cognizable IPC crimes
0,ANDHRA PRADESH,ADILABAD,2001.0,101.0,60.0,17.0,50.0,0.0,50.0,46.0,...,,,,,,,,,,
1,ANDHRA PRADESH,ANANTAPUR,2001.0,151.0,125.0,1.0,23.0,0.0,23.0,53.0,...,,,,,,,,,,
2,ANDHRA PRADESH,CHITTOOR,2001.0,101.0,57.0,2.0,27.0,0.0,27.0,59.0,...,,,,,,,,,,
3,ANDHRA PRADESH,CUDDAPAH,2001.0,80.0,53.0,1.0,20.0,0.0,20.0,25.0,...,,,,,,,,,,
4,ANDHRA PRADESH,EAST GODAVARI,2001.0,82.0,67.0,1.0,23.0,0.0,23.0,49.0,...,,,,,,,,,,


In [4]:
print("Number of columns:", len(df_all_years.columns))


Number of columns: 124


In [None]:
# Clean columns, handle NaNs, normalize text

In [4]:
print("2001–2012 columns:", len(df_2001_2012.columns))
print("2013 columns:", len(df_2013.columns))
print("2014 columns:", len(df_2014.columns))


2001–2012 columns: 33
2013 columns: 33
2014 columns: 91


### Analysis

##### 📊 A. Identify Top Crime Types
- Group by Crime Head or similar column:

#### 📈 B. Trend Over Time
- Group by year



####  🧭 C. Crime by State (or District)

#### 🟠 D. Specific Crime Trend (e.g., Rape or Theft)