## Author: Nam Nguyen 
# Analysis of the Crime Dataset of Los Angeles (2020–2024)  
This project primarily utilizes **Tableau** to analyze the crime dataset provided by the Los Angeles Police Department (LAPD), which documents crimes reported in the city of Los Angeles from 2020 to the present. As the dataset is continuously being updated, the following analysis focuses specifically on data from 2020 to 2024.

The main **objectives** of this analysis are to answer the following questions:  
- Which areas experience the highest number of crimes?
- How has the number of crimes changed over the years? 
- What types of crime are most common?  
- Which ethnic groups are most frequently targeted?  
- Where do these crimes typically occur (e.g., streets, parks, or other locations)?

By addressing these questions, we aim to uncover patterns that may contribute to a deeper understanding of crime trends in Los Angeles.

For more information on the UCR, please visit [**this document**](https://www.cityofroseville.com/DocumentCenter/View/26568/Description-of-Uniform-Crime-Offenses#:~:text=Part%20I%20Offenses%20are%20ten,%E2%80%9Cless%20serious%E2%80%9D%20crime%20classifications.).


## I. Dataset Description
Before preparing data for analysis, it is essential to understand the dataset. Due to the richness of details included in the dataset, I have chosen the following fields as the focal point of this project:
- **Area Name**: The community or neighborhood for which each of the 21 Community Police Stations in Los Angeles is responsible.
- **DATE OCC**: The date the crime occurred.
- **Crm Cd Desc**: A description of each crime committed.
- **Vict Descent**: The ethnicity of the victim.  
- **Premis Desc**: The location or property where the crime took place.

<u>**NOTE**</u>: 
- In the **Vict Descent** field, there are 144,606 null values, indicating possible non-disclosure or missing information. Since the dataset already includes the "Unknown" category, null values cannot simply be classified as unknown, as this category might exist for specific reasons. Since these values are left blank, intentionally or not, the information of the associated victims can be said to be missing. Therefore, "Missing Info" is the appropriate alias to describe these values. For unambiguity, we will treat "Missing Info" as a separate category.

- The **Vehicle - Stolen** crime type refers to stolen cars but excludes any motorized vehicles.

Other details recorded in the dataset include the weapons used, victims' ages, crime codes, the time the crime was reported, and much more.

If the dataset interests you, please feel free to further explore [**this page**](https://data.lacity.org/Public-Safety/Crime-Data-from-2020-to-Present/2nrs-mtv8/about_data). 

## II. Data Preparation

In [45]:
# Importing pandas library
import pandas as pd

In [50]:
# Loading in dataset and transforming date columns
df_crime = pd.read_csv('Crime_Data_from_2020_to_Present.csv', 
                   parse_dates=['Date Rptd', 'DATE OCC'], 
                   date_format = '%m/%d/%Y %I:%M:%S %p')

# Examining dataset
df_crime.info()
df_crime.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1003448 entries, 0 to 1003447
Data columns (total 28 columns):
 #   Column          Non-Null Count    Dtype         
---  ------          --------------    -----         
 0   DR_NO           1003448 non-null  int64         
 1   Date Rptd       1003448 non-null  datetime64[ns]
 2   DATE OCC        1003448 non-null  datetime64[ns]
 3   TIME OCC        1003448 non-null  int64         
 4   AREA            1003448 non-null  int64         
 5   AREA NAME       1003448 non-null  object        
 6   Rpt Dist No     1003448 non-null  int64         
 7   Part 1-2        1003448 non-null  int64         
 8   Crm Cd          1003448 non-null  int64         
 9   Crm Cd Desc     1003448 non-null  object        
 10  Mocodes         851881 non-null   object        
 11  Vict Age        1003448 non-null  int64         
 12  Vict Sex        858856 non-null   object        
 13  Vict Descent    858844 non-null   object        
 14  Premis Cd       10

Unnamed: 0,DR_NO,Date Rptd,DATE OCC,TIME OCC,AREA,AREA NAME,Rpt Dist No,Part 1-2,Crm Cd,Crm Cd Desc,...,Status,Status Desc,Crm Cd 1,Crm Cd 2,Crm Cd 3,Crm Cd 4,LOCATION,Cross Street,LAT,LON
0,190326475,2020-03-01,2020-03-01,2130,7,Wilshire,784,1,510,VEHICLE - STOLEN,...,AA,Adult Arrest,510.0,998.0,,,1900 S LONGWOOD AV,,34.0375,-118.3506
1,200106753,2020-02-09,2020-02-08,1800,1,Central,182,1,330,BURGLARY FROM VEHICLE,...,IC,Invest Cont,330.0,998.0,,,1000 S FLOWER ST,,34.0444,-118.2628
2,200320258,2020-11-11,2020-11-04,1700,3,Southwest,356,1,480,BIKE - STOLEN,...,IC,Invest Cont,480.0,,,,1400 W 37TH ST,,34.021,-118.3002
3,200907217,2023-05-10,2020-03-10,2037,9,Van Nuys,964,1,343,SHOPLIFTING-GRAND THEFT ($950.01 & OVER),...,IC,Invest Cont,343.0,,,,14000 RIVERSIDE DR,,34.1576,-118.4387
4,200412582,2020-09-09,2020-09-09,630,4,Hollenbeck,413,1,510,VEHICLE - STOLEN,...,IC,Invest Cont,510.0,,,,200 E AVENUE 28,,34.082,-118.213
5,200209713,2020-05-03,2020-05-02,1800,2,Rampart,245,1,510,VEHICLE - STOLEN,...,IC,Invest Cont,510.0,,,,2500 W 4TH ST,,34.0642,-118.2771
6,200200759,2020-07-07,2020-07-07,1340,2,Rampart,265,1,648,ARSON,...,IC,Invest Cont,648.0,998.0,,,JAMES M WOOD,ALVARADO,34.0536,-118.2788
7,201308739,2020-03-27,2020-03-27,1210,13,Newton,1333,1,510,VEHICLE - STOLEN,...,IC,Invest Cont,510.0,,,,3200 S SAN PEDRO ST,,34.017,-118.2643
8,201112065,2020-07-31,2020-07-30,2030,11,Northeast,1161,1,510,VEHICLE - STOLEN,...,AA,Adult Arrest,510.0,,,,KENMORE ST,FOUNTAIN,34.0953,-118.2974
9,200121929,2020-12-04,2020-12-03,2300,1,Central,105,1,510,VEHICLE - STOLEN,...,IC,Invest Cont,510.0,,,,400 SOLANO AV,,34.071,-118.2302


In [52]:
# Choosing relevant columns for analysis
columns = ['DATE OCC', 'AREA NAME', 'Crm Cd Desc','Vict Descent', 'Premis Desc']
df_crime = df_crime[columns]
df_crime.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1003448 entries, 0 to 1003447
Data columns (total 5 columns):
 #   Column        Non-Null Count    Dtype         
---  ------        --------------    -----         
 0   DATE OCC      1003448 non-null  datetime64[ns]
 1   AREA NAME     1003448 non-null  object        
 2   Crm Cd Desc   1003448 non-null  object        
 3   Vict Descent  858844 non-null   object        
 4   Premis Desc   1002860 non-null  object        
dtypes: datetime64[ns](1), object(4)
memory usage: 38.3+ MB


In [39]:
# Exporting data as CSV file
df_crime.to_csv('Crime_2020_LAPD.csv')

## III. Data Visualization

Next, we will import the `Crime_2020_LAPD.csv` file into Tableau to craft an interactive dashboard. This dashboard will serve as a powerful tool for visualizing key patterns and answering our questions posed earlier in the project.

Below is the embedded interactive Tableau dashboard that I have built. Now, let us dive into the analysis and uncover insights.

In [67]:
%%HTML
<div class='tableauPlaceholder' id='viz1736828222239' style='position: relative'><noscript><a href='#'><img alt='Crimes in Los Angeles: 2020 to 2024 Dashboard ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Cr&#47;Crime_2020_LAPD_Edit&#47;CrimesinLosAngeles2020to2024Dashboard&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='Crime_2020_LAPD_Edit&#47;CrimesinLosAngeles2020to2024Dashboard' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Cr&#47;Crime_2020_LAPD_Edit&#47;CrimesinLosAngeles2020to2024Dashboard&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1736828222239');                    var vizElement = divElement.getElementsByTagName('object')[0];                    if ( divElement.offsetWidth > 800 ) { vizElement.style.width='1080px';vizElement.style.height='777px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='1080px';vizElement.style.height='777px';} else { vizElement.style.width='100%';vizElement.style.height='1777px';}                     var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

## IV. Key Insights

- **Number of Crimes Declines over Time**: Crime rates slightly increased from early 2020 through the end of 2023, peaking in May 2022, before sharply declining in March 2024. In March 2024, the LAPD adopted a new crime recording system. In the following months, the number of crimes occurring fell consistently until the end of the year. The correlation between reduced crime rates and the system transition may indicate incomplete data capture. However, the drop can also be due to a genuine improvement in public safety.
  
- **Uneven Crime Distribution in Different Areas**: Throughout the four-year timeframe, the Central area had the highest number of crimes committed with almost 70,000, approximately 8,000 more than the second highest area, accounting for 6.93% of total crimes. Other high-crime areas include 77th Street, Pacific, and Southwest. In contrast, Foothill reported the lowest crime count in the city. Most other areas fall within the range of 40,000 to 50,000 offenses, indicating variations in crime distribution.

- **Hispanic and White Ethnicity Disproportionately Affected**: Making up around 29.50% of crime victims over the last four years, individuals of Hispanic/Latin/Mexican background were the most targeted ethnic group. This group was particularly vulnerable to battery, simple assaults involving intimate partners, and aggravated assaults with deadly weapons. Neighborhoods like Newton, Mission, and 77th Street were where these victims were attacked the most. White individuals were also heavily affected by burglary, theft of identity, and battery, with these offenses occurring the most in the Pacific and West Los Angeles areas.

- **Prevalence of Car Theft**: Having cars stolen is the most pressing issue in Los Angeles, accounting for 11.48% of all crimes. Although crime dropped in early 2024, the number of cars stolen remained consistent throughout the years, averaging between 1,200 and 2,200 incidents a month, underscoring the urgency of addressing this problem.

- **Safety Concerns at Home and on the Streets**: The streets of Los Angeles have emerged as highly dangerous, with 26.01% of offenses occurring on public roadways over the past four years. The 77th Street neighborhood was most severely impacted, while vehicle theft (vehicle stealing and motor vehicle driveby theft) were the biggest problems. Even more concerning, nearly one-third of crimes occurred in single-family or multi-unit dwellings, highlighting the significant risk posed close to home.

- **Pattern in victim descent missing data**: Examining the data of victims whose descent is labeled "Missing Info," we noticed that most of these victims were involved in crimes that occurred in public spaces — such as streets, parking lots, or driveways — and were related to vehicles.

## V. Recommendations
- **Investigate the significant drop in crime numbers** in the months following March 2024 to identify contributing factors. If flaws exist in the new crime recording system, identifying them early will enable timely adjustments.

- **Advise individuals of Hispanic/Latin/Mexican and White ethnic backgrounds** to be cautious of situations that may increase the likelihood of crimes, especially those involving battery, car theft, petty theft, assaults, and burglary, as they are the most common offenses targeting these groups.

- **Conduct further analysis into the persistent occurrence of car theft** to better understand its underlying motivations and patterns, since it consistently remains the most common crime to take place over the years.

- **Enhance security measures in public spaces**, especially on the streets or parking lots, to minimize damages caused by burglary, vehicle theft, and aggravated assaults.

- **Raise awareness** about home safety and **educate families** on protecting their homes against burglary, identity theft, and assaults. **Encourage proactive actions** such as installing security systems and reinforcing household safety practices.
