### Load the dataset

This dataset contains records of reported crimes in the City of Los Angeles, covering the period from 2020 through September 2025 (the date the data was downloaded).

In [255]:
import pandas as pd

df=pd.read_csv(r"C:\Users\mahit\Downloads\Crime_Data_from_2020_to_Present.csv")
df.head()

Unnamed: 0,DR_NO,Date Rptd,DATE OCC,TIME OCC,AREA,AREA NAME,Rpt Dist No,Part 1-2,Crm Cd,Crm Cd Desc,...,Status,Status Desc,Crm Cd 1,Crm Cd 2,Crm Cd 3,Crm Cd 4,LOCATION,Cross Street,LAT,LON
0,211507896,04/11/2021 12:00:00 AM,11/07/2020 12:00:00 AM,845,15,N Hollywood,1502,2,354,THEFT OF IDENTITY,...,IC,Invest Cont,354.0,,,,7800 BEEMAN AV,,34.2124,-118.4092
1,201516622,10/21/2020 12:00:00 AM,10/18/2020 12:00:00 AM,1845,15,N Hollywood,1521,1,230,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",...,IC,Invest Cont,230.0,,,,ATOLL AV,N GAULT,34.1993,-118.4203
2,240913563,12/10/2024 12:00:00 AM,10/30/2020 12:00:00 AM,1240,9,Van Nuys,933,2,354,THEFT OF IDENTITY,...,IC,Invest Cont,354.0,,,,14600 SYLVAN ST,,34.1847,-118.4509
3,210704711,12/24/2020 12:00:00 AM,12/24/2020 12:00:00 AM,1310,7,Wilshire,782,1,331,THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND ...,...,IC,Invest Cont,331.0,,,,6000 COMEY AV,,34.0339,-118.3747
4,201418201,10/03/2020 12:00:00 AM,09/29/2020 12:00:00 AM,1830,14,Pacific,1454,1,420,THEFT FROM MOTOR VEHICLE - PETTY ($950 & UNDER),...,IC,Invest Cont,420.0,,,,4700 LA VILLA MARINA,,33.9813,-118.435


In [256]:
#number of rows and columns
num_rows = df.shape[0]
num_cols = df.shape[1]
print("Number of Rows:",num_rows)
print("Number of Columns:",num_cols)

Number of Rows: 1004991
Number of Columns: 28


In [257]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1004991 entries, 0 to 1004990
Data columns (total 28 columns):
 #   Column          Non-Null Count    Dtype  
---  ------          --------------    -----  
 0   DR_NO           1004991 non-null  int64  
 1   Date Rptd       1004991 non-null  object 
 2   DATE OCC        1004991 non-null  object 
 3   TIME OCC        1004991 non-null  int64  
 4   AREA            1004991 non-null  int64  
 5   AREA NAME       1004991 non-null  object 
 6   Rpt Dist No     1004991 non-null  int64  
 7   Part 1-2        1004991 non-null  int64  
 8   Crm Cd          1004991 non-null  int64  
 9   Crm Cd Desc     1004991 non-null  object 
 10  Mocodes         853372 non-null   object 
 11  Vict Age        1004991 non-null  int64  
 12  Vict Sex        860347 non-null   object 
 13  Vict Descent    860335 non-null   object 
 14  Premis Cd       1004975 non-null  float64
 15  Premis Desc     1004403 non-null  object 
 16  Weapon Used Cd  327247 non-null   fl

In [258]:
duplicates = df[df.duplicated()]

print(duplicates)

Empty DataFrame
Columns: [DR_NO, Date Rptd, DATE OCC, TIME OCC, AREA, AREA NAME, Rpt Dist No, Part 1-2, Crm Cd, Crm Cd Desc, Mocodes, Vict Age, Vict Sex, Vict Descent, Premis Cd, Premis Desc, Weapon Used Cd, Weapon Desc, Status, Status Desc, Crm Cd 1, Crm Cd 2, Crm Cd 3, Crm Cd 4, LOCATION, Cross Street, LAT, LON]
Index: []

[0 rows x 28 columns]


### Victim Sex Column

The `vict_sex`  column represents the gender of the victim.  
Upon inspection, this column contains **missing (null) values** for some records. 


In [259]:
df['Vict Sex'].value_counts(dropna=False)

Vict Sex
M      403879
F      358580
NaN    144644
X       97773
H         114
-           1
Name: count, dtype: int64

According to the official dataset documentation, the vict_sex field should contain only three values: M, F, and X. However, the dataset also includes NaN, H, and -, which are inconsistent and need to be cleaned.

In [260]:
import numpy as np

df['Vict Sex']=np.where(df['Vict Sex'].isin(['X','H','-']) | (df['Vict Sex'].isna()),'X',df['Vict Sex'])

In [261]:
df['Vict Sex'].info()

<class 'pandas.core.series.Series'>
RangeIndex: 1004991 entries, 0 to 1004990
Series name: Vict Sex
Non-Null Count    Dtype 
--------------    ----- 
1004991 non-null  object
dtypes: object(1)
memory usage: 7.7+ MB


In [262]:
df['Vict Sex'].value_counts(dropna=False)

Vict Sex
M    403879
F    358580
X    242532
Name: count, dtype: int64

### Victim Descent Column

The `vict_descent`  column represents the descent details of the victim.  
Upon inspection, this column contains **missing (null) values** for some records. 

In [263]:
df['Vict Descent'].value_counts(dropna=False)

Vict Descent
H      296404
W      201442
NaN    144656
B      135816
X      106685
O       78005
A       21340
K        5990
F        4838
C        4631
J        1586
V        1195
I        1015
Z         577
P         288
U         221
D          91
L          77
G          74
S          58
-           2
Name: count, dtype: int64

According to the official dataset documentation, the `Vict Descent` field should contain only the following values:

- A - Other Asian
- B - Black
- C - Chinese
- D - Cambodian
- F - Filipino
- G - Guamanian
- H - Hispanic/Latin/Mexican
- I - American Indian/Alaskan Native
- J - Japanese
- K - Korean
- L - Laotian
- O - Other
- P - Pacific Islander
- S - Samoan
- U - Hawaiian
- V - Vietnamese
- W - White
- X - Unknown
- Z - Asian Indian

However, the field also contains **NaN** and `-`, which are inconsistent and need to be cleaned.


In [264]:
import numpy as np

df['Vict Descent']=np.where((df['Vict Descent']=='-') | (df['Vict Descent'].isna()),'X',df['Vict Descent'])

In [265]:
df['Vict Descent'].info()

<class 'pandas.core.series.Series'>
RangeIndex: 1004991 entries, 0 to 1004990
Series name: Vict Descent
Non-Null Count    Dtype 
--------------    ----- 
1004991 non-null  object
dtypes: object(1)
memory usage: 7.7+ MB


In [266]:
df['Vict Descent'].value_counts(dropna=False)

Vict Descent
H    296404
X    251343
W    201442
B    135816
O     78005
A     21340
K      5990
F      4838
C      4631
J      1586
V      1195
I      1015
Z       577
P       288
U       221
D        91
L        77
G        74
S        58
Name: count, dtype: int64

### Victim Age Column

The `Vict Age`  column represents the age details of the victim.  

Note: The `Vict Age` column has ~27% missing values. These rows have been retained as nulls and are ignored in visualizations and calculations where appropriate.



In [267]:
df['Vict Age'].describe()

count    1.004991e+06
mean     2.891706e+01
std      2.199272e+01
min     -4.000000e+00
25%      0.000000e+00
50%      3.000000e+01
75%      4.400000e+01
max      1.200000e+02
Name: Vict Age, dtype: float64

According to the documentation, the values are two diigits 01-99
The maximum value present in the age column is 20, while that may be correct, the documentation proves that it is an outlier.

In [268]:
df['Vict Age']=np.where(df['Vict Age'] <=0,np.nan,df['Vict Age'])


In [269]:
df['Vict Age']=np.where(df['Vict Age'] >99,np.nan,df['Vict Age'])

In [270]:
df['Vict Age'] = df['Vict Age'].astype('Int64')


In [271]:
df['Vict Age'].dtype


Int64Dtype()

In [272]:
df['Vict Age'].describe()

count     735631.0
mean     39.505475
std      15.571176
min            2.0
25%           28.0
50%           37.0
75%           50.0
max           99.0
Name: Vict Age, dtype: Float64

In [273]:
df['Vict Age'].info()

<class 'pandas.core.series.Series'>
RangeIndex: 1004991 entries, 0 to 1004990
Series name: Vict Age
Non-Null Count   Dtype
--------------   -----
735631 non-null  Int64
dtypes: Int64(1)
memory usage: 8.6 MB


### Area Column
The `Area Name` column represents the geographical area/patrol division

In [274]:
df['AREA NAME'] = df['AREA NAME'].str.strip()
df['AREA NAME'].value_counts(dropna=False)

AREA NAME
Central        69670
77th Street    61758
Pacific        59514
Southwest      57441
Hollywood      52429
N Hollywood    51107
Olympic        50071
Southeast      49936
Newton         49177
Wilshire       48239
Rampart        46825
West LA        45729
Northeast      42963
Van Nuys       42883
West Valley    42156
Devonshire     41756
Harbor         41394
Topanga        41374
Mission        40351
Hollenbeck     37085
Foothill       33133
Name: count, dtype: int64

### Crime Code description
The `Crm Cd Desc` column represents the crime committed

In [275]:
df['Crm Cd Desc'].value_counts(dropna=False)

Crm Cd Desc
VEHICLE - STOLEN                                           115190
BATTERY - SIMPLE ASSAULT                                    74839
BURGLARY FROM VEHICLE                                       63517
THEFT OF IDENTITY                                           62537
VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)     61092
                                                            ...  
FIREARMS EMERGENCY PROTECTIVE ORDER (FIREARMS EPO)              5
FIREARMS RESTRAINING ORDER (FIREARMS RO)                        4
DISHONEST EMPLOYEE ATTEMPTED THEFT                              4
TRAIN WRECKING                                                  1
DRUNK ROLL - ATTEMPT                                            1
Name: count, Length: 140, dtype: int64

In [276]:
df['Premis Desc'].value_counts(dropna=False)

Premis Desc
STREET                                          261284
SINGLE FAMILY DWELLING                          163654
MULTI-UNIT DWELLING (APARTMENT, DUPLEX, ETC)    119011
PARKING LOT                                      69147
OTHER BUSINESS                                   47647
                                                 ...  
MTA - SILVER LINE - DOWNTOWN STREET STOPS            2
HORSE RACING/SANTA ANITA PARK*                       2
MTA - SILVER LINE - LAC/USC MEDICAL CENTER           2
DEPT OF DEFENSE FACILITY                             2
TRAM/STREETCAR(BOXLIKE WAG ON RAILS)*                1
Name: count, Length: 307, dtype: int64

In [277]:
df['Premis Desc'].isna().sum()

588

In [278]:
df['Premis Cd'].isna().sum()

16

In [279]:
mapping_df = df[df['Premis Desc'].notna()]
mapping_df['Premis Cd'] = mapping_df['Premis Cd'].astype(int)  
premis_mapping = dict(zip(mapping_df['Premis Cd'], mapping_df['Premis Desc']))
list(premis_mapping.items())[:5]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mapping_df['Premis Cd'] = mapping_df['Premis Cd'].astype(int)


[(501, 'SINGLE FAMILY DWELLING'),
 (102, 'SIDEWALK'),
 (101, 'STREET'),
 (103, 'ALLEY'),
 (502, 'MULTI-UNIT DWELLING (APARTMENT, DUPLEX, ETC)')]

In [280]:
df['Premis Desc'] = df.apply(lambda row: premis_mapping.get(row['Premis Cd']) if pd.isna(row['Premis Desc']) else row['Premis Desc'],axis=1) 

In [281]:
df['Premis Cd'] = df.apply(lambda row: premis_mapping.get(row['Premis Desc']) if pd.isna(row['Premis Cd']) else row['Premis Cd'],axis=1)


In [282]:
df['Premis Cd'].describe()

count    1.004975e+06
mean     3.056201e+02
std      2.193021e+02
min      1.010000e+02
25%      1.010000e+02
50%      2.030000e+02
75%      5.010000e+02
max      9.760000e+02
Name: Premis Cd, dtype: float64

In [283]:
df['Premis Desc'].describe()

count     1004403
unique        306
top        STREET
freq       261284
Name: Premis Desc, dtype: object

In [284]:
df['Premis Cd'].fillna(0, inplace=True)
df['Premis Desc'].fillna('Not Mentioned', inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Premis Cd'].fillna(0, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Premis Desc'].fillna('Not Mentioned', inplace=True)


### Temporal Information
- The `Date Rptd` column represents the date the crime was reported.
- The `DATE OCC` column represents the date thr crime occurred.
- The `TIME OCC` column represents the time the crime was occured.

In [285]:
df["Date Rptd"]=pd.to_datetime(df["Date Rptd"])
df["DATE OCC"]=pd.to_datetime(df["DATE OCC"])

  df["Date Rptd"]=pd.to_datetime(df["Date Rptd"])
  df["DATE OCC"]=pd.to_datetime(df["DATE OCC"])


In [286]:
#converting military time format to HH format

df['TIME OCC'] = df['TIME OCC'].astype(str).str.zfill(4)
# Extract the hour as integer
df['HOUR_OCC'] = df['TIME OCC'].str.slice(0, 2) 

In [287]:
df.head()

Unnamed: 0,DR_NO,Date Rptd,DATE OCC,TIME OCC,AREA,AREA NAME,Rpt Dist No,Part 1-2,Crm Cd,Crm Cd Desc,...,Status Desc,Crm Cd 1,Crm Cd 2,Crm Cd 3,Crm Cd 4,LOCATION,Cross Street,LAT,LON,HOUR_OCC
0,211507896,2021-04-11,2020-11-07,845,15,N Hollywood,1502,2,354,THEFT OF IDENTITY,...,Invest Cont,354.0,,,,7800 BEEMAN AV,,34.2124,-118.4092,8
1,201516622,2020-10-21,2020-10-18,1845,15,N Hollywood,1521,1,230,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",...,Invest Cont,230.0,,,,ATOLL AV,N GAULT,34.1993,-118.4203,18
2,240913563,2024-12-10,2020-10-30,1240,9,Van Nuys,933,2,354,THEFT OF IDENTITY,...,Invest Cont,354.0,,,,14600 SYLVAN ST,,34.1847,-118.4509,12
3,210704711,2020-12-24,2020-12-24,1310,7,Wilshire,782,1,331,THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND ...,...,Invest Cont,331.0,,,,6000 COMEY AV,,34.0339,-118.3747,13
4,201418201,2020-10-03,2020-09-29,1830,14,Pacific,1454,1,420,THEFT FROM MOTOR VEHICLE - PETTY ($950 & UNDER),...,Invest Cont,420.0,,,,4700 LA VILLA MARINA,,33.9813,-118.435,18


In [288]:
df['TIME OCC'].isna().sum()

0

In [289]:
invalid_hours = df.loc[~df['HOUR_OCC'].astype(str).str.match(r'^(?:[01]\d|2[0-3])$'), ['HOUR_OCC']]
invalid_hours

Unnamed: 0,HOUR_OCC


###  Case Status Information
- The `Status` column represents the status code of the case.
- The `Status Desc` column defines the status code provided.


In [290]:
df_counts = df.groupby(['Status', 'Status Desc']).size().reset_index(name='count')


In [291]:
print(df_counts)

  Status   Status Desc   count
0     AA  Adult Arrest   87155
1     AO   Adult Other  109802
2     CC           UNK       6
3     IC   Invest Cont  802862
4     JA    Juv Arrest    3286
5     JO     Juv Other    1879


In [292]:
print(sum(df_counts["count"]))

1004990


### Cleaning up the data 

In [293]:
df.drop(columns=['Crm Cd 1', 'Crm Cd 2','Crm Cd 3','Crm Cd 4'], inplace=True)


In [294]:
df.drop(columns=['Part 1-2','Cross Street','Mocodes','Weapon Used Cd','Weapon Desc'], inplace=True)


In [295]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1004991 entries, 0 to 1004990
Data columns (total 20 columns):
 #   Column        Non-Null Count    Dtype         
---  ------        --------------    -----         
 0   DR_NO         1004991 non-null  int64         
 1   Date Rptd     1004991 non-null  datetime64[ns]
 2   DATE OCC      1004991 non-null  datetime64[ns]
 3   TIME OCC      1004991 non-null  object        
 4   AREA          1004991 non-null  int64         
 5   AREA NAME     1004991 non-null  object        
 6   Rpt Dist No   1004991 non-null  int64         
 7   Crm Cd        1004991 non-null  int64         
 8   Crm Cd Desc   1004991 non-null  object        
 9   Vict Age      735631 non-null   Int64         
 10  Vict Sex      1004991 non-null  object        
 11  Vict Descent  1004991 non-null  object        
 12  Premis Cd     1004991 non-null  float64       
 13  Premis Desc   1004991 non-null  object        
 14  Status        1004990 non-null  object        
 15

### Validation

In [296]:
df['LAT'].describe()  
 

count    1.004991e+06
mean     3.399821e+01
std      1.610713e+00
min      0.000000e+00
25%      3.401470e+01
50%      3.405890e+01
75%      3.416490e+01
max      3.433430e+01
Name: LAT, dtype: float64

In [297]:
df['LON'].describe()  

count    1.004991e+06
mean    -1.180909e+02
std      5.582386e+00
min     -1.186676e+02
25%     -1.184305e+02
50%     -1.183225e+02
75%     -1.182739e+02
max      0.000000e+00
Name: LON, dtype: float64

In [298]:
df['LAT'] = df['LAT'].replace(0, np.nan)
df['LON'] = df['LON'].replace(0, np.nan)

In [299]:
df['LON'].describe()  

count    1.002751e+06
mean    -1.183547e+02
std      1.044454e-01
min     -1.186676e+02
25%     -1.184309e+02
50%     -1.183230e+02
75%     -1.182740e+02
max     -1.181554e+02
Name: LON, dtype: float64

In [300]:
df.head(10)

Unnamed: 0,DR_NO,Date Rptd,DATE OCC,TIME OCC,AREA,AREA NAME,Rpt Dist No,Crm Cd,Crm Cd Desc,Vict Age,Vict Sex,Vict Descent,Premis Cd,Premis Desc,Status,Status Desc,LOCATION,LAT,LON,HOUR_OCC
0,211507896,2021-04-11,2020-11-07,845,15,N Hollywood,1502,354,THEFT OF IDENTITY,31,M,H,501.0,SINGLE FAMILY DWELLING,IC,Invest Cont,7800 BEEMAN AV,34.2124,-118.4092,8
1,201516622,2020-10-21,2020-10-18,1845,15,N Hollywood,1521,230,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",32,M,H,102.0,SIDEWALK,IC,Invest Cont,ATOLL AV,34.1993,-118.4203,18
2,240913563,2024-12-10,2020-10-30,1240,9,Van Nuys,933,354,THEFT OF IDENTITY,30,M,W,501.0,SINGLE FAMILY DWELLING,IC,Invest Cont,14600 SYLVAN ST,34.1847,-118.4509,12
3,210704711,2020-12-24,2020-12-24,1310,7,Wilshire,782,331,THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND ...,47,F,A,101.0,STREET,IC,Invest Cont,6000 COMEY AV,34.0339,-118.3747,13
4,201418201,2020-10-03,2020-09-29,1830,14,Pacific,1454,420,THEFT FROM MOTOR VEHICLE - PETTY ($950 & UNDER),63,M,H,103.0,ALLEY,IC,Invest Cont,4700 LA VILLA MARINA,33.9813,-118.435,18
5,240412063,2024-12-11,2020-11-11,1210,4,Hollenbeck,429,354,THEFT OF IDENTITY,35,M,B,502.0,"MULTI-UNIT DWELLING (APARTMENT, DUPLEX, ETC)",IC,Invest Cont,5300 CRONUS ST,34.083,-118.1678,12
6,240317069,2024-12-16,2020-04-16,1350,3,Southwest,396,354,THEFT OF IDENTITY,21,F,B,501.0,SINGLE FAMILY DWELLING,IC,Invest Cont,900 W 40TH PL,34.01,-118.29,13
7,201115217,2020-10-29,2020-07-07,1400,11,Northeast,1133,812,CRM AGNST CHLD (13 OR UNDER) (14-15 & SUSP 10 ...,14,F,H,121.0,YARD (RESIDENTIAL/BUSINESS),AO,Adult Other,3000 ACRESITE ST,34.1107,-118.2589,14
8,241708596,2024-04-20,2020-03-02,1200,17,Devonshire,1729,354,THEFT OF IDENTITY,43,M,W,501.0,SINGLE FAMILY DWELLING,IC,Invest Cont,17700 SIMONDS ST,34.2763,-118.521,12
9,242113813,2024-12-18,2020-09-01,900,21,Topanga,2196,354,THEFT OF IDENTITY,57,M,W,501.0,SINGLE FAMILY DWELLING,IC,Invest Cont,20900 MARMORA ST,34.1493,-118.5886,9


In [301]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1004991 entries, 0 to 1004990
Data columns (total 20 columns):
 #   Column        Non-Null Count    Dtype         
---  ------        --------------    -----         
 0   DR_NO         1004991 non-null  int64         
 1   Date Rptd     1004991 non-null  datetime64[ns]
 2   DATE OCC      1004991 non-null  datetime64[ns]
 3   TIME OCC      1004991 non-null  object        
 4   AREA          1004991 non-null  int64         
 5   AREA NAME     1004991 non-null  object        
 6   Rpt Dist No   1004991 non-null  int64         
 7   Crm Cd        1004991 non-null  int64         
 8   Crm Cd Desc   1004991 non-null  object        
 9   Vict Age      735631 non-null   Int64         
 10  Vict Sex      1004991 non-null  object        
 11  Vict Descent  1004991 non-null  object        
 12  Premis Cd     1004991 non-null  float64       
 13  Premis Desc   1004991 non-null  object        
 14  Status        1004990 non-null  object        
 15

In [307]:
df['Status'].value_counts()

Status
IC    802862
AO    109802
AA     87155
JA      3286
JO      1879
CC         6
Name: count, dtype: int64

In [306]:
df['Crm Cd Desc'].value_counts()

Crm Cd Desc
VEHICLE - STOLEN                                           115190
BATTERY - SIMPLE ASSAULT                                    74839
BURGLARY FROM VEHICLE                                       63517
THEFT OF IDENTITY                                           62537
VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)     61092
                                                            ...  
FIREARMS EMERGENCY PROTECTIVE ORDER (FIREARMS EPO)              5
FIREARMS RESTRAINING ORDER (FIREARMS RO)                        4
DISHONEST EMPLOYEE ATTEMPTED THEFT                              4
TRAIN WRECKING                                                  1
DRUNK ROLL - ATTEMPT                                            1
Name: count, Length: 140, dtype: int64

In [311]:
pd.set_option('display.max_rows', None)
unique_pairs = df[['Crm Cd', 'Crm Cd Desc']].drop_duplicates()
unique_pairs


Unnamed: 0,Crm Cd,Crm Cd Desc
0,354,THEFT OF IDENTITY
1,230,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT"
3,331,THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND ...
4,420,THEFT FROM MOTOR VEHICLE - PETTY ($950 & UNDER)
7,812,CRM AGNST CHLD (13 OR UNDER) (14-15 & SUSP 10 ...
12,510,VEHICLE - STOLEN
13,310,BURGLARY
14,330,BURGLARY FROM VEHICLE
15,440,THEFT PLAIN - PETTY ($950 & UNDER)
16,626,INTIMATE PARTNER - SIMPLE ASSAULT


### Exporting the data

Export the DataFrame to a CSV file for loading into Tableau

In [304]:
df.to_csv('cleaned_LA_crime_dataset.csv', index=False)