### Including:
- Find the number of Zero Longitude or Latitude
- Check if the missing Latitude is in a specific category of crime
- Check if the missing Latitude is in a specific category of crime
- Add date features: Year, Month
- Find extrema 

In [None]:
import pandas as pd

In [4]:
crime_df = pd.read_csv('../data/interventionscitoyendo.csv', sep=',', encoding='latin-1')

In [3]:
print(crime_df.head())

                  CATEGORIE        DATE QUART   PDQ              X  \
0  Vol de véhicule à moteur  2018-09-13  jour  30.0  294904.159001   
1  Vol de véhicule à moteur  2018-04-30  jour  30.0  294904.159001   
2  Vol de véhicule à moteur  2018-09-01  nuit   7.0  290274.565000   
3                    Méfait  2017-07-21  jour  21.0       0.000000   
4                    Méfait  2017-07-29  jour  12.0       0.000000   

              Y  LONGITUDE   LATITUDE  
0  5.047549e+06 -73.626778  45.567780  
1  5.047549e+06 -73.626778  45.567780  
2  5.042150e+06 -73.685928  45.519122  
3  0.000000e+00 -76.237290   0.000000  
4  0.000000e+00 -76.237290   0.000000  


#### Find the number of Zero Longitude or Latitude

In [16]:
crime_df.describe()

Unnamed: 0,PDQ,X,Y,LONGITUDE,LATITUDE
count,218697.0,218702.0,218702.0,218702.0,218702.0
mean,26.41231,245328.789612,4182488.0,-74.062524,37.75836
std,14.018752,111431.96635,1897437.0,0.989112,17.129545
min,1.0,0.0,0.0,-76.23729,0.0
25%,15.0,288363.003992,5035155.0,-73.710371,45.45608
50%,26.0,295870.75,5041428.0,-73.614295,45.512678
75%,39.0,299220.019001,5045920.0,-73.571425,45.553135
max,55.0,306389.863,5062496.0,-73.479583,45.702351


In [14]:
num_zero_long = (crime_df['LONGITUDE']==0).sum()
print('Count of zeros in Column LONGITUDE: ', num_zero_long)

Count of zeros in Column LONGITUDE:  0


In [13]:
num_zero_long = (crime_df['LATITUDE']==0).sum()
print('Count of zeros in Column LATITUDE: ', num_zero_long)

Count of zeros in Column LATITUDE:  37328


#### Check if the missing Latitude is in a specific category of crime 

In [17]:
miss_lat_df = crime_df[crime_df['LATITUDE']==0] 

#### CATEGORIE: Nature of the event. List of values:

- **Introduction**: break and enter into a public establishment or a private residence, theft of a firearm from a residence
- **Vol dans / sur véhicule à moteur**: theft of the contents of a motor vehicle (car, truck, motorcycle, etc.) or of a vehicle part (wheel, bumper, etc.)
- **Vol de véhicule à moteur**: theft of car, truck, motorcycle, tractor snowmobile with or without trailer, construction or farm vehicle, all-terrain vehicle
- **Méfait**: Graffiti and damage to religious property, vehicle or general damage and all other types of mischief
- **Vol qualifié**: Theft accompanied by violence of business, financial institution, person, purse, armored vehicle, vehicle, firearm, and all other types of robbery
- **Infraction entraînant la mort**: First degree murder, second degree murder, manslaughter, infanticide, criminal negligence, and all other types of offenses resulting in death

In [20]:
miss_lat_df.groupby(['CATEGORIE']).size()

CATEGORIE
Infractions entrainant la mort         38
Introduction                         2414
Méfait                               8653
Vol dans / sur véhicule à moteur    17178
Vol de véhicule à moteur             6312
Vols qualifiés                       2733
dtype: int64

It seems the missing latitude cases are most likely related to cars. We will remove these data points temporarily.

In [23]:
crime_data = crime_df[crime_df['LATITUDE']!=0]
crime_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 181374 entries, 0 to 218701
Data columns (total 8 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   CATEGORIE  181374 non-null  object 
 1   DATE       181374 non-null  object 
 2   QUART      181374 non-null  object 
 3   PDQ        181374 non-null  float64
 4   X          181374 non-null  float64
 5   Y          181374 non-null  float64
 6   LONGITUDE  181374 non-null  float64
 7   LATITUDE   181374 non-null  float64
dtypes: float64(5), object(3)
memory usage: 12.5+ MB


#### Add date features: Year, Month

In [27]:
from datetime import datetime
date_data = [datetime.strptime(crime_data.iloc[i]['DATE'], '%Y-%m-%d') for i in range(len(crime_data))]

In [29]:
date_data[0]

datetime.datetime(2018, 9, 13, 0, 0)

In [30]:
crime_year = [date_data[i].year for i in range(len(date_data))]
crime_month = [date_data[i].month for i in range(len(date_data))]

In [34]:
len(crime_year)

181374

In [36]:
crime_data = crime_data.assign(CRIME_YEAR=crime_year)
crime_data = crime_data.assign(CRIME_MONTH=crime_month)

In [38]:
crime_data.head()

Unnamed: 0,CATEGORIE,DATE,QUART,PDQ,X,Y,LONGITUDE,LATITUDE,CRIME_YEAR,CRIME_MONTH
0,Vol de véhicule à moteur,2018-09-13,jour,30.0,294904.159001,5047549.0,-73.626778,45.56778,2018,9
1,Vol de véhicule à moteur,2018-04-30,jour,30.0,294904.159001,5047549.0,-73.626778,45.56778,2018,4
2,Vol de véhicule à moteur,2018-09-01,nuit,7.0,290274.565,5042150.0,-73.685928,45.519122,2018,9
6,Méfait,2017-07-30,jour,38.0,297654.715002,5041877.0,-73.591457,45.516776,2017,7
8,Vol dans / sur véhicule à moteur,2017-08-01,jour,39.0,294259.780993,5051450.0,-73.635117,45.602873,2017,8


#### Find extrema

In [43]:
high_extrema = (crime_data['LONGITUDE'].min(), crime_data['LATITUDE'].min())
low_extrema = (crime_data['LONGITUDE'].max(), crime_data['LATITUDE'].max())
print('high_extrema', high_extrema)
print('low_extrema', low_extrema)

high_extrema (-73.96895444690254, 45.40269124754524)
low_extrema (-73.47958329446993, 45.70235112098197)
