<h1>Data Preperation for Crime Dataset</h1>

<p>
Manipulating the raw data to an easily understandable format for better analysis.

</p><p>Step 1: Import the data file and print its first 5 entries:
</p>

In [1]:
import pandas as pd
crime_data = pd.read_csv('C:/Users/Sandi/Downloads/Crime1.csv',
                           sep= ',', header= 0)
print(crime_data.head())


             Dates        Category                      Descript  DayOfWeek  \
0  5/13/2015 23:53        WARRANTS                WARRANT ARREST  Wednesday   
1  5/13/2015 23:53  OTHER OFFENSES      TRAFFIC VIOLATION ARREST  Wednesday   
2  5/13/2015 23:33  OTHER OFFENSES      TRAFFIC VIOLATION ARREST  Wednesday   
3  5/13/2015 23:30   LARCENY/THEFT  GRAND THEFT FROM LOCKED AUTO  Wednesday   
4  5/13/2015 23:30   LARCENY/THEFT  GRAND THEFT FROM LOCKED AUTO  Wednesday   

  PdDistrict      Resolution                    Address           X          Y  
0   NORTHERN  ARREST, BOOKED         OAK ST / LAGUNA ST -122.425892  37.774599  
1   NORTHERN  ARREST, BOOKED         OAK ST / LAGUNA ST -122.425892  37.774599  
2   NORTHERN  ARREST, BOOKED  VANNESS AV / GREENWICH ST -122.424363  37.800414  
3   NORTHERN            NONE   1500 Block of LOMBARD ST -122.426995  37.800873  
4       PARK            NONE  100 Block of BRODERICK ST -122.438738  37.771541  



<p>
The above output can be transformed to more understandable output using tabulate.

</p><p>Step 2: Transforming the output to more understable form:
</p>

In [2]:
from tabulate import tabulate
print(tabulate(crime_data.head(), tablefmt="grid", headers="keys"))

+----+-----------------+----------------+------------------------------+-------------+--------------+----------------+---------------------------+----------+---------+
|    | Dates           | Category       | Descript                     | DayOfWeek   | PdDistrict   | Resolution     | Address                   |        X |       Y |
|  0 | 5/13/2015 23:53 | WARRANTS       | WARRANT ARREST               | Wednesday   | NORTHERN     | ARREST, BOOKED | OAK ST / LAGUNA ST        | -122.426 | 37.7746 |
+----+-----------------+----------------+------------------------------+-------------+--------------+----------------+---------------------------+----------+---------+
|  1 | 5/13/2015 23:53 | OTHER OFFENSES | TRAFFIC VIOLATION ARREST     | Wednesday   | NORTHERN     | ARREST, BOOKED | OAK ST / LAGUNA ST        | -122.426 | 37.7746 |
+----+-----------------+----------------+------------------------------+-------------+--------------+----------------+---------------------------+----------+---


<p>
The above output though being more understandable, we need to rearrange and enhance our dataset in the dataframe.

</p><p>We can observe that in the Dates column, the field contains both date and time. As a part of data preperation, we need to seperate these two into two different columns.

</p><p>Step 3: Enhancing the data:
</p>

In [3]:
columnsTitles=['Category', 'Dates', 'Descript', 'DayOfWeek', 'PdDistrict', 'Resolution', 'Address', 'X', 'Y']
crime_data=crime_data.reindex(columns=columnsTitles)
crime_data= crime_data[:3000]
# new data frame with split value columns 
new = crime_data["Dates"].str.split(" ", n = 1, expand = True) 

# making seperate date column from Dates Column 
crime_data["Date"]= new[0] 
  
# making seperate time column from Dates Column 
crime_data["Time"]= new[1] 
  
# Dropping old Dates columns 
crime_data.drop(columns =["Dates"], inplace = True) 

print(tabulate(crime_data.head(), tablefmt="grid", headers="keys"))

+----+----------------+------------------------------+-------------+--------------+----------------+---------------------------+----------+---------+-----------+--------+
|    | Category       | Descript                     | DayOfWeek   | PdDistrict   | Resolution     | Address                   |        X |       Y | Date      | Time   |
|  0 | WARRANTS       | WARRANT ARREST               | Wednesday   | NORTHERN     | ARREST, BOOKED | OAK ST / LAGUNA ST        | -122.426 | 37.7746 | 5/13/2015 | 23:53  |
+----+----------------+------------------------------+-------------+--------------+----------------+---------------------------+----------+---------+-----------+--------+
|  1 | OTHER OFFENSES | TRAFFIC VIOLATION ARREST     | Wednesday   | NORTHERN     | ARREST, BOOKED | OAK ST / LAGUNA ST        | -122.426 | 37.7746 | 5/13/2015 | 23:53  |
+----+----------------+------------------------------+-------------+--------------+----------------+---------------------------+----------+------


<p>
Step 4: Check for any missing data:
</p>

In [4]:
crime_data.isnull().sum()

Category      0
Descript      0
DayOfWeek     0
PdDistrict    0
Resolution    0
Address       0
X             0
Y             0
Date          0
Time          0
dtype: int64