## **Safetown AI**

Crime record analisys with aim of predicting future crimes based on key features. This work try to aproximate a geographical location (latitud and longitud) and also the type of crime to be commited. The data is referet to the city of Bucaramanga (COLOMBIA).



Dataset source: https://www.datos.gov.co/Seguridad-y-Defensa/40-Delitos-en-Bucaramanga-enero-2010-a-diciembre-d/75fz-q98y

 ![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Data Overview
![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

In [293]:
# Basics
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px


# Models
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
import numpy as np


# Metrics
from sklearn.metrics import *


In [294]:
# Load dataset
dfn = pd.read_json('https://www.datos.gov.co/resource/75fz-q98y.json?$limit=135000')


In [295]:
df = dfn.copy()
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 135000 entries, 0 to 134999
Data columns (total 20 columns):
 #   Column                  Non-Null Count   Dtype 
---  ------                  --------------   ----- 
 0   orden                   135000 non-null  int64 
 1   armas_medios            135000 non-null  object
 2   barrios_hecho           135000 non-null  object
 3   latitud                 128713 non-null  object
 4   longitud                128713 non-null  object
 5   zona                    135000 non-null  object
 6   nom_comuna              135000 non-null  object
 7   ano                     135000 non-null  int64 
 8   mes                     135000 non-null  object
 9   dia                     135000 non-null  int64 
 10  dia_semana              135000 non-null  object
 11  descripcion_conducta    135000 non-null  object
 12  conducta                135000 non-null  object
 13  clasificaciones_delito  135000 non-null  object
 14  edad                    135000 non-n

### Preprocessing
Make numerical clases and clean from inecesary values.
 ![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

As rows containing any NAN value represent 7.32% del datasee,  we proced to remove such rows to allow better processing.

In [296]:

df.dropna(inplace=True)
df.drop(['descripcion_conducta', 'orden', 'edad', 'ano'], axis=1, inplace=True)

# Parse the latitude and longitude columns
df['latitud'] = df['latitud'].str.replace(',', '.').astype(float)
df['longitud'] = df['longitud'].str.replace(',', '.').astype(float)


In [297]:
# Convert colums to numeric categories

df['armas_medios'], uniques_armas_medios = pd.factorize(df['armas_medios'])
df['barrios_hecho'], uniques_barrios_hecho = pd.factorize(df['barrios_hecho'])
df['zona'], uniques_zona = pd.factorize(df['zona'])
df['nom_comuna'], uniques_nom_comuna = pd.factorize(df['nom_comuna'])
df['conducta'], uniques_conducta = pd.factorize(df['conducta'])
df['mes'], uniques_mes = pd.factorize(df['mes'])
df['dia_semana'], uniques_week = pd.factorize(df['dia_semana'])
df['clasificaciones_delito'], uniques_clasificaciones = pd.factorize(df['clasificaciones_delito'])
df['curso_de_vida'], uniques_vida = pd.factorize(df['curso_de_vida'])
df['estado_civil_persona'], uniques_estado = pd.factorize(df['estado_civil_persona'])
df['genero'], uniques_gender = pd.factorize(df['genero'])
df['movil_agresor'], uniques_agresor = pd.factorize(df['movil_agresor'])
df['movil_victima'], uniques_agresor = pd.factorize(df['movil_victima'])
df['dia']= pd.to_numeric(df["dia"])
df['mes'] = df['mes'].apply(lambda x: x+1)
df['dia_semana'] = df['dia_semana'].apply(lambda x: x+1)

# Delete records out of AMB
df = df.loc[(df['latitud']>= 6) & (df['latitud'] <= 8)]
df = df.loc[(df['longitud'] >= -74) & (df['longitud'] <=-72)]


### Training
We'll use the Random Forest Regression for and makig lat, lng and conduct prediction.

 ![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

Most relevant features

In [298]:
X = df[[ 'dia_semana', 'curso_de_vida', 'nom_comuna', 'estado_civil_persona']]
y = df[[ 'latitud', 'longitud','conducta']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True)

accuracySet = []

# Random Forest

# After analysis of MSE, the best pearameters are :
# Max_depth = 5 , Trees = 7

model = RandomForestRegressor(max_depth=5, n_estimators=7)
model.fit(X_train.values,y_train.values)

# Error
print(f'Model Error: {mean_squared_error(y_test.values, model.predict(X_test.values))}')





Model Error: 1.9408206674420914


###Prediction

In [299]:
def crimePrediction(lista, model=model):
    '''
    Parameteres are in a list in this order: 
    [dia_semana, curso_de_vida, nombre_comuna, estado_civil_persona]

    Should Respect that order, in other case model misunderstand data
    '''
    p = model.predict(np.reshape(lista, (1, 4)))[0]
    pFormal = { 'lat': p[0], 'lng':p[1], 'conduct':uniques_conducta[int(p[2])]}
    return pFormal

In [300]:
predictions = []

for i in range(10):
  r = np.random.randint(0,6, 4)
  p = crimePrediction(r)
  predictions.append(p)
  print(f'Paramater {r}, prediction {p}')


Paramater [0 4 2 1], prediction {'lat': 7.127846403150818, 'lng': -73.12697174819111, 'conduct': 'VIOLENCIA INTRAFAMILIAR'}
Paramater [3 3 3 1], prediction {'lat': 7.121918093440207, 'lng': -73.1247604314242, 'conduct': 'HOMICIDIO'}
Paramater [5 1 1 3], prediction {'lat': 7.1262670165087565, 'lng': -73.12582464940418, 'conduct': 'HURTO A PERSONAS'}
Paramater [5 0 1 0], prediction {'lat': 7.129202081770994, 'lng': -73.12725121782817, 'conduct': 'HOMICIDIO'}
Paramater [1 1 2 2], prediction {'lat': 7.131449772816974, 'lng': -73.12716390346316, 'conduct': 'HOMICIDIO'}
Paramater [3 5 5 2], prediction {'lat': 7.122139560022405, 'lng': -73.12289263761377, 'conduct': 'HURTO A RESIDENCIAS'}
Paramater [5 0 1 5], prediction {'lat': 7.124970782475861, 'lng': -73.12493953772596, 'conduct': 'HURTO A PERSONAS'}
Paramater [5 3 3 2], prediction {'lat': 7.121918093440207, 'lng': -73.1247604314242, 'conduct': 'HOMICIDIO'}
Paramater [4 0 4 5], prediction {'lat': 7.120131575992092, 'lng': -73.1223915442021

In [301]:
#@title ###Plot Results
import folium

from folium.plugins import HeatMap

latitud = predictions[0]['lat']
longitud = predictions[0]['lng']

map_obj = folium.Map(location = [latitud, longitud], zoom_start = 20)


lats_longs = [
              latitud, longitud 
            ]



print(lats_longs)
folium.Circle(lats_longs, fill=True).add_child(folium.Popup(predictions[0]['conduct'])).add_to(map_obj)
HeatMap([lats_longs]).add_to(map_obj)

map_obj

[7.127846403150818, -73.12697174819111]
