Ini adalah rincian dari setiap insiden penembakan yang terjadi di NYC. Data ini diekstraksi secara manual dan ditinjau oleh Kantor Analisis dan Perencanaan Manajemen sebelum diposting di situs web NYPD. Setiap catatan mewakili insiden penembakan di NYC dan mencakup informasi tentang peristiwa, lokasi, dan waktu kejadian. Selain itu, juga disertakan informasi terkait demografi tersangka dan korban. Data ini dapat digunakan oleh masyarakat untuk mengeksplorasi sifat penembakan/kegiatan kriminal.

**Import Library**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

**Load Dataset**

In [None]:
data = pd.read_csv('../input/nypd-shooting-incident-data-500-line/NYPD_Shooting_Incident_Data__Historic.csv', sep=',') 

**Data Understanding**

In [None]:
#Melihat 5 Data Pertama
data.head()

Dataset ini mencakup:
1. Kasus penembakan yang mengakibatkan korban jiwa - kolom STATISTICAL_MURDER_FLAG.
2. Tempat dan waktu kejadian.
3. Demografi pelaku dan korban - umur, jenis kelamin, dan warna kulit.


In [None]:
#Melihat Besar Baris dan Kolom Data
data.shape

In [None]:
#Mengetahui Setiap Variabel Data
data.info()

In [None]:
#Melihat Ringkasan Statistik dari Variabel dengan Jenis Numerik
data.describe()

**Data Cleaning and Preprocessing**

In [None]:
#Membuang Nilai yang Duplikat
data=data.drop_duplicates()

In [None]:
#Lihat Perbedaan Baris dan Kolom Setelag Nilai Duplikat Dibuang
data.shape

In [None]:
#Cek Tipe Data
data.dtypes

In [None]:
#Mengisi Nilai Kosong dengan N/A
data["LOCATION_DESC"]=data["LOCATION_DESC"].fillna("N/A")
data["PERP_AGE_GROUP"]=data["PERP_AGE_GROUP"].fillna("N/A")
data["PERP_SEX"]=data["PERP_SEX"].fillna("N/A")
data["PERP_RACE"]=data["PERP_RACE"].fillna("N/A")

In [None]:
#Merubah Tipe Data
data["INCIDENT_KEY"]=data["INCIDENT_KEY"].astype("float64")

In [None]:
#Cek Tipe Data yang Telah Diubah
data.dtypes

**Check Missing Value**

In [None]:
#Cek Nilai yang Hilang
data.isnull().sum()

In [None]:
#Melihat Distribusi Variabel PRECINCT dengan Histogram dan Density Plot
plt.figure(figsize=(10, 5))
sns.distplot(data["PRECINCT"])
plt.show()

In [None]:
#Melihat Distribusi Variabel PRECINCT dengan Boxplot
plt.figure(figsize=(10, 5))
sns.boxplot(data["PRECINCT"])
plt.show()

Terlihat dari Boxplot, data tidak memiliki outlier.

**Handle Inconsistent Data**

In [None]:
#Mengubah "POINT" Menjadi "LAT/LONG"
replace_colsi = ["Lon_Lat"]
for i in replace_colsi :
  data[i] = data[i].replace({'POINT' : 'LAT/LONG'})

**Encoding Variables**

In [None]:
from sklearn.preprocessing import LabelEncoder
#Memisah Kolom Kategorikal dan Numerikal
Id_col = ["INCIDENT_KEY"]
num_col = ["PRECINCT","X_COORD_CD","Y_COORD_CD"]
bin_cols = ["STATISTICAL_MURDER_FLAG"]
multi_cols = ["OCCUR_DATE","OCCUR_TIME","BORO","LOCATION_DESC","PERP_AGE_GROUP","JURISDICTION_CODE","PERP_SEX","PERP_RACE","VIC_AGE_GROUP","VIC_SEX","VIC_RACE","Latitude","Longitude","Lon_Lat"]
#Label Encoding Kolom Kategori Biner
le = LabelEncoder()
for i in bin_cols :
  data[i] = le.fit_transform(data[i])
#Lobel Encoding untuk Multi Kategori
data=pd.get_dummies(data = data,columns = multi_cols,drop_first=True)

In [None]:
data

**Data Partition**

In [None]:
data=data.drop(labels=["INCIDENT_KEY"],axis=1)

In [None]:
from sklearn.model_selection import train_test_split
#Partisi Data ke Data Training/Testing
train,test = train_test_split(data,test_size = 0.20, random_state = 111)
#Memisah Variabel Dependent dan Independent di Training dan Testing
train_X = train.drop(labels="STATISTICAL_MURDER_FLAG",axis=1)
train_Y = train["STATISTICAL_MURDER_FLAG"]
test_X = train.drop(labels="STATISTICAL_MURDER_FLAG",axis=1)
test_Y = train["STATISTICAL_MURDER_FLAG"]

**SMOTE (Synthetic Minority Oversampling Technique)**

In [None]:
from imblearn.over_sampling import SMOTE
#Handle Kelas Imbalance Menggunakan Oversampling Minority Class dengan SMOTE
os = SMOTE(sampling_strategy='minority',random_state = 123,k_neighbors=5)
train_smote_X,train_smote_Y = os.fit_resample(train_X,train_Y)
train_smote_X = pd.DataFrame(data = train_smote_X,columns=train_X.columns)
train_smote_Y = pd.DataFrame(data = train_smote_Y)

In [None]:
#Proporsi Sebelum SMOTE
train_Y.value_counts()

In [None]:
#Proporsi Setelah SMOTE
train_smote_Y.value_counts()

**KNN**

In [None]:
from sklearn.preprocessing import StandardScaler
ss=StandardScaler()
train_smote_X[num_col]=ss.fit_transform(train_smote_X[num_col])
test_X[num_col]=ss.transform(test_X[num_col])

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
knnc=KNeighborsClassifier()
param_grid = {
    'n_neighbors': [2,3,4,5,6],
    'metric': ['euclidean', 'manhatan']
}

In [None]:
from sklearn.model_selection import GridSearchCV
CV_knnc = GridSearchCV(estimator=knnc, param_grid=param_grid, cv=2)
CV_knnc.fit(train_smote_X, train_smote_Y)

**Evaluation**

In [None]:
pred=CV_knnc.predict(test_X)

In [None]:
print("Akurasi KNN untuk CV Data: ",accuracy_score(test_Y,pred))

In [None]:
from sklearn.metrics import confusion_matrix
CF=confusion_matrix(test_Y,np.round(pred))
CF

In [None]:
CF=confusion_matrix(test_Y,pred)
CF

In [None]:
from sklearn.metrics import classification_report
target_names = ["False","True"]
print(classification_report(test_Y, pred, target_names=target_names))

**SVM**

In [None]:
from sklearn.preprocessing import StandardScaler
ss=StandardScaler()
train_smote_X_2=train_smote_X
test_X_2=test_X
train_smote_X_2[num_col]=ss.fit_transform(train_smote_X_2[num_col])
test_X_2[num_col]=ss.transform(test_X_2[num_col])

In [None]:
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
svm=SVC()
param_grid = {
    "C" : [0.1, 1],
    "gamma" : [0.1, 1],
    "kernel" : ["linear", "rbf"]
}

In [None]:
from sklearn.model_selection import GridSearchCV
CV_svm = GridSearchCV(estimator=svm, param_grid=param_grid, cv=2)
CV_svm.fit(train_smote_X_2, train_smote_Y)

**Evaluation**

In [None]:
pred=CV_svm.predict(test_X_2)

In [None]:
print("Akurasi SVM untuk CV Data: ",accuracy_score(test_Y,pred))

In [None]:
from sklearn.metrics import confusion_matrix
CF=confusion_matrix(test_Y,np.round(pred))
CF

In [None]:
from sklearn.metrics import classification_report
target_names = ["False","True"]
print(classification_report(test_Y, pred, target_names=target_names))

**Naive Bayes**

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
gnb = GaussianNB()
gnb = gnb.fit(train_smote_X, train_smote_Y)
hasil_gnb = gnb.predict(test_X)

**Evaluation**

In [None]:
confusion_matrix(hasil_gnb, test_Y)
print(classification_report(hasil_gnb, test_Y, target_names=["False", "True"]))