# Predicting Cyberattacks in CAV using ML

## Abstract

Connected and Autonomous Vehicles (CAVs) are increasingly vulnerable to cyberattacks, particularly due to vulnerabilities within the Controller Area Network (CAN) protocol, which facilitates communication between Electronic Control Units (ECUs). This study investigates the application of machine learning (ML) for intrusion detection in CAVs, utilizing an experimental dataset from the [Hacking and Countermeasure Research Lab (HCRL)](https://ocslab.hksecurity.net/Datasets/CAN-intrusion-dataset). A **Random Forest (RF) classifier** is employed to identify cyberattacks, trained on **over 3 million records** with a **70:30 train-test split**, utilizing **200 estimators** and a **random state of 11**.

The model achieves an impressive **accuracy exceeding 92%** across a range of attack types, including **Denial-of-Service (DoS), Fuzzy Attacks, Gear Spoofing, and RPM Spoofing**. The data preprocessing techniques implemented in this study, including data cleaning and feature selection, are applicable to other ML applications, such as credit card fraud detection and financial anomaly detection. Given the potential for real-time data availability, this model holds promise for significantly enhancing CAV cybersecurity by detecting and mitigating cyber threats in real-time.


# 1. Data Preparation
## 1.1 Importing Necessary Libraries

In [1]:
import numpy as np
import pandas as pd 

## 1.2 Load Data

In [2]:
dos = pd.read_csv('/kaggle/input/car-hacking-dataset/DoS_dataset.csv')
fuzzy = pd.read_csv('/kaggle/input/car-hacking-dataset/Fuzzy_dataset.csv')
gear = pd.read_csv('/kaggle/input/car-hacking-dataset/gear_dataset.csv')
rpm = pd.read_csv('/kaggle/input/car-hacking-dataset/RPM_dataset.csv')


In [3]:
dos.columns = ['Timestamp', 'CAN ID', 'DLC', 'DATA0', 'DATA1', 'DATA2', 'DATA3', 'DATA4', 'DATA5', 'DATA6', 'DATA7', 'Flag']
fuzzy.columns = ['Timestamp', 'CAN ID', 'DLC', 'DATA0', 'DATA1', 'DATA2', 'DATA3', 'DATA4', 'DATA5', 'DATA6', 'DATA7', 'Flag']
gear.columns = ['Timestamp', 'CAN ID', 'DLC', 'DATA0', 'DATA1', 'DATA2', 'DATA3', 'DATA4', 'DATA5', 'DATA6', 'DATA7', 'Flag']
rpm.columns = ['Timestamp', 'CAN ID', 'DLC', 'DATA0', 'DATA1', 'DATA2', 'DATA3', 'DATA4', 'DATA5', 'DATA6', 'DATA7', 'Flag']


In [4]:
dos.shape, fuzzy.shape, gear.shape, rpm.shape

((3665770, 12), (3838859, 12), (4443141, 12), (4621701, 12))

In [5]:
dos.head(10)

Unnamed: 0,Timestamp,CAN ID,DLC,DATA0,DATA1,DATA2,DATA3,DATA4,DATA5,DATA6,DATA7,Flag
0,1478198000.0,018f,8,fe,5b,00,00,0,3c,00,00,R
1,1478198000.0,0260,8,19,21,22,30,8,8e,6d,3a,R
2,1478198000.0,02a0,8,64,00,9a,1d,97,02,bd,00,R
3,1478198000.0,0329,8,40,bb,7f,14,11,20,00,14,R
4,1478198000.0,0545,8,d8,00,00,8a,0,00,00,00,R
5,1478198000.0,0002,8,00,00,00,00,0,03,0b,11,R
6,1478198000.0,0153,8,00,21,10,ff,0,ff,00,00,R
7,1478198000.0,02c0,8,14,00,00,00,0,00,00,00,R
8,1478198000.0,0130,8,08,80,00,ff,31,80,0b,7f,R
9,1478198000.0,0131,8,e5,7f,00,00,48,7f,0b,ac,R
