# Description

This notebook demonstrate how to download and combine the CICIDS2017 dataset.

*Author*: **Mahendra Data** mahendra.data@dbms.cs.kumamoto-u.ac.jp

License: **BSD 3 clause**

# Mounting Google Drive

We will save the downloaded dataset to Google Drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Downloading the dataset

The description of CICIDS2017 dataset is accessible at https://www.unb.ca/cic/datasets/ids-2017.html

There are three versions available:

1. Raw network captured data (PCAPs),
2. Generated Labelled Flows, and
3. Machine Learning CSV.

In this notebook, we will download the `MachineLearningCSV.zip` version of this dataset.

When downloading this dataset, we rename the `MachineLearningCSV.zip` file to `MachineLearningCVE.zip` because in the `MachineLearningCSV.md5` the target filename is `MachineLearningCVE.zip`.

In [None]:
!wget -nc -O MachineLearningCVE.zip http://205.174.165.80/CICDataset/CIC-IDS-2017/Dataset/MachineLearningCSV.zip

# Integrity check

Download `MachineLearningCSV.md5` file to check the integrity of the downloaded file.

In [None]:
!wget -nc http://205.174.165.80/CICDataset/CIC-IDS-2017/Dataset/MachineLearningCSV.md5

Checking the file integrity.

In [4]:
!md5sum -c MachineLearningCSV.md5

MachineLearningCVE.zip: OK


If the downloaded dataset is correct, then the output should be like this

`MachineLearningCVE.zip: OK`

# Unzip the dataset

Unzip the `MachineLearningCVE.zip`.

In [5]:
!unzip -n MachineLearningCVE.zip

Archive:  MachineLearningCVE.zip


There are eight files extracted from this zip file.

1. `Monday-WorkingHours.pcap_ISCX.csv`
2. `Tuesday-WorkingHours.pcap_ISCX.csv`
3. `Wednesday-workingHours.pcap_ISCX.csv`
4. `Thursday-WorkingHours-Morning-WebAttacks.pcap_ISCX.csv`
5. `Thursday-WorkingHours-Afternoon-Infilteration.pcap_ISCX.csv`
6. `Friday-WorkingHours-Morning.pcap_ISCX.csv`
7. `Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv`
8. `Friday-WorkingHours-Afternoon-DDos.pcap_ISCX.csv`

Save the zip and extracted files to Google Drive.

In [6]:
!mkdir -p '/content/drive/My Drive/CICIDS2017/'

!cp MachineLearningCVE.zip '/content/drive/My Drive/CICIDS2017/'

!cp -r 'MachineLearningCVE' '/content/drive/My Drive/CICIDS2017/'

Now the dataset is saved to your Google Drive at `CICIDS2017` folder.