# Project Proposal: Visualization of Cybersecurity Intrusion Detection Data

- **Course:** COMP 6934 - Intro to Data Visualization
- **Student Name | Student ID:** Saravanan Ganesan | 202285057
- **Dataset:** CICIDS Collection (Kaggle)  
- **Dataset Link:** [https://www.kaggle.com/datasets/dhoogla/cicidscollection](https://www.kaggle.com/datasets/dhoogla/cicidscollection)  

## **1. Data Set Selection**

### **Dataset Overview:**
The **CICIDS Collection** dataset is a compilation of real-world intrusion detection data collected over multiple years by the Canadian Institute for Cybersecurity (CIC). It includes:
- **CIC-IDS 2017:** Covers real-world attack scenarios such as Brute Force, SQL Injection, and Botnet attacks. ([Canadian Institute for Cybersecurity](https://www.unb.ca/cic/datasets/ids-2017.html))
- **CIC-IDS 2018:** A large-scale dataset spanning multiple days, capturing a variety of network intrusions, including DDoS, Brute Force, Botnet, and Web Attacks. It contains over **2GB+ of traffic logs across multiple CSV files**. ([Canadian Institute for Cybersecurity](https://www.unb.ca/cic/datasets/ids-2018.html))
- **CIC-DDoS 2019:** Focuses on **Distributed Denial-of-Service (DDoS) attacks**, capturing various forms of network flooding. ([Canadian Institute for Cybersecurity](https://www.unb.ca/cic/datasets/ddos-2019.html))
- **CIC-Bell-DNS 2021:** Targets **malicious DNS tunneling** and domain-based attacks to analyze covert data exfiltration.
 ([Canadian Institute for Cybersecurity](https://www.unb.ca/cic/datasets/bell-dns-2021.html))

### **What Analysis (Brief Justification for Selection):**
- **Sufficient Data Types & Attributes:**
  - Features include **flow duration, packet length, protocol type, service ports, header length, and attack labels**.
  - The datasets contain diverse network behaviors, making them ideal for **intrusion detection visualization**.
- **Complete Dataset Analysis:**
  - The datasets provide **billions of network flow records**, ensuring rich data for meaningful visualizations.
  - Attack classifications such as **benign vs malicious traffic** enable clear security insights.
- **Sufficient Data Volume for Interesting Visualizations:**
  - Data is **high-dimensional**, supporting multi-attribute analysis across **multiple attack vectors**.
  - The datasets allow for **comparative study** across different years and attack types.

## **2. Client Goals & Three Questions**
### **Client Goals & Munzner-Style Why Analysis:**
Since I am not working with an external client, I will act as the client and define relevant cybersecurity objectives:

1. **How do DDoS and other intrusion patterns evolve over time?**
   - **Why?** Understanding how attacks develop over months/years helps security teams improve detection.
   - **Potential Actions:** **Time-series visualizations** to highlight attack frequency trends.

2. **Which network protocols and services are most targeted?**
   - **Why?** Identifying the most vulnerable areas helps optimize firewalls and IDS configurations.
   - **Potential Actions:** A **radar chart** visualizing attack frequency by protocol type.

3. **Can we detect relationships between different attack vectors?**
   - **Why?** Identifying correlations between attack types can improve predictive security measures.
   - **Potential Actions:** A **network graph** showing attack relationships across different datasets.

## **3. Initial Ideas for Visualization Designs**

Each visualization is designed to demonstrate **creativity, multiple data attributes, and novel design approaches** beyond simple charts.

### **3.1. Cyber Attack Heatmap**
- **Purpose:** Visualize attack intensity over time.
- **Design:** Grid-based heatmap where each cell represents attack frequency at a given time interval.
- **How Analysis:**
  - Uses **timestamps (flow duration)** and **attack labels**.
  - **Marks:** Color intensity shows attack severity.
  - **Channels:** Time on X-axis, attack type on Y-axis.
- **Creativity:** Enhances traditional heatmaps with **real-time streaming updates**.

### **3.2. Intrusion Constellation (Network Graph)**
- **Purpose:** Show relationships between attacker and target network nodes.
- **Design:**
  - Nodes = Source & Destination IPs
  - Edges = Attack occurrences
  - Color = Attack type (benign/malicious)
- **How Analysis:**
  - **Marks:** Nodes and edges.
  - **Channels:** Edge thickness for attack frequency, node size for severity.
- **Creativity:** Converts **static logs into an interactive visualization** using **D3.js or NetworkX**.

### **3.3. Network Traffic Radar Chart**
- **Purpose:** Identify the most attacked protocols.
- **Design:** Radar chart where each axis represents a **protocol (TCP, UDP, HTTP, DNS, etc.)**.
- **How Analysis:**
  - **Marks:** Circular axes and attack data.
  - **Channels:** Color intensity per protocol shows attack volume.
- **Creativity:** This radar chart allows for easy **comparative security analysis**.

## **4. Conclusion**
This project aims to provide a comprehensive visual exploration of cybersecurity threats using real-world intrusion detection datasets. Through **heatmaps, network graphs, and radar charts**, I will:
- **Enhance cybersecurity threat detection.**
- **Deliver intuitive, data-driven security insights.**
- **Push the boundaries of conventional cybersecurity visualizations.**

## **5. Attributions**
| Component | Source |
|-----------|--------|
| Dataset | Kaggle - CICIDS Collection ([https://www.kaggle.com/datasets/dhoogla/cicidscollection](https://www.kaggle.com/datasets/dhoogla/cicidscollection)) |
| Visualization Techniques | Inspired by cybersecurity reports and real-world security dashboards |
| Design Methodology | Munzner-style **Why-What-How** framework |
| Libraries To Be Used | Pandas, Matplotlib, Seaborn, NetworkX, Plotly |