This dataset serves as a benchmark for evaluting the performance and efficiency of anomaly detectors in east-west data center network traffic. Detailed information about the benchmark can be found in our NetVigil paper.
The dataset includes 13 distinct scenarios, each designated as an attack or a normal operation. For each scenario, a web-based e-commerce application is utilized to generate normal traffic patterns. Simultaneously, attacks are carried out using one or more compromised, malicious nodes. Traffic traces, sourced from NSG flow logs, are processed and converted into CSV files containing only the relevant properties.
The dataset is available for download from Azure Blob Storage via this link. For efficient data transfer, we suggest using wget -i Dataset.txt
to download the dataset. The dataset is compressed in a .tar.gz
format and each file represents a distinct scenario.
After decompression, each folder corresponds to either a normal or an attack scenario and includes two files: nsg.csv
and label.csv
. The schema for these files is as follows:
time
: Time in UTC when the event was logged.Source IP
: Source IP address.Destination IP
: Destination IP address.Source port
: Source port.Destination port
: Destination port.Protocol
: Protocol of the flow. Valid values areT
for TCP andU
for UDP.Traffic flow
: Direction of the traffic flow. Valid values areI
for inbound andO
for outbound.Traffic decision
: Whether traffic was allowed or denied. Valid values areA
for allowed andD
for denied.Flow State
: State of the flow. Possible states are:B
: Begin, when a flow is created. Statistics aren't provided.C
: Continuing for an ongoing flow. Statistics are provided at 5-minute intervals.E
: End, when a flow is terminated. Statistics are provided.
Packets sent
: Total number of TCP packets sent from source to destination since the last update.Bytes sent
: Total number of TCP packet bytes sent from source to destination since the last update. Packet bytes include the packet header and payload.Packets received
: Total number of TCP packets sent from destination to source since the last update.Bytes received
: Total number of TCP packet bytes sent from destination to source since the last update. Packet bytes include packet header and payload.
Source IP
: Source IP address.Destination IP
: Destination IP address.time
: The starting time of each 2-minute window.label
: A0
indicates normal operation for this IP pair during the 2-minute window, while a1
denotes the presence of an attack.
Attack Description | Description | # flows | Ratio malicious |
---|---|---|---|
Vertical Port Scan | Run an exhaustive scan of open ports | 1429 | 0.0265 |
SYN Flood DoS attack | DoS attack where connections are rapidly initialized but not completed | 2817 | 0.0184 |
SYN Flood DDoS | DoS attack where connections are rapidly initialized but not completed (multiple attackers) | 2437 | 0.0439 |
UDP DDoS | DoS attack with UDP packets (multiple attackers) | 1473 | 0.0081 |
Distributed Stealth Port Scan | Run a targeted stealth scan of several key ports across many nodes with SYN packets | 4069 | 0.0058 |
Distributed Port Scan | Run a targeted scan of several key ports across many nodes | 4054 | 0.0051 |
Distributed UDP Port Scan | Run a targeted stealth scan of several key across many nodes with UDP packets | 4319 | 0.0050 |
Infection Monkey 1 | Scans key ports and launches network exploits | 2768 | 0.0122 |
Infection Monkey 2 | Scans key ports and launches network exploits (target limited number of hosts) | 1490 | 0.0107 |
Infection Monkey 3 | Scans key ports and launches network exploits (mount limited number of exploits) | 4677 | 0.0027 |
C&C communication | Compromised nodes receive commands, heartbeats, and file updates from C&C server | 2163 | 0.0254 |
DNS amplification | Attackers send DNS requests and direct responses to victim | 4410 | 0.0825 |
If you use our benchmark in your work, we would appreciate a reference to the following paper:
Kevin Hsieh, Mike Wong, Santiago Segarra, Sathiya Kumaran Mani, Trevor Eberl, Anatoliy Panasyuk, Ravi Netravali, Ranveer Chandra, and Srikanth Kandula. NetVigil: Robust and Low-Cost Anomaly Detection for East-West Data Center Security. USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2024.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.