Skip to content

In this repository, you will find my previous and current research projects. This includes: Datasets, Codes and Paper where necessary.

License

Notifications You must be signed in to change notification settings

regchukwuka/Datasets

Repository files navigation

Research-Projects

I have successfully curated a comprehensive list of datasets for Malware analysi and Threat Intelligence.

Dataset Description Link
APT Malware Dataset This dataset contains over 3,500 malware samples that are related to 12 APT groups which alledgedly are sponsored by 5 different nation-states. https://github.com/cyber-research/APTMalware
Malware sample library This dataset contains 26 categories of malware samples https://github.com/mstfknn/malware-sample-library
Malware-samples A collection of 10+ classes of malware samples caught by several honeypots https://github.com/fabrimagic72/malware-samples
Malware-API-class Public malware dataset generated by Cuckoo Sandbox based on Windows OS API calls analysis for cyber security researchers https://github.com/ocatak-zz/malware_api_class
MalWAReX A collection of RAT (Remote Access Trojan) malwares targeted at computer networks https://github.com/0x48piraj/MalWAReX
Malicious URL The data set consists of about 2.4 million URLs (examples) and 3.2 million features http://www.sysnet.ucsd.edu/projects/url/
Malicious URL A huge dataset of 651,191 URLs, out of which 428103 benign or safe URLs, 96457 defacement URLs, 94111 phishing URLs, and 32520 malware URLs https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset
UNSW-NB15 Dataset This data set has nine families of attacks, namely, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms. The Argus, Bro-IDS tools are utilised and twelve algorithms are developed to generate totally 49 features with the class label https://research.unsw.edu.au/projects/unsw-nb15-dataset
Microsoft Malware Classification Challenge You are provided with a set of known malware files representing a mix of 9 different families. Each malware file has an Id, a 20 character hash value uniquely identifying the file, and a Class, an integer representing one of 9 family names to which the malware may belong https://www.kaggle.com/competitions/malware-classification/data
CIC-MalMem-2022 The dataset is balanced with it being made up by 50% malicious memory dumps and 50% benign memory dumps. The dataset contains a total of 58,596 records with 29,298 benign and 29,298 malicious https://www.unb.ca/cic/datasets/malmem-2022.html

The data presented above may be messy and contain various duplicates. To this end, I have curated a perfect-ish dataset that represents all the classes of malwares using the above sources.

About

In this repository, you will find my previous and current research projects. This includes: Datasets, Codes and Paper where necessary.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages