Research-Projects

I have successfully curated a comprehensive list of datasets for Malware analysi and Threat Intelligence.

Dataset	Description	Link
APT Malware Dataset	This dataset contains over 3,500 malware samples that are related to 12 APT groups which alledgedly are sponsored by 5 different nation-states.	https://github.com/cyber-research/APTMalware
Malware sample library	This dataset contains 26 categories of malware samples	https://github.com/mstfknn/malware-sample-library
Malware-samples	A collection of 10+ classes of malware samples caught by several honeypots	https://github.com/fabrimagic72/malware-samples
Malware-API-class	Public malware dataset generated by Cuckoo Sandbox based on Windows OS API calls analysis for cyber security researchers	https://github.com/ocatak-zz/malware_api_class
MalWAReX	A collection of RAT (Remote Access Trojan) malwares targeted at computer networks	https://github.com/0x48piraj/MalWAReX
Malicious URL	The data set consists of about 2.4 million URLs (examples) and 3.2 million features	http://www.sysnet.ucsd.edu/projects/url/
Malicious URL	A huge dataset of 651,191 URLs, out of which 428103 benign or safe URLs, 96457 defacement URLs, 94111 phishing URLs, and 32520 malware URLs	https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset
UNSW-NB15 Dataset	This data set has nine families of attacks, namely, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms. The Argus, Bro-IDS tools are utilised and twelve algorithms are developed to generate totally 49 features with the class label	https://research.unsw.edu.au/projects/unsw-nb15-dataset
Microsoft Malware Classification Challenge	You are provided with a set of known malware files representing a mix of 9 different families. Each malware file has an Id, a 20 character hash value uniquely identifying the file, and a Class, an integer representing one of 9 family names to which the malware may belong	https://www.kaggle.com/competitions/malware-classification/data
CIC-MalMem-2022	The dataset is balanced with it being made up by 50% malicious memory dumps and 50% benign memory dumps. The dataset contains a total of 58,596 records with 29,298 benign and 29,298 malicious	https://www.unb.ca/cic/datasets/malmem-2022.html

The data presented above may be messy and contain various duplicates. To this end, I have curated a perfect-ish dataset that represents all the classes of malwares using the above sources.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
bear-gff225ec44_1920.jpg		bear-gff225ec44_1920.jpg
bird-gf1ae0343c_1920.jpg		bird-gf1ae0343c_1920.jpg
cat-g663ae9a55_1920.jpg		cat-g663ae9a55_1920.jpg
socket_client.py		socket_client.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

bear-gff225ec44_1920.jpg

bear-gff225ec44_1920.jpg

bird-gf1ae0343c_1920.jpg

bird-gf1ae0343c_1920.jpg

cat-g663ae9a55_1920.jpg

cat-g663ae9a55_1920.jpg

socket_client.py

socket_client.py

Repository files navigation

Research-Projects

About

Releases

Packages

Languages

License

regchukwuka/Datasets

Folders and files

Latest commit

History

Repository files navigation

Research-Projects

About

Resources

License

Stars

Watchers

Forks

Languages