CTC

Datasets used for the paper A Robust Cybersecurity Topic Classification Tool

Training and validation data (un-processed English text) for the Cybersecurity Topic Classification (CTC) tool.

All validation data is English natural text, written to individual text files.

The training dataset is too large to upload to github. The full training text can be downloaded here https://zenodo.org/records/10655913. Training data format is a json file containing an array of English text samples.

All data is compressed into tar.gz format to save storage space.

top_500_subreddits.txt list was used to crawl a large corpus of Reddit text - note that this list is certainly out of date now.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
validation_data_cybersecurity		validation_data_cybersecurity
validation_data_non_cybersecurity		validation_data_non_cybersecurity
Arxiv_Cybersecurity_keywords.txt		Arxiv_Cybersecurity_keywords.txt
English_word_dictionary.txt		English_word_dictionary.txt
README.md		README.md
Stack_Exchange_Cybersecurity_tags.txt		Stack_Exchange_Cybersecurity_tags.txt
cybersecurity_subreddits.txt		cybersecurity_subreddits.txt
stack_exchange_sites.txt		stack_exchange_sites.txt
top_500_subreddits.txt		top_500_subreddits.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validation_data_cybersecurity

validation_data_cybersecurity

validation_data_non_cybersecurity

validation_data_non_cybersecurity

Arxiv_Cybersecurity_keywords.txt

Arxiv_Cybersecurity_keywords.txt

English_word_dictionary.txt

English_word_dictionary.txt

README.md

README.md

Stack_Exchange_Cybersecurity_tags.txt

Stack_Exchange_Cybersecurity_tags.txt

cybersecurity_subreddits.txt

cybersecurity_subreddits.txt

stack_exchange_sites.txt

stack_exchange_sites.txt

top_500_subreddits.txt

top_500_subreddits.txt

Repository files navigation

CTC

About

Releases

Packages

epelofske-student/CTC

Folders and files

Latest commit

History

Repository files navigation

CTC

About

Resources

Stars

Watchers

Forks