Skip to content

imdeepmind/Preprocessed-TREC-2007-Public-Corpus-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Preprocessed TREC 2007 Public Corpus Dataset

A preprocessed TREC 2007 Public Corpus Dataset suitable for building Spam Detection Models. The original dataset is from https://plg.uwaterloo.ca/~gvcormac/treccorpus07/about.html, here I just preprocessed the data so that it can be used simply.

TREC 2007 Public Corpus Dataset is an email spam detection email. It contains 50199 spam emails and 25220 ham (not spam) emails.

In the dataset, there is one CSV file. In the CSV file, there are 5 columns. I'm detailing them below.

  • label: This is the label for the email, if it is 1 then spam else ham
  • subject: Subject of the email
  • email_to: Receiver of the email
  • email_from: Sender of the email
  • message: Email body

If you want to download the processed data, then please check this Kaggle dataset: https://www.kaggle.com/imdeepmind/preprocessed-trec-2007-public-corpus-dataset

Here is the link to the original dataset: https://plg.uwaterloo.ca/~gvcormac/treccorpus07/about.html

Acknowledgments

I really thankful to these peoples/sources for providing this amazing dataset

About

Preprocessed and easy to use TREC 2007 Public Corpus Dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published