- Iporting a list of string from the exsiting documents
- Parse text into token vectors
- Train the bayes-classifier with these token vectors
- Use this classifier to determing if one e-mail is a spam e-mail
Example: using naïve Bayes to classify email
Collect: Text files provided.
Prepare: Parse text into token vectors.
Analyze: Inspect the tokens to make sure parsing was done correctly.
Train: Use trainNB0() that we created earlier.
Test: Use classifyNB() and create a new testing function to calculate the error rate over a set of documents.
Use: Build a complete program that will classify a group of documents and print misclassified documents to the screen.