Skip to content

Latest commit

 

History

History
12 lines (10 loc) · 938 Bytes

Readme.md

File metadata and controls

12 lines (10 loc) · 938 Bytes
                           Stylometric Analysis of E-mail Content for Author Identification

The identification of the authorship of e-mail messages is of increasing importance due to an increase in the use of e-mail for criminal purposes and publicizing of critical information through anonymous e-mails which may lead to a catastrophe. An authors unique writing style can be reduced to a pattern by making measurements of various stylometric features from the written text. This project is developed to predict the author of an arbitrary e-mail by using customized writing style features. An enhanced set of stylistic features including lexical, syntactic, content-specific, idiosyncratic attributes and syntactic n-grams has been used to develop a cross-genre and cross-topic author identification by the application of four classifiers which are Support Vector Machines, Gradient Boosting, Random Forest and Voting classifier.