Skip to content

Latest commit

History

History
36 lines (25 loc) 路 1.68 KB

readme.md

File metadata and controls

36 lines (25 loc) 路 1.68 KB

BayesianBookworm: Unraveling Authorship with Bayesian Analysis

Overview

BayesianBookworm is an innovative text analysis tool that harnesses the power of Bayes' Theorem to determine the probable authorship of literary texts. Initially focusing on the works of Jane Austen and Charles Dickens, this project introduces a novel approach to authorship attribution.


馃摎 Current Functionality

Data Foundation

The program analyzes texts from the following novels, located in the Books/ directory:

  • Jane Austen: Emma (em), Pride and Prejudice (pp), Persuasion (pe), Sense and Sensibility (ss)
  • Charles Dickens: Great Expectations (ge), Hard Times (ht), A Tale of Two Cities (tc), Oliver Twist (ot)

馃搱 Word Frequency Analytics

A sophisticated dictionary maps word frequencies across these novels, forming the backbone for authorship prediction:

word_frequencies = {
    "officer": [220, 322]  # Austen: 220, Dickens: 322
}

馃攳 Identifying the Author

The guess.py script employs this frequency data within a Bayesian framework to estimate the author of a given text passage.


馃敭 Planned Enhancements

  • Incorporating More Authors: Broadening the scope to include various authors for a more comprehensive literary analysis.
  • Enhanced Algorithm Efficiency: Optimizing the processing capabilities for handling larger datasets.
  • User Interface Development: Crafting an intuitive interface for effortless user interaction and result visualization.

BayesianBookworm represents a groundbreaking step in literary analytics, merging statistical methods with classical literature to unveil the hidden patterns in authorial styles.