BayesianBookworm: Unraveling Authorship with Bayesian Analysis

Overview

BayesianBookworm is an innovative text analysis tool that harnesses the power of Bayes' Theorem to determine the probable authorship of literary texts. Initially focusing on the works of Jane Austen and Charles Dickens, this project introduces a novel approach to authorship attribution.

📚 Current Functionality

Data Foundation

The program analyzes texts from the following novels, located in the Books/ directory:

Jane Austen: Emma (em), Pride and Prejudice (pp), Persuasion (pe), Sense and Sensibility (ss)
Charles Dickens: Great Expectations (ge), Hard Times (ht), A Tale of Two Cities (tc), Oliver Twist (ot)

📈 Word Frequency Analytics

A sophisticated dictionary maps word frequencies across these novels, forming the backbone for authorship prediction:

word_frequencies = {
    "officer": [220, 322]  # Austen: 220, Dickens: 322
}

🔍 Identifying the Author

The guess.py script employs this frequency data within a Bayesian framework to estimate the author of a given text passage.

🔮 Planned Enhancements

Incorporating More Authors: Broadening the scope to include various authors for a more comprehensive literary analysis.
Enhanced Algorithm Efficiency: Optimizing the processing capabilities for handling larger datasets.
User Interface Development: Crafting an intuitive interface for effortless user interaction and result visualization.

BayesianBookworm represents a groundbreaking step in literary analytics, merging statistical methods with classical literature to unveil the hidden patterns in authorial styles.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

BayesianBookworm: Unraveling Authorship with Bayesian Analysis

Overview

📚 Current Functionality

Data Foundation

📈 Word Frequency Analytics

🔍 Identifying the Author

🔮 Planned Enhancements

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

BayesianBookworm: Unraveling Authorship with Bayesian Analysis

Overview

📚 Current Functionality

Data Foundation

📈 Word Frequency Analytics

🔍 Identifying the Author

🔮 Planned Enhancements