Skip to content

๐Ÿ“š๐Ÿ“Š BayesianBookworm: A text analysis tool utilizing Bayesian inference to attribute authorship of literary works, initially focusing on the distinctive styles of Austen and Dickens.

Notifications You must be signed in to change notification settings

jcgonzalez25/BayesianBookworm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

13 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

BayesianBookworm: Unraveling Authorship with Bayesian Analysis

Overview

BayesianBookworm is an innovative text analysis tool that harnesses the power of Bayes' Theorem to determine the probable authorship of literary texts. Initially focusing on the works of Jane Austen and Charles Dickens, this project introduces a novel approach to authorship attribution.


๐Ÿ“š Current Functionality

Data Foundation

The program analyzes texts from the following novels, located in the Books/ directory:

  • Jane Austen: Emma (em), Pride and Prejudice (pp), Persuasion (pe), Sense and Sensibility (ss)
  • Charles Dickens: Great Expectations (ge), Hard Times (ht), A Tale of Two Cities (tc), Oliver Twist (ot)

๐Ÿ“ˆ Word Frequency Analytics

A sophisticated dictionary maps word frequencies across these novels, forming the backbone for authorship prediction:

word_frequencies = {
    "officer": [220, 322]  # Austen: 220, Dickens: 322
}

๐Ÿ” Identifying the Author

The guess.py script employs this frequency data within a Bayesian framework to estimate the author of a given text passage.


๐Ÿ”ฎ Planned Enhancements

  • Incorporating More Authors: Broadening the scope to include various authors for a more comprehensive literary analysis.
  • Enhanced Algorithm Efficiency: Optimizing the processing capabilities for handling larger datasets.
  • User Interface Development: Crafting an intuitive interface for effortless user interaction and result visualization.

BayesianBookworm represents a groundbreaking step in literary analytics, merging statistical methods with classical literature to unveil the hidden patterns in authorial styles.

About

๐Ÿ“š๐Ÿ“Š BayesianBookworm: A text analysis tool utilizing Bayesian inference to attribute authorship of literary works, initially focusing on the distinctive styles of Austen and Dickens.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages