Skip to content

jags-programming/SmartSimpleTextGenerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SmartSimpleTextGenerator

SmartSimpleTextGenerator is an improved text generation project that fetches Wikipedia page content, processes it using a Markov Chain model, and generates structured, human-readable text.

Improvements Over Previous Version

1. Enhanced Wikipedia Data Fetching

  • Custom User-Agent: Added a user-agent string to comply with Wikipedia API best practices.
  • Error Handling: Displays a clear error message when a page doesn't exist.
  • Text Cleaning: Extra whitespace and newline characters are removed for cleaner output.

2. Improved Markov Chain Model

  • Flexible N-grams: The model now supports different N-gram sizes (default is trigrams, n=3).
  • Better Tokenization: Removes punctuation and numbers for cleaner training data.
  • Fallback for Missing Sequences: If a given prompt doesn't match any known patterns, the model selects a random starting point to continue generating text.

3. Post-Processing for Readability

  • Sentence Capitalization: Ensures generated text starts with a capital letter.
  • Punctuation Fixes: Adjusts spaces before punctuation marks to improve readability.
  • Automatic Sentence Termination: Ensures sentences end with a period (.) if no punctuation is present at the end.

4. Structured Logging of Generated Text

  • Timestamped Logging: Saves generated outputs along with their prompts and timestamps.
  • File-Based Storage: Appends structured output logs to generated_output.txt.

Installation

  1. Clone this repository:

    git clone https://github.com/yourusername/SmartSimpleTextGenerator.git
    cd SmartSimpleTextGenerator
  2. Install required dependencies:

    pip install -r requirements.txt

Usage

1. Fetch Wikipedia Pages

Run the script to download Wikipedia biographies:

python wiki_getbio.py

Modify titles in wiki_getbio.py to specify the pages you want to fetch.

2. Train & Generate Text

Run the model to generate text based on the fetched Wikipedia content:

python run_jmodel.py

3. View Generated Text

The generated text will be printed in the console and logged in generated_output.txt.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Pull requests are welcome! Please open an issue to discuss proposed changes before submitting a PR.

About

This is further improved version of SimpleTextGenerator.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages