SmartSimpleTextGenerator is an improved text generation project that fetches Wikipedia page content, processes it using a Markov Chain model, and generates structured, human-readable text.
- ✅ Custom User-Agent: Added a user-agent string to comply with Wikipedia API best practices.
- ✅ Error Handling: Displays a clear error message when a page doesn't exist.
- ✅ Text Cleaning: Extra whitespace and newline characters are removed for cleaner output.
- ✅ Flexible N-grams: The model now supports different N-gram sizes (default is trigrams, n=3).
- ✅ Better Tokenization: Removes punctuation and numbers for cleaner training data.
- ✅ Fallback for Missing Sequences: If a given prompt doesn't match any known patterns, the model selects a random starting point to continue generating text.
- ✅ Sentence Capitalization: Ensures generated text starts with a capital letter.
- ✅ Punctuation Fixes: Adjusts spaces before punctuation marks to improve readability.
- ✅ Automatic Sentence Termination: Ensures sentences end with a period (
.
) if no punctuation is present at the end.
- ✅ Timestamped Logging: Saves generated outputs along with their prompts and timestamps.
- ✅ File-Based Storage: Appends structured output logs to
generated_output.txt
.
-
Clone this repository:
git clone https://github.com/yourusername/SmartSimpleTextGenerator.git cd SmartSimpleTextGenerator
-
Install required dependencies:
pip install -r requirements.txt
Run the script to download Wikipedia biographies:
python wiki_getbio.py
Modify titles
in wiki_getbio.py
to specify the pages you want to fetch.
Run the model to generate text based on the fetched Wikipedia content:
python run_jmodel.py
The generated text will be printed in the console and logged in generated_output.txt
.
This project is licensed under the MIT License. See the LICENSE
file for details.
Pull requests are welcome! Please open an issue to discuss proposed changes before submitting a PR.