- Overview
- Task 1: Preparation and Training
- Task 2: Model Comparison and Analysis
- Task 3: Text Generation - Web Application Development
- Contributing & Support
- License
- Acknowledgments
This project focuses on developing and analyzing LSTM language models. It encompasses the training of models on the Harry Potter Books Dataset and deploying a Flask-based web application for interactive text generation.
This task involves setting up the project, preparing the data, and training the LSTM language model.
The model is inspired by the paper "Regularizing and Optimizing LSTM Language Models". The paper provides a foundational understanding of the LSTM mechanisms and optimization techniques employed in this project.
- Website: HPD Dataset Download
- Research Background of the dataset: Utilized in the study for enhancing dialogue agents and character alignment in AI models, detailed in the paper Dialogue-style Large Language Models for Character Alignment in Open-domain Dialogue Agents.
- Contains text from the Harry Potter novels (7 books) in txt format, structured for training and evaluating Language Models (LLMs) like ChatGPT and GPT4.
- Developed by: Tencent AI Lab & Department of Systems Analysis (DSA), Hong Kong University of Science and Technology (Guangzhou), and Hong Kong University of Science and Technology.
For additional details or to access the dataset, visit the HPD Dataset Download page.
-
Tokenization: Utilizes the
torchtext.data.utils.get_tokenizerfunction to tokenize the text data, effectively splitting it into individual words or tokens. -
Numericalizing: Constructs a vocabulary from the tokenized text using
torchtext.vocaband converts tokens into numerical indices, providing a standard format for model input. -
Dataset Splitting: Divides the tokenized text data into training, validation, and testing sets, ensuring a balanced distribution for model training and evaluation.
Dataset Number of Tokens Training(70%) 926,599 Validation(15%) 198,557 Test(15%) 198,558 Total 1,323,714
- LSTM Architecture: Implements an LSTM language model using PyTorch's
nn.Module. The architecture includes embedding layers, LSTM layers, dropout layers, and a fully connected layer. - Weight Initialization: Employs uniform weight initialization for embedding and fully connected layers, and specific LSTM weight initialization for input-hidden and hidden-hidden connections.
- Hidden State Management: Manages hidden states efficiently, allowing for proper handling across training iterations and providing functions to initialize and detach hidden states.
- Hyperparameter Adjustment: Configures model hyperparameters such as the number of layers, embedding dimension, hidden dimension, dropout rate, and learning rate.
- Gradient Clipping: Implements gradient clipping to prevent exploding gradients during backpropagation, maintaining model stability.
- Learning Rate Schedulers: Uses a learning rate scheduler (
ReduceLROnPlateau) to adjust the learning rate based on validation loss, aiding in model convergence and avoiding overfitting. - Batch Processing: Employs batch processing for efficient model training, reshaping the data into batches of a specified size.
- Loss Function: Utilizes Cross-Entropy Loss, suitable for multi-class classification tasks, to compute the loss between predictions and targets.
- Optimizer: Adopts the Adam optimizer for model training, updating model parameters based on computed gradients.
- Evaluation Metrics: Employs perplexity as the primary metric for model evaluation, providing insight into the model's performance in predicting the next word in a sequence.
To get started with the LSTM language model project, follow these steps to set up the environment, prepare the data, and initiate the training process:
- Environment Setup:
- Ensure you have Python 3.6+ installed.
- Create a virtual environment to manage dependencies:
python -m venv venv
- Activate the virtual environment:
- On Windows:
venv\Scripts\activate - On macOS/Linux:
source venv/bin/activate
- On Windows:
- Install the required dependencies:
pip install -r requirements.txt
This report provides an analysis of the LSTM language model's training performance over 50 epochs, focusing primarily on minimizing Train and Valid Perplexity, and offering an evaluation based on Test Perplexity.
- Total Training Time: 48 minutes and 12 seconds(On T4 GPU Linux Machine with 16GB memory ).
- Optimal Epoch: 30
- Train Perplexity: 45.028
- Valid Perplexity: 88.476
- Test Perplexity: 106.331
The model demonstrated consistent improvement throughout the training epochs, with Epoch 30 yielding the most balanced performance. The Test Perplexity, while higher than the Validation Perplexity, remains within a reasonable range, suggesting good generalization of the model to unseen data. However, the noticeable gap between Test and Validation Perplexity indicates potential areas for further model refinement.
- Early Stopping: Implement an early stopping mechanism to halt training when the Validation Perplexity no longer shows significant improvement. This approach can prevent overfitting and reduce unnecessary computational overhead.
- Hyperparameter Tuning: Conduct thorough experimentation with various sets of hyperparameters to optimize the model's performance further. Consider exploring different learning rates, batch sizes, and LSTM configurations.
- Extended Dataset: Enrich the training dataset with more diverse or comprehensive data sources. This expansion can enhance the model's ability to learn and generalize across a wider array of textual contexts.
- Advanced Architectures: Investigate the integration of more sophisticated model architectures, such as Transformer-based models or attention mechanisms, to potentially capture the dataset's complexities more effectively and improve predictive performance.
This analysis and the subsequent recommendations aim to guide further development and optimization of the LSTM language model, promoting advancements in performance and applicability.
If anyone wants to use our trained model, please download from here https://drive.google.com/file/d/1DhDYEpJT4EpHx7bfmuezCMCGKV--kgmx/view?usp=sharing & rest of the configuration files are inside the app/models directory
This task focuses on deploying the trained LSTM language model as an interactive web application using Flask. The application allows users to input custom prompts and generate text based on those prompts, showcasing the model's text generation capabilities in a user-friendly format.
- Interactive Interface: Users can input a prompt and adjust parameters such as maximum sequence length and temperature for text generation.
- Real-time Text Generation: The model generates text in real-time, providing immediate feedback based on the user's input.
- Parameter Tuning: Users can experiment with different temperatures to influence the creativity and coherence of the generated text.
-
Set Up the Flask Environment:
- Navigate to the Flask application directory.
- Ensure all dependencies are installed:
pip install -r requirements.txt.
-
Start the Flask Server:
- Run the Flask server with
python app.py
- The server typically starts at http://127.0.0.1:5000.
For Live Demo from Huggingface Space https://huggingface.co/spaces/shaficse/a2-text-gen
- Run the Flask server with
-
Interact with the Application:
- Access the web application through the provided URL.
- Input your prompt and adjust generation parameters as desired.
- Click 'Generate' to view the model's text output.
- Flask Backend: Handles requests, interacts with the LSTM model, and serves the generated text.
- Frontend Interface: Provides an intuitive UI for users to interact with the model, built using HTML, CSS, and JavaScript.
- Model Integration: Seamlessly integrates the trained LSTM model to perform text generation based on user input.
This development phase brings the LSTM language model to a wider audience, allowing for interactive engagement and demonstration of the model's text generation prowess through a web-based platform.
Contributions are welcome. For issues or questions, please open an issue in the repository.
This project is licensed under the MIT License.LICENSE
- Research Inspiration: Sincere thanks to the authors of the foundational research paper, "Regularizing and Optimizing LSTM Language Models," which provided crucial insights and methodologies for this project.
- Resource Contributions: Special appreciation to Chaklam Silpasuwanchai for his invaluable contributions. The codebase for this project drew inspiration and guidance from his Python for Natural Language Processing repository, serving as a vital resource.
- Dataset Providers: Heartfelt gratitude to the curators and maintainers of the Harry Potter dataset, whose efforts in compiling and structuring the data have been instrumental for the training and evaluation of the LSTM language models in this project.
These acknowledgments reflect the collaborative spirit and valuable contributions that have significantly enriched this project, and we extend our sincere gratitude to everyone involved.