Text classification of news articles using a pre-trained model (bidirectional LSTM) and Streamlit web app.
The dataset used for training the model is Kaggle - Text Document Classification. The dataset contains 2225 text data and five categories of documents. Five categories are politics, sport, tech, entertainment and business. We can use this dataset for documents classification and document clustering.
- Dataset contains two features text and label.
- No. of Rows : 2225
- No. of Columns : 2
Text: It contains different categories of text data
Label: It contains labels for five different categories : 0,1,2,3,4
- Politics = 0
- Sport = 1
- Technology = 2
- Entertainment =3
- Business = 4
- Load the dataset
- Preprocess the text data
- Clean the text data
- Tokenize the text data
- Pad the text data
- Create the model
- Embedding layer
- Bidirectional LSTM layer
- Dense layer
- Train the model
- Evaluate the model
- Save the model
According to the training results, the model is overfitting. The model is trained for 14 epochs. The training accuracy is 100% and validation accuracy is 95%. The result of inference also skewed into one category. The model is not able to predict the correct category for the given text data. The training notebook is available in Google Colab Notebook.
For next training, we can add dataset for more categories and more text data. We can also use SOTA (State of the Art) model for training, such as Transformer, BERT, etc.
Easy way to deploy this project is using docker. Make sure you have installed docker in your machine.
- Clone this repository
git clone https://github.com/hiseulgi/text-classification.git- Copy
.env.exampleto.envand change the value
cp .env.example .env- Build docker image for first time and run service
# build and run compose for first time
bash scripts/build_docker.sh
# run compose after first time
bash scripts/run_docker.sh-
Open and test the service at API docs
http://localhost:6969/ -
Open and test the service at Web App
http://localhost:8501/
- Add more dataset
- Train with Transformer model

