Classifying text content for Vietnamese
Dataset:
Google search results
Word vectors:
https://github.com/Kyubyong/wordvectors
* Cleaning the text, splitting it into words and handling punctuation and case.
* Categorizing text data.
* Building the models.
* Model evaluation.
* Building RESTful API
* Building web/app layout.
Check out this link for RESTfull API.
- Install the packages:
pip install -r setup.txt
- Download the dataset and extract into
./data
- Run
python build.py
orbuild.py
to build data - You can also run
python build-compressed.py
to compress your data
- Run
python train.py
ortrain.py
for full training data
- Run
python predict.py
orpredict.py
to predict the result
You can also change the algorithm in this.