Skip to content

A short research in Machine Learning with Text Classification for Vietnamese.

Notifications You must be signed in to change notification settings

fecoderchinh/text-classification

Repository files navigation

Text Classification

Classifying text content for Vietnamese

Resources

Dataset:
Google search results
Word vectors:
https://github.com/Kyubyong/wordvectors

Tasks

* Cleaning the text, splitting it into words and handling punctuation and case.
* Categorizing text data.
* Building the models.
* Model evaluation.
* Building RESTful API
* Building web/app layout.

Check out this link for RESTfull API.

Work flow

  • Install the packages: pip install -r setup.txt
  • Download the dataset and extract into ./data

  • Run python build.py or build.py to build data
  • You can also run python build-compressed.py to compress your data

  • Run python train.py or train.py for full training data

  • Run python predict.py or predict.py to predict the result
    You can also change the algorithm in this.