Skip to content

smorzhov/comment_classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Toxic comment classifier

Description

Prerequisites

You will need the following things properly installed on your computer.

Installation

  • git clone https://github.com/smorzhov/comment_classifier

Running

Remember that Docker container has the Python version 2.7.12!

  1. Download and unrar (unzip) test and train data into ~src/data directory.
  2. Download pretrained word2vec model, glove_6B model, globe_840B model and FastText model. Unpack them into ~src/data/raw directory.
  3. If you are planning to use nvidia-docker, you need to build nvidia-docker image first. Otherwise, you can skip this step
    nvidia-docker build -t sm_keras_tf:gpu .
    Run container
    nvidia-docker run -v $PWD/src:/comment_classifier -dt --name tcc sm_keras_tf:gpu /bin/bash
  4. Training
    nvidia-docker exec tcc python train.py [-h]

Advices

You can add some custom stop words. They must be placed in ~src/data/stopwords.txt file (one word per line).

You can create some files with useful information about training data

nvidia-docker exec tcc python info.py