Skip to content

napsternxg/DeepSequenceClassification

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

Twitter Follow GitHub license

Deep Sequence Classification

Generic library for training models for deep neural networks for text sequence classification tasks.

Usage:

  • Create a json file config.json (default name) using the template in config.json.sample and specify the parameters for your training.

  • Best strategy is to save the training and test files in vector format in advance and then give their paths in data_vectors parameter in the file.

  • Train a model:

python model.py --config config_multitask.json --verbose 1
  • Resume training from saved weights:
python model.py --config config_multitask.json --verbose 1 --weights output/models/model_multi_brnn_multitask_h2-45.h5 --base_epochs 45

Preprocessing:

  • Currently, we support the preprocessing for the following file formats:
<ROOT><DOC>
<DOCNO> DOCUMENT 1 </DOCNO>
     For <TIME TYPE="DATE:DATE">six years</TIME> , <ENTITY TYPE="PERSON">Shubhanshu A. B. Mishra</ENTITY> has made several programming projects after being inspired by <ENTITY TYPE="PERSON">Linus Torvalds</ENTITY>, a very renowned programmer.
</DOC></ROOT>
  • Each file can contain multiple DOCNO.
  • The dir structure consists of many folders of data split for cross validation. It is as follows:
data/
data/CV_files
data/CV_files/1/file1.xml
data/CV_files/1/file2.xml
data/CV_files/1/file3.xml
...

data/CV_files/5/file1.xml
data/CV_files/5/file2.xml

Supports:

  • Boundary and Category Detection
  • Simple RNN and Bidirectional RNN
  • Multi task sequence learning (Boundary + Category trained using same model)
  • CNN + BRNN

Coming Up:

Use Cases:

  • Named Entity Recognition
  • POS Tagging
  • Dependency Parsing

Author:

  • Shubhanshu Mishra

Dependencies:

  • Theano
  • Keras
  • BeautifulSoup (with lxml)
  • numpy
  • lxml (requires libxml2, libxslt and libxml2-dev)

Install theano and keras using the following commands:

pip install --user --upgrade --no-deps git+git://github.com/Theano/Theano.git
pip install --user --upgrade --no-deps git+git://github.com/fchollet/keras.git

About

Deep neural network based model for sequence to sequence classification

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages