VDCNN

Tensorflow 2.0 Implementation of Very Deep Convolutional Neural Network for Text Classification.

Note

This repository is a simple Tensorflow 2.0 implementation of the VDCNN model proposed by Conneau et al. in their 2016 paper. It is based off of this Keras implementation by zonetrooper32.

Note: Temporal batch norm has not been implemented. "Temp batch norm applies same kind of regularization as batch norm, except that the activations in a mini-batch are jointly normalized over temporal instead of spatial locations." Right now, this project is using Tensorflow's standard batch normalization.

It should be noted that the original implementation by the authors of the VDCNN paper was done in Touch 7.

Prerequisites

Python 3
Tensorflow 2.0
numpy

Datasets

The original paper tests several NLP datasets, including DBPedia, AG's News, Sogou News, etc. data_loader.py expects CSV-formatted train and test files.

Downloads of those NLP text classification datasets can be found here (Many thanks to ArdalanM):

Dataset	Classes	Train samples	Test samples	source
AG's News	4	120 000	7 600	link
Sogou News	5	450 000	60 000	link
DBPedia	14	560 000	70 000	link
Yelp Review Polarity	2	560 000	38 000	link
Yelp Review Full	5	650 000	50 000	link
Yahoo! Answers	10	1 400 000	60 000	link
Amazon Review Full	5	3 000 000	650 000	link
Amazon Review Polarity	2	3 600 000	400 000	link

A script to generate GloVe vector embeddings from the CSV datasets is located at scripts/txt2embedding.py. It has its own dependencies that are independent of the main project, located in the script_requirements.txt file in the same folder.

Usage:

$ python txt2embedding.py \
    --dataset_path="../data/dbpedia_csv" \
    --max_len=100 \
    --embedding_dim=300 \ 
    --max_samples=1000

Hardware

Training and testing were performed on an Ubuntu 16.04 server with an NVIDIA Quadro GP100, using the configuration defaults defined in train.py. The dataset used was the AG's News dataset.

References

Keras implementation by zonetrooper32

Original preprocessing codes and VDCNN Implementation By geduo15

Train Script and data iterator from Convolutional Neural Network for Text Classification

NLP Datasets Gathered by ArdalanM and Others

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
custom_callbacks.py		custom_callbacks.py
data_loader.py		data_loader.py
k_maxpooling.py		k_maxpooling.py
requirements.txt		requirements.txt
train.py		train.py
vdcnn.py		vdcnn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VDCNN

Note

Prerequisites

Datasets

Hardware

References

About

Releases

Packages

Languages

License

evu/VDCNN

Folders and files

Latest commit

History

Repository files navigation

VDCNN

Note

Prerequisites

Datasets

Hardware

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages