Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


This repository contains source code necessary to reproduce the results presented in the paper Joint Embedding of Words and Labels for Text Classification (ACL 2018):

  title={Joint Embedding of Words and Labels for Text Classification},
  author={Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, Lawrence Carin},

Comparison Illustration of proposed LEAM with traditional methods for text sequence representations

Traditional Method           LEAM: Label Embedding Attentive Model
Directly aggregating word embedding V for text sequence representation z We leverage the “compatibility” G between embedded words V and labels C to derive the attention score β for improved z.


There are four steps to use this codebase to reproduce the results in the paper.

  1. Dependencies
  2. Prepare datasets
  3. Training
    1. Training on standard dataset
    2. Training on your own dataset
  4. Reproduce paper figure results


This code is based on Python 2.7, with the main dependencies being TensorFlow==1.7.0 and Keras==2.1.5. Additional dependencies for running experiments are: numpy, cPickle, scipy, math, gensim.

Prepare datasets

We consider the following datasets: Yahoo, AGnews, DBPedia, yelp, yelp binary. For convenience, we provide pre-processed versions of all datasets. Data are prepared in pickle format. Each .p file has the same fields in same order: train text, val text, test text, train label, val label, test label, dictionary and reverse dictionary.

Datasets can be downloaded here. Put the download data in data directory. Each dataset has two files: tokenized data and corresponding pretrained Glove embedding.

To run your own dataset, please follow the code in to tokenize and split train/dev/test datsset. To build pretrained word embeddings, first download Glove word embeddings and then follow


1. Training on standard dataset

To run the test, use the command python -u The default test is on Yahoo dataset. To run other default datasets, change the [Option class] attribute dataset to corresponding dataset name. Most the parameters are defined in the Option class part.

Reproduce paper figure results

Jupyter notebooks in plots folders are used to reproduce paper figure results.

Note that without modification, we have copyed our extracted results into the notebook, and script will output figures in the paper. If you've run your own training and wish to plot results, you'll have to organize your results in the same format instead.


No description, website, or topics provided.






No releases published


No packages published