Skip to content

stanfordmlgroup/nlm-noising

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Data Noising as Smoothing in Neural Network Language Models

Dependencies

Overview

Based off of Tensorflow inplementation here, which is in turn based off of PTB LSTM implementation here.

Implements noising for neural language modeling as described in this paper.

@inproceedings{noising2017,
  title={Data Noising as Smoothing in Neural Network Language Models},
  author={Xie, Ziang and Wang, Sida I. and Li, Jiwei and L{\'e}vy, Daniel and Nie, Aiming and Jurafsky, Dan and Ng, Andrew Y.},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2017}
}

The noising code can be found in loader.py and utils.py.

How to run

First download PTB data from here and put in data directory. Make sure to update paths in cfg.py to point to data. Alternatively, you can also grab the Text8 data here, then run the script data/text8/makedata-text8.sh.

Then run lm.py. Here's an example setting:

python lm.py --run_dir /tmp/lm_1500_kn  --hidden_dim 1500 --drop_prob 0.65 --gamma 0.2 --scheme ngram --ngram_scheme kn --absolute_discounting

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published