A small Python package for preparing ordered language data for RNN language models.
Tokenization is not included.
from preppy import Prep sentences = ['Hello World.', 'Hello World.'] prep = Prep(sentences, reverse=False, # generate batches starting from last document batch_size=1, # batch size context_size=1, # number of back-prop-through-time steps sliding=False, # windows slide over input text ) for batch in prep.generate_batches(): pass # train model on batch
Developed on Ubuntu 18.04 and Python 3.7