Skip to content
/ Preppy Public

prepare ordered language data for RNN training

Notifications You must be signed in to change notification settings

phueb/Preppy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A small Python package for preparing ordered language data for RNN language models.

Tokenization is not included.

Usage

from preppy import Prep

sentences = ['Hello World.', 'Hello World.']

prep = Prep(sentences,
            reverse=False,  # generate batches starting from last document
            batch_size=1,   # batch size 
            context_size=1, # number of back-prop-through-time steps
            sliding=False,  # windows slide over input text
            )
            
for batch in prep.generate_batches():
   pass  # train model on batch

Compatibility

Developed on Ubuntu 18.04 and Python 3.7