Discovery of Rhyme Schemes in Poetry
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This directory contains the accompanying code for the paper, "Unsupervised Discovery of Rhyme Schemes", Sravana Reddy and Kevin Knight.

Our implementation is in Python using Numpy, and contains:

1.  The learning algorithm described in Section 3.

2.  Code to evaluate output of above, as well as various parsing functions.

Right now, our implementation of the algorithm for stanza dependencies is very slow. We would like to hold off on optimizing the code before releasing it. That said, it is just a simple modification of using forward-backward. (Unfortunately, even is reasonably slow for large datasets; we apologize.)

The file allschemes.pickle is a serialization of the complete list of rhyme schemes, with overall frequencies (used in estimating the naive baseline). This should be placed in the same directory as the code.

To learn rhyme schemes, use the command <goldfile> <initialization-type> <output>

<goldfile> is the gold standard data like the files in english_gold and french_gold. The code only reads the stanzas, and obviously, makes no use of the annotations.

<initialization-type> is a character that specifies whether to initialize theta uniformly (u), with the orthographic similarity measure (o), or using CELEX pronunciations and definition of rhyme (p). The last requires you to have CELEX on your machine.

<output> is simply the name of the desired output file. The program writes stanzas and annotations in a format similar to the gold standard.

To evaluate, do <goldfile> <output>

For example, to learn rhyme schemes from Kipling's poety with uniform initialization, kipling.pgold u kipling.out

or poetry from 1450-1550 with orthographic initialization: 1415.pgold o 1415.out

To evaluate the above runs: kipling.pgold kipling.out 1415.pgold 1415.out

E-mail with any questions or bug-fixes.