Natural Language Processing is Molecules! (Molecules, Cells → Swell)
Rhyming slang is a form of phrase construction in the English language and is especially prevalent in dialectal English from the East End of London; hence the alternative name, Cockney rhyming slang. The construction involves replacing a common word with a rhyming phrase of two or three words and then, in almost all cases, omitting the secondary rhyming word, making the origin and meaning of the phrase elusive to listeners not in the know.
This project computationally generates unique rhyming slang for common words. Here are some examples of selected generated slang:
- BEES HONEY → MONEY
- BOTTOMLESS PIT → QUIT
- PINT KEG → EGG
- STUPOR TRANCE → DANCE
- MEAT KNIFE → LIFE
- FOOT SPRAIN → PAIN
- WARM BRANDY → CANDY
- WALL STREET SLUMP → JUMP
- ROAST DUCK → FUCK
- ARABLE CROP → COP
- LAUGHTER TEARS → STAIRS
- HOLIDAY REST → BEST
- BOOKS RETURN → DISCERN
- HOUSES STREET → EAT
Rhyming slang is generated through a combination of 2 linguistic corpora: the CMU Pronouncing Dictionary and the Edinburgh Associative Corpus. CMU is used to index words according to their rhyming pattern, and EAT is used to group pronounceable words by semantic association.
Once everything is indexed, the actual generation of slang is pretty straightforward. First, take a word and find its rhyme pattern. Then get all of the other words that share this pattern. Randomly go through each rhyming word until one is found that has associated words that meet a particular threshold of relevance. Finally, take one of those words and return it along with the rhyme and the original word.
- Ruby 1.9+
- Redis (with server running on
$ git clone git://github.com/mattt/cockney.git $ cd cockney $ rake build:all $ ruby ./console.rb
I'll be the first to admit that the current implementation is rather simple, and generates somewhat-acceptable (but definitely not stellar) results.
One way I would improve this would be to use a corpus of short idiomatic expressions rather than simple word associations, which would get closer to the spirit of actual rhyming slang. Corpora being difficult to find as they are, I might be able to get pretty far by looking through a tree bank for 3-word Noun Phrases (NP), and selecting ones that occur multiple times in the corpus.
If you have any leads on useful linguistic corpora, or have any ideas of how to improve this, please let me know!
The Cockney Rhyming Slang Generator is available under the MIT license. See the LICENSE file for more info.
The CMU Pronouncing Dictionary is Copyright (C) 1993-2008 Carnegie Mellon University. All rights reserved.
The Edinburgh Associative Corpus was created by George Kiss and Christine Armstrong.