Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Twitterbot @parsextoto: Finding Words Inside Other Words

This is the code for a Twitter bot that finds German words inside other German words. We made this for the Hackathon at #clunc15.

Sample Output

See for sample output.

This module reads our vocabulary from the source (we used the freely available Germandict from Sourceforge) and writes each word into our database (parsextoto.sqlite). Morphological features of the words were analysed using SMOR.


The database contains all words, their POS tags and morphological features, and the word-substring pairs with their scores.

This module calculates the funniness score of each word-substring pair. Factors that contribute to funniness are:

  • the relation between the length of both words (longer substrings are better)
  • the number of morphemes of the long word that are part of the short word
  • whether the short word is just the long word minus common inflectional affixes
  • whether the short word is at the start or end of the long word, or somewhere in between
  • the number of phoneme boundaries that are crossed by the short word (this makes the pair more surprising because the connection is only apparent when you read/write it)

The funniness scores are saved in parsextoto.sqlite in the table Substring, as the attribute Score.

This module tweets semi-randomly generated sentences that take word-substring pairs as their central elements. Only pairs with a funniness score of 1 or above are taken into account. The values of consumer_key, consumer_secret, access_token and access_token_secret are taken from a module named If you want to use our code with your own Twitter account, you need to write your own with the values that you can generate at The sentences that are posted to Twitter are chosen based on the morphological features of the word and the substring. A pair that has already been posted is marked with a value of Posted = 1 in the database to avoid repetitions.

Additional thoughts

We were inspired by the Twitter bot @spellwithout by @squidlarkin, which tweets word pairs with the pattern "You can't spell [A] without [B]." We decided to use this general idea, but refine it by adding a variety of sentence patterns, and to use German words instead of English words. The advantage of German in this context is its productivity in generating compound words, which lead to some of our funniest word-substring pairs.


Das Ganze ist mehr als die Summe seiner Teile. Hackathon-Projekt von @ojahnn und @noeddl bei der #clunc15






No releases published


No packages published