Phonetic/semantic similarity stuff
Given a list of words from Google news for which a 'semantic' distance was available

Take the top most 150 semantically similar word pairs for each of the 70k words

Calculate phonetic similarity for each word pair (using

Removed all pairs where the two words had the same stem (used Porter stemming).

Output of just under 2 million word pairs (1959712).

Pretty pictures and some interesting word pairs.