Phonetic/semantic similarity stuff
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Given a list of words from Google news for which a 'semantic' distance was available

Take the top most 150 semantically similar word pairs for each of the 70k words

Calculate phonetic similarity for each word pair (using

Removed all pairs where the two words had the same stem (used Porter stemming).

Output of just under 2 million word pairs (1959712).

Pretty pictures and some interesting word pairs.