# Word Anagrams

## Introduction

**Purpose:**

The purpose of this project is to **find anagrams in the dictionary of English words**. Two words are anagrams of each other if their letters can be rearranged to turn one word into the other. For example, "listen" and "silent" are anagrams.

**Goals:**

1. Load a list of English words into a Python list.
2. Create a Python dictionary of anagrams, indexed by anagrammed word.
3. Group dictionary words by their length and then find the total number of anagrams in each group.

## Loading the English Words

We begin by loading a list of English words into Python. The data file is called `web2` and it should be located in the `../data/` folder. The file contains a list of words from *Webster's Second International Dictionary (1934)*. The 1934 copyright has lapsed and this file is included in Unix and OS X at `/usr/share/dict/web2)` as a reference word list for various uses.

Open and read in the data:

In [108]:
words = open('practice_python/data/web2', 'r').readlines()
words = [line.rstrip() for line in words]  # Remove '\n'

print(len(words))

235886


We can do the same in just one line:

In [109]:
words = [line.rstrip() for line in open('practice_python/data/web2', 'r')]

print(len(words))
words[:10]

235886


['A',
 'a',
 'aa',
 'aal',
 'aalii',
 'aam',
 'Aani',
 'aardvark',
 'aardwolf',
 'Aaron']

We should also convert all the words to lowercase since we don't need capitalized words in anagrams. We can open the file, read in its contents, remove newline characters, and change case in just one line:

In [110]:
words = [line.rstrip().lower() for line in open('practice_python/data/web2', 'r')]

print(len(words))
words[:10]

235886


['a',
 'a',
 'aa',
 'aal',
 'aalii',
 'aam',
 'aani',
 'aardvark',
 'aardwolf',
 'aaron']

Notice that the word list has duplicates, such as the terms 'A' and 'a' which now both appear as 'a' after changing the terms to lowercase. To remove duplicates, convert the list to a set and then convert back to a list. In this process, the set conversion loses the alphabetical ordering of the terms, so we also need to sort the resulting list of unique words.

In [111]:
words_unique = sorted(list(set(words)))

print(len(words_unique))
words_unique[:10]

234371


['a',
 'aa',
 'aal',
 'aalii',
 'aam',
 'aani',
 'aardvark',
 'aardwolf',
 'aaron',
 'aaronic']

**We can open the file, read in its contents, remove newline characters, change the terms to lowercase, and remove duplicates *in a single step*:**

In [112]:
words_unique = sorted(list(set([line.rstrip().lower() for line in open('practice_python/data/web2', 'r')])))

print(len(words_unique))
words_unique[:10]

234371


['a',
 'aa',
 'aal',
 'aalii',
 'aam',
 'aani',
 'aardvark',
 'aardwolf',
 'aaron',
 'aaronic']

**We can write a nice function** to accomplish the same result.

In [113]:
data_path = 'practice_python/data/web2'

# =============================================================================
def get_wordlist(data_path):
    """
    Returns a clean, sorted list of unique lowercase English words read in from
    the given data file located at the given path.
    """
    return sorted(list(set([line.rstrip().lower()
                            for line in open(data_path, 'r')])))

In [114]:
words_unique = get_wordlist(data_path)

print(len(words_unique))
words_unique[:10]

234371


['a',
 'aa',
 'aal',
 'aalii',
 'aam',
 'aani',
 'aardvark',
 'aardwolf',
 'aaron',
 'aaronic']

## Finding Anagrams