# Computing Anagrams

## Definition

> An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

From [Wikipedia](https://en.wikipedia.org/wiki/Anagram)

## Implementation
Write a MapReduce algorithm for finding anagrams in the holmes.txt dataset.

Output only anagrams where we have at least three different words.

- e.g. `["own", "now", "won"]`


In [None]:
%%writefile anagram.py

#!/usr/bin/python3
from mrjob.job import MRJob
import re

class MyJob(MRJob):

    def mapper(self, key, line):
        line = line.strip()
        words = line.split()
        for word in words:
            word = word.lower()
            word = re.sub(r'[^\w\s]', '', word)
            yield sorted(word), word

    def reducer(self, sorted_word, words):
        distinct_words = set(words)
        length = len(distinct_words)
        
        if length > 2:
            yield length, list(distinct_words)

if __name__ == '__main__':
    MyJob.run()

### Running the Job


In [None]:
!python anagram.py /data/dataset/text/holmes.txt