# Problem C

For the third problem we will compare two words to identify if they are anagrams.

Consider the three sets of words:

    EARTH - HEART
    CAT - DOG
    DORMITORY - DIRTY ROOM
    
Here we will write some programs which compare these words and identify if they are anagrams by way of a boolean (True, False).

There are 4 ways you could do this that we will explore:

    1. Test that each character in word 1 actually occurs in word 2.
    2. Sort the letters in the word alphabetically and compare.
    3. Create a complete list of all the possible strings from word 1 and then compare this list with word 2. (NOTE - only test this with short words!)
    4. Count the number of times each character occurs in each word and then compare the result.

Write an algorithm for methods 1, 2 and 4 above. Method 3 is more advanced and has been written for you - you do not need to code it but should instead think about <b>how</b> you might code it.  How do the number of calculations performed by each method differ (give approximate answers - you don't need to count every single calculation)?

In [4]:
import numpy as np

In [1]:
# Answer here - method 1


In [2]:
# Answer here - method 2


In [6]:
# Becky's solution for (method 3) using the itertools module
# itertools.permutations will return a list of all the possible permutations of the input list.
# See for example the printed outputs for the input ['d','o','g']. Coding up a function to do this would
# itself by a great example problem for this notebook!

def anagramSolution3(s1,s2):
    import itertools
    
    alist1 = list(s1)
    alist2 = list(s2)
    
    #convert letetrs to numbers and write factorial
    no_of_ways=np.math.factorial(len(alist1))
    print('no. of solutions :', no_of_ways)
    permutations = list(itertools.permutations(alist1))
    
    matches = False
    for i in permutations:
        print(i)
        if list(i) == alist2:
            matches = True
            return matches
    
    return matches
    
print(anagramSolution3('dog','cat'))


# We require n! (n factorial) comparisons where n is the length of the input. When n is large this will be
# *very* slow!

no. of solutions : 6
('d', 'o', 'g')
('d', 'g', 'o')
('o', 'd', 'g')
('o', 'g', 'd')
('g', 'd', 'o')
('g', 'o', 'd')
False


In [7]:
# Answer here - method 4


We tend to consider the number of calculations or steps in your code as a mark of its efficiency. We use the 'order of magnitude' notation big 'O' to report this. 

In these cases above you worked out roughly how many calculations were done. In big O notation we would conclude this:

1. Two nested for lops are required e.g. for i in word 1 then loop through word 2 - so this is a O(n${^2}$) algorithm

2. The comparison is quick O(n) but the sort is again a two nested for loop operation (if using a bubble sort) so this dominates the process and it is again an O(n$^{2}$) algorithm

3. Search for each possible string combination requires n*(n-1)*(n-2)*(n-3)*...*1 operations which is n!. n! is *way* larger than n${^2}$ for big n  (O(n!)). This is not efficient for us though this approach might be the most familiar to humans as we try to make words from anagrams. 

4. There are 26 letters in the alphabet so this algorithm only required that we go through one for loop and simply add a number when a letter turns up so this operation is actually linear O(n). This may not be the first thing you would have thought of but it is the most efficient process. (Of course, you may not be using the Roman alphabet, or want to consider spaces and punctuation (or not), but all this could be included while still being O(n)).

## Lessons learned

* Take time to consider different methods of solving a problem - no need to rush to code
* Consider what might be best for your algorithm - e.g. is it a one-off such that brute force works, does it need to be generalised so it can work in many situations, does it need to be quick?