# Advent of Code 2017: [Day 4](http://adventofcode.com/2017/day/4)

## Problem statement

>A new system policy has been put in place that requires all accounts to use a passphrase instead of simply a password. A passphrase consists of <font color='green'>a series of words (lowercase letters) separated by spaces</font>.

>To ensure security, a valid passphrase must contain <font color='blue'>no duplicate words</font>.

>For example:

>- '`aa bb cc dd ee`' is valid.
- '`aa bb cc dd aa`' is not valid - the word aa appears more than once.
- '`aa bb cc dd aaa`' is valid - '`aa`' and '`aaa`' count as different words.

>The system's full <font color='green'>passphrase list</font> is available as your puzzle input. **<font color='red'>How many</font> passphrases are valid?**

## Breaking down the problem
- **Task**: Count the number of valid passphrases given a passphrase list
- <font color='green'>Input</font>: A list of passphrases (strings)
- <font color='blue'>Process the data</font>: Filter out passphrases with duplicate words
- <font color='red'>Compute</font>: Find the length of the remaining passphrase list

## Implementation
The main thing to implement is to check whether a particular passphrase is valid or not - whether it contains any duplicate words. Python's `set` type reduces a list down to its unique elements, and so if a list (`words` in this case) stays the same length then every element must be unique. 

In [69]:
def valid(phrase):
    words = phrase.split()
    return len(set(words)) == len(words)

## Check against test cases
We can check this against the toy example given in the problem

In [70]:
phrase_list = [
    'aa bb cc dd ee',
    'aa bb cc dd aa',
    'aa bb cc dd aaa'
]

for passphrase in phrase_list:
    print('\'{}\' is {}'.format(passphrase, 'valid' if valid(passphrase) else 'not valid'))

'aa bb cc dd ee' is valid
'aa bb cc dd aa' is not valid
'aa bb cc dd aaa' is valid


## Solve problem
Since the data is all one block of text it needs to be preprocessed by splitting the lines into passphrases. The first five passphrases are shown as an example.

In [71]:
def load_data():
    with open('day4_input.txt') as f:
        return [phrase
                for phrase in f.read().split('\n')
                if len(phrase) > 0]
    
phrase_list = load_data()

for i, phrase in zip(range(5), phrase_list):
    print('{}: {}'.format(i, phrase))

0: sayndz zfxlkl attjtww cti sokkmty brx fhh suelqbp
1: xmuf znkhaes pggrlp zia znkhaes znkhaes
2: nti rxr bogebb zdwrin
3: sryookh unrudn zrkz jxhrdo gctlyz
4: bssqn wbmdc rigc zketu ketichh enkixg bmdwc stnsdf jnz mqovwg ixgken


The answer to the problem is simply the length of the list that results from filtering out any passphrases that are not valid

In [72]:
print(len([passphrase for passphrase in phrase_list
           if valid_passphrase(passphrase)]))

383


For **part two** passphrases are also not valid if any two words in the phrase are anagrams of each other. This works exactly the same as in part one, except we have redefined what it means for words to be unique.

If two words are anagrams of each other then they must contain the same characters, albeit in a different order. Therefore both words should have the same result after sorting the characters alphabetically. Using phrase `4` from above:

In [73]:
print(sorted('enkixg'))
print(sorted('ixgken'))

['e', 'g', 'i', 'k', 'n', 'x']
['e', 'g', 'i', 'k', 'n', 'x']


By implementing this as the hash and equality operator for a Python class, we are redefining what it means for two instances to be distinct, and therefore which should be remain after converting a phrase to a set.

In [74]:
class Password(str):
    def __hash__(self):
        return hash(str(sorted(self)))
    
    def __eq__(self, other):
        return sorted(self) == sorted(other)
    
def valid(phrase, accept_anagrams=True):
    words = phrase.split()
    
    if not accept_anagrams:
        words = [Password(word) for word in words]

    return len(set(words)) == len(words)

This new definition of '`valid`' can be seen to be consistent with the original result, as well as being able to allow for filtering out anagrams as well

In [75]:
print(len([passphrase for passphrase in phrase_list
           if valid(passphrase)]))

383


In [76]:
print(len([passphrase for passphrase in phrase_list
           if valid(passphrase, accept_anagrams=False)]))

265
