## Day 2

### Part 1 

Your flight departs in a few days from the coastal airport; the easiest way down to the coast from here is via toboggan.

The shopkeeper at the North Pole Toboggan Rental Shop is having a bad day. "Something's wrong with our computers; we can't log in!" You ask if you can take a look.

Their password database seems to be a little corrupted: some of the passwords wouldn't have been allowed by the Official Toboggan Corporate Policy that was in effect when they were chosen.

To try to debug the problem, they have created a list (your puzzle input) of passwords (according to the corrupted database) and the corporate policy when that password was set.

For example, suppose you have the following list:

- 1-3 a: abcde
- 1-3 b: cdefg
- 2-9 c: ccccccccc

Each line gives the password policy and then the password. The password policy indicates the lowest and highest number of times a given letter must appear for the password to be valid. For example, 1-3 a means that the password must contain a at least 1 time and at most 3 times.

In the above example, 2 passwords are valid. The middle password, cdefg, is not; it contains no instances of b, but needs at least 1. The first and third passwords are valid: they contain one a or nine c, both within the limits of their respective policies.

How many passwords are valid according to their policies?

In [1]:
# Import pandas
import pandas as pd

In [2]:
# Read in data
df = pd.read_csv("input_data/Day2.txt", sep=' ', header=None)

In [3]:
df.shape

(1000, 3)

In [4]:
df.head()

Unnamed: 0,0,1,2
0,2-8,t:,pncmjxlvckfbtrjh
1,8-9,l:,lzllllldsl
2,3-11,c:,ccchcccccclxnkcmc
3,3-10,h:,xcvxkdqshh
4,4-5,s:,gssss


In [5]:
df.columns=['range','letter','password']

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   range     1000 non-null   object
 1   letter    1000 non-null   object
 2   password  1000 non-null   object
dtypes: object(3)
memory usage: 23.6+ KB


In [7]:
df.head()

Unnamed: 0,range,letter,password
0,2-8,t:,pncmjxlvckfbtrjh
1,8-9,l:,lzllllldsl
2,3-11,c:,ccchcccccclxnkcmc
3,3-10,h:,xcvxkdqshh
4,4-5,s:,gssss


In [8]:
# We can pick out a row
df.loc[2]

range                    3-11
letter                     c:
password    ccchcccccclxnkcmc
Name: 2, dtype: object

In [9]:
# or a column
df['password']

0         pncmjxlvckfbtrjh
1               lzllllldsl
2        ccchcccccclxnkcmc
3               xcvxkdqshh
4                    gssss
              ...         
995            cccjncjsccr
996     xkswshrhghxlnmhqzr
997    kkkkkkkhkkkklkkkknk
998             ttttttttnt
999            xxxxxxxxxcv
Name: password, Length: 1000, dtype: object

In [10]:
# We can filter the rows easily by creating a boolean mask
df['password'].str.count('s')>0

0      False
1       True
2      False
3       True
4       True
       ...  
995     True
996     True
997    False
998    False
999    False
Name: password, Length: 1000, dtype: bool

In [11]:
# And then filtering the dataframe by that mask
df[df['password'].str.count('s')>0]

Unnamed: 0,range,letter,password
1,8-9,l:,lzllllldsl
3,3-10,h:,xcvxkdqshh
4,4-5,s:,gssss
6,3-12,n:,grnxnbsmzttnzbnnn
9,6-8,t:,qtlwttsqg
...,...,...,...
983,4-6,d:,gvqdwrclzsdmhglrz
991,12-16,h:,nkvzdqlbsptvnrzh
993,4-11,q:,vqsllpqnqdcbbtvqrqxb
995,4-10,c:,cccjncjsccr


In [12]:
# And if we only want two columns where this is true?
df[df['password'].str.count('s')>0][['range','letter']]

Unnamed: 0,range,letter
1,8-9,l:
3,3-10,h:
4,4-5,s:
6,3-12,n:
9,6-8,t:
...,...,...
983,4-6,d:
991,12-16,h:
993,4-11,q:
995,4-10,c:


In [13]:
# Let's look at our df again
df.head()

Unnamed: 0,range,letter,password
0,2-8,t:,pncmjxlvckfbtrjh
1,8-9,l:,lzllllldsl
2,3-11,c:,ccchcccccclxnkcmc
3,3-10,h:,xcvxkdqshh
4,4-5,s:,gssss


In [14]:
# So now let's add some columns which have the min and max range
df[['min_range','max_range']] = df['range'].str.split('-', expand=True)
df['letter'] = df['letter'].str.strip(':')

In [15]:
df = df.drop(columns='range')

In [16]:
df['min_range'] = df['min_range'].astype(int)
df['max_range'] = df['max_range'].astype(int)

In [17]:
df.head()

Unnamed: 0,letter,password,min_range,max_range
0,t,pncmjxlvckfbtrjh,2,8
1,l,lzllllldsl,8,9
2,c,ccchcccccclxnkcmc,3,11
3,h,xcvxkdqshh,3,10
4,s,gssss,4,5


In [18]:
df.describe()

Unnamed: 0,min_range,max_range
count,1000.0,1000.0
mean,5.932,9.376
std,4.19726,4.601282
min,1.0,2.0
25%,3.0,6.0
50%,5.0,9.0
75%,9.0,13.0
max,19.0,20.0


In [19]:
# We'll create a new column which contains the letter count for each password
df['letter_count'] = df.apply(lambda x: x['password'].count(x['letter']), axis=1)

In [20]:
df.head()

Unnamed: 0,letter,password,min_range,max_range,letter_count
0,t,pncmjxlvckfbtrjh,2,8,1
1,l,lzllllldsl,8,9,7
2,c,ccchcccccclxnkcmc,3,11,11
3,h,xcvxkdqshh,3,10,2
4,s,gssss,4,5,4


In [21]:
# Now create another column which indicates whether password is valid or not
df.apply(lambda x: x['min_range'] <= x['letter_count'] <= x['max_range'], axis=1).sum()

666

### Part 2 

While it appears you validated the passwords correctly, they don't seem to be what the Official Toboggan Corporate Authentication System is expecting.

The shopkeeper suddenly realizes that he just accidentally explained the password policy rules from his old job at the sled rental place down the street! The Official Toboggan Corporate Policy actually works a little differently.

Each policy actually describes two positions in the password, where 1 means the first character, 2 means the second character, and so on. (Be careful; Toboggan Corporate Policies have no concept of "index zero"!) Exactly one of these positions must contain the given letter. Other occurrences of the letter are irrelevant for the purposes of policy enforcement.

Given the same example list from above:

- 1-3 a: abcde is valid: position 1 contains a and position 3 does not.
- 1-3 b: cdefg is invalid: neither position 1 nor position 3 contains b.
- 2-9 c: ccccccccc is invalid: both position 2 and position 9 contain c.

How many passwords are valid according to the new interpretation of the policies?

In [22]:
df.head()

Unnamed: 0,letter,password,min_range,max_range,letter_count
0,t,pncmjxlvckfbtrjh,2,8,1
1,l,lzllllldsl,8,9,7
2,c,ccchcccccclxnkcmc,3,11,11
3,h,xcvxkdqshh,3,10,2
4,s,gssss,4,5,4


In [23]:
# Let's rework our df for this new problem
df.rename(columns={'min_range':'first_pos', 'max_range':'second_pos'}, inplace=True)

In [24]:
df.drop(columns='letter_count', inplace=True)

In [25]:
df.head()

Unnamed: 0,letter,password,first_pos,second_pos
0,t,pncmjxlvckfbtrjh,2,8
1,l,lzllllldsl,8,9
2,c,ccchcccccclxnkcmc,3,11
3,h,xcvxkdqshh,3,10
4,s,gssss,4,5


In [26]:
# create two new boolean columns indicating whether the first and second position match
df['fp_match'] = df.apply(lambda x: x['password'][x['first_pos']-1]==x['letter'], axis=1)
df['sp_match'] = df.apply(lambda x: x['password'][x['second_pos']-1]==x['letter'], axis=1)

In [27]:
df.head()

Unnamed: 0,letter,password,first_pos,second_pos,fp_match,sp_match
0,t,pncmjxlvckfbtrjh,2,8,False,False
1,l,lzllllldsl,8,9,False,False
2,c,ccchcccccclxnkcmc,3,11,True,False
3,h,xcvxkdqshh,3,10,False,True
4,s,gssss,4,5,True,True


In [28]:
(df['fp_match'] != df['sp_match']).sum()

670