# Monday, November 4th, 2024

## Dictionaries

We've been working extensively with lists, which we can think of as mappings from an index to an object.

In [1]:
my_list = ['hello', 5, 'a', 'bye', 72]

In [2]:
my_list[0]

'hello'

With this list, the index `0` maps to the string `'hello'`.

In [3]:
my_list[4]

72

Dictionaries are similar, except that we're not stuck with using integers as our indices. Instead, dictionaries consist of `<key>:<value>` pairs, where the "index" key maps to the a value.

We can define dictionaries as follows:

In [4]:
my_dict = {0: 'hello',
           1: 5,
           2: 'a',
           3: 'bye',
           4: 72}

In [6]:
my_dict[2]

'a'

We can use many different things for keys, not just integers.

In [7]:
my_dict = {'name': 'Jonathan Lottes',
           'age': 35,
           'city': 'Buffalo'}

In [8]:
my_dict['name']

'Jonathan Lottes'

In [9]:
my_dict['age']

35

In [10]:
my_dict['city']

'Buffalo'

We can add elements to a dictionary using the syntax `my_dict[<key>] = <value>`

In [11]:
my_dict['birth date'] = 'August 18, 1989'

In [13]:
my_dict

{'name': 'Jonathan Lottes',
 'age': 35,
 'city': 'Buffalo',
 'birth date': 'August 18, 1989'}

We can check if a key is in a dictionary using `in`:

In [14]:
'name' in my_dict

True

In [15]:
'last name' in my_dict

False

Note: note everything can work as a dictionary key:

In [16]:
my_dict[[1,2,3]] = 'hello'

TypeError: unhashable type: 'list'

In [17]:
my_dict[(1,2,3)] = 'hello'

In [18]:
my_dict

{'name': 'Jonathan Lottes',
 'age': 35,
 'city': 'Buffalo',
 'birth date': 'August 18, 1989',
 (1, 2, 3): 'hello'}

## Code breakers

In [22]:
ord('a')

97

In [23]:
ord('A')

65

In [24]:
ord(';')

59

In [26]:
chr(97)

'a'

In [27]:
chr(59)

';'

In [28]:
chr(58)

':'

### Encryption example

In [29]:
secret_key = 'buffalo'

secret_message = 'Top secret!'

**Exercise:** Write a function that will take in a string and return a corresponding list of ASCII codes.

**Exercise:** Use this function to apply the encryption algorithm to `secret_message` using `secret_key`.

In [30]:
len(secret_key)

7

In [31]:
len(secret_message)

11

In [32]:
def str_to_ascii(s):
    ascii_list = []
    for c in s:
        ascii_list.append(ord(c))
    return ascii_list

In [61]:
secret_key_ascii = str_to_ascii(secret_key)
secret_key_ascii

[98, 117, 102, 102, 97, 108, 111]

In [37]:
secret_message_ascii = str_to_ascii(secret_message)
secret_message_ascii

[84, 111, 112, 32, 115, 101, 99, 114, 101, 116, 33]

In [38]:
(secret_key_ascii[0] + secret_message_ascii[0]) % 128

54

### Working with files in Python:

Note: I've download a file called `5desk.txt` from the Code Breakers project page into my weekly notebook folder.

In [64]:
f = open('5desk.txt')

In [86]:
#help(f)

`f.read(n)` will return the next `n` characters (starting from the beginning) from the file.

In [67]:
print(f.read(10))

A
a
Aachen


Running again prints the next `10` characters:

In [68]:
print(f.read(10))


Aalborg
a


In [69]:
print(f.read(50))

ardvark
Aarhus
Aaron
AB
Ab
abaci
aback
abacus
Abad


`f.seek()` can be used a starting position. For example, `f.seek(0)` returns to the beginning:

In [72]:
f.seek(0)

0

In [73]:
print(f.read(20))

A
a
Aachen
Aalborg
a


In [74]:
f.seek(0)

0

For our purposes, we generally will just use `f.read()` with no argumen to read in the entire file.

In [75]:
s = f.read()

In [77]:
print(s[:20])

A
a
Aachen
Aalborg
a


It is generally advisable to close a file once we're done working with it:

In [78]:
f.close()

A cleaner way to accomplish this same task of loading in a file to a string:

In [79]:
with open('5desk.txt') as f:
    s = f.read()
    
# Once we leave the `with` block, the file will automatically close

In [81]:
print(s[:20])

A
a
Aachen
Aalborg
a


### Working with strings:

As previously mentioned, the `.split()` method applied to a string will return a list of all substrings that are separated by white space.

In [82]:
words = s.split()

In [83]:
words[:10]

['A',
 'a',
 'Aachen',
 'Aalborg',
 'aardvark',
 'Aarhus',
 'Aaron',
 'AB',
 'Ab',
 'abaci']

# Wednesday, November 6th, 2024

**Exercise:** Write a function that will apply the decryption algorithm using the secret key `buffalo` and the encrypted message `[54, 100, 86, 6, 84, 81, 82, 84, 90, 90, 7]`.

**Exercise:** Write a function that will take a sequence of ASCII codes and return a corresponding string of characters.

**Exercise:** 
- Download your encrypted file from the project page
- Open the file and Python and read the contents to a string
- Convert the string to a sequence of integers
- Play around with "decrypting" this sequence using some random passwords

In [None]:
for i, char in enumerate(message):
    

In [1]:
my_str = 'hello'

In [41]:
with open('5desk.txt') as f:
    s = f.read()

In [42]:
words = s.split()

In [15]:
print(words[:10])

['A', 'a', 'Aachen', 'Aalborg', 'aardvark', 'Aarhus', 'Aaron', 'AB', 'Ab', 'abaci']


In [None]:
password = words[0]

key = str_ascii(password)

decrypt(key, encrypted_message)

In [16]:
with open('akuhn.txt') as f:
    s = f.read()

In [18]:
print(s[:100])

119 48 81 77 9 82 70 102 1 55 90 94 96 81 13 85 75 77 9 81 84 91 84 87 90 94 81 89 92 83 3 88 94 98 


### Some ideas for identifying the correct keyword

We will need to try every word in our list of words as a potential keyword. For each keyword, we will decrypt the message and somehow measure how correct we believe the decrypted message to be.

We would expect that the correctly decrypted message will contain english words. We could try to take our decrypted message, identify "words" (i.e. split on white space), and check how many "words" are actual words.

In [25]:
# Check if something is a word:

'aardvark' in words

True

In [26]:
'Aardvark' in words

False

How to deal with capitalization?

We can use the `.lower()` method to return a string with all letters lowercase.

In [28]:
words = [word.lower() for word in words]

In [31]:
word = 'Aardvark'

print(word)
print(word.lower())

print(word in words)
print(word.lower() in words)

Aardvark
aardvark
False
True


What about punctuation?

In [32]:
word = 'ran.'

print(word in words)

False


In [33]:
print('ran' in words)

True


Can we deal with punctuation in a reasonable way?

One way is to use the `.replace()` method.

In [34]:
my_string = 'Hello, my name is Jon. Today is Wednesday.'

In [35]:
help(my_string.replace)

Help on built-in function replace:

replace(old, new, count=-1, /) method of builtins.str instance
    Return a copy with all occurrences of substring old replaced by new.
    
      count
        Maximum number of occurrences to replace.
        -1 (the default value) means replace all occurrences.
    
    If the optional argument count is given, only the first count occurrences are
    replaced.



In [36]:
print(my_string.replace('Jon', 'Luke'))

Hello, my name is Luke. Today is Wednesday.


In [37]:
print(my_string.replace('Wednesday', 'Thursday'))

Hello, my name is Jon. Today is Thursday.


In [39]:
my_string2 = my_string.replace('.', '')
my_string3 = my_string2.replace(',', '')

# Also remove !, ?, :, ;
# Can we do this in a more automated way with a for loop?

print(my_string3)

Hello my name is Jon Today is Wednesday


Once we've counted how many actual words appear in our decrypted message, we could find the keyword that produced the most actual words.

In [None]:
def get_clean_message(decrypted_message):
    # Remove punctuation
    # Get rid of capital letters

In [None]:
def get_word_count(decrypted_message):
    clean_message = get_clean_message(decrypted_message)
    decrypted_words = clean_message.split()
    ...

How to find the keyword the generated the most actual words in its decrypted message?

In [44]:
keywords = words[:5]
print(keywords)

# Suppose we've counted the number of actual words that appear 
# in the decrypted message for each of the keywords

actual_word_counts = [5, 7, 4, 122, 10]

['A', 'a', 'Aachen', 'Aalborg', 'aardvark']


In [45]:
max(actual_word_counts)

122

The `max` function returns the largest element in a list.

The `np.argmax()` will instead return the index of the largest element in a list.

In [47]:
keyword_index = np.argmax(actual_word_counts)

keywords[keyword_index]

'Aalborg'

Suppose we have an alternative data structure:

In [51]:
actual_word_counts = {'A': 5,
                      'a': 7,
                      'Aachen': 4,
                      'Aalborg': 122,
                      'aardvark': 10,
                      'aaaaaaaaa': 5}

In [53]:
max(actual_word_counts)

'aardvark'

In [52]:
max(actual_word_counts, key = len)

'aaaaaaaaa'

In [54]:
def get_count(key):
    return actual_word_counts[key]

In [55]:
max(actual_word_counts, key=get_count)

'Aalborg'

In [56]:
for item in actual_word_counts.items():
    print(item)

('A', 5)
('a', 7)
('Aachen', 4)
('Aalborg', 122)
('aardvark', 10)
('aaaaaaaaa', 5)


In [58]:
def f(item):
    key, value = item
    return value

In [59]:
max(actual_word_counts.items(), key = f)

('Aalborg', 122)

In [61]:
max(actual_word_counts.items(), key=lambda item: item[1])

('Aalborg', 122)