## Set 1

This is the story of my journey through [the cryptopals crypto challenges](https://cryptopals.com/). I ran into Cryptopals online while trying to solve some other ones-and-zeroes puzzle I was totally unequipped for, and I took up the challenges because they seemed... not easy, but digestable. And they are.

But they also led me into some areas of Python where I rarely tread, so I'm laying down a path and taking notes. The notes are really for me, but if someone else finds them useful then spread the love.

###  Challenge 1: Convert hex to base64

> The string:
>
> `49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d`
>
> Should produce
>
> `SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t`
>
> So go ahead and make that happen.

---

#### A big discovery!

`49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d` is _not_ a string. "string," of course, isn't a thing in Python: the data types are called **str** and **bytes**, but that behemoth isn't one of those either. 

It's a _hexadecimal number_. In Python, you can pass it to a standard library function, `bytes.fromhex`, to turn it into something more human-readable. But for the most part, we'll be looking at it and its kin in its less-readable form, a sequence of zeroes and ones.

[Link to more background.](sequence-types.ipynb)

#### Solution

In [21]:
import base64

def to_base64(hexstring):
    """
    :param hexstring: a hexadecimal-encoded string
    :return: the base64 encoding of `hexstring` as a `bytes`
    """
    return base64.b64encode(bytes.fromhex(hexstring))
    
to_base64('49276d206b696c6c696e6720796f757220627261696e206c'
          '696b65206120706f69736f6e6f7573206d757368726f6f6d')

b'SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t'

### Challenge 2: Fixed XOR

> Write a function that takes two equal-length buffers and produces their XOR combination.
> 
> If your function works properly, then when you feed it the string:
> 
> `1c0111001f010100061a024b53535009181c`
> 
> ... after hex decoding, and when XOR'd against:
> 
> `686974207468652062756c6c277320657965`
>
> ... should produce:
> 
> `746865206b696420646f6e277420706c6179`

---

Looking ahead, I think the Cryptopals are trying to get us used to the idea of bitwise calculations, like literally iterating one bit at a time:

        m = 11100000
        n = 11010000
    ----------------
    m ^ n = 00110...

#### Solution

Looking ahead again, it seems useful for this method to take two **bytes** as parameters instead of two **str** objects.

In [22]:
def fixed_xor(m, n):
    """
    Given `m` and `n` as bytes, calculate the bitwise exclusive-or of `m` and `n`.
    
    :param m: a bytes
    :param n: another bytes
    :return: a bytes
    """
    return bytes(x ^ y for x, y in zip(m, n))
                 
fixed_xor(bytes.fromhex('1c0111001f010100061a024b53535009181c'),
          bytes.fromhex('686974207468652062756c6c277320657965')).hex()

'746865206b696420646f6e277420706c6179'

### Challenge 3: Single-byte XOR cipher

> The hex encoded string:

> `1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736`

> ... has been XOR'd against a single character. Find the key, decrypt the message.

Single-byte XOR means we have a one-character (i.e. one byte) key $k$, and we replace each byte $b$ of our message with $b \oplus k$.

#### Solution — part 1

It turns out that the following function shouldn't be exposed outside of Challenge 3, so it should take a **bytes** object in its long-term lifecycle.

In [23]:
def single_byte_xor(encrypted, key):
    """
    Decrypt a message encoded with a single-byte key.
    
    :param encrypted: the message, a bytes
    :param key: the key, 0 <= key < 2 ** 8
    :return: the plaintext bytes
    """
    return bytes(c ^ key for c in encrypted)

With the `single_byte_xor` function, all we need to do is try our encrypted message against all 255 keys and see which one produces genuine English text.

The problem statement includes:

> You can do this by hand. But don't: write code to do it for you.

> How? Devise some method for "scoring" a piece of English plaintext. Character frequency is a good metric. Evaluate each output and choose the one with the best score.

My hypothesis was that the return values of `single_byte_xor` with some of the keys would include garbage bytes that, in ASCII, map to junk like "vertical tab" or "delete". Even though it's not stated in the problem, I'm assuming that all characters in the plaintext are _printable_.

#### Another discovery!

A _printable_ character, here, is a character `c` such that `c in string.printable`. TIL that `'\n'.printable() != '\n' in string.printable` and lots of plaintext has newlines in it.

In [24]:
import string

def is_printable(text):
    return all(ch in string.printable for ch in text)


class Decryption:
    def __init__(self, plaintext, key):
        self.plaintext = [chr(c) for c in plaintext]
        self.key = key

    def is_printable(self):
        return all(c in string.printable for c in self.plaintext)

    def get_plaintext(self):
        return ''.join(self.plaintext)

    def __iter__(self):
        return iter(self.plaintext)

    def __repr__(self):
        return "Decryption(plaintext={!r}, key={})".format(''.join(self.plaintext), self.key)

    @staticmethod
    def create(ciphertext, key):
        return Decryption(single_byte_xor(ciphertext, key), key)

In [25]:
def try_decrypt_single_byte_xor(encrypted, keys):
    """
    Apply single-byte XOR decryption to `encrypted` for each of the given keys. Return only the printable Decryptions.

    :param encrypted: the message, a hex-encoded string
    :param keys: an iterable of keys to try
    :return: an iterable of printable Decryptions
    """
    decryptions = [Decryption.create(encrypted, key) for key in keys]

    # filter out plaintexts with non-printable characters
    return list(filter(lambda d: d.is_printable(), decryptions))

In [26]:
decrypted = try_decrypt_single_byte_xor(
    bytes.fromhex('1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736'), range(1, 2 ** 8))
[''.join(d.plaintext) for d in decrypted]

['\\pptvqx?R\\8l?svtz?~?opjq{?py?}~|pq',
 'Q}}y{|u2_Q5a2~{yw2s2b}g|v2}t2psq}|',
 'Vzz~|{r5XV2f5y|~p5t5ez`{q5zs5wtvz{',
 'Txx|~yp7ZT0d7{~|r7v7gxbys7xq7uvtxy',
 'Kggcafo(EK/{(dacm(i(xg}fl(gn(jikgf',
 'Jffb`gn)DJ.z)e`bl)h)yf|gm)fo)khjfg',
 'Hdd`bel+FH,x+gb`n+j+{d~eo+dm+ijhde',
 'Nbbfdcj-@N*~-adfh-l-}bxci-bk-olnbc',
 'Maaeg`i.CM)}.bgek.o.~a{`j.ah.loma`',
 "Cooking MC's like a pound of bacon",
 'Bnnjhof!LB&r!mhjd!`!qntoe!ng!c`bno',
 'Ammikle"OA%q"nkig"c"rmwlf"md"`caml',
 '@llhjmd#N@$p#ojhf#b#slvmg#le#ab`lm',
 'Gkkomjc$IG#w$hmoa$e$tkqj`$kb$fegkj',
 'Fjjnlkb%HF"v%iln`%d%ujpka%jc%gdfjk',
 'Eiimoha&KE!u&jomc&g&vishb&i`&dgeih',
 "Dhhlni`'JD t'knlb'f'whric'ha'efdhi",
 'iEEACDM\ngi\rY\nFCAO\nK\nZE_DN\nEL\nHKIED',
 'hDD@BEL\x0bfh\x0cX\x0bGB@N\x0bJ\x0b[D^EO\x0bDM\x0bIJHDE',
 'oCCGEBK\x0cao\x0b_\x0c@EGI\x0cM\x0c\\CYBH\x0cCJ\x0cNMOCB',
 'nBBFDCJ\r`n\n^\rADFH\rL\r]BXCI\rBK\rOLNBC']

Only 21 keys produce something that looks like text, and it's easy to scan the list and discover one with English (mostly) words separated by spaces. But now that we know what the answer is, let's see if the computer can find it with the character frequency metric.

Did some Googling and found a table of letter frequencies in English:

In [27]:
letter_frequencies = {
    'a': 0.0834,
    'b': 0.0154,
    'c': 0.0273,
    'd': 0.0414,
    'e': 0.126,
    'f': 0.0203,
    'g': 0.0192,
    'h': 0.0611,
    'i': 0.0671,
    'j': 0.0023,
    'k': 0.0086,
    'l': 0.0424,
    'm': 0.0253,
    'n': 0.068,
    'o': 0.077,
    'p': 0.0166,
    'q': 0.0009,
    'r': 0.0568,
    's': 0.0611,
    't': 0.0937,
    'u': 0.0285,
    'v': 0.0106,
    'w': 0.0234,
    'x': 0.002,
    'y': 0.0204,
    'z': 0.0006
}

This might need some refinement because I'm not a statistician, but it seemed reasonable to me to verify the (likelihood of) correctness of our English plaintext by summing, for each letter, the difference between the expected number of times the letter should appear in our plaintext and the actual number of times the letter does appear. There might be subtleties to this that I'm missing, but it worked for this challenge and the next one.

In [32]:
from collections import Counter, namedtuple

Score = namedtuple('Score', ['score', 'decryption'])

nil_score = Score(float('inf'), decryption=None)

def score(text):
    """
    "Scores" the likelihood that a piece of text is in English.
    
    :param text: the text in question, as a str
    :return: a value indicating how close `text` is to expected English text, in terms of letter frequency. 
        The lower the value, the more likely the text is English.
    """
    # ignore non-letter characters
    counter = Counter(c.lower() for c in text if c.isalpha())
    letter_count = sum(counter.values())
    
    if not letter_count:
        return float('inf')
    
    total_variance = 0.0
    for letter, frequency in letter_frequencies.items():
        total_variance += abs(counter[letter] / letter_count - frequency)
        
    return total_variance

def decrypt_single_byte_xor(encrypted):
    """
    You are given a hex-encoded string that has been single-byte xor'd against an ASCII character. Decrypt the string.

    Assume that all characters in the string are in string.printable.

    :param encrypted: the message, as bytes
    :return: the best Score when the string is decrypted against every ASCII character.
    """
    decryptions = try_decrypt_single_byte_xor(encrypted, range(1, 2 ** 8))
    if not decryptions:
        return nil_score

    return min(map(lambda p: Score(score(p), p), decryptions))

msg = bytes.fromhex('1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736')
res = decrypt_single_byte_xor(msg)
''.join(res.decryption.plaintext)

"Cooking MC's like a pound of bacon"

### Challenge 4

I don't see a way to do this without decrypting each line with every possible key until we find some plaintext that looks like English.

In [37]:
import requests

req = requests.get('https://cryptopals.com/static/challenge-data/4.txt')
res = min((decrypt_single_byte_xor(bytes.fromhex(line.strip())), idx) 
          for idx, line in enumerate(request.text.splitlines()))
res[1]

170

### Challenge 5

Repeating-key XOR is the generalized case of single-byte XOR.

Given a `key`, each byte of the plaintext at index i is XOR'd with `key[i % len(key)]` and the results converted to two-character hexadecimal numbers, which are concatenated together.

In [22]:
import itertools
from functools import singledispatch

@singledispatch
def repeating_key_xor(text, key):
    """
    :param text: the text, either plaintext or ciphertext, as bytes
    :param key: the key, as a str
    :return: a hex-encoded string
    """
    encrypted = [p ^ k for p, k in zip(text, itertools.cycle([ord(c) for c in key]))]
    return bytes(encrypted).hex()


@repeating_key_xor.register(str)
def _(text, key):
    """
    :param text: the text, either plaintext or ciphertext, as a str
    :param key: the key, as a str
    :return: a hex-encoded string
    """
    encrypted = [p ^ k for p, k in zip((ord(c) for c in text), itertools.cycle([ord(c) for c in key]))]
    return bytes(encrypted).hex()

In [23]:
repeating_key_xor("Burning 'em, if you ain't quick and nimble\\nI go crazy when I hear a cymbal", "ICE")

'0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272a282b2f20152d0c69242a69203728393c69342d2c2d6500632d2c22376922652a3a282b2229'

### Challenge 6



In [24]:
def bitwise_hamming_distance(m, n):
    if len(m) != len(n):
        raise ValueError("m and n should be the same length in bytes")

    def count_ones(b):
        count = 0
        while b > 0:
            b = b & (b - 1)
            count += 1
        return count

    return sum(count_ones(i) for i in (x ^ y for x, y in zip(m, n)))

In [25]:
def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return itertools.zip_longest(*args, fillvalue=fillvalue)

In [26]:
KeySize = namedtuple('KeySize', ['normalized_hamming_distance', 'key_size'])

import heapq

def break_repeating_key_xor(encrypted):
    def hamming_distance(keysize):
        return bitwise_hamming_distance(encrypted[:keysize], encrypted[keysize:2 * keysize])

    heap = [KeySize(hamming_distance(n) / n, n) for n in range(2, 40)]
    heapq.heapify(heap)

    def fail_fast_decrypt(columns):
        res = []
        for column in columns:
            dec = decrypt_single_byte_xor(column)
            if dec is nil_score:
                return None
            res.append(dec)
        return res

    while True:
        if not heap:
            raise ValueError("No key length worked!")

        chunks = list(grouper(encrypted, heapq.heappop(heap).key_size, fillvalue=0))
        columns = list(itertools.zip_longest(*chunks))

        individual_keys = fail_fast_decrypt(columns)
        if individual_keys:
            yield ''.join([chr(score.decryption.key) for score in individual_keys])
            
def decrypt_repeating_key_xor(encrypted):
    """
    Set 1, challenge 6.

    :param encrypted:
    :return:
    """
    while True:
        keys = break_repeating_key_xor(encrypted)
        return bytes.fromhex(repeating_key_xor(encrypted, next(keys)))

In [21]:
with requests.get('https://cryptopals.com/static/challenge-data/6.txt') as request:
    unencoded = base64.b64decode(request.content)
    print(decrypt_repeating_key_xor(unencoded))

TypeError: fromhex() argument must be str, not tuple

In [27]:
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend

def decrypt_aes_with_ecb(ciphertext, key):
    cipher = Cipher(algorithms.AES(key), modes.ECB(), default_backend())
    decryptor = cipher.decryptor()
    return decryptor.update(ciphertext) + decryptor.finalize()

def encrypt_aes_with_ecb(plaintext, key):
    cipher = Cipher(algorithms.AES(key), modes.ECB(), default_backend())
    encryptor = cipher.encryptor()
    return encryptor.update(plaintext) + encryptor.finalize()

In [33]:
abc = requests.get('https://cryptopals.com/static/challenge-data/7.txt').text

In [40]:
cipher = Cipher(algorithms.AES(b'YELLOW SUBMARINE'), modes.ECB(), default_backend())
cipher

<cryptography.hazmat.primitives.ciphers.base.Cipher at 0x104147580>

In [41]:
decryptor = cipher.decryptor()
decryptor

<cryptography.hazmat.primitives.ciphers.base._CipherContext at 0x104147bb0>

In [42]:
decryptor.update(abc)

TypeError: from_buffer() cannot return the address of a unicode object