
## Step 1 - Convert hex to base64

> The string:
> ```
> 49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
> ```
> Should produce:
> ```
> SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
> ```
> So go ahead and make that happen. You'll need to use this code for the rest of the exercises.
>
> ### Comment
>
> Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing.

In [None]:
# your code here ... (put some comments to explain what you did)

# converts hex string -> base64 string
def convert(hex):
    raw_bytes = bytes.fromhex(hex)
    base64 = ""
    base64_alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"

    i = 0
    while i < len(raw_bytes):
        block = raw_bytes[i:i + 3]
        i += 3
        
        if len(block) == 3: # 24 bits
            bits = (block[0] << 16) | (block[1] << 8) | block[2]
            indices = [
                (bits >> 18) & 0x3F,
                (bits >> 12) & 0x3F,
                (bits >> 6) & 0x3F,
                bits & 0x3F,
            ]
            for index in indices:
                base64 += base64_alphabet[index]

        elif len(block) == 2: # 16 bits
            bits = (block[0] << 16) | (block[1] << 8)
            indices = [
                (bits >> 18) & 0x3F,
                (bits >> 12) & 0x3F,
                (bits >> 6) & 0x3F,
            ]
            for index in indices:
                base64 += base64_alphabet[index]
            base64 += "="

        elif len(block) == 1: # 8 bits
            bits = block[0] << 16
            indices = [
                (bits >> 18) & 0x3F,
                (bits >> 12) & 0x3F,
            ]
            # Append two characters then two '=' paddings.
            for index in indices:
                base64 += base64_alphabet[index]
            base64 += "=="
    return base64

res = convert("49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d")
print(res)



SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t


## Step 2 - Fixed XOR

> Write a function that takes two equal-length buffers and produces their XOR combination.
>
> If your function works properly, then when you feed it the string:
> ```
> 1c0111001f010100061a024b53535009181c
> ```
> ... after hex decoding, and when XOR'd (bitwise) against:
> ```
> 686974207468652062756c6c277320657965
> ```
> ... should produce:
> ```
> 746865206b696420646f6e277420706c6179
> ```

In [4]:
# your code with comments ... (feel free to add as many as helper functions as you need!)

def XOR(hex1, hex2):
    bytes1 = bytes.fromhex(hex1)
    bytes2 = bytes.fromhex(hex2)

    res = bytearray()

    for b1, b2 in zip(bytes1, bytes2):
        res.append(b1 ^ b2)
    
    return res.hex()

hex1 = "1c0111001f010100061a024b53535009181c"
hex2 = "686974207468652062756c6c277320657965"
res = XOR(hex1, hex2)
print(res)

746865206b696420646f6e277420706c6179


## Step 3 - Single-byte XOR cipher

> The hex encoded string:
> ```
> 1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736
> ```
> ... has been XOR'd against a single character. Find the key (which is one byte) and decrypt the message. The message is a meaningful sentence in English!
>
> You should write a code to find the key and decrypt the message. Don't do it manually!
>
> ### Comment
> There are several mini steps to achieve this! First, you need a strategy for searching in the key space. Second, you need a test/scoring mechanism to check whether the decrypted message is  meaningful or not (i.e., detecting garbage vs. the correct output). You can read more about *"Caesar"* cipher to get some ideas and more background!

#### Description
*A brief description of your approach. Don't just put the code. First explain what you did and WHY you did it!*

<p> I'm given a hex string (encrypted) which represents some English text. This question is asking me to find the key that's 1 byte and turn the hex into English using that key. I'm using brute force to find all possible keys and keeping track of a "score" which represents how likely something is English. The reason why I think brute search is a decent way is because -- for this problem -- the method doesn't require too much computation (the runtime will be good) since there's only 256 keys (it's 1 byte). <br>
...
</p>

In [13]:
# your code with comment
import string

def score(text):
    score = 0
    common_letters = "ETAOIN SHRDLU"
    for char in text.upper():
        if char in common_letters:
            score += 2  
        elif char in string.ascii_uppercase:
            score += 1  
        elif char == " ":
            score += 2 
        elif char in string.printable:
            score += 0.5  
        else:
            score -= 5 
    return score

hex = "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"
raw_bytes = bytes.fromhex(hex)

best_score = float("-inf")
best_candidate = None
best_key = None

for key in range(256):
    candidate_bytes = bytes([b ^ key for b in raw_bytes])
    candidate_text = candidate_bytes.decode('ascii')
    candidate_score = score(candidate_text)
    if candidate_score > best_score:
        best_score = candidate_score
        best_candidate = candidate_text
        best_key = key

print(best_key)
print(chr(best_key))
print("english: ")
print(best_candidate)

UnicodeDecodeError: 'ascii' codec can't decode byte 0x9b in position 0: ordinal not in range(128)

In [9]:
import string

def score(text):
    """
    Scores a piece of text based on how likely it is to be English.
    This scoring function rewards common English letters and spaces,
    while penalizing non-printable characters.
    """
    score_val = 0
    common_letters = "ETAOIN SHRDLU"
    for char in text.upper():
        if char in common_letters:
            score_val += 2  
        elif char in string.ascii_uppercase:
            score_val += 1  
        elif char == " ":
            score_val += 2 
        elif char in string.printable:
            score_val += 0.5  
        else:
            score_val -= 5 
    return score_val

# The given hex string.
hex_string = "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"

# Convert the hex string into raw bytes.
raw_bytes = bytes.fromhex(hex_string)

best_score = float("-inf")
best_candidate = None
best_key = None

# Try every possible single-byte key (0-255)
for key in range(256):
    # XOR each byte of raw_bytes with the key.
    candidate_bytes = bytes([b ^ key for b in raw_bytes])
    
    # Try to decode the candidate bytes into an ASCII string.
    try:
        candidate_text = candidate_bytes.decode('ascii')
    except UnicodeDecodeError:
        continue  # If decoding fails, skip this candidate.
    
    # Score the candidate text.
    candidate_score = score(candidate_text)
    
    # Keep track of the candidate with the highest score.
    if candidate_score > best_score:
        best_score = candidate_score
        best_candidate = candidate_text
        best_key = key

print("Best key (as integer):", best_key)
print("Best key (as character):", chr(best_key))
print("Decrypted message:")
print(best_candidate)


TypeError: 'bytes' object is not callable

In [10]:
print(bytes)


b'\x1b77316?x\x15\x1b\x7f+x413=x9x(7-6<x7>x:9;76'


## Step 4 - Detect single-character XOR

> One of the 60-character strings in [this file](data/04.txt) has been encrypted by single-character XOR (each line is one string).
>
> Find it.
>
> ### Comment
> You should use your code in Step 3 to test each line. One line should output a meaningful message. Remeber that you don't know the key either but you can find it for each line (if any).

#### Description
*A brief description of your approach. Don't just put the code. First explain what you did and WHY you did it!*

<p> (your description)<br>
...
</p>

In [None]:
# your code with comment

## Step 5 - Implement repeating-key XOR

> Here is the opening stanza of an important work of the English language:
> ```
> Burning 'em, if you ain't quick and nimble
> I go crazy when I hear a cymbal
> ```
> Encrypt it, under the key "ICE", using repeating-key XOR.
>
> In repeating-key XOR, you'll sequentially apply each byte of the key; the first byte of plaintext will be XOR'd against I, the next C, the next E, then I again for the 4th byte, and so on.
>
> It should come out to:
> ```
> 0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272
> a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f
> ```


In [None]:
# your code with comments

## Step 6 (Main Step) - Break repeating-key XOR

> There's a file [here](data/06.txt). It's been base64'd after being encrypted with repeating-key XOR.
>
> Decrypt it.
>
> Here's how:
>
> - Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40.
>
> - Write a function to compute the edit distance/Hamming distance between two strings. The Hamming distance is just the number of differing bits. The distance between:
```"this is a test"```
and
```"wokka wokka!!!"```
is 37. Make sure your code agrees before you proceed.
>
> - For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.
>
> - The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.
>
> - Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.
>
> - Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.
>
> - Solve each block as if it was single-character XOR. You already have code to do this.
> For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key.

#### Description
*A brief description of your approach. Don't just put the code. First explain what you did and WHY you did it!*

<p> (your description)<br>
...
</p>

In [None]:
# your code with comments