### Encryption and decryption

Welcome to a new Python session for familiarization with the Jupyter Notebooks. This time, we will concentrate on encrypting and decrypting data. On a simpler language, this notebook is for you to get more familiarized with ways of sending a hidden 
message or code, and deciphering it. 

We certainly will not crack the Enigma code like Alan Turing after only one session, nor will we be Robert Langdon to 
search hidden scripts in the Vatican library. We will get, however, more familiar with some cryptic methods:

In [None]:
import graphviz
from graphviz import Digraph

Generally, the scheme for encrypting a file, or message, is the following:

In [None]:
g = Digraph()

g.node('0', 'Emitter', shape='circle')
g.node('a', 'Original file', shape='circle')
g.node('b', 'Key', shape='circle', style='filled', fillcolor='grey')
g.node('c', 'Encrypted file', shape='circle')
g.node('d', 'Email-sending', shape='circle')
g.node('e', 'Key', shape='circle', style='filled', fillcolor='grey')
g.node('f', 'Decrypted file', shape='circle')
g.node('1', 'Receiver', shape='circle')

g.edge('0', 'a', constraint = 'false')
g.edge('a', 'b', constraint = 'false')
g.edge('b', 'c', constraint = 'false')
g.edge('c', 'd', constraint = 'false')
g.edge('d', 'e', constraint = 'false')
g.edge('e', 'f', constraint = 'false')
g.edge('f', '1', constraint = 'false')

g

The trick is to choose a key as sophisticated as possible inorder to get a hidden message, unable to be cracked. The more sophisticated the key is, the harder it will be to break the code. This is how hackers can crack the software of various companies and banks, with damages being beyond repair at some points: https://www.zdnet.com/article/the-biggest-hacks-data-breaches-of-2020/

The key can only be known by the entity who sends the message (the emitter) and the one who receives the message (receiver)

Let us start with a very simple example of what encryption involves by working with the ASCII CODE: Each character
on the keyboard (latin alphabet, digits, the $%#&* characters, the spacebar) can be represented by a bunch of digit-characters (example: a represented as 97), which is turned into a binary number (we will come to that later).

How does that work? Type the following command in Python:

In [None]:
ord('a')

a character is represented as 97 in ASCII Code. Try with as many characters as you would like:

In [None]:
ord('z')

Let us check the validity of the ASCII code: according to the commands above, there are $(122 - 97) + 1 = 26$ integers 
standing for ASCII characters. This is exactly the number of characters in latin alphabet. From this perspective, the ASCII 
table does indeed make sense. 

Let us try now this:

In [None]:
ord('A')

In [None]:
ord('Z')

The capital letters occupy a lower ranking in the ASCII table than the previous a-z characters. What is there, in this case,
between the positions 90 and 97? Here is the full ASCII - Code, which you can access at your pleasure: https://www.ascii-code.com/. 

We know how to convert any character into its ASCII code equivalent using the given Python function. However, how can we do it
the other way around - convert an ASCII code into its original character? Use the below function:

In [None]:
chr(97)

Test: turn any character into the ASCII code and then use the chr function to get the original number back:

In [None]:
test_character = 'p' # Just choose a random character
test_ASCII_char = ord(test_character) # Turn the random character into its ASCII code equivalent
returned_char = chr(test_ASCII_char)  # Print the returned value
print(returned_char)

Of course, this could have been done much faster:

In [None]:
chr(ord('a'))

We can also return a different value, only by basic arithmetics:

In [None]:
chr(ord('a') + 1)

Let us play with an array of characters. This bears the name of <b>string</b>. You can actually check the data-type of any variable:

In [None]:
new_string = "It is very nice outside"
type(new_string)

What if the string consists of a single character?

In [None]:
test_1 = "a"
test_2 = 'a'
print(type(test_1))
print(type(test_2))

Unlike other programming languages, Python does not support both char and string datatypes - there is only the char datatype. 
There are some peculiarities about the string datatype, as we shall soon discover:

#### Challenge: advance all the characters of the array with one position in the latin alphabet: 

Example: a becomes b, A becomes B, and so on...We can do this by old school method, of course, with pen and paper, but the 
computer gives a much faster result. Let us do the iteration through the string in the usual way so far:

In [None]:
for i in range (len(new_string)):
    new_string[i] = chr(ord(new_string[i]) + 1)
    
new_string

This is one particular aspect of string data-type to take into discussion: it does not support item assignment in Python. The
strings are immutable: once you give a value to the overall string, you cannot assign any other variable for any element inside
the string. The string cannot be formatted any more.

As a result, if we still want the output, then <b>we will have to use some particularly in-built functions for the strings in 
Python</b>. They will create other strings which can initially be formatted. 

In [None]:
test_string = []
separator = ','

for element in new_string:
    test_string.append(chr(ord(element) + 1))
    
print(test_string)    
print(''.join(test_string))

The created string is much more complicated. You can already see where this is going - towards data encryption. For more practice, let us create a function with the particular scope of switching the characters of the string with one position:

In [None]:
def change_string(s):
    
    test_string = []
    separator = ','
    
#    for element in new_string:
    for element in s:
        test_string.append(chr(ord(element) + 1))
        
    return ''.join(test_string)

We can test this on much larger strings, to get an idea of how different the final message can be:

In [None]:
test_0 = "One of the highlights of Scottish history is the war against England in early 14-th century, with the uprising led by William Wallace and Robert Bruce"
change_string(test_0)

To make it even more complicated, we can push each character(spacebar included) two units forward, not one. Example: a becomes c, b becomes d, and so on...Or we can move it three, four units forward. We can actually control the amount by which we push the characters in the latin alphabet by a very small modification:

<b> Challenge: </b> What do I have to change for achieving this? Hint: I can control this amount only by adding another control
variable in the definition of the function

In [None]:
def go_forward(s, i):
    
    test_string = []
    separator = ','
    
#    for element in new_string:
    for element in s:
        test_string.append(chr(ord(element) + i))
        
    return ''.join(test_string)

In [None]:
go_forward(test_0,2)

Try it as many times as you want to convince yourselves that every time the output is astonishingly different than the previous
one. If we want, now, to make the message <b>impossible</b> to recover (to the point that anytime I run the same function I get
different results and the original text would be lost for good), we can work with the random function from the numpy library. Again, there are not too many modifications to make:

In [None]:
import numpy as np

def change_string(s):
    
    test_string = []
    separator = ','
    
    for element in s:
        i = np.random.randint(5)
        test_string.append(chr(ord(element) + i))
        
    return ''.join(test_string)

Try now with the given input:

In [None]:
change_string(test_0)

### A moment of some analysis (optional, does not enter the CS exam)

<b> Question: </b> What are the chances of getting the same text after the function has been run? Well, let's check the length of the string. This string is too huge, so to get a clearer idea on the chances of getting anything like it again, let's do the maths on the smaller string

In [None]:
len(new_string) # Ironically, the old string: It is very nice outside

So there are 23 characters in the string; In order to get the same text after applying the function, we need i=0 for all the characters. However, for random.randint(5) there are 5 possible choices ${0,1,2,3,4}$ (Type help(np.random.randint()) to convince yourselves). As a result, there are 23 digits, each of them having accsess to 5 possibe choices, so there will be, as a result, $5^{23}$ possible outputs. Only one is favourable: $i=0$ everywhere, so the chances of getting this result from one shot are $p = 5^{-23}$, meaning $0.00000000....$ already a number too small. To get the same string as before from more trials, you would need roughly $\frac {1}{p} = 5^{23}$ seconds. How far is this? To give you a scale, imagine we double the length of the string, so after applying the same function, the expected number of trials to get the same answer is $5^{46} = 10^{23}$, each of them taking one second, if we are quick. As a reference frame, the age of Universe is of roughly $3 * 10^{17}$ seconds, so we wait until you reach the age of Universe and 1,000,000 times more!!! This clearly cannot happen, so it is astonishingly unlikely to ever get the same string ever again.

Only from a string of reasonable length, we can get astonishingly high numbers with some basic operations...This is what encryption and decryption deal with (and generally computer science), in reducing the number of cases to be analysed. 

### Back to the exercises

OK, we played with some ways of encoding a message, but what about decoding it? As an example, we encoded a to b, b to c, c to d, and so on..but now we want b to go back to a, c to go back to b, et cetera...How do we do it? Well, we will just go the other way around, using again the ASCII code. Just to convince yourselves of the way to go, a little exercise is shown below:

In [None]:
chr(ord('b')-1)

The same procedure is applied for all the characters in the string. However, let us work directly with the index i, meaning that characters are shifted backwards with i positions, where i is specified, as before, from the input: 

In [None]:
def go_back(s,i):
    
    test_string = []
    separator = ','
    
    for element in s:
        test_string.append(chr(ord(element) - i))
        
    return ''.join(test_string)

Now run the code for the test input:

In [None]:
go_back(test_0,2)

Just as a check that all actually went well, you can test for yourselves some of the code: The initial first character of the code was O, and after two backwards shifts, it became M, which is indeed shown as output. Do similar reasonings for some of the other characters to convince yourselves that the program works as it should.

Now we arrive to the key point in the encryption-decryption procedure: it all relies in the key being accessible only to the emitter and the receiver. In our case, the key is shifting the characters i positions forward. Obviously, the only way to decode the message is simply shifting all the characters i positions backwards. Any person can do that, but by pen and paper it can take enormous time for large enough texts. As shown before, this is done much better through a computer.

We know the key for both encrypting and decrypting. <b> If the key works, we will get the original text. </b> 

Let us have the following scenario below: Alice has a message which encrypts it and sends it to Bob:

In [None]:
message_Alice = go_forward(test_0, 2)
message_Alice

John receives the encrypted message and has the key for decrypting it:

In [None]:
message_Bob = go_back(message_Alice, 2)
message_Bob

Why is it very important for the encryption and decryption keys to match? What if they do not match?

In [None]:
wrong_message = go_back(message_Alice, 3)
wrong_message

This is nothing like what it should look... To a set of characters, another set of characters is associated. This is the core of mathematical functions, so we can just as well express everything in mathematical language: to a finite and discrete set $S$ (There is no infinite numbers of letters, and we will assume that we don't work with irrational numbers here) is associated a function $f(S)$ which transforms the set into another set. The decryption key is associated with a function $g(S)$ which, acting upon the encrypted message, gives the original set: $g(f(S)) = S$, so g is the inverse function: $g = f^{-1}(S)$. Of course, this also works the other way around: $f^{-1}(f(S)) = f(f^{-1}(S)) = S$.

As you may have already noticed, the problem with encryption and decryption is that someone outside the party may intercept them as well, due to all the radar and antenna systems, microwave propagation and other physical aspects. Suppose Bill also receives the message, but he does not know the key. He tries to work out the original message, but he gets it wrong many times:

In [None]:
Bill_message = go_back(message_Alice, 5)
Bill_message

Bill has no idea above the key, the above example was just a way to show that he can get the wrong message very easily. However, an IT agency, government department or bank with fully-advanced equipment will realize in short time the content of the key, especially if the algorithm is poorly designed( This notebook is just a small taster of what encryption methods involve: the algorithm by itself is trivial, used from Ancient times. It even has a name, Caesar's Cipher: https://en.wikipedia.org/wiki/Caesar_cipher. There are lots of Artificial Intelligence algorithms for detecting trivial and simple designs in text processing, so surely no one can be fooled by this algorithm. To show you just how easy this message can be deciphered if at least a vague idea of the key is known:

Suppose CIA intercepts the message and they already know that some characters are shifted i positions forward. For the argument's sake, let us say that Alice shifted the letters 20 positions forward and CIA knows the algorithm itself, but not just by how much the characters were shifted. We will see just in how short time the message can be found out:

In [None]:
import time

Alice_message = go_forward(test_0, 20)

t_initial = time.time()
for i in range(30):
    print(go_back(Alice_message, i))
t_final = time.time()

print("Processing time: " + str(t_final - t_initial) + " seconds")

So it took only $0.007$ seconds to process all 40 messages. Can you already spot where the reliable message is? Even if there would be ten times more messages, advanced AI algorithms can easily detect the decrypted message for a rudimentary algorithm.

### Euclidian Algorithms 

There are plenty of ways to encrypt and decrypt data, and a heavy research nowadays on this topic. Many of them rely, a bit unexpectedly, on ancient algorithms, suggested by the greek mathematician Euclid. Its implementation is quite simple, and yet a perfect example for how recursion works. Before showing the code, let us delve a bit in more detail into this algorithm:

Suppose we pick two random numbers: 45 and 63, and we want to find out their greatest common divisor. By a quick glance, it is obvious that the two numbers can be written as: $45 = 9x5$ and $63 = 9x7$, 5 and 7 having no common divisor apart from 1. Hence, the greatest common divisor is 9. How does that work, though, in smaller steps to make?

In [2]:
a = 45
b = 63

Subtract the greater number from the smaller number: $63 - 45 = 18$. Remaining numbers: 45 and 18

Subtract the greater number from the smaller number: $45 - 18 = 27$  Remaining numbers: 27 and 18

Subtract the greater number from the smaller number: $27 - 18 = 9$   Remaining numbers: 18 and 9

Subtract the greater number from the smaller number: $18 - 9 = 9$    Remaining numbers: 9 and 9

Now we have equal numbers and their difference is 0. A very simple output of the code is:

In [3]:
while(a != b):
    
    if(a > b):
        a = a - b
    else:
        b = b - a
print(a)

9


Of course, this can also be done through recursion (It is a bit more advanced, but good practice for future coding exercises)

In [6]:
a = 45
b = 63

def gcd(a,b):
     
    # breakpoint()
    # Everything divides 0 
    if (a == 0):
        return b
    if (b == 0):
        return a
 
    # base case
    if (a == b):
        return a
 
    # a is greater
    if (a > b):
        return gcd(a-b, b)
    return gcd(a, b-a)

print(gcd(a,b))

9


What about the least common multiple? There are, of course, plenty of ways to achieve this, but let us recall a fundamental
property in number theory which you have studied:

$a \cdot b = gcd(a,b) \cdot lcm(a,b) $

In [12]:
def lcm(a,b):
    return a*b/gcd(a,b)

lcm(a,b)

315.0

### Small taster of cryptography

Although they seem quite trivial, the whole modern cryptography relies on the Euclidian algorithms shown above and some more number theory, for manipulating huge numbers. For instance, calculating the remainder $2^{83} \bmod 5 $ is a huge pain and plenty of work has to be done without some proper mathematical knowledge. However, there are some proper theorems which come in hand perfectly at such type of problems. In this way, digits and characters are encoded in the most awkward possible combinations, to encrypt the messages. The RSA Algorithm is typical for what we have just studied here: https://en.wikipedia.org/wiki/RSA_%28cryptosystem%29 

This makes use of the whole discrete algebra to encode messages. Cracking the encoded message is a significant issue and constitutes a topic of research nowadays. Why? Huge numbers (which are typically used for encoding) are very hard to factorize (this is the way to get to the original message). One solution to the whole issue is given by the quantum computing (https://www.technologyreview.com/2019/05/30/65724/how-a-quantum-computer-could-break-2048-bit-rsa-encryption-in-8-hours/), but this is a totally different matter.