![Coding_Club_Header](https://raw.githubusercontent.com/nhs-pycom/coding-club/main/img/coding_club_header_small.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nhs-pycom/coding-club/blob/main/introduction-to-applied-cryptography/introduction-to-applied-cryptography.ipynb)

In [None]:
!sudo apt update
!pip install IPython --upgrade
!pip install cryptography
!pip install libnum


# Coding Club - Applied Cryptography

Cryptography is a component of security in infosec, more commonly referred to as **Cybersecurity**.

At the heart of security is the CIA...

Not those guys, the acronym which stands for Confidentiality, Integrity, Availability (CIA).

Most of the tools extend beyond python and as such we are limited to the parts we can meaningfully explore with python.

If you take away two thing today let it be:

- Security is not an algorithm it is a set of practices. Good OpsSec (operational security) is just as important as the algorithms, tools and technologies used to secure data.

- Do not create and deploy to production your own crypto.

## Agenda

- Ciphers Fundamentals
- Hashing
- Symmetric Keys
- Asymmetric Encryption
- Interactive Quiz


To start we need the [cryptography](https://cryptography.io/en/latest/installation/) library:

    pip install cryptography



In [None]:
from cryptography.fernet import Fernet
key = Fernet.generate_key()
f = Fernet(key)
token = f.encrypt(b"My deep dark secret: I auditioned for the position of scary spice in the Spice Girls!")
print(token)
print(key)

In [None]:
f.decrypt(token)

That's it we can all go home. I hope my secret is safe with you.

Ok, your turn generate a key, encrypt your secret and share the token.

In [None]:
key = Fernet.generate_key()
f = Fernet(key)
token = f.encrypt(b"My deep dark secret: ...")

In [None]:
token = b'gAAAAABjD1MB8CQY_4IeQfnuZjzTp8yFkXG77PCj8GGTaunaUzk7yQ9BJf_bOLRjlYwswKK_0H_QFlbSJGm_YjlPwFiUOZzpn8vafGxH8ft4X5FeUCA0uwkKqjNz7XO0eyszodVcf-lulXdnHtUQts24tZle1pilUH4AbtR1Jgd_nTNlryqCfTLkUzG-tK-4NQPaJDo0oVSK'
key = b'dC_0Fw_qiWfP3-DyIl5nhtJ_sleJ1UAPDCflOX2OhNs='
f = Fernet(key)
f.decrypt(token)

## Bob and Alice?

There are archetypes that are used when discussing how different actors participate with cryptographic systems. You will hear the name Bob and Alice used a lot and for a good reason, the archetypes help encapsulate a common behaviour or concept in a cryptographic system. There are many different characters but the ones below are frequently mentioned:

- Bob and Alice - Exchange messages or cryptographic keys.
- Eve - The eavesdropper.
- Trent - Trusted third party.
- Mallory - Malicious Actor.

## Part 1 - Cipher Fundamentals

- Encoding
    - ASCII
    - Base64
    - Binary
    - Hexadecimal
- Prime Factorisation
- Modulus and Exponentiation

To understand what is going on here we have to cover the fundamentals of cryptography and cryptographic primitives.
We start with encoding.

ASCII is the most common format for entering characters into a computer. The table below tabulates the binary value for the ASCII characters:

| Val | Enc   | Val  | Enc   | Val   | Enc   | Val   | Enc   |
|:---:|:-----:|:----:|:-----:|:-----:|:-----:|:-----:|:-----:|
|  0  |  A | 16 | Q | 32 | g | 48 | w |
|  1  |  B | 17 | R | 33 | h | 49 | x |
|  2  | C  | 18 | S | 34 | i | 50 | y |
|  3  |  D | 19 | T | 35 | j | 51 | z |
|  4  |  E | 20 | U | 36 | k | 52 | 0 |
|  5  |  F | 21 | V | 37 | l | 53 | 1 |
|  6  |  G | 22 | W | 38 | m | 54 | 2 |
|  7  |  H | 23 | X | 39 | n | 55 | 3 |
|  8  |  I | 24 | Y | 40 | o | 56 | 4 |
|  9  |  J | 25 | Z | 41 | p | 57 | 5 |
| 10  |  K | 26 | a | 42 | q | 58 | 6  |
| 11  |  L | 27 | b | 43 | r | 59 | 7 |
| 12  |  M | 28 | c | 44 | s | 60 | 8 |
| 13  |  N | 29 | d | 45 | t | 61 | 9 |
| 14  |  O | 30 | e | 46 | u | 62 | * |
| 15  |  P | 31 | f | 47 | v | 63 | $\backslash$ |

In [None]:
import base64
import binascii

greeting = 'hello'

greeting_binary = ''.join(format(i, '08b') for i in bytearray(greeting, encoding='utf_8' ))
greeting_hexadecimal = binascii.hexlify(bytearray(greeting, encoding='utf_8' )).decode()
greeting_bytes = bytes(greeting, encoding='utf_8')
greeting_base64 = base64.b64encode(bytearray(greeting, encoding='utf_8' ), altchars=None)

print(f"The binary representation of the greeting hello is: {greeting_binary}")
print(f"The hexadecimal representation of the greeting hello is: {greeting_hexadecimal}")
print(f"The bytes representation of the greeting hello is: {greeting_bytes}")
print(f"The base64 representation of the greeting hello is: {greeting_base64}")

So, what is going on under the hood?

The 8-bit binary representation of the ASCII for "hello" is:

\begin{align}
\nonumber "hello" \rightarrow \overbrace{
\underbrace{01101000}_{h=104_{10}}
\quad 
\underbrace{01100101}_{e=101_{10}} 
\quad 
\underbrace{01101100}_{l=108_{10}} 
\quad
\underbrace{01101100}_{l=108_{10}} 
\quad 
\underbrace{01101111}_{o=111_{10}}}^{binary}
\end{align}

To convert the 8-bit binary ASCII to hexadecimal group the binary into 4-bits (right to left). When converted the hexadecimal for "hello" is **68656c6c6f**
represented by the characters (0-9,A-F).

\begin{align}
\nonumber "hello" \rightarrow \overbrace{
\underbrace{0110 \quad 1000}_{h=68_{16}} 
\quad
\underbrace{0110 \quad 0101}_{e=65_{16}}
\quad 
\underbrace{0110 \quad 1100}_{l=6c_{16}} 
\quad 
\underbrace{0110 \quad 1100}_{l=6c_{16}} 
\quad 
\underbrace{0110 \quad 1111}_{o=6f_{16}}}^{hexadecimal} 
\end{align}

To convert the 8-bit binary ASCII to Base-64 group the binary into 6 bytes (left to right). The binary representation for "hello" when converted to base-64 is **aGVsbG8=**:

\begin{align}
\nonumber "hello" \rightarrow \overbrace{
\underbrace{011010}_{a=26_{base64}}
\quad 
\underbrace{000110}_{G=6_{base64}} 
\quad 
\underbrace{010101}_{V=37_{base64}} 
\quad
\underbrace{101100}_{s=44_{base64}} 
\quad 
\underbrace{011011}_{b=27_{base64}}
\quad 
\underbrace{000110}_{G=6_{base64}}
\quad 
\underbrace{111100}_{8=60_{base64}}}^{base64}
\end{align}

### Question 1:

Do you notice anything significant about the last group and the ASCII representation?

### Exercise 1:

Now that we know the difference between encoding formats, please use the functions below to find the ASCII representation of the following values:

- 01110000011000010111000001110010011010010110101101100001 (hint:split into8-bit [int('01001101',2),...)
-  cG9zaA==
-  6c656f70617264207072696e74206f6e736965


In [None]:
def decode_base64(stream: bytes): 
  return base64.b64decode(stream)

def decode_hex(stream: str): 
  return bytes.fromhex(stream).decode('utf-8')

def decode_binary(*stream): 
  return "".join([chr(i) for i in stream])

In [None]:
a = [int('01110000', 2), int('01100001',2), int('01110000', 2), int('01110010', 2), int('01101001', 2), int('01101011', 2), int('01100001', 2)]
b = b'cG9zaA=='
c = '6c656f70617264207072696e74206f6e736965'

Changing format may make it less readable by humans but *obfuscation* is not encryption. It should not be as easy as converting to another encoding format to decrypt a message. To underscore the point try the next exercise.

### Exercise 2:

These database entries have been "encrypted" using base64. Crack the database and retrieve the passwords.

In [None]:
spots_store_db = {
    "Michael Phelps": b'Ymx1ZQ==',
    "Jesse Owens": b'cmVk',
    "Usain Bolt": b'Z3JlZW4=',
    "Kelly Holmes": b'YnJvd24=',
    "Linford Christie": b'YmxhY2s=',
    "Carl Lewis": b'eWVsbG93',
    "Mo Farah": b'b3Jhbmdl',
    "Simone Biles": b'Z3JlZW4=',
    "Mohammed Ali":b'b3Jhbmdl'
}

Did you notice anything about the repeated password entries?

### Prime Factorisation and Greatest Common Divisor (GCD)

Large prime numbers are very important in cryptography. Prime numbers have interesting properties that make them useful when building cryptographic primitives like hash functions and encryption ciphers. 

In [None]:
def prime_factorisation(p): 
  d, prime_factors = 2, [] 
  while d*d <= p:
    while (p % d) == 0: 
      prime_factors.append(d) 
      p //= d
    d += 1 
  if p > 1:
    prime_factors.append(p) 
  return prime_factors
p = 956
print(prime_factorisation(p))

### Exercise 3:

Factorise the following prime numbers (please go in sequential order):

1.   229 (8 bit)
2.   51131 (16 bit)
3.   3991116511 (32 bit)
4.   17919019621785889583 (64 bit)
5.   311230909699249272075466980068705556013 (128 bit)

What do you observe?

Now imagine trying to factorise two large primes (secret keys) that have been multiplied together. This method is deployed in the RSA algorithm covered later.

In general to increase the security of a well tested, secure cryptographic hash function or encryption scheme you can increase the bits required for the secrets or modulus.

Brute forcing a key can be made more difficult by increasing the number of bits.
4 bits (0000 - 1111) so $2^{4}$. The more bits the more computational power and time required to search the **keyspace**. 

The sweet spot where it becomes infeasible to find a key within an adequate time is at $72$ bits. The time it would take to crack the key can be calculated by, taking a key and the number of keys required to be tried which is approximately half and then the multiply that by the time taken to try one key. For example a 64-bit key has $1.84\times10^{19}$ combinations. If it can be cracked in $0.9\times10^{19}$ attempts, using a 1GHz clock (1ns) and one cycle for each attempt then it will take:

\begin{align}
	\nonumber 1ns=1\times10^{-9} seconds \\
	\nonumber \therefore(1\times10^{-9})\times0.9\times10^{19} = 1\times0.9\times10^{10} \\
	\nonumber = 9,000,000,000 \quad\text{seconds} \\
	\nonumber = 9,000,000,000\div60=15,000,000 \quad\text{minutes} \\
	\nonumber = 15.000,000\div60= 2,500,000\quad\text{hours} \\
	\nonumber = 2,500,000\div24=104,166.6667\quad\text{days} \\
	\nonumber = 104,166.6667\div365=285\quad\text{years (3 s.f.)}
\end{align}


### Question 2:

Why would a well tested and secure algorithm need increased security?

### Greatest Common Divisor (GCD)

The GCD is the largest integer that two numbers are both divible without a remainder. Below is an example of the Euclidean algorithm. Two numbers are said to be co-prime if they do not share co-factors apart from 1.

In [None]:
def gcd(a:int, b:int): 
  rem: int = 0
  while b != 0: 
    rem = a % b
    a=b
    b = rem 
  return a

### Exercise 4:

Find the gcd for the following, which pair are co-prime?

1.  (42, 56)
2.  (96, 172)
3.  (5435, 634)

### The Modulus and Eponentiation

And so we come to the modulus. In cryptography the mod operation uses a prime number.

\begin{align}
C= yx  \mod \text{ p}
\end{align}

If given a value for $x$ and $y$, 50 and 12 respectively, and a prime of 229. It is difficult to find the exact value of $x$ used in a scheme, since many values can get the same result. This feature makes determining secret keys used to encipher messages difficult to decrypt.

In [None]:
C=12 * 50 % 229
print(C)

### Exercise 5:

Find some more results where $x=142$.

Exponentiation ciphers are similar and take the form:


\begin{align}
C= M^{e} \mod \text{ p}
\end{align}



### Summary

We have covered:
- How to encode and decode ASCII, binary, hexadecimal and base64.

- The significance of large prime numbers for prime factorisation and their utility in cryptography. 


## Part 2 Symmetric Key

There are two definitive types of symmetric key encryption:

- Block
- Stream

Symmetric Key Encryption Algorithms include:

- 3DES - Block
- AES - Block
- Twofish - Block
- RC4 - Stream 
- ChaCha20 - Stream

Symmetric key encryption algorithm has a shared private key.

Below is a code snippet from my course (Source: Professor Bill Buchanan OBE).

In [None]:
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.primitives import padding
import hashlib

def encrypt(plaintext,key, mode): 
  method=algorithms.AES(key)
  cipher = Cipher(method, mode)
  encryptor = cipher.encryptor()
  ct = encryptor.update(plaintext) + encryptor.finalize() 
  return(ct)

def decrypt(ciphertext,key, mode): 
  method=algorithms.AES(key) 
  cipher = Cipher(method, mode)
  decryptor = cipher.decryptor()
  pl = decryptor.update(ciphertext) + decryptor.finalize() 
  return(pl)

def pad(data,size=128):
  padder = padding.PKCS7(size).padder() 
  padded_data = padder.update(data) 
  padded_data += padder.finalize() 
  return(padded_data)

def unpad(data,size=128):
  padder = padding.PKCS7(size).unpadder() 
  unpadded_data = padder.update(data) 
  unpadded_data += padder.finalize() 
  return(unpadded_data)

In [None]:
val= 'hello' 
password= 'secret' 
plaintext=val
# a key is generated from a hash of the password. In the example we did at
# the beginning this key is created for you.
key = hashlib.sha256(password.encode()).digest()
plaintext=pad(plaintext.encode())
print("After padding (CMS): ",binascii.hexlify(bytearray(plaintext)))
ciphertext = encrypt(plaintext,key,modes.ECB())
print("Cipher (ECB): ",binascii.hexlify(bytearray(ciphertext)))
plaintext = decrypt(ciphertext,key,modes.ECB())
plaintext = unpad(plaintext)
print(" decrypt: ",plaintext.decode())

### Salt/ Initialisation Vector (IV)

ECB has a well known problem. Much like you saw with base64 the same plaintext alway returns the same cipher text. To avoid this AES uses what is known as a salt or initialisation vector. 


### Padding Value

The value of the padding equals the number of padding bytes.

## Part 3 Hashing

A hash function is a cryptographic primitive that takes an input of any size and produces a fixed-size message digest.

A cryptographic hash is:

- Deterministic  - The same input will yield the same output. 
- Collision resistant - Different inputs will not yield the same output.
- One way function - Prohibitively hard to derive the inverse.

Examples of a hash function are:

- MD5
- SHA1
- SHA256
- BLAKE2b

In [None]:
message="The quick brown fox jumps over the lazy dog".encode('utf-8')
message_digest = hashlib.sha256()
message_digest.update(message)
message_digest.hexdigest()

### Exercise 6:

Using the example above generate a hash value for a fruit (singular, all lowercase) and stick the resulting message digest in the chat.

In [None]:
from string import hexdigits

fruits = [ "Apple", "Apricot", "Avocado", "Banana", "Bilberry", "Blackberry", 
              "Blackcurrant", "Blueberry", "Boysenberry", "Currant", "Cherry", 
              "Cherimoya", "Chico fruit", "Cloudberry","Coconut", "Cranberry", 
              "Cucumber", "Custard apple", "Damson", "Date", "Dragonfruit", 
              "Durian", "Elderberry", "Feijoa", "Fig", "Goji berry", "'Gooseberry",
              "Grape", "Raisin", "Grapefruit", "Guava", "Honeyberry", 
              "Huckleberry", "Jabuticaba", "Jackfruit", "Jambul", "Jujube", 
              "Juniper berry", "Kiwano", "Kiwifruit", 
              "Kumquat", "Lemon", "Lime", "Loquat", "Longan", "Lychee", "Mango",
              "Mangosteen", "Marionberry", "Melon", "Cantaloupe", "Honeydew", 
              "Watermelon", "Miracle fruit", "Mulberry", "Nectarine", "Nance", 
              "Olive", "Orange", "Blood orange", "Clementine", "Mandarine", 
              "Tangerine", "Papaya", "Passionfruit", "Peach", "Pear", "Persimmon",
              "Physalis", "Plantain", "Plum", "Prune", "Pineapple", "Plumcot", 
              "Pomegranate", "Pomelo", "Purple mangosteen", "Quince, Raspberry", 
              "Salmonberry", "Rambutan", "Redcurrant", "Salal berry", "Salak", 
              "Satsuma", "Soursop", "Star fruit", "Solanum quitoense", "Strawberry", 
              "Tamarillo", "Tamarind","Ugli fruit", "Yuzu", "Tomatoe" ]

# Lowercase and encode the fruit list
fruits_lc = [ i.encode('utf-8').lower() for i in fruits]
print(fruits_lc)

hash_table = {}

# create our hash table
for i in  fruits_lc:
  fruit_digest = hashlib.sha256()
  fruit_digest.update(i)
  hash_table.update({fruit_digest.hexdigest(): i})

print(hash_table)

# person as key, hash as value
code_club_fruits_hash = {}

for k, v in code_club_fruits_hash.items():
  if v in hash_table.keys():
    print( "\n",f"{k}:", hash_table.get(v))

### Exercise 7:

These database passwords have been "encrypted" using sha256 hash functions. All the passwords are colours (lowercase). Crack the database and reveal the passwords.

In [None]:
colours = ["red", "blue", "black", "orange", "yellow", "green", "purple"]

# create a hash table with hexdigest as key and colour as value
hash_table = {}

guitar_store_db = {
    "Jimmi Hendrix" : 'b1f51a511f1da0cd348b8f8598db32e61cb963e5fc69e2b41485bf99590ed75a',
    "Prince": '8e0a1b0ada42172886fd1297e25abf99f14396a9400acbd5f20da20289cff02f',
    "Victor Wooten": '1b4c9133da73a711322404314402765ab0d23fd362a167d6f0c65bb215113d94',
    "John Myung": 'c006c7e3ab14d686f63524136f1ec7c5e553d839bc01c851e4dc9de2bdbfc589',
    "Eric Clapton": 'c685a2c9bab235ccdd2ab0ea92281a521c8aaf37895493d080070ea00fc7f5d7',
    "Slash": '16477688c0e00699c6cfa4497a3612d7e83c532062b64b250fed8908128ed548',
    "BB King": '1b4c9133da73a711322404314402765ab0d23fd362a167d6f0c65bb215113d94',
}

for k, v in guitar_store_db.items():
  if v in hash_table.keys():
    print( "\n",f"{k}:", hash_table.get(v))

There are tools that help automate the hash cracking process.

## Asymmetric Key


The work horse of the internet is the asymmetric key algorithm RSA. 


- Generate two large primes $p$ and $q$, such that $p\neq q$.
- Calculate $n=pq$ and $\phi=(p-1)(q-1)$, where  $\phi$ is the value of Eulers $\phi$-function $\phi(n)$
- Using trial and error find $e$ where $gcd(e, \phi)=1 \text{ and } \\ e \text{ with } 1<e< \phi$
- Calculate the value of decryption key $d$ such that $ed\equiv1\text{ mod }\phi$ and $1<d<\phi$

\\

\begin{align}
	N (modulus) = p\cdot q \\
	\phi = (p-1)(q-1) \\
	\nonumber \text{Select a value for $e$ that has no common factors} \\
	\nonumber \text{Public Encryption Key $[e,n]$ } \\	
	\nonumber \text{Find $d$ (the inverse mod )value for the decryption key} \\	
	\nonumber \text{Decryption Key $[d,n]$ } \\
	\text{if}\quad p=3, q=11, e = 3 \\
	N=33 \\
	E(3,33) \\
	\phi = (3-1)(11-1) = 20 \\
	(d\times e)\mod 20 = 1 \\
	(d\times 3)\mod 20 = 1 \\
	d = 7 \\
	D(7, 33) \\
	M=5 \quad \text{M=Message} \\ 
	Decipher = C^{d} mod N \\
	Cipher=M^{e}\mod N \\
	\nonumber\therefore \\
	Cipher = 5^{3}\mod 33 = 26 \\
	Decipher = (26)^{7} \mod 33 = 5	
\end{align} 


In [None]:
import libnum

p=11
q=17
N=p*q
PHI=(p-1)*(q-1)
OM=81
e=3
C=pow(OM,e,N)
d=libnum.invmod(e,PHI)
DM=pow(C,d,N)
print(DM)

## Quiz

I hope you have enjoyed the session. To finish off let's have a little quiz. 

Take a look at these links to explore the different roles and responsibilities in the cybersecurity field:

- [The UK Cybersecurity Council](https://www.ukcybersecuritycouncil.org.uk/qualifications-and-careers/careers-route-map/)

Other useful links:

- [The National Cyber Security Centre](https://www.ncsc.gov.uk)
- [asecuritysite](https://asecuritysite.com)