# Part 4, Topic 2: CPA on Firmware Implementation of AES

**SUMMARY**: *By now, you'll have used a DPA attack to break AES. While this method has its place in side channel attacks, it often requires a large number of traces to break AES and can suffer from additional issues like ghost peaks.*

*We've also learned in the previous lab that there is a very linear relationship between the hamming weight of the SBox output and the power consumption at that point. Instead of checking average power consumption over many traces to see if a guessed subkey is correct, we can instead check if our guessed subkey also has this linear relationship with the device's power consumption across a set of traces. Like with DPA, we'll need to repeat this measurement at each point in time along the power trace.*

*To get an objective measurement of how linear this relationship is, we'll be developing some code to calculate the Pearson correlation coefficient.*

**LEARNING OUTCOMES:**
* Developing an algorithm based on a mathematical description
* Verify that correlation can be used to break a single byte of AES
* Extend the single byte attack to the rest of the key

## Prerequisites

This notebook will build upon previous ones. Make sure you've completed the following tutorials and their prerequisites:

* ☑ Part 3 notebooks (you should be comfortable with running an attack on AES)
* ☑ Power and Hamming Weight Relationship (we'll be using information from this tutorial)

## AES Trace Capture

Our first step will be to send some plaintext to the target device and observe its power consumption during the encryption. The capture loop will be the same as in the DPA attack. This time, however, we'll only need 50 traces to recover the key, a major improvement over the last attack!

Depending what you are using, you can complete this either by:

* Capturing new traces from a physical device.
* Reading pre-recorded data from a file.

You get to choose your adventure - see the two notebooks with the same name of this, but called `(SIMULATED)` or `(HARDWARE)` to continue. Inside those notebooks you should get some code to copy into the following section, which will define the capture function.

Be sure you get the `"✔️ OK to continue!"` print once you run the cell afterwards, otherwise things will fail later on!

In [None]:
# There is a file with the same name of this lab but (HARDWARE) in title for using CW-Nano/CW-Lite/CW-Pro
# There is a file with the same name of this lab but (SIMULATED) in title for using recorded data
raise NotImplementedError("Insert code from (HARDWARE) or (SIMULATED) Notebook Here")

In [None]:
assert len(trace_array) == 50
print("✔️ OK to continue!")

Again, let's quickly plot a trace to make sure everything looks as expected:

In [None]:
%matplotlib notebook
import matplotlib.pylab as plt

# ###################
# Add your code here
# ###################
raise NotImplementedError("Add your code here, and delete this.")

## AES Model and Hamming Weight

Like with the previous tutorial, we'll need to be able to easily grab what the sbox output will be for a given plaintext and key, as well as get the hamming weight of numbers between 0 and 255:

In [None]:
# ###################
# Add your code here
# ###################
raise NotImplementedError("Add your code here, and delete this.")

Verify that your model is correct:

In [None]:
assert HW[aes_internal(0xA1, 0x79)] == 3
assert HW[aes_internal(0x22, 0xB1)] == 5
print("✔️ OK to continue!")

## Developing our Correlation Algorithm 

As we discussed earlier, we'll be testing how good our guess is using a measurement called the Pearson correlation coefficient, which measures the linear correlation between two datasets. 

The actual algorithm is as follows for datasets $X$ and $Y$ of length $N$, with means of $\bar{X}$ and $\bar{Y}$, respectively:

$$r = \frac{cov(X, Y)}{\sigma_X \sigma_Y}$$

$cov(X, Y)$ is the covariance of `X` and `Y` and can be calculated as follows:

$$cov(X, Y) = \sum_{n=1}^{N}[(Y_n - \bar{Y})(X_n - \bar{X})]$$

$\sigma_X$ and $\sigma_Y$ are the standard deviation of the two datasets. This value can be calculated with the following equation:

$$\sigma_X = \sqrt{\sum_{n=1}^{N}(X_n - \bar{X})^2}$$

As you can see, the calulation is actually broken down pretty nicely into some smaller chunks that we can implement with some simple functions. While we could use a library to calculate all this stuff for us, being able to implement a mathematical algorithm in code is a useful skill to develop. 

To start, build the following functions:

1. `mean(X)` to calculate the mean of a dataset (the mean being `X_bar` that will be used elsewhere).
1. `std_dev(X, X_bar)` to calculate the standard deviation of a dataset. We'll need to reuse the mean for the covariance, so it makes more sense to calculate it once and pass it in to each function
1. `cov(X, X_bar, Y, Y_bar)` to calculate the covariance of two datasets. Again, we can just pass in the means we calculate for std_dev here.

**HINT: You can use `np.sum(X, axis=0)` to replace all of the $\sum$ from earlier. The argument `axis=0` will sum across columns, allowing us to use a single `mean`, `std_dev`, and `cov` call for the entire power trace**

In [None]:
def mean(X):
    raise NotImplementedError("Add your code here, and delete this.")

def std_dev(X, X_bar):
    raise NotImplementedError("Add your code here, and delete this.")

def cov(X, X_bar, Y, Y_bar):
    raise NotImplementedError("Add your code here, and delete this.")

Let's quickly check to make sure everything's as expected. The following blocks will run some test vectors on your functions, confirm you get the correct answer:

In [None]:
a = np.array([[5, 3, 4, 4, 5, 6],
             [27, 2, 3, 4, 12, 6],
              [1, 3, 5, 4, 5, 6],
              [1, 2, 3, 4, 5, 6],
             ]).transpose()
a_bar = mean(a)
b = np.array([[5, 4, 3, 2, 1, 3]]).transpose()
b_bar = mean(b)

o_a = std_dev(a, a_bar)
o_b = std_dev(b, b_bar)

ab_cov = cov(a, a_bar, b, b_bar)

In [None]:
assert (a_bar == np.array([4.5, 9., 4., 3.5])).all()
assert (b_bar == np.array([3.])).all()
assert (o_a[3] > 4.1833001 and o_a[3] < 4.1833002)
assert (o_b[0] > 3.162277 and o_b[0] < 3.162278)
assert (ab_cov == np.array([-1., 28., -9., -10.])).all()
print("✔️ OK to continue!")

Now that we've got all the building blocks to our correlation function, let's see if we can put everything together and break a single byte of AES. In order to do this, let's take a closer look at what we're trying to do and the data we've got.

## Correlation Data

Remember that the general correlation formula for two datasets $X$  and $Y$ is:

$$r = \frac{cov(X, Y)}{\sigma_X \sigma_Y}$$

We are going to be correlateing between a power measurment (`trace_array`) and Hamming weight of a key guess. First let's look at our power trace array:

In [None]:
print(trace_array)

You should have something like the following:
```python
[
    [point_0, point_1, point_2, ...], # trace 0
    [point_0, point_1, point_2, ...], # trace 1
    [point_0, point_1, point_2, ...], # trace 2
    ...
]
```

where the rows of the array are the different traces we captured and the columns of the array are the different points in those traces. The columns here will be one of the two datasets for our correlation equation. The other dataset will be the hamming weight of the SBox output, for a given *key guess* `key` of the byte we are looking at:

```python
[
      [HW[aes_internal(plaintext0[0], key[0])], # trace 0
      [HW[aes_internal(plaintext1[0], key[0])], # trace 1
      [HW[aes_internal(plaintext2[0], key[0])], # trace 2
      ...
]
```

which we'll shorten to:

```python
[
      [hw], # trace 1
      [hw], # trace 2
      [hw], # trace 3
      ...
]
```

Like with the DPA attack, we don't know where the encryption is occurring, meaning we have to repeat the correlation calculation for each column in the trace array, with the largest correlation being our best guess for where the SBox output is happening. We obviously also don't know the key (that's the thing we're trying to find!), so we'll also need to repeat the best correlation calculation for each possible value of `key[0]` (0 to 255). The key with the highest absolute correlation is our best guess for the value of the key byte.

## Correlation Attack Implementaiton

The correlation attack is basically to calculate this:

$$r = \frac{cov(X, Y)}{\sigma_X \sigma_Y}$$

Where:

* $X$ is a power trace sample point
* $Y$ is an internal state guess

Remember you already defined (and tested) the functions that generate `cov(X,Y)`, and also the `std_dev(X)` ($\sigma_X$). The actual API for those functions requires to to pass in the `mean()` as well as a seperate argument (passed in for computational efficiency, since it is re-used).

### Hint: Using Vectors

We should mention a few way to improve your work.

A really nice feature of numpy is that we can do the correlation calculations across the entire trace at once (mean, std_dev, cov). That means there's no need to do:

```python
t_bar = []
for point_num in range(len(trace_array[0])):
    t_bar.append(mean(trace_array[:,point_num]))
    # and so on...

t_bar = np.array(t_bar)
```

when we can do

```python
t_bar = mean(trace_array)
```

and get the same thing back. The only caveat being that we need to make sure that the columns and rows of our arrays are the right way around (i.e. make sure your hamming weight array has 1 column and 50 rows and not the other way around). If you find it easier to construct and array one way and not the other, you can use the `.transpose()` method to swap the rows and columns.

### Finding Largest Correlation

Once you've got all your correlations for a particular key guess, you want to find the largest absolute correlation. We're taking the absolute value of the correlation here since we only care that the relation between hamming weight and the power trace is linear, not that the slope is positive or negative. `max(abs(correlations))` will do that for you.

### Enumerating Guesses

Perform this for every possible value of the key byte (aka 0 to 255) and the one with the largest correlation is your best guess for the key. It's up to you how you want to extract this information from your loop, but one way of doing it is to stick the best guess for each of your key guesses in an array. Once you've gone through all the key guesses, you can extract the best guess with `np.argmax(maxcpa)` and the correlation of that guess with `max(maxcpa)`.

### Equation to Python

We can take the earlier equation and plug in some of our Python variable names to give you a good starting point. We are using:

* $r$ = `cpaoutput`
* $X$ = `t` or `trace_array` (the average of it called `t_bar`).
* $Y$ = `hws` (the mean of it called `hws_bar`).

Our equation now looks something like this:

$$cpaoutput = \frac{cov(X, Y)}{\sigma_X \sigma_Y}$$

This should almost directly convert to Python code!

In [None]:
maxcpa = [0] * 256

# we don't need to redo the mean and std dev calculations 
# for each key guess
t_bar = mean(trace_array) 
o_t = std_dev(trace_array, t_bar)

for kguess in tnrange(0, 256):
    hws = np.array([[HW[aes_internal(textin[0],kguess)] for textin in textin_array]]).transpose()
    
    # ###################
    # Add your code here  
    # ###################
    cpaoutput = ???
    maxcpa[kguess] = ???  
    raise NotImplementedError("Add your code here, and delete this.")
    
print("Key guess: ", hex(guess))
print("Correlation: ", guess_corr)

Let's make sure we've recovered the byte correctly:

In [None]:
assert guess == 0x2b
print("✔️ OK to continue!")

To break the rest of the key, simply repeat the attack for the rest of the bytes of the key. Don't forget to update your code from above to use the correct byte of the plaintext!

In [None]:
t_bar = np.sum(trace_array, axis=0)/len(trace_array)
o_t = np.sqrt(np.sum((trace_array - t_bar)**2, axis=0))

cparefs = [0] * 16 #put your key byte guess correlations here
bestguess = [0] * 16 #put your key byte guesses here

for bnum in tnrange(0, 16):
    maxcpa = [0] * 256
    for kguess in range(0, 256):
        # ###################
        # Add your code here
        # ###################
        raise NotImplementedError("Add your code here, and delete this.")

print("Best Key Guess: ", end="")
for b in bestguess: print("%02x " % b, end="")
print("\n", cparefs)

With one final check to make sure you've got the correct key:

In [None]:
for bnum in range(16):
    assert bestguess[bnum] == key[bnum], \
    "Byte {} failed, expected {:02X} got {:02X}".format(bnum, key[bnum], bestguess[bnum])
print("✔️ OK to continue!")

We're done! There's actually a lot of room to expand on this attack:

1. Currently, the loop needs to go through all the traces before it can return a correlation. This isn't too bad for a short attack, for a much longer one (think 10k+ traces) we won't get any feedback from the attack until it's finished. Also, if we didn't capture enough traces for the attack, the entire analysis calculation needs to be repeated! Instead of using the original correlation equation, we can instead use an equivalent "online" version that can be easily updated with more traces: $$r_{i,j} = \frac{D\sum_{d=1}^{D}h_{d,i}t_{d,j}-\sum_{d=1}^{D}h_{d,i}\sum_{d=1}^{D}t_{d,j}}{\sqrt{((\sum_{d=1}^Dh_{d,i})^2-D\sum_{d=1}^Dh_{d,i}^2)-((\sum_{d=1}^Dt_{d,j})^2-D\sum_{d=1}^Dh_{d,j}^2)}}$$
where

| **Equation** | **Python Variable** | **Value**  | 
|--------------|---------------------|------------|
|  d           |       tnum          | trace number |
|  i           |       kguess        | subkey guess |
| j | j index trace point | sample point in trace |
| h | hypint | guess for power consumption | 
| t | traces | traces | 

2. There's a lot more we can learn from the attack other than the key. For example, we could plot how far away the correct key guess is from the top spot (called the partial guessing entropy or PGE) vs. how many traces we used, giving us a better idea of how many traces we needed to actually recover the correct key. We also might want to plot how correlation for a given key guess changes over time.

This "online" correlation equation is the one that the subject of the next tutorial, ChipWhisperer Analyzer, actually uses. It also provides functions and methods for gathering and plotting some interesting statistics.

---
<small>NO-FUN DISCLAIMER: This material is Copyright (C) NewAE Technology Inc., 2015-2020. ChipWhisperer is a trademark of NewAE Technology Inc., claimed in all jurisdictions, and registered in at least the United States of America, European Union, and Peoples Republic of China.

Tutorials derived from our open-source work must be released under the associated open-source license, and notice of the source must be *clearly displayed*. Only original copyright holders may license or authorize other distribution - while NewAE Technology Inc. holds the copyright for many tutorials, the github repository includes community contributions which we cannot license under special terms and **must** be maintained as an open-source release. Please contact us for special permissions (where possible).

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.</small>