# Introduction to DPA & HW Assumption

Supported setups:

SCOPES:

* OPENADC

PLATFORMS:

* CWLITEARM
* CWLITEXMEGA

In [None]:
SCOPETYPE = 'OPENADC'
PLATFORM = 'CWLITEARM'
CRYPTO_TARGET = 'TINYAES128C'

In [None]:
%%bash -s "$PLATFORM" "$CRYPTO_TARGET"
cd ../hardware/victims/firmware/simpleserial-aes
make PLATFORM=$1 CRYPTO_TARGET=$2

## DPA Attack Theory

It's first suggested that you complete PA_Intro_1 and PA_Intro_2, since that will introduce Jupyter and show how to interface with ChipWhisperer using Python. Assuming you've done that, let's look at what we are trying to accomplish here. Going back to the theory, remember that we have an assumed relationship between power on the data lines and measured power consumption. You can see this in the following:

![Power lines](img/dpa_4bits_powerhw_scaled.png)

How do we prove this is true? Let's plot the Hamming weight (HW) of the data to figure this out along with the power traces! We are going to use the AES algorithm (it doesn't matter what we use), because that is an easy firmware to use as part of our attack.

## Capturing Power Traces

Capturing power traces will be very similar to previous tutorials, except this time we'll be using a loop to capture multiple traces, as well as numpy to store them. It's not necessary, but we'll also plot the trace we get using `bokeh`.

### Setup

We'll use some helper scripts to make setup and programming easier. If you're using an XMEGA or STM (CWLITEARM) target, binaries with the correct should be setup for you:

In [None]:
%run "Helper_Scripts/Setup_Generic.ipynb"

In [None]:
fw_path = "../hardware/victims/firmware/simpleserial-aes/simpleserial-aes-{}.hex".format(PLATFORM)

In [None]:
cw.program_target(scope, prog, fw_path)

### Capturing Traces

Below you can see the capture loop. The main body of the loop loads some new plaintext, arms the scope, sends the key and plaintext, then finally records and appends our new trace to the `traces[]` list. At the end, we convert the trace data to numpy arrays, since that's what we'll be using for analysis.

In [None]:
%run "Helper_Scripts/plot.ipynb"
plot = real_time_plot(plot_len=3000)

In [None]:
#Capture Traces
from tqdm import tnrange
import numpy as np
import time

ktp = cw.ktp.Basic()

traces = []
N = 1000  # Number of traces

for i in tnrange(N, desc='Capturing traces'):
    key, text = ktp.next()  # manual creation of a key, text pair can be substituted here

    trace = cw.capture_trace(scope, target, text, key)
    
    if trace is None:
        continue
    traces.append(trace)
    plot.send(trace)

#Convert traces to numpy arrays
trace_array = np.asarray([trace.wave for trace in traces])  # if you prefer to work with numpy array for number crunching
textin_array = np.asarray([trace.textin for trace in traces])
known_keys = np.asarray([trace.key for trace in traces])  # for fixed key, these keys are all the same

Now that we have our traces, we can also plot them individually:

In [None]:
%matplotlib notebook
import matplotlib.pylab as plt
plt.plot(trace_array[0])

In [None]:
# cleanup the connection to the target and scope
scope.dis()
target.dis()

## Trace Analysis

### Using the Trace Data

Now that we have some traces, let's look at what we've actually recorded. Looking at the earlier parts of the script, we can see that the trace data is in `trace_array`, while `textin_array` stores what we sent to our target to be encrypted. For now, let's get some basic information (the total number of traces, as well as the number of sample points in each trace) about the traces, since we'll need that later:

In [None]:
numtraces = np.shape(trace_array)[0] #total number of traces
numpoints = np.shape(trace_array)[1] #samples per trace

For the analysis, we'll need to loop over every byte in the key we want to attack, as well as every trace:
```python
for bnum in range(0, 16):
    for tnum in range(0, numtraces):
        pass
```
Though we didn't loop over them, note that each trace is made up of a bunch of sample points.
Let's take a closer look at AES so that we can replace that `pass` with some actual code.

### Calculating Hamming weight (HW) of Data

Now that we have some power traces of our target that we can use, we can move on to the next steps of our attack. Looking way back to how AES works, remember we are effectively attemping to target the position at the bottom of this figure:

![S-Box HW Leakage Point](img/Sbox_cpa_detail.png)

The objective is thus to determine the output of the S-Box, where the S-Box is defined as follows:

In [None]:
sbox = (
    0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5, 0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76,
    0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0, 0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0,
    0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc, 0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15,
    0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a, 0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75,
    0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0, 0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84,
    0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b, 0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf,
    0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85, 0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8,
    0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5, 0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2,
    0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x44, 0x17, 0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73,
    0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88, 0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb,
    0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c, 0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79,
    0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9, 0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08,
    0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6, 0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a,
    0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e, 0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e,
    0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94, 0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf,
    0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68, 0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16)

Thus we need to write a function taking a single byte of input, a single byte of the key, and return the output of the S-Box:

In [None]:
def intermediate(pt, keyguess):
    return sbox[pt ^ keyguess]

Finally, remember we want the Hamming Weight of the S-Box output. Our assumption is that the system is leaking the Hamming Weight of the output of that S-Box. As a dumb solution, we could first convert every number to binary and count the 1's:

```python
>>> bin(0x1F)
'0b11111'
>>> bin(0x1F).count('1')
5
```
This will ultimately be fairly slow. Instead we make a lookup table using this idea:

In [None]:
HW = [bin(n).count("1") for n in range(0, 256)]

def intermediate(pt, key):
    return sbox[pt ^ key]

#Example - PlainText is 0x12, key is 0xAB
HW[intermediate(0x12, 0xAB)]

### Plotting HW

Finally, what we are going to do is plot each of the different "classes" in a different color. With this we should see if there is some location that has relatively obvious difference in Hamming weight. We get that easily using the `HW` array and `intermediate()` function we defined earlier and a loop to plot all of the traces.

To make this easier, we can zoom in on some specific area. In the following example a small subset of the full capture is plotted only. You can more easily figure out what this point should be by using the CPA attack (we'll talk about later) which provides more information about where the leakage is happening. For now let's pretend we know already what a "good" point is:

In [None]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.palettes import brewer

output_notebook()
p = figure()

#Must run S-Box() script first to define the HW[] array and intermediate() function

#Must adjust these points for different compilers/targets
if PLATFORM == "CWLITEARM" or PLATFORM == "CW308_STM32F3":
    plot_start = 900
    plot_end = 1050
elif PLATFORM == "CWLITEXMEGA" or PLATFORM == "CW303":
    plot_start = 1370
    plot_end = 1410
elif PLATFORM == "CWNANO":
    plot_start = 590
    plot_end = 620


xrange = range(len(traces[0].wave))[plot_start:plot_end]
bnum = 0

color_mapper = brewer['Reds'][9]

for trace in traces:
    hw_of_byte = HW[intermediate(trace.textin[bnum], trace.key[bnum])]
    p.line(xrange, trace.wave[plot_start:plot_end], line_color=color_mapper[hw_of_byte])

show(p)

### Finding Average at Locations

To find the best place to get a good correlation between hamming weight and voltage we can write a loop to check the correlation at each point and find the maximum correlation. There are many point ranges you can use just pick a range that encompasses 500-1000 points. This should give you enough points to find a good correlation. Usually you will use much fewer than that. Also since we do not care if the correlation is positive or negative we can use the absolute value of the correlation to find the maximum.

In [None]:
import numpy as np

max_corr = 0
point = 0
for avg_point in range(500, 1400):
    hw_list = [[], [], [], [], [], [], [], [], []]
    for trace in traces:
        hw_of_byte = HW[intermediate(trace.textin[bnum], trace.key[bnum])]
        hw_list[hw_of_byte].append(trace.wave[avg_point])
    
    hw_mean_list = [np.mean(hw_list[i]) for i in range(0, 9)]
    
    corr = abs(np.corrcoef(range(1,9), hw_mean_list[1:9])[0,1])
    if corr > max_corr:
        max_corr = corr
        point = avg_point
        
print(max_corr, point)

If you go back to the previous plot and take a look at the point with the highest correlation you notice that you could have identified good points to try by visually looking for distinctive color gradients on the plot. Try a few different points that do not have a distinct gradient and see how the hamming weight vs. voltage plots looks like in the later steps.

So with an idea that there are differences, let's actually plot them to see how "linear" they are in real life. We're going to pick a point (again), and use that to get the averages. The following will find and and print the averages, this same code was used to determine the point with the highest correlation before. We are using it now to evaluate the voltage vs. hamming weight values for the point with the highest correlation:

In [None]:
# from point of max correlation
avg_point = point

hw_list = [[], [], [], [], [], [], [], [], []]
for trace in traces:
    hw_of_byte = HW[intermediate(trace.textin[bnum], trace.key[bnum])]
    hw_list[hw_of_byte].append(trace.wave[avg_point])
    
hw_mean_list = [np.mean(hw_list[i]) for i in range(0, 9)]
print(hw_list[8])
    
for hw in range(1, 9):
    print("HW " + str(hw) + ": " + str(hw_mean_list[hw]))

The above should look somewhat linear. Let's get a nice plot of this to see it visually. If it is linear try some other points and see how they compare. 

In [None]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook

output_notebook()
p = figure(title="HW vs Voltage Measurement")
p.line(range(1, 9), hw_mean_list[1:9], line_color="red")
p.xaxis.axis_label = "Hamming Weight of Intermediate Value"
p.yaxis.axis_label = "Average Value of Measurement"
show(p)

That's it! You should see a nice linear plot as a result. If not you might have selected the wrong point (are you running on a STM32F3 device, and did the compiler change maybe?). But you might notice the slope is opposite what you expect.

This happens for a good reason. If you remember how we are measuring the current into the device, you'll find out that the voltage will go DOWN for an INCREASE in current. You can see this in the following figure:

![Measurepoint point](img/vmeasure.png)

We are measuring the drop across the shunt resistor. An increase in the current causes a higher voltage across the resistor. When no current flows there is no drop across the resistor. But since we only measure a single end of the resistor, we see a higher voltage when no current flows.

We can fix the slope by simply inverting the measurement direction (adding a - in front of the measurement).

## Tests

In [None]:
corr = np.corrcoef(range(1,9), hw_mean_list[1:9])[0,1]

In [None]:
assert (abs(corr) > 0.9), "Low HW correlation of {}. Compiler may have changed best spot".format(corr)

In [None]:
print(corr)