# Breaking Software ECC <u>without</u> TraceWhisperer

## Background

The [uecc_part1_trace.ipynb](uecc_part1_trace.ipynb) notebook was written to show how to find and exploit side-channel leakage in the [micro-ecc library](https://github.com/newaetech/chipwhisperer/tree/develop/hardware/victims/firmware/crypto/micro-ecc) library using Arm trace and [TraceWhisperer](https://github.com/newaetech/tracewhisperer).

Arm trace [is](https://developer.arm.com/documentation/ihi0029/latest/)
[very](https://developer.arm.com/documentation/ihi0014/q)
[complex](https://developer.arm.com/documentation/ddi0403/ee); it's also only available on Arm targets. It seemed worthwhile to do a variant of uecc_part1_trace.ipynb which does not use trace, and so this notebook came to be.

This notebook carries out what is essentially the same attack as uecc_part1_trace.ipynb, but using SAD instead of Arm trace to find the leakage.

The two notebooks are similar and independent of one another; you may wish to do just one, or both:
- If your main interest is in learning about ECC side-channel attacks, then you can start with either notebook.
- If your main interest is in learning how Arm trace or SAD can be used to find vulnerable segments of power traces, then start with the corresponding notebook.

## Supported Hardware

This notebook is written specifically for **CW-Husky** and the **STM32F3** target.

It is definitely possible to run this notebook on the SAM4S target, but you'll have to make some small tweaks. If you have a SAM4S, it's recommended as-is, on the STM32 target (if you don't have one, use the pre-recorded STM32 traces), to understand how the attack works; then, switch to your live SAM4S target, and use the hints provided in order to succeed (look for the "**⚠️ SAM4S tip**" notes that pop up throughout).

It should also be possible (and encouraged!) to run on any other target (as long as the [micro-ecc library](https://github.com/newaetech/chipwhisperer/tree/develop/hardware/victims/firmware/crypto/micro-ecc) can be compiled), but in this case no hints are provided (in all likelihood, where SAM4S hints are provided is also were other targets will need their own tweaks).

In principle, it should be possible to succeed with CW-Pro, but the CW-Pro lacks the trigger timestamping feature that is heavily used here to help fine-tuning SAD parameters.

CW-Lite/Nano cannot be used since they have no SAD triggering capability.

If you don't have the required hardware, you can still follow along the entire notebook with the pre-recorded traces, by setting `TRACES = 'SIMULATED'`.

In [None]:
PLATFORM = 'CW308_STM32F3'
#PLATFORM = 'CW308_SAM4S'

In [None]:
#TRACES = 'HARDWARE' # if you have the required capture+target hardware: capture actual traces
TRACES = 'SIMULATED' # if you don't have capture+target hardware: use pre-captured traces (these traces were obtained using CW-Husky with a STM32F3)

In [None]:
import chipwhisperer as cw

if TRACES == 'SIMULATED':
    # fake out the CW scope and target: this will allow us to set attributes and read back previously set attributes
    scope = type('', (), {'adc': type('', (), {})(),
                          'gain': type('', (), {})(),
                          'io': type('', (), {})(),
                          'UARTTrigger': type('', (), {})(),
                          'SAD': type('', (), {})(),
                          'trigger': type('', (), {})(),
                          'trace': type('', (), {})(),
                          'fpga_buildtime': type('', (), {})(),
                          'clock': type('', (), {})()})()

    target = type('', (), {'baud': type('', (), {})()})()

else:
    scope = cw.scope()
    %run "../../Setup_Scripts/Setup_Generic.ipynb"

In [None]:
scope.trace.target = target
trace = scope.trace
scope.clock.clkgen_freq = 10e6
scope.clock.clkgen_src = 'system'
scope.clock.adc_mul = 1
target.baud = 38400 * 10 / 7.37
scope.io.glitch_trig_mcx = 'trigger'

## Attack Details

Refer to the [Attack Details](uecc_part1_trace.ipynb#Attack-Details) section of uecc_part1_trace.ipynb for some background on the target code.

In this notebook, since we're not using Arm trace to trigger, we instrument the code to help guide us:

<table>
<tr>
    <th>Original uecc_part1_trace.ipynb target code</th>
    <th>Modified target code</th>
</tr>
<tr>
<td>
  
```C
for (i = num_bits - 2; i > 0; --i) {


    
    nb = !uECC_vli_testBit(scalar, i);
    XYcZ_addC(Rx[1 - nb], Ry[1 - nb], Rx[nb], Ry[nb], curve);

    
    
    XYcZ_add(Rx[nb], Ry[nb], Rx[1 - nb], Ry[1 - nb], curve);
}
```
  
</td>
<td>

```c
    for (i = num_bits - 2; i > 0; --i) {
#ifdef FW_TRIGGER
        trigger_high();
#endif
        nb = !uECC_vli_testBit(scalar, i);
        XYcZ_addC(Rx[1 - nb], Ry[1 - nb], Rx[nb], Ry[nb], curve);
#ifdef FW_TRIGGER
        trigger_low();
#endif
        XYcZ_add(Rx[nb], Ry[nb], Rx[1 - nb], Ry[1 - nb], curve);
    }

```
</td>
</tr>
</table>

We'll first use a version of the firmware compiled with `FW_TRIGGER` defined, to help guide and build the attack; later we'll switch to the triggerless firmware build, for the actual realistic attack.

### Program target:

**Warning**: if you make any changes to the target firmware (including compiler version and switches), there is a chance that the attack parameters used in this notebook won't work for you anymore. So, for your first run-through, stick with the provided binary.

But, making changes to the target firmware is a great way to get practice with side-channel attacks, so once you've had success with the provided firmware, do go ahead and try some changes!

In [None]:
#%%bash -s "$PLATFORM"
#cd ../../hardware/victims/firmware/simpleserial-ecc-notrace
#make PLATFORM=$1 CRYPTO_TARGET=MICROECC FW_TRIGGER=1

In [None]:
fw_path = '../../../hardware/victims/firmware/simpleserial-ecc-notrace/simpleserial-ecc-fwtrigger-{}.hex'.format(PLATFORM)

In [None]:
if TRACES != 'SIMULATED':
    if (PLATFORM == 'CW308_STM32F3') or (PLATFORM == 'CWLITEARM'):
        prog = cw.programmers.STM32FProgrammer
    elif PLATFORM == 'CW308_SAM4S':
        prog = cw.programmers.SAM4SProgrammer

    cw.program_target(scope, prog, fw_path)

    reset_target(scope)
    target.simpleserial_write('i', b'')
    time.sleep(0.1)
    print(target.read())

    scope.clock.reset_adc()
    time.sleep(0.2)
    assert (scope.clock.adc_locked), "ADC failed to lock"

### Customized functions to run and capture ECC power traces:

In [None]:
%run "ECC_capture.ipynb"

## First trace

To get our bearings, let's see what a trace looks like when $k_r$ has a very regular pattern. We'll use a random point.

In [None]:
# big block of 1's, big block of 0's:
k = 0xf0000000fffffffefffffffffffffff04319055258e8617b0c46353d039cdaaf
kr = regularized_k(k)
hex(kr)

We could capture a full target operation but let's just examine a subset from the middle of the operation which should be long enough to capture the processing of a few bits of $k$.

(Feel free to extend the capture to the full trace, if you're curious.)

In [None]:
scope.adc.samples = int(100e3)
scope.adc.presamples = 0
scope.adc.offset = int(2e6)
scope.adc.segments = 1
scope.adc.stream_mode = False
scope.gain.db = 21

> **⚠️ SAM4S tip**: try a higher gain.

In [None]:
rtrace = capture_ecc_traces(k, N=1, step='1')[0]

In [None]:
from bokeh.plotting import figure, show
from bokeh.resources import INLINE
from bokeh.io import output_notebook
output_notebook(INLINE)

In [None]:
s = figure(plot_width=2000)
s.line(range(len(rtrace.wave)), rtrace.wave)
show(s)

As mentioned above, the initial version of the target firmware that we're using here asserts the GPIO4 trigger at every iteration of the main loop which iterates over the bits of $k$, in `ECCPoint_mult()` function of uECC.c:

```c
    for (i = num_bits - 2; i > 0; --i) {
#ifdef FW_TRIGGER
        trigger_high();
#endif
        nb = !uECC_vli_testBit(scalar, i);
        XYcZ_addC(Rx[1 - nb], Ry[1 - nb], Rx[nb], Ry[nb], curve);
#ifdef FW_TRIGGER
        trigger_low();
#endif
        XYcZ_add(Rx[nb], Ry[nb], Rx[1 - nb], Ry[1 - nb], curve);
    }

```

There is lots of information contained in this trace.

First, we see periodic activity with peaks occuring very regularly every ~2000 samples.

We also see some shifts up and down every ~22K samples. Could these be directly tied to $k$? It turns out that these shifts are caused by the GPIO4 trigger activity (we'll show this later), so we won't try to directly build an attack on them (the final attack will be against a version of the firmware compiled with `FW_TRIGGER` undefined).

But we *will* use the triggers to *help build* the attack. Husky has a nifty feature where if you're triggering multiple times, it will timestamp those triggers; we can use this to learn exactly when `trigger_high()` is being called.

In order to get these timestamps, we must do a segmented capture (where we capture a trace segment on each trigger) of the full capture. Let's capture 200 samples on every trigger, hoping this will catch some leakage that we can use towards an attack:

In [None]:
scope.adc.segments = 255
scope.adc.samples = 200
scope.adc.presamples = 0
scope.adc.offset = 0
scope.adc.segment_cycle_counter_en = 0

In [None]:
ftrace = capture_ecc_traces(k, N=1, step='2')[0]

The trigger times can be read back with `scope.trigger.get_trigger_times()`.

`get_ttimes()` is a wrapper around that function; we use it to deal with pre-recorded trigger times when `TRACES = 'SIMULATED'`.

In [None]:
get_ttimes??

In [None]:
ttimes = get_ttimes(step='2')

In [None]:
len(ttimes), min(ttimes), max(ttimes)

This tells us two important things:

1. The FW trigger fired 255 times, as expected (`len(ttimes)` is 254 because it's the time between *successive triggers* that is recorded).
2. The time delta between triggers is not constant, but it has a fairly narrow range.

One question that may come to mind is whether there is time-based leakage. Our special $k$ allows us to easily check if there is a statistical difference between the time it takes to process a bit of $k$ that is "1" vs a bit that is "0":

In [None]:
np.average(ttimes[:128]) / np.average(ttimes[128:])

If there is an actual difference, it is very small. We won't pursue this any further (note that if you repeat this for a different point `(Px, Py)`, you'll get a different result: execution times depend on **both** $k$ **and** $P$).

Out of curiosity, let's annotate the trigger times on our earlier power trace:

In [None]:
abs_trigger_times = []
counter = 0
for t in ttimes:
    counter += t
    if 2e6 < counter < 2.1e6: # this is the interval captured by our first trace
        abs_trigger_times.append(counter-int(2e6))

In [None]:
from bokeh.models import Span
s = figure(plot_width=2000)
for t in abs_trigger_times:
    s.renderers.extend([Span(location=t, dimension='height', line_color='black', line_width=2)])
s.line(range(len(rtrace.wave)), rtrace.wave)
show(s)

## First step of the attack: establish distinguishing markers


Let's collect several traces so that we can average them and see if we can spot differences between when the target is processing a $k$ bit that is one versus a $k$ bit that is zero.

To facilitate this, each trace uses the same $k$, but a different base point. Using a different point allows us to "average out" the contribution of the base point to the power trace, to better focus on the effect of $k$.

In [None]:
scope.adc.segments = 255
scope.adc.samples = 170
scope.adc.presamples = 0
scope.adc.offset = 0
scope.adc.segment_cycle_counter_en = 0
scope.adc.stream_mode = False

> **⚠️ SAM4S tip**: the distinguishing markers are easier to find with a $kr$ that has alternating ones and zeros:
> `kr = 0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa`
> 
> You'll have to modify the `avg_ones` and `avg_zeros` calculations accordingly (e.g. ones are from even-numbered segments, zeros from odd-numbered segments).
> 
> Finally, increase N to 200 traces for better results.

In [None]:
rtraces = capture_ecc_traces(k, N=30, step='3')

In [None]:
samples = scope.adc.samples

avg_trace = np.zeros(samples)

for t in rtraces:
    for i in range(1,255):
        avg_trace += t.wave[i*samples:(i+1)*samples]

avg_trace /= (255*len(rtraces))

In [None]:
avg_ones = np.zeros(samples)
avg_zeros = np.zeros(samples)

for t in rtraces:
    for i in range(128):
        avg_ones += t.wave[i*samples:(i+1)*samples]
        avg_zeros += t.wave[(i+127)*samples:(i+128)*samples]

avg_ones /= (127*len(rtraces))
avg_zeros /= (127*len(rtraces))

In [None]:
s = figure(plot_width=2000)

xrange = range(len(avg_trace))
s.line(xrange, avg_trace-100, line_color="black")
s.line(xrange, avg_ones-100, line_color="red")
s.line(xrange, avg_zeros-100, line_color="blue")
s.line(xrange, (avg_ones - avg_zeros)*20, line_color="orange")

show(s)

We're looking for sample points which allow us to reliably distinguish between bits of $k$ that are 1 and bits of $k$ that are 0; the orange line suggests there are many!

Let's grab the sample indices for the largest negative peak:

In [None]:
THRESHOLD = -50/20
poi = list(np.where((avg_ones - avg_zeros) < THRESHOLD)[0])
print(poi)

Then we define a function to score each bit of $k$ using the sum of the power trace values at each point of `poi`:

In [None]:
def calc_sumdata(poi, ptraces, trim=None):
    if trim:
        samples = trim
    else:
        samples = scope.adc.samples
    sumdata = np.zeros(255)
    for i in range(255):
        for t in ptraces:
            for p in poi:
                sample = t.wave[i*samples+abs(p)]
                if p >= 0:
                    sumdata[i] += sample
                else:
                    sumdata[i] -= sample
    return sumdata/len(ptraces)


and plot the results:

In [None]:
sd = calc_sumdata(poi, rtraces)

s2 = figure(plot_width=2000)

xrange = range(len(sd))
s2.line(xrange, sd, line_color="red", line_width=2)

show(s2)

This seems to work really well: the difference between ones and zeros is extremely clear.

Before we carry on with this, let's see what happens when $k$ has more 0/1 transitions:

In [None]:
kr = 0xf0ccccccccccccccccccccccccccccccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa0f
k = input_k(kr)

In [None]:
r01traces = capture_ecc_traces(k, N=30, step='4')

In [None]:
sd01 = calc_sumdata(poi, r01traces)

s3 = figure(plot_width=2000)

xrange = range(len(sd))
s3.line(xrange, sd, line_color="blue", line_width=2)
s3.line(xrange, sd01, line_color="red", line_width=2)

show(s3)

It looks like our first stab at a distinguisher works really well when $k$ has contiguous blocks of ones or zeros, but really badly when $k$ has alternating ones and zeros.

Let's try the other, smaller peaks further down the trace, just before sample 100, between the two dotted vertical lines:

In [None]:
s.renderers.extend([Span(location=80, dimension='height', line_color='black', line_width=2, line_dash='dotted')])
s.renderers.extend([Span(location=95, dimension='height', line_color='black', line_width=2, line_dash='dotted')])
show(s)

In [None]:
# if using a different target, adjust as needed:
START=80
STOP=95
PTHRESH = 6/20
NTHRESH = 8/20

In [None]:
poi = list(np.where((avg_ones[START:STOP] - avg_zeros[START:STOP]) > PTHRESH)[0] + START)
poi.extend(list(-(np.where((avg_ones[START:STOP] - avg_zeros[START:STOP]) < -NTHRESH)[0] + START)))
print(poi)

In [None]:
# this is what you should get with the STM32 target:
assert poi == [82, 89, 90, -84, -92]

In [None]:
sd = calc_sumdata(poi, rtraces)
sd01 = calc_sumdata(poi, r01traces)

s = figure(plot_width=2000)

xrange = range(len(sd))
s.line(xrange, sd + 5, line_color="blue", line_width=2)
s.line(xrange, sd01, line_color="red", line_width=2)

show(s)

Now we have something that works equally well for long sequences of static values and alternating values.

It's noisier than the first `poi`, but the 0/1 transitions are all quite clear.

We're off to the races!

> **⚠️ SAM4S tip**: better results are obtained with:
> ```
> START=0
> STOP=283
> PTHRESH = 4/20
> NTHRESH = 5/20
> ```
> which should yield something similar to `poi = [24, 121, 122, 125, 126, 127, 153, 162, 167, -130, -131, -148, -150, -198, -246, -247]`

# The Attack
This time let's play for real: we'll generate a random $k$ and see whether our attack can retrieve it.

Since we only need a few samples, let's trim down our capture by pushing it out 80 samples and capturing only around the POI (for storage efficiency).

> **⚠️ SAM4S tip**: adjust according to the range of your `poi`.

In [None]:
scope.adc.samples = 13
scope.adc.offset = 80

In [None]:
# adjust POI accordingly:
for i in range(len(poi)):
    if poi[i] > 0:
        poi[i] -= 80
    else:
        poi[i] += 80

In [None]:
k = random_k()
kr = regularized_k(k)
hex(k), hex(kr)

> **⚠️ SAM4S tip**: the attack tends to require a few more traces (around 150) to succeed.

In [None]:
ptraces = capture_ecc_traces(k, N=60, step='5')

In [None]:
# In the case of pre-recorded traces, set k to what was used for those traces:
if TRACES == 'SIMULATED':
    k = ptraces[0].textin['k']
    kr = regularized_k(k)

In [None]:
sd = calc_sumdata(poi, ptraces)

In [None]:
s = figure(plot_width=2000)

xrange = range(len(sd))
s.line(xrange, sd, line_color="red", line_width=2)

show(s)

If your plot looks good (where "looks good" = a sequence of ~well-distinguished highs and lows, approaching a noisy square wave), proceed to guessing all the bits.

Otherwise, you may need to tweak your POI set (use the recommended values), or simply try a fresh trace acquisition.

In [None]:
def attack(poi, straces, trim=None, verbose=True):
    sd = calc_sumdata(poi, straces, trim=trim)

    # guess all bits from waveform:
    guess = ''
    for i in range(1,255):
        if sd[i] > np.average(sd):
            guess += '1'
        else:
            guess += '0'

    # first and last bit are unknown, so enumerate the possibilities:
    guesses = []
    for first in (['0', '1']):
        for last in (['0', '1']):
            guesses.append(int(first + guess + last, 2))

    kr = regularized_k(k)
    wrong_bits = []
    if kr in guesses:
        if verbose: print('✅ Guessed right!')
    else:
        for kbit in range(1,254):
            if int(guess[kbit-1]) != ((kr >> (255-kbit)) & 1):
                wrong_bits.append(255-kbit)
        if verbose:
            print('Attack failed.')
            print('Guesses: %s' % hex(guesses[0]))
            print('         %s' % hex(guesses[1]))
            print('         %s' % hex(guesses[2]))
            print('         %s' % hex(guesses[3]))
            print('Correct: %s' % hex(kr))
            print('%d wrong bits' % len(wrong_bits))
    return wrong_bits

In [None]:
attack(poi, ptraces)

The attack should have succeeded (or at least guess almost all of the 256 bits correctly).

The last step is to see how well the attack works as we reduce the number of traces used.

In [None]:
fwtrigger_wrong_bits = []
for attack_traces in range(1, len(ptraces)+1):
    print('Attacking with %d traces... ' % attack_traces,  end='')
    wrong_bits = attack(poi, ptraces[:attack_traces], None, False)
    if wrong_bits:
        print('failed, %d wrong bits' % len(wrong_bits))
    else:
        print('passed ✅')
    fwtrigger_wrong_bits.append(len(wrong_bits))

What's remarkable is that even with a single trace, most bits are guessed correctly.

# Next step: adding SAD

So we have a successful attack, but it's not very realistic because it relies on the firmware toggling the trigger line to tell us when it's processing each bit.

Our ultimate goal is to remove the FW trigger, and we eventually will, but not yet. The next step is to identify the bits from a SAD pattern: we'll keep the FW trigger in place, but we won't actually use it.

We re-use the traces from our previous captures to guide us. Let's overlay all of the 255 trace segments from a single target operation:

In [None]:
from bokeh.palettes import inferno
from bokeh.plotting import figure, show
from bokeh.resources import INLINE
from bokeh.io import output_notebook
from bokeh.models import Span, Legend, LegendItem
import itertools
output_notebook(INLINE)
B = figure(plot_width=1800)
colors = itertools.cycle(inferno(255))
for i in range(255):
    B.line(range(samples), rtraces[0].wave[i*samples:(i+1)*samples], color=next(colors))
show(B)

We see that all the segments align very well until sample 151. It's no surprise that there is a divergence, since we know that the bits of $k$ are not processed in constant-time.

This is very good, because a SAD-triggered capture needs trace segments that overlay nicely.

To do a Husky-triggered SAD capture, we need two things:
1. a reference waveform;
2. a threshold.

We can set the SAD reference to 192 or 96 samples. Because of the divergence at sample 151, we'll use a 96-sample reference (another option would be to increase the number of samples per clock, `scope.clock.adc_mul`).

We could find a good SAD reference and threshold with trial and error on real captures; instead, let's compute SAD in software over our already-acquired traces:

> **⚠️ SAM4S notes**: the divergence is at sample 284. While a 96-sample reference ending at sample 283 scores really well, in practice you will later see that this would fire at too many points (i.e. multiple times per bit).
> 
> Good results were obtained with a 96-sample reference starting at sample 149.


In [None]:
def calc_sads(ref_trace, traces, start, length=96):
    stop = start+length
    sads = np.zeros(len(traces))#, dtype=np.int32)
    a = ref_trace[start:stop]
    for j,t in enumerate(traces):
        b = t.wave[start:stop]
        sad = 0
        for i in range(len(a)):
            #asamp = int(256*(a[i]+0.5))
            #bsamp = int(256*(b[i]+0.5))
            asamp = int(a[i])
            bsamp = b[i]
            sad += abs(asamp-bsamp)
        sads[j] = sad
    return sads

bestmax = [2**16, 0]
bestavg = [2**16, 0]
bestvar = [2**16, 0]
maxmax = 0

for start in range(0, 151-96):
    sads = calc_sads(avg_trace, rtraces, start)
    maxsad = np.max(sads)
    avgsad = np.average(sads)
    varsad = np.var(sads)
    if maxsad < bestmax[0]:
        bestmax = [maxsad, start]
    if avgsad < bestavg[0]:
        bestavg = [avgsad, start]
    if varsad < bestvar[0]:
        bestvar = [varsad, start]
    if maxsad > maxmax:
        maxmax = maxsad

In [None]:
print('Starting SAD point with the lowest maximum SAD score (%d): %d' % (bestmax[0], bestmax[1]))
print('Starting SAD point with the lowest SAD variance (%d): %d' % (bestvar[0], bestvar[1]))
print('Starting SAD point with the lowest average SAD score (%d): %d' % (bestavg[0], bestavg[1]))

The "best starting SAD points" can vary from run to run; this attack was developed using 35 as a starting point.

Even if you obtained different "best points" above, you should find the metrics for 35 in line with the results above. (You can also choose to go ahead with a different starting point but you'll have to modify some indices accordingly.)

In [None]:
starting_sample = 35
sads = calc_sads(avg_trace, rtraces, starting_sample)
print('Max: %d (best max: %d)' % (np.max(sads), bestmax[0]))
print('Avg: %d (best avg: %d)' % (np.average(sads), bestavg[0]))
print('Var: %d (best var: %d)' % (np.var(sads), bestvar[0]))

In [None]:
# avg_trace is floats; change it to a bytearray of ints for setting scope.SAD.reference
ref_trace = []
for i in range(starting_sample, starting_sample+96):
    ref_trace.append(int(avg_trace[i]))

In [None]:
scope.SAD.half_pattern = True
scope.SAD.reference = bytearray(ref_trace)
scope.SAD.multiple_triggers = True
scope.trigger.module = 'SAD'

Then we set the `scope.adc` parameters to grab the POI samples that we need for our attack, keeping in mind the SAD trigger module's latency.

A capture triggered by SAD starts `scope.SAD.sad_reference_length + scope.SAD.latency` after the first SAD sample; in our case this means that it starts on sample `35 + scope.SAD.sad_reference_length + scope.SAD.latency = 143`.

Our POI are from sample 82 to sample 92, so we have to set `scope.adc.presamples` to 143-82 = 61 in order to capture the POI.

Finally, while we only need 11 consecutive samples to cover our POI, `scope.adc.samples` must be greater than `scope.adc.presamples`, so we set that to 69 (it must also be a multiple of 3 here, because of segments).

> **⚠️ SAM4S notes**: adjust these settings as per your own `starting_sample` value.


In [None]:
# if your starting_sample != 35, or if your POI are different, calculate these values as per the explanation above:
scope.adc.samples = 69
scope.adc.presamples = 61
scope.adc.offset = 0

Let's guess at a good SAD threshold and see what happens. The default suggested here may not work for you; some trial and error may be required. A good starting value is "a little bit higher" than the maximum SAD we calculated above.

A "good SAD threshold" is one which results in 255 or 256 matches(\*), as reported by `scope.SAD.num_triggers_seen`. So that's one tool to help determine how to get a good threshold: if SAD triggers more than 256 times, than the threshold is too high; less than 256 times means the threshold is too low.

Setting `scope.SAD.always_armed` means that the SAD module will trigger even after the capture is done; we need this on to know whether the SAD is triggering too often.

(\*) why 255 or 256? If you refer back to uECC.c, recall that we are triggering in the `for` loop, near the call to `XCcZ_addC()`; this loop iterates 255 times, but `XCcZ_addC()` is also called one last time after the loop, and that may (or may not) trigger a SAD match

Let's start with a single capture. There is no saved trace for this since it's intended to help users with actual hardware to set good SAD parameters:

In [None]:
scope.SAD.always_armed = True

In [None]:
scope.SAD.threshold = 200 # adjust as needed, 

In [None]:
if TRACES != 'SIMULATED': # no point in doing this step with pre-recorded traces
    Px, Py = new_point()
    strace = capture_ecc_trace(k, Px, Py)
    if scope.SAD.num_triggers_seen in [255,256]:
        print('Looks good! Got %d triggers. ✅' % scope.SAD.num_triggers_seen)
    else:
        print('❌ Got %d triggers; try again.' % scope.SAD.num_triggers_seen)

It's useful to experiment with different thresholds to get a feel for it, even if you get the correct results on the first try.

Once you get the right number of triggers, it's worth checking that there are no outliers in their timestamps:

> **⚠️ SAM4S notes**: the expected range for `ttimes` is 26000 to 28000.

In [None]:
if TRACES != 'SIMULATED': # no point in doing this step with pre-recorded traces
    ttimes = scope.trigger.get_trigger_times()
    assert 20000 < min(ttimes) < 23000
    assert 20000 < max(ttimes) < 23000

In [None]:
if TRACES != 'SIMULATED': # no point in doing this step with pre-recorded traces
    print(min(ttimes), max(ttimes))

We can also check visually check whether the captured segments are aligning as they should:

In [None]:
if TRACES != 'SIMULATED': # no point in doing this step with pre-recorded traces
    avg_trace_offset = 82 # first element of our original POI list
    samples = scope.adc.samples
    B = figure(plot_width=1800)
    colors = itertools.cycle(inferno(255))
    for i in range(255):
        B.line(range(samples), strace.wave[i*samples:(i+1)*samples], color=next(colors))
    B.line(range(samples), avg_trace[avg_trace_offset:avg_trace_offset+samples], color='black', line_width=3)
    show(B)

Finally, if you have a logic analyzer, you can drive the SAD's trigger onto Husky's trigger/glitch output MCX port (or USERIO), and confirm visually whether the SAD is triggering everytime that the target FW issues a trigger on GPIO4.

If all is good, let's proceed to capture lots of traces for an attack.

With a good SAD threshold, most captures should be "good", but there may be some outliers which we'll simply discard.

Let's also use a random $k$, to set ourselves up for an attack with these traces:

In [None]:
k = random_k()
kr = regularized_k(k)
hex(k), hex(kr)

We turn on the `check_sad_triggers` and `check_ttimes` options, to discard traces that don't look good:

In [None]:
#scope.SAD.threshold = 195

In [None]:
straces = capture_ecc_traces(k, N=60, step='6', check_sad_triggers=True, check_ttimes=True, trim=13)

In [None]:
if TRACES == 'SIMULATED':
    k = straces[0].textin['k']

A few discarded traces is ok, but if you get a lot you can try adjusting `scope.SAD.threshold` for better results, or increase `N` to compensate.

We also need to shift the POI array back by two samples, since those samples will now located starting at sample 0 in our captures:

In [None]:
for i in range(len(poi)):
    if poi[i] >= 0:
        poi[i] -= 2
    else:
        poi[i] += 2

In [None]:
poi

Now we're ready to try the attack; it should work exactly as before.

In [None]:
sd = calc_sumdata(poi, straces, trim=13)

In [None]:
s = figure(plot_width=2000)

xrange = range(len(sd))
s.line(xrange, sd, line_color="red", line_width=2)

show(s)

In [None]:
attack(poi, straces, trim=13)

In [None]:
sadtrigger_wrong_bits = []
for attack_traces in range(1, len(straces)+1):
    print('Attacking with %d traces... ' % attack_traces,  end='')
    wrong_bits = attack(poi, straces[:attack_traces], 13, False)
    if wrong_bits:
        print('failed, %d wrong bits' % len(wrong_bits))
    else:
        print('passed ✅')
    sadtrigger_wrong_bits.append(len(wrong_bits))

You should find that the attack works just as well as before:

In [None]:
s = figure(plot_width=2000, x_axis_label='Number of traces', y_axis_label='Number of wrong bits')
s.line(range(len(fwtrigger_wrong_bits)), fwtrigger_wrong_bits, color='blue')
s.line(range(len(fwtrigger_wrong_bits)), sadtrigger_wrong_bits, color='red')
show(s)

# Removing the firmware trigger

Although we're triggering our captures using only SAD, the firmware is still pulsing GPIO4 255 times in its PMUL loop.

Can the attack succeed without any modifications if we remove the GPIO4 trigger? The answer is not necessarily yes. Turning on a GPIO draws a fair amount of current and can certainly influence power traces (see for example [this note](https://github.com/newaetech/tracewhisperer/blob/master/trace_noise.md) on the effect of trace activity on power traces).

So let's see: we'll switch to a different firmware, compiled from the same source but with the `trigger_high()` and `trigger_low()` calls ifdef'd out.

In [None]:
fw_path = '../../../hardware/victims/firmware/simpleserial-ecc-notrace/simpleserial-ecc-nofwtrigger-{}.hex'.format(PLATFORM)

In [None]:
if TRACES != 'SIMULATED':
    cw.program_target(scope, prog, fw_path)

    target.simpleserial_write('i', b'')
    time.sleep(0.1)
    print(target.read())

Let's see what happens if we try to capture using our existing SAD reference...

In [None]:
if TRACES != 'SIMULATED':
    ptrace = capture_ecc_trace(k, Px, Py)
    print('SAD triggered %d times' % scope.SAD.num_triggers_seen)

...nothing: the SAD never triggers (which is why there are no pre-recorded traces for this). Whether it's the absence of the GPIO4 pulse, or that the recompilation of the ECC code has affected the power trace; we're not done yet.

In [None]:
if TRACES != 'SIMULATED':
    scope.errors.clear()

There are different ways to go forward here. We'll try capturing the full PMUL trace and see what we can do with it.

But how do we trigger our capture? Again there are different approaches. ADC-level triggering is an option, depending on the target. On the STM32F3, the idle power isn't much lower than the "active" power, so that might not work great.

Let's instead trigger on the UART messages that kick off the target operation.

When `capture_ecc_trace()` is run, the last thing sent to the target is $k$, so let's trigger on that.

In [None]:
if TRACES != 'SIMULATED':
    scope.UARTTrigger.enabled = True
    scope.UARTTrigger.baud = int(target.baud)
    scope.UARTTrigger.set_pattern_match(0, 'k' + hex(k)[2:9]) # match the 'k'... command that we send, which is what kicks off the PMUL operation:
    scope.UARTTrigger.trigger_source = 0
    scope.UARTTrigger.rules_enabled = [0]

    scope.trigger.module = 'UART'
    scope.trigger.triggers = 'tio2'

    assert scope.trace.clock.swo_clock_locked

In [None]:
scope.adc.samples = int(6e6)
scope.adc.presamples = 0
scope.adc.offset = 0
scope.adc.stream_mode = True
scope.adc.segments = 1

In [None]:
fulltrace = capture_ecc_traces(k, N=1, step='7')[0]

In [None]:
import holoviews as hv
from holoviews.operation import decimate
from holoviews.operation.datashader import datashade
hv.extension('bokeh')
datashade(hv.Curve(fulltrace.wave)).opts(width=2000, height=900)

At this point it might be easier to understand why SAD matching is not working by computing SAD scores in software.

Let's study a slice of the full capture. We know that processing one bit of $k$ takes around 21K cycles.

It then stands that any slice of 100K sample, should contain 4 or 5 full bits.

We'll pick some point in the middle of the target operation and look for our SAD pattern.

In [None]:
def calc_sad(ref_trace, wave):
    sad = 0
    for i in range(len(ref_trace)):
        #asamp = int(256*(ref_trace[i]+0.5))
        #bsamp = int(256*(wave[i]+0.5))
        asamp = int(ref_trace[i])
        bsamp = wave[i]
        sad += abs(asamp-bsamp)
    return sad

In [None]:
# Do SW SAD over a small range, where we can more easily spot a smaller number of SAD matches:
start = int(3e6)
stop = int(start + 100e3)

sads = []
for i in tnrange(start, stop):
    #sads.append(calc_sad(avg_trace[54:54+96], fulltrace.wave[i:i+96]))
    sads.append(calc_sad(ref_trace, fulltrace.wave[i:i+96]))

In [None]:
B = figure(plot_width=1800)
B.line(range(len(sads)), sads)
show(B)

The SAD score has some dips, but it never gets nearly as low as it did on the original firmware, which is somewhere a bit less than what you set for `scope.SAD.threshold`:

In [None]:
print('SAD threshold for previous FW: %d' % scope.SAD.threshold)
print('Minimum SW-calculated SAD for trigger-less FW: %d' % min(sads))

This SAD plot appears to have some periodicity, but not enough to confidently establish markers that distinguish our 4 $k$ bits.

(If you're lucky you may see 4 distinct minimums, however this isn't always the case.)

At this point it's useful to remember two very important points:
1. ChipWhisperer does not measure the actual power consumption of the target. It measure the voltage drop across a shunt resistor, and there are all sorts of analog effects that you may have noticed before.
2. The SAD computation done by Husky does not accomodate shifted or scaled power traces. If the SAD reference is actually contained in the new firmware's power trace but in a shifted, scaled, or stretched form, Husky's SAD trigger won't likely match it.

However by doing SAD matching in *software*, we can easily compensate for shifts and stretches.

The idea is really simple: for every SAD computation, shift and scale the candidate trace so that it matches the *range* of the SAD reference. Thus, is a shifted/scaled version of the SAD reference exists in the power trace, we will find it.

If this works, we can then define a new SAD reference, and successfully go back to SAD triggering on the ChipWhisperer capture hardware.

In [None]:
def calc_sad_scaled(ref_trace, wave):
    refmin = min(ref_trace)
    refmax = max(ref_trace)
    refrange = refmax - refmin

    rawmin = min(wave)
    rawmax = max(wave)
    rawrange = rawmax - rawmin

    scaled = np.asarray(wave, dtype=np.float64) - rawmin
    scaled *= refrange/rawrange
    scaled += refmin

    sad = 0
    for i in range(len(ref_trace)):
        #asamp = int(256*(ref_trace[i]+0.5))
        #bsamp = int(256*(scaled[i]+0.5))
        asamp = int(ref_trace[i])
        bsamp = int(scaled[i])
        sad += abs(asamp-bsamp)
    return sad

In [None]:
scaled_sads = []
for i in tnrange(start, stop):
    scaled_sads.append(calc_sad_scaled(ref_trace, fulltrace.wave[i:i+96]))

In [None]:
B = figure(plot_width=1800)
B.line(range(len(scaled_sads)), scaled_sads)
show(B)

In [None]:
min(sads), min(scaled_sads)

This is an improvement! The peaks should be both lower and more distinct from other peaks. With an appropriate THRESHOLD we may be able to find appropriately-distanced match points:

In [None]:
THRESHOLD = 210
matches = np.where(np.asarray(scaled_sads)  < THRESHOLD)[0]
deltas = []
for m in range(1, len(matches)):
    deltas.append(matches[m] - matches[m-1])
print(deltas)

Ideally we'd like to find three deltas, each of around 21K, but don't stress too much if you can't get three nice deltas; it turns out we don't need the "ideal" scenario to improve our SAD reference.

Let's run the scaled SAD match on the full waveform:

In [None]:
scaled_sads = []
for i in tnrange(len(fulltrace.wave)-96):
    scaled_sads.append(calc_sad_scaled(ref_trace, fulltrace.wave[i:i+96]))

*If you have a slow computer, you can shorten this by doing the calculation over a fraction of the waveform, instead of the full trace; for example you can change the `tnrange()` argument to `len(fulltrace.wave)//4-96`, in which case you would expect to get a quarter of the matches.*

We want to pick a threshold that results in approximately 255 matches. However it doesn't need to be exactly 255, so don't try too hard:

In [None]:
THRESHOLD = 250
len(np.where(np.asarray(scaled_sads) < THRESHOLD)[0])

Now we look at the time delta between successive matches:

In [None]:
found = np.where(np.asarray(scaled_sads) < THRESHOLD)[0]
deltas = []
for i in range(1, len(found)):
    deltas.append(found[i] - found[i-1])

In [None]:
B = figure(plot_width=1800)
B.line(range(len(deltas)), deltas)
show(B)

We're going to assume that deltas of ~22000 cycles are the "correct" matches, and that others are "bad" matches that are due to untuned threshold and/or SAD reference.

The idea then is to take all "correct" matches and average them to synthesize a better SAD reference.

> **⚠️ SAM4S notes**: recall that the expected range for successive triggers shifts to between 26000 and 28000.

In [None]:
new_ref = np.zeros(96)
used_segments = 0
for i in range(1, len(found)):
    if 20000 < found[i] - found[i-1] < 25000:
        used_segments += 1
        new_ref += fulltrace.wave[found[i]:found[i]+96]
new_ref /= used_segments
print('Used %d SW-matched references to build the new SAD reference.' % used_segments)

How does the new reference compare to the old one? It looks like just a "little" shift... but if we compute the SAD between these two references, it's quite large!

This explains why our initial attempt at capturing with the FW-trigger-derived reference was totally unsuccessful.

In [None]:
calc_sad(ref_trace, new_ref)

In [None]:
B = figure(plot_width=1800)
B.line(range(len(new_ref)), ref_trace, line_color='black')
B.line(range(len(new_ref)), new_ref, line_color='red')
show(B)

# The SAD attack

We are finally(!) ready to move on to a Husky SAD-triggered capture.

In [None]:
scope.SAD.reference = bytearray(new_ref.astype(np.int8))
scope.trigger.module = 'SAD'

In [None]:
scope.adc.stream_mode = False
scope.adc.segments = 255

# see previous section for how these values were obtained; if your starting_sample != 37, or if your POI are different, calculate these values as per the explanation above:
scope.adc.samples = 69
scope.adc.presamples = 61
scope.adc.offset = 0

As before, we need to first establish a good threshold.

(As before, no recorded traces here since this is to help tune for the acquisition of real traces.)

In [None]:
#scope.SAD.threshold = 165

In [None]:
if TRACES != 'SIMULATED':
    strace = capture_ecc_trace(k, Px, Py)
    if scope.SAD.num_triggers_seen in [255,256]:
        print('Looks good! Got %d triggers. ✅' % scope.SAD.num_triggers_seen)
    else:
        print('❌ Got %d triggers; try again.' % scope.SAD.num_triggers_seen)

Adjust the threshold until you get the right number of triggers. 

Then we check that there are no outliers in their timestamps:

In [None]:
if TRACES != 'SIMULATED':
    ttimes = scope.trigger.get_trigger_times()
    assert 20000 < min(ttimes) < 23000
    assert 20000 < max(ttimes) < 23000

In [None]:
if TRACES != 'SIMULATED':
    print(min(ttimes), max(ttimes))

We should now be able to reliably capture traces for the attack. Some discarded traces is to be expected, but the majority of captures should succeed.

In [None]:
k = random_k()
kr = regularized_k(k)
hex(k), hex(kr)

> **⚠️ SAM4S notes**: around 150 good traces are required.
> With the STM32 target, it's usually possible to tune the SAD reference and threshold so that virtually every trace capture is a good one.
> 
> With the SAM4S, this seems more difficult; expect to lose a few traces, due to both not enough and too many triggers. You can simply compensate by increasing N.

In [None]:
straces = capture_ecc_traces(k, N=60, step='8', check_sad_triggers=True, check_ttimes=True, trim=13)

In [None]:
len(straces)

In [None]:
# In the case of pre-recorded traces, set k to what was used for those traces:
if TRACES == 'SIMULATED':
    k = straces[0].textin['k']
    kr = regularized_k(k)

In [None]:
sd = calc_sumdata(poi, straces, trim=13)

In [None]:
s = figure(plot_width=2000)

xrange = range(len(sd))
s.line(xrange, sd, line_color="red", line_width=2)

show(s)

If your plot looks good, proceed to guessing all the bits.

Otherwise, you can try a tighter threshold, or simply try a fresh trace acquisition.

In [None]:
attack(poi, straces, 13)

In [None]:
finalattack_wrong_bits = []
for attack_traces in range(1, len(straces)+1):
    print('Attacking with %d traces... ' % attack_traces,  end='')
    wrong_bits = attack(poi, straces[:attack_traces], 13, False)
    if wrong_bits:
        print('failed, %d wrong bits' % len(wrong_bits))
    else:
        print('passed ✅')
    finalattack_wrong_bits.append(len(wrong_bits))

In [None]:
s = figure(plot_width=2000, x_axis_label='Number of traces', y_axis_label='Number of wrong bits')
s.line(range(len(fwtrigger_wrong_bits)), fwtrigger_wrong_bits, color='blue')
s.line(range(len(fwtrigger_wrong_bits)), sadtrigger_wrong_bits, color='red')
s.line(range(len(finalattack_wrong_bits)), finalattack_wrong_bits, color='green', line_width=2)
show(s)

# Conclusion

The last plot shows that the final triggerless SAD-based attack (in green) performs a tiny bit less well than the firmware-triggered attack (in blue).

This may lead one to believe that the SAD triggering isn't as accurate as the firmware-based triggering... Here's an experiment you can do to dispel that possibility: **for a fixed set** of `k`, `Px` and `Py`, run a FW-triggered capture and save the trigger times. Then, repeat that capture (**with the same** `k`, `Px` and `Py`) with SAD triggering (several times if you want), and compare the trigger times with those of the FW capture. If your SAD parameters are good, the trigger times should be 100% identical.

So why the difference? Could be bad luck; nevertheless the objectives of this lab have been reached.

If you're so inclined, you can take this attack further:
- it's likely possible to improve these results (better POI?)
- apply the Hidden Number Problem to transform this into a more realistic attack (see the [uecc_part1_trace.ipynb](uecc_part1_trace.ipynb) notebook for a discussion of HNP)

If you completed this notebook and part 1, it's highly recommended to move on to [part 3](uecc_part3_trace_sad.ipynb) to learn about the advantages of *combining* trace and SAD triggering.