# CW-Husky SAD Triggering

Sum-of-Absolute Differences triggering on Husky is quite a bit different from how it is on CW-Pro.

The API has changed, the way that SAD is computed has changed, and there are a lot more options. But fear not, we've also tried to make it as easy as possible to use.

First, a reference trace must be established; this is very target and compiler-dependent, so we'll pull in pre-compiled AES firmware for the SAM4S target.

**SAD can absolutely be used for different targets, but for learning it's best to stick with this.**

If you're using a different target, you'll have to select a "good" SAD reference, which can be a bit of a black art. You should try! But only after you've learned the basics.

In [None]:
PLATFORM = 'CW308_SAM4S'
SS_VER = "SS_VER_1_1"

In [None]:
import chipwhisperer as cw
scope = cw.scope()

In [None]:
%run ../../Setup_Scripts/Setup_Generic.ipynb

In [None]:
cw.program_target(scope, prog, "../../../firmware/mcu/simpleserial-trace/simpleserial-trace-{}.hex".format(PLATFORM))

In [None]:
reset_target(scope)

Let's start with a regular TIO4-triggered AES capture.

In [None]:
scope.trigger.module = 'basic'
scope.trigger.triggers = 'tio4'

scope.adc.samples = 35000
scope.adc.presamples = 0
scope.adc.segments = 1
scope.adc.bits_per_sample = 8  # SAD is done at 8 bits per sample

scope.gain.db = 22

In [None]:
reftrace = cw.capture_trace(scope, target, bytearray(16), bytearray(16), as_int=True)
assert scope.adc.trig_count == 31864, "Unexpected trigger count. Are you running the correct firmware?"

In [None]:
refstart = 17372

from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import Span

output_notebook()
p = figure(width=1800, tools='pan, box_zoom, hover, reset, save')

xrange = list(range(len(reftrace.wave)))
p.line(xrange, reftrace.wave)
p.renderers.extend([Span(location=refstart, dimension='height', line_color='black', line_width=2)])
p.renderers.extend([Span(location=refstart+scope.SAD.sad_reference_length, dimension='height', line_color='black', line_width=2)])

show(p)

With this target, the AES rounds should be fairly obvious. Our goal will be to have a SAD reference that matches each of the 10 AES rounds,

We've pre-selected something that looks like a good reference starting at sample 17372; the vertical black lines in the plot above show the SAD reference that we'll use below.

# Model time

It's often easier to check if we have good SAD parameters by running software SAD on our trace. ChipWhisperer includes a software model of its hardware SAD implementation.

To make it easier to go back-and-forth between the software model and Husky's actual FPGA implementation, the software SAD model is configured via the `scope.SAD` object; once the model is configured to our liking, the hardware will be ready to go!

Let's configure `scope.SAD`:

In [None]:
scope.SAD.reference = reftrace.wave[refstart:]
scope.SAD.threshold = 20
scope.SAD.interval_threshold = 20
scope.SAD.multiple_triggers = True
scope.SAD.emode = False

All of these parameters will be explained later; one important setting that we'll explain here is `scope.SAD.multiple_triggers`:

- If `scope.SAD.multiple_triggers` is set, then once the scope is armed, a trigger will be issued whenever the SAD threshold is met.
- If `scope.SAD.multiple_triggers` is not set, then no triggers are issued after the first one, until the scope is re-armed.

Here we want to trigger on each of the 10 AES rounds, and so we set `multiple_triggers`.

Let's run the model and print the results. This will take a while; computing SAD in software across a full trace is very slow! (The model is designed for accuracy, not speed: [it is used for verifying the Verilog implementation](https://github.com/newaetech/chipwhisperer-husky-fpga/blob/sad/fpga/sim/test_sad.py).)

While this runs, sit back and marvel at how Husky's hardware SAD implementation does this in real-time on incoming power samples, in the blink of an eye.

In [None]:
sad_model = cw.SADModelWrapper(scope.SAD)
sad_model.run(reftrace.wave)
print(sad_model)

We should have matched on each of the 10 rounds:

In [None]:
assert sad_model.num_triggers == 10

Now let's see what the SAD scores actually look like.

With SAD, the smaller the number, the better the match -- a perfectly-matching trace would have a SAD score of 0.

The red horizontal line is our `scope.SAD.threshold` setting: whenever a SAD score equal or less than `scope.SAD.threshold` is found, a trigger is issued.

A properly-tuned SAD trigger should have low peaks that are clearly distinct, and `scope.SAD.threshold` roughly halfway between the lowest peaks and the "rest". That is what you should see here:

In [None]:
p = figure(width=1800)
dat = sad_model.SADS

xrange = list(range(len(dat)))
p.line(xrange, dat)
p.renderers.extend([Span(location=scope.SAD.threshold, dimension='width', line_color='red', line_width=2)])
show(p)

We can also visualize the 10 waveform segments that would be captured with these SAD settings:

In [None]:
from bokeh.palettes import inferno
from bokeh.plotting import figure, show
from bokeh.resources import INLINE
from bokeh.io import output_notebook
from bokeh.models import Span, Legend, LegendItem
import itertools

SAMPLES = scope.SAD.sad_reference_length

numplots = len(sad_model.match_times)
xrange = list(range(SAMPLES))
p = figure(width=1800)
colors = itertools.cycle(inferno(numplots))
for i in range(numplots):
    offset = sad_model.match_times[i] - scope.SAD.sad_reference_length
    p.line(xrange, reftrace.wave[offset:offset+SAMPLES], color=next(colors))

p.line(xrange, scope.SAD.reference[:SAMPLES], line_color='grey', line_width=3, line_dash='dotted')
show(p)

# Real SAD Triggering

Now that we have found good SAD parameters, let's run some actual hardware-based SAD triggering with Husky.

`scope.SAD` is already set-up, so we don't need to touch any of its settings; we just need to switch the trigger module from `basic` to `SAD`, and set our `scope.adc` parameters to capture the number of segments and samples that we want.

Remember that SAD will be triggering on every AES round, so we set `scope.adc.segments = 10`. The capture duration of each segment must be shorter than an AES round: roughly `scope.adc.trig_count / 10`, which is 3186 (otherwise, you will get a "segmenting error").

The SAD module triggers at the **end** of the SAD reference -- it can't go back in time to issue the trigger at the start of the reference! But, we can use the `scope.adc.presamples` feature to essentially go back in time and capture the matching power trace.

To line up the segments perfectly with the SAD reference, we need to set `scope.adc.presamples` to the length of the SAD reference (`scope.SAD.sad_reference_length`), plus the SAD module's small fixed triggering latency (`scope.SAD.latency`).

Finally, since we are using segmenting *and* presamples, `scope.adc.samples` must be a multiple of 3 (this is a limitation of the Husky capture mechanism).

In [None]:
scope.trigger.module = 'SAD'
scope.adc.presamples = scope.SAD.sad_reference_length + scope.SAD.latency
scope.adc.samples = scope.adc.presamples + 100 # let's capture a bit more than the SAD reference
scope.adc.samples -= scope.adc.samples %3 # when using segments with presamples, the number of samples per segment (scope.adc.samples) must be a multiple of 3.
scope.adc.segments = 10

In [None]:
sadtrace = cw.capture_trace(scope, target, bytearray(16), bytearray(16), as_int=True)

`scope.SAD.num_triggers_seen` tells us how many times the SAD module triggered; there should have been 10 triggers, one for each AES round:

In [None]:
assert scope.SAD.num_triggers_seen == 10

Husky also handily logs the time between successive triggers; let's see what those timestamps are:

In [None]:
ttimes = scope.trigger.get_trigger_times()
print(ttimes)

Each round except for the last one takes exactly the same number of ADC clock cycles, which is not surprising; this gives us confidence that our SAD reference works well as an AES round marker.

These trigger times should be exactly what we obtained from the model:

In [None]:
assert sad_model.match_time_deltas == ttimes

# SAD Explorer

To make it easier to explore and understand the many `scope.SAD` parameters, ChipWhisperer has a new interactive "SAD Explorer" module.

When we launch the SAD Explorer, it will start with the same capture parameters that we have been using.

We provide the explorer with the `scope` and `target` objects, the **full** reference trace (not just our chosen `scope.SAD.reference`: this is so that we can experiment with changing `scope.SAD.reference`), the starting index of the SAD reference that we have chosen, and the maximum number of segments that we wish to capture.

Note that while `scope.SAD` can work with `scope.adc.bits_per_sample` set to either 8 or 12, `SADEXplorer` requires it to be 8; it also requires the reference trace to be captured as integers (i.e. `cw.capture(as_int=True)`).

Push the "run SAD capture" button that appears after you run the cell below; the plot should briefly flash yellow, then green, then plot the 10 captured segments.

In [None]:
explorer = cw.SADExplorer(scope, target, reftrace.wave, refstart, max_segments=10)

There is a **lot** to play with here. Some basic usage notes:
1. The vertical red lines denote the start and end of the SAD reference.
2. If the plot flashes green, the capture was successful; if it stays red, the capture failed (look at the resulting messages to find out why).
3. The gray area is defined by the SAD reference and `scope.SAD.interval_theshold`. Husky now uses the "interval matching" method defined by [Beckers et al.](https://www.esat.kuleuven.be/cosic/publications/article-2626.pdf):
    - for each incoming sample, if the incoming waveform sample is within the range of the SAD reference sample +/- `scope.SAD.interval_threshold`, the SAD score remains unchanged; otherwise it is increased by 1
    - this effectively splits the old single `scope.SAD.threshold` parameter into *two* separate threshold parameters. While this increases the configuration space, in practice you should find that this makes SAD easier to tune.
    - SAD computations always use the most significant 8 bits of each sample (regardless of the `scope.adc.bits_per_sample` setting); consequently the maximum setting for `scope.SAD.interval_threshold` is 255.
    - the maximum `scope.SAD.threshold` value is less than `scope.SAD.sad_reference_length` (for FPGA resource optimizations). If you find yourself needing a higher value than what's possible, either:
       1. increase `scope.SAD.interval_threshold`;
       2. find a better reference;
       3. shorten the reference via `enabled_samples` and/or `trigger_sample` (which are explained below).      
4. You can turn on legends in the plot and/or in a separate text cell, with or without SW-computed SAD scores for the captured segments. The more of these you turn on, the slower the capture gets. Remember that green is good: it means that the capture was successful and that the plot is still being updated.
5. It's possible to exclude arbitrary samples from the SAD computation via the "excluded samples" dialog:
    - you can list samples to exclude like this: "1, 10, 20:30"
    - in the Python API, specify these as: `scope.SAD.enabled_samples[1] = False`; `scope.SAD.enabled_samples[10:20] = [False]*10`
6. It's also possible to shorten the SAD reference and advance the SAD trigger accordingly by reducing `scope.SAD.trigger_sample` arbitrarily; samples after `scope.SAD.trigger_sample` are excluded from the SAD computation. (Turning off samples via "excluded samples" does not advance the trigger.)
7. You can increase `scope.adc.samples` and/or "extra presamples" to explore areas around the chosen reference. (You can't set `scope.adc.presamples` directly: `SADExplorer` sets it automatically to ensure that the reference trace is fully captured.)
8. Use the "show diff" option to plot the *absolute difference* between the captured samples and the reference. This can be particularly useful for tuning the thresholds and deciding whether some samples should be excluded.
9. You can **double** the length of the SAD reference by turning on `scope.SAD.emode`. More on this below.

### Some ideas of things to try:
1. The first round has a few samples at the start that diverge considerably from the reference; exclude those samples and make the thresholds tighter. The "show diff" option can be very helpful here.
2. Turn on "emode" to use a longer reference.
3. What happens when the thresholds are too loose, or too many samples are excluded?
4. Increase `scope.adc.samples` and `scope.adc.presamples` to hunt around for a better `refstart` nearby (i.e. one which doesn't require samples to be excluded).
5. How small can you make `scope.SAD.trigger_sample` and still reliably capture all 10 rounds? *(be sure to check that the trigger times are still good, i.e. that triggers are not occuring ~randomly!)*
6. Try to find a totally different reference segment that works well.

If you mess things up, you can return to our known good settings with the cell below (run that and then re-run the `SADExplorer` instantiation cell above).

In [None]:
refstart = 17372
scope.SAD.threshold = 30
scope.SAD.interval_threshold = 20
scope.SAD.emode = False
scope.SAD.always_armed = False
scope.SAD.reference = reftrace.wave[refstart:]

scope.adc.samples = 300
scope.adc.segments = 10

## Extended Mode

`scope.SAD.emode` doubles the length of the SAD reference. Hardware SAD is **expensive** (in terms of FPGA resources), so how do we manage to do this? By taking a chance...

When `scope.SAD.emode` is `False`, a power trace that "matches" the reference as per the SAD threshold values will **always** result in the match being detected and a trigger being issued.

When `scope.SAD.emode` is `True`, there is a  *(very very)* small chance that some valid matches are missed.

Considering that SAD triggering *in practice* is probabilistic (noisy traces means that SAD triggering may not work 100% of the time), it's reasonable to accept the risk of `emode` missing triggers. If you want to fully understand how and why missed triggers can arise, read on...

### Extended Mode Details 
In order for `emode` to miss triggers, there must be some periodicity in the power trace of length `scope.SAD.sad_reference_length / 2` (note that `scope.SAD.sad_reference_length` depends on the `scope.SAD.emode` setting; use the value given when `emode` is `True`).

For example, if `scope.SAD.sad_reference_length` is 512, and the power trace has a repeating pattern that has a period of 256 samples, then `emode` may not work very well; in this scenario you would probably get more reliable SAD triggering with `emode` turned off (alternatively, change `scope.clock.adc_mul` if you're able to).

Our software SAD model can be used to illustrate how this happens. Let's synthesize a fictional reference and power trace to help illustrate. We'll use one full period of a 512-sample sine wave as a reference:

In [None]:
import numpy as np
sineref = ((np.sin(np.arange(0, np.pi*2, np.pi*2/512))/2 + 0.5)*255).astype(np.uint8)

The fictional incoming power trace is made of random samples; we then insert the reference in two locations:

In [None]:
import random
trace = np.asarray([random.randint(0,255) for _ in range(5000)], dtype=np.uint8)

trace[1000:1512] = sineref
trace[3000:3512] = sineref

Since the reference is contained exactly, the SAD thresholds can be set to minimal values:

In [None]:
scope.SAD.threshold = 2
scope.SAD.interval_threshold = 1
scope.SAD.reference = sineref
scope.SAD.emode = True

In [None]:
sad_model = cw.SADModelWrapper(scope.SAD)
sad_model.run(trace)
print(sad_model)

The model triggers twice, as expected. Because `scope.SAD.emode = True`, the model has a new `uncovered_samples` property which contains two elements.

If an incoming trace that matches the reference started at any of the `sad_model.uncovered_samples` indices, the match would be missed.

Here you can see that the `uncovered_samples` elements are at exactly the halfway point of the incoming sequences that later do result in a trigger, and that is no coincidence: in order to double the length of the SAD reference that is checked, Husky is unable to consider a potential match that starts at the halfway point.

Let's synthesize a special case which **does** result in a missed trigger to illustrate this:

In [None]:
trace = np.asarray([random.randint(0,255) for _ in range(5000)], dtype=np.uint8)
trace[1000:1256] = sineref[:256]
trace[1256:1768] = sineref[:512]

This trace construction contains the first half of the reference at indices [1000:1256], followed immediately by the full reference at indices [1256:1768].

Running the model on this trace shows that the full reference is not caught:

In [None]:
sad_model = cw.SADModelWrapper(scope.SAD, catch_emisses=True)
sad_model.run(trace)
print(sad_model)

Running the model with `catch_emisses=True` runs the model twice: once in a "normal Husky model mode", and then again in a "full SAD mode" which doesn't miss *any* triggers (unlike Husky).

As a result we get both the `uncovered_samples` and `missed_triggers` information.

Printing the `sad_model` object gives the summarized results; the specific results of the normal Husky model are at `sad_model.sad`, and the results of the full SAD model are at `sad_model.fsad`.

You can play around with inserting segments in different ways to get a better feel for the situations which result in missed triggers.

The other "gotcha" of `emode` to be aware of is that at the halfway point of the full reference, the current SAD score must be less than **half** of `scope.SAD.threshold` in order for a potential match to go ahead.

To illustrate, first we run the model with a single sample outside of `scope.SAD.interval_threshold`, which results in a match:

In [None]:
trace = np.asarray([random.randint(0,255) for _ in range(5000)], dtype=np.uint8)
trace[1000:1512] = sineref
trace[1032] += 2

scope.SAD.threshold = 4

In [None]:
sad_model = cw.SADModelWrapper(scope.SAD, catch_emisses=False)
sad_model.run(trace)
print(sad_model)

assert sad_model.num_triggers == 1
assert sad_model.sad.match_scores == [1]

But if we move two additional samples outside of `scope.SAD.interval_threshold`, no match will occur, even though the number of samples exceeding the threhsold is less than `scope.SAD.threshold`; this happens because all exceeding samples are in the first half of the pattern:

In [None]:
trace[1040] += 2
trace[1050] += 2

sad_model.run(trace)
print(sad_model)

assert sad_model.num_triggers == 0

The solution is to increase `scope.SAD.threshold` to at least twice the number of offending samples.

Because we've changed `scope.SAD`, we need to re-instantiate the model:

In [None]:
scope.SAD.threshold = 8

sad_model = cw.SADModelWrapper(scope.SAD, catch_emisses=False)
sad_model.run(trace)
print(sad_model)

assert sad_model.num_triggers == 1
assert sad_model.sad.match_scores == [3]

Go back to the `SADExplorer` to see how this can happen with real traces: the first AES round is an excellent example of this since it has several diverging samples early on in the reference pattern!

# ECC target

Next we show how to tune SAD to work against [micro-ecc](https://github.com/kmackay/micro-ecc).

Again we use pre-compiled firmware to have known good settings.

In [None]:
cw.program_target(scope, prog, "../../../firmware/mcu/simpleserial-ecc-notrace/simpleserial-ecc-fwtrigger-{}.hex".format(PLATFORM))
reset_target(scope)

In [None]:
target.simpleserial_write('i', b'')
time.sleep(0.1)
print(target.read())

In [None]:
TRACES = 'HARDWARE'
%run "../../courses/sca205/ECC_capture.ipynb"

In [None]:
scope.trigger.module = 'basic'
scope.trigger.triggers = 'tio4'

scope.adc.stream_mode = True
scope.adc.presamples = 0
scope.adc.samples = int(16e6)
scope.adc.segments = 1

The micro-ecc target firmware is not constant time; in order to have a known good SAD reference, we stick to these parameters:

In [None]:
k = 0x526a13ac66957d13622a9d872ff9302c47d6393237efaa4c0fc92c08febc5d2c
Px = 0xe479bb253840235126427b2cdff9a862601e1577c2abbc274d4b5372a45656ec
Py = 0x561fbeb30f276006b91ba1b81df8e3f3edf40f8ea000593b3a622610af02a50

In [None]:
# try different k / Px / Py!
#k = random_k()
#Px, Py = new_point()

In [None]:
reftrace = capture_ecc_trace(k, Px, Py)
scope.errors.clear()
print(scope.adc.trig_count)
assert scope.adc.trig_count == 15788560

In [None]:
import holoviews as hv
from holoviews.operation import decimate
from holoviews.operation.datashader import datashade
hv.extension('bokeh')
datashade(hv.Curve(reftrace.wave)).opts(width=2000, height=900)

The target firmware triggers at the start of each bit being processed by the point multiplication algorithm; let's peak at how long each bit takes:

In [None]:
ttimes = scope.trigger.get_trigger_times()

In [None]:
print(ttimes[:10])
print('Min: %d' % min(ttimes))
print('Max: %d' % max(ttimes))

# sanity check:
assert min(ttimes) == 105024 and max(ttimes) == 110900

Finding a good SAD reference for this target is much harder than it was for AES. This, which was found through much trial and error, should work:

In [None]:
scope.SAD.emode = True
refstart = 1028600

Let's plot a subset of the reference trace that covers about 3 bits somewhere in the middle of the operation:

In [None]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import Span

start = int(1e6)
samples = 300000
output_notebook()
p = figure(width=1800, tools='pan, box_zoom, hover, reset, save')

xrange = list(range(samples))
p.line(xrange, reftrace.wave[start:start+samples])

p.renderers.extend([Span(location=refstart-start, dimension='height', line_color='black', line_width=1)])
p.renderers.extend([Span(location=refstart-start+scope.SAD.sad_reference_length, dimension='height', line_color='black', line_width=1)])

show(p)

In [None]:
scope.trigger.module = 'SAD'
scope.SAD.threshold = 15
scope.SAD.interval_threshold = 20
scope.SAD.emode = True
scope.SAD.always_armed = False
scope.SAD.multiple_triggers = True
scope.SAD.reference = reftrace.wave[refstart:]
scope.adc.stream_mode = False

Due to Husky's sample storage limitations, you'll have to reduce `scope.adc.segments` to 187 (or less).

However, by turning on `scope.SAD.always_armed`, you can still see whether the expected number of SAD matches (255) occur.

You can also verify that the range of trigger times is in line with what was observed above.

This shows how a custom capture function can be provided to `SADExplorer`:

In [None]:
explorer = cw.SADExplorer(scope, target, reftrace.wave, refstart, max_segments=255, capture_function=lambda: capture_ecc_trace(k, Px, Py))

## Things to try:
1. Turn on `scope.adc.always_armed` to make sure that 255 matches are seen.
2. If you turn off `scope.SAD.emode`, you can capture all 255 segments.
3. How few samples are needed to reliably match? (reduce `scope.SAD.trigger_sample`).
4. We used a specific k/Px/Py to find our SAD reference. If you change any/all of k/Px/Py, does that reference still work?
5. Can you find another suitable SAD reference?
6. Can you use this towards an attack against micro-ecc? (see the [sca205](https://github.com/newaetech/chipwhisperer-jupyter/tree/ches2024sad/courses/sca205) series of notebooks to see how such an attack can be done)

In [None]:
# to use different k / Px / Py:
k = random_k()
Px, Py = new_point()

*known good parameters in hidden cell below:*

In [None]:
refstart = 1028600

# re-acquire reftrace if these changed!
k = 0x526a13ac66957d13622a9d872ff9302c47d6393237efaa4c0fc92c08febc5d2c
Px = 0xe479bb253840235126427b2cdff9a862601e1577c2abbc274d4b5372a45656ec
Py = 0x561fbeb30f276006b91ba1b81df8e3f3edf40f8ea000593b3a622610af02a50

scope.SAD.threshold = 15
scope.SAD.interval_threshold = 20
scope.SAD.emode = True
scope.SAD.always_armed = False
scope.SAD.reference = reftrace.wave[refstart:]

scope.adc.stream_mode = False
scope.adc.samples = 450
scope.adc.segments = 200

# Next Steps

The beauty of SAD is that it can be used with any target. We used a very specific target in this notebook because it's easier to teach SAD with known good parameters... but the next step is for **you** to gain experience with finding good SAD parameters from scratch, so go explore!