# Breaking Hardware ECC on CW305 FPGA, part 4

This builds on CW305_ECC parts 1, 2 and 3 notebooks; be sure to digest them before starting this one.

In this notebook, we study a fourth and final modifications to the target Verilog source code to reduce the side-channel leakage.

This modification will reveal some side-channel leakage which had not be noted previously.

The tutorial was developed with a CW-Pro with the CW305 100t target FPGA; the observations made in the attack's development should be accurate if you're using the same, but other combinations of CW-Pro / CW-Lite / CW-Husky / CW305 100t / 35t / CW312T-A35 may behave somewhat differently (some definitely do!).

## Setup

See CW305_ECC_part1.ipynb for explanations which are not repeated here.

In [None]:
#PLATFORM = 'CWLITE'
#PLATFORM = 'CWPRO'
PLATFORM = 'CWHUSKY'

In [None]:
TARGET_PLATFORM = 'CW305_100t'
#TARGET_PLATFORM = 'CW305_35t'
#TARGET_PLATFORM = 'CW312T_A35'

In [None]:
TRACES = 'HARDWARE' # if you have the required capture+target hardware: capture actual traces
#TRACES = 'SIMULATED' # if you don't have capture+target hardware: use pre-captured traces (these traces were obtained using CW-Husky with a  CW305_100t)

In [None]:
import chipwhisperer as cw
import time

if TRACES != 'SIMULATED':
    scope = cw.scope()
    if TARGET_PLATFORM == 'CW312T_A35':
        scope.io.hs2 = 'clkgen'
        fpga_id = 'cw312t_a35'
        platform = 'ss2'
    else:
        scope.io.hs2 = "disabled"
        platform = 'cw305'
        if TARGET_PLATFORM == 'CW305_100t':
            fpga_id = '100t'
        elif TARGET_PLATFORM == 'CW305_35t':
            fpga_id = '35t'

    target = cw.target(scope, cw.targets.CW305_ECC, force=False, fpga_id=fpga_id, platform=platform)
    
    # ensure ADC is locked:
    time.sleep(0.5)
    scope.clock.reset_adc()
    assert (scope.clock.adc_locked), "ADC failed to lock"

%run "CW305_ECC_setup.ipynb"

Occasionally the ADC will fail to lock on the first try; when that happens, the above assertion will fail (and on the CW-Lite, the red LED will be on). Simply re-running the above cell again should fix things.

# Attempt #4

While attempt #3 was shown to be effective, we will now compare it with a much more expensive approach: a complete duplication of the target core.

Since the side-channel leakage originates from the differences when a bit of $k$ is 0 or 1, and since the time required to process each bit of $k$ is always the same, it stands to reason that instantiating a second copy of the target core which processes the inverse of $k$ in parallel with (at the same time as) the original core could also be an effective (albeit expensive!) countermeasure.

In [None]:
change_bitfile('attempt4')

In [None]:
k = 0xffffffffffffffffffffffffffffffff00000000000000000000000000000000
traces = get_traces(1, k, 'part4_1', full=True)

Let's begin by looking at the raw difference between ones and zeros, as we did for the other attempts:

In [None]:
samples = 4204
trace = traces[0]
avg_ones = np.zeros(samples)
for start in cycles[1:128]:
    avg_ones += trace.wave[start:start+samples]
avg_ones /= 128

avg_zeros = np.zeros(samples)
for start in cycles[128:256]:
    avg_zeros += trace.wave[start:start+samples]
avg_zeros /= 128

In [None]:
from bokeh.plotting import figure, show
from bokeh.resources import INLINE
from bokeh.io import push_notebook, output_notebook
from ipywidgets import interact, Layout

output_notebook(INLINE)
s = figure(plot_width=2000)

xrange = range(len(avg_ones))
s.line(xrange, avg_ones - avg_zeros, line_color="orange")

In [None]:
show(s)

This is a **drastically** different picture from the one we are used to!

Let's again compare it to the leakage from the original target bitfile:

In [None]:
change_bitfile('original')

In [None]:
otraces = get_traces(1, k, 'part1_1', full=True)
otrace = otraces[0]

oavg_ones = np.zeros(samples)
for start in cycles[1:128]:
    oavg_ones += otrace.wave[start:start+samples]
oavg_ones /= 128

oavg_zeros = np.zeros(samples)
for start in cycles[128:256]:
    oavg_zeros += otrace.wave[start:start+samples]
oavg_zeros /= 128

In [None]:
from bokeh.models import Legend, LegendItem

diff = figure(plot_width=2000)

odiff = oavg_ones - oavg_zeros
newdiff = avg_ones - avg_zeros

xrange = range(len(newdiff))
O = diff.line(xrange, odiff, line_color="black")
N = diff.line(xrange, newdiff, line_color="orange")

legend = Legend(items=[
    LegendItem(label='original 0/1 difference', renderers=[O]),
    LegendItem(label='new 0/1 difference', renderers=[N]),
])
diff.add_layout(legend)


In [None]:
show(diff)

Zooming in around cycles 6 and 4202 shows that leakage still peaks around those clock cycles, but now we find numerous other peaks in addition to the original peaks.

If we amplify the original leakage, we find that the new leakage observed with attempt 4 actually also appears with attempt 1; it's just these new leakage points are much weaker in the original bitfile:

(This is evident if you zoom in around many of the larger orange peaks, for example cycles 1600 or 3100.)

In [None]:
diff = figure(plot_width=2000)

scale = np.max(newdiff) / np.max(odiff[100:4000])

xrange = range(len(newdiff))
diff.line(xrange, odiff*scale, line_color="black")
diff.line(xrange, newdiff, line_color="orange")

In [None]:
show(diff)

Let's visualize the DoM distinguisher, first using only our original DoM markers (cycles 6, 7, 4202 and 4203):

In [None]:
change_bitfile('attempt4')

In [None]:
k = 0x0000ffffffffff000000000000ffff00aaaa0000cccc00001111000033330000
traces = get_traces(30, k, 'part4_2', full=False, samples_per_segment=64)

In [None]:
poi = [4202, -4203, -6, 7]

In [None]:
def update_corrected_plot(no_traces):
    SSC.data_source.data['y'] = get_corrected_sums(traces[:no_traces], poi)
    push_notebook()

In [None]:
SC = figure(plot_width=1200, x_axis_label='k bit index', y_axis_label='D')

xrange = range(len(cycles)-1)
sums = get_corrected_sums(traces[:15], poi)
SSC = SC.line(xrange, sums, line_color='black')
SC.xaxis.axis_label_text_font_size = '20pt'
SC.yaxis.axis_label_text_font_size = '20pt'
SC.xaxis.major_label_text_font_size = '14pt'
SC.yaxis.major_label_text_font_size = '14pt'
SC.title.text_font_size = '20pt'

In [None]:
show(SC, notebook_handle=True)

In [None]:
interact(update_corrected_plot, no_traces=(1, len(traces)))

Visually, this does seem worse than the original target.

We'll go through our usual attack, but before we get to that let's see what happens if we add the newly identified leakage markers. There seem to be *a lot* of these so let's use an automated process to extract them.

Let's re-center our 1 vs 0 plot, and pick `POS_THRESHOLD` and `NEG_THRESHOLD` as the minimum thresholds for selecting the largest peaks for our new list of markers.

In [None]:
from bokeh.models import Span

POS_THRESHOLD = 0.045
NEG_THRESHOLD = -0.045
    
# in case samples were recorded as ints, translate result to make it as though they were floats, so that the *THRESHOLDS can cover both cases:
if 'int' in str(type(traces[0].wave[0])):
    if PLATFORM == 'CWPRO':
        div = 2**10
    # infer whether trace was collected with 8 or 12 bits per sample:
    elif max(abs(traces[0].wave)) > 255:
        div = 2**12
    else:
        div = 2**8
else:
    div = 1

diff = figure(plot_width=2000)

xrange = range(len(avg_ones))
diff.line(xrange, (newdiff - np.average(newdiff))/div, line_color="red")

pos_threshold = Span(location=POS_THRESHOLD, dimension='width', line_color='black')
neg_threshold = Span(location=NEG_THRESHOLD, dimension='width', line_color='black')
diff.renderers.extend([pos_threshold, neg_threshold])

In [None]:
show(diff)

The default values for `POS_THRESHOLD` and `NEG_THRESHOLD` should work well for CW-Husky with a 100t target; you may need to adjust them if you end up with too few or too many markers, but this is not an exact science.

In [None]:
avg = (newdiff - np.average(newdiff))/div
poi = list(np.where(avg > POS_THRESHOLD)[0]) + list(-np.where(avg < NEG_THRESHOLD)[0])

assert len(poi) > 100 and len(poi) < 400, "Got %d markers; goal is >100 and <400. Tweak POS_THRESHOLD and NEG_THRESHOLD until this passes." % len(poi)

In [None]:
if TRACES == 'HARDWARE':
    num_traces = 30
else:
    num_traces = 1
# need to acquire a full trace to use all the POIs; we can show some results with a single trace:
traces = get_traces(num_traces, k, 'part4_3', full=True)

In [None]:
from bokeh.models import Label

SCnew = figure(plot_width=1200, x_axis_label='k bit index', y_axis_label='D')

xrange = range(len(cycles)-1)
sums = get_corrected_sums([traces[0]], poi)
SSCnew = SCnew.line(xrange, sums, line_color='black')
SCnew.xaxis.axis_label_text_font_size = '20pt'
SCnew.yaxis.axis_label_text_font_size = '20pt'
SCnew.xaxis.major_label_text_font_size = '14pt'
SCnew.yaxis.major_label_text_font_size = '14pt'
SCnew.title.text_font_size = '20pt'

k_text = Label(x=5, y=-10, text='k = {16 zeros, 40 ones, ...}')

SCnew.add_layout(k_text)

In [None]:
show(SCnew)

This is a staggering result: from a **single trace**, we get an astonishingly clear distinguisher which coincides with the location of the leading one bit of $k$.

The source of this new leakage can be readily found with a quick look at a simulation waveform (return to part 1 for how to do this): we find that intermediate results written to the `bram_rz` target memory are 256’d1 (255 zeros followed by a single 1) as long as the target is processing leading zeros; when the first 1 is encountered, data written to `bram_rz` changes to random-looking data with a Hamming weight of around 128.

Go back and play with different values of $k$ to confirm that this is what is happening.

In the original core, only the leading one is leaked -- you can confirm this by re-running the capture above with the original bitfile (but keeping the new expanded list of markers).

However with attempt #4, the leading zero is **also** leaked (since that ends up being the leading one for the second core). **This is a great example for the unintended consequences of countermeasures!**

None of the other bits of $k$ are leaked from this marker, so on the surface these new markers may appear less useful than the ones we had used until now. But the result above suggests that this marker could be 100% reliable at finding the leading 0 and 1 from a single trace, which avoids having to deal with bad guesses and the throwing away of guesses for which we do not have sufficient confidence.

So it's possible that an improved, better attack could be build from this. But for the sake of finishing our comparison with our three other countermeasure attempts (and the original design), we will continue here with our original attack.

Before doing so, let's have a closer look at the DoM components. Skip this if you're using pre-recorded traces, since these have been omitted due to space constraints:

In [None]:
if TRACES == 'HARDWARE':
    poicomponents = figure(plot_width=1200, x_axis_label='k bit index', y_axis_label='D')

    xrange = range(len(cycles)-1)
    poi = [4202, -4203, -6, 7]

    sumsall = get_corrected_sums(traces, [4202, -4203, -6, 7])
    sumscomp = get_corrected_sums(traces, [-4202, 4203, -6, 7])
    sums6 = get_corrected_sums(traces, [-6, 7])
    sums4202 = get_corrected_sums(traces, [4202, -4203])

    poicomponents.line(xrange, sumsall, line_color='black')
    poicomponents.line(xrange, sumscomp, line_color='red')
    poicomponents.line(xrange, sums6, line_color='blue', line_width=2)
    poicomponents.line(xrange, sums4202, line_color='orange')
    poicomponents.xaxis.axis_label_text_font_size = '20pt'
    poicomponents.yaxis.axis_label_text_font_size = '20pt'
    poicomponents.xaxis.major_label_text_font_size = '14pt'
    poicomponents.yaxis.major_label_text_font_size = '14pt'
    poicomponents.title.text_font_size = '20pt'

    show(poicomponents)

This plot shows the countermeasure's effectiveness: while the leakage at cycles 6-7 is preserved (blue curve), the leakage at cycles 4202-4203 (orange curve) now only reveals the leading 1.

If we combine all the markers (black curve), the leading 1 markers effectively cancel each other out. This could be addressed by complementing the 4202-4203 component, but this adds noise to the rest of the measurements (red curve).

It appears we are better off using only the leakage from cycles 6-7 for our attack.

Some small tweaks are required for our guessing methodology:
1. there is now a single guessing threshold (not two)
2. we can't guess the first bit: we will leave it unknown and end with 4 possible values of $k$ for each guess (instead of 2).

In [None]:
if TRACES == 'SIMULATED':
    traces = get_traces(30, k, 'part4_4', full=False, samples_per_segment=64)

In [None]:
poi = [-6, 7]
sums = get_corrected_sums(traces, poi)

poi_init_threshold = None
poi_reg_threshold = (np.average(sums[104:119]) - np.average(sums[56:103]))/2 + np.average(sums[56:103])

print('threshold: %3.2f' % poi_reg_threshold)

attempt4thresholds = [poi_init_threshold, poi_reg_threshold]

And we carry out our usual sanity check, to see that we correctly guess $k$ when multiple traces are averaged:

In [None]:
sums = get_corrected_sums(traces, poi)
guess = poi_guess(sums, attempt4thresholds)
print("DoM: %s" % check_guess(guess, k)[0])

Now we can finally see how many errors we make on single-trace attacks, on average:

In [None]:
traces = get_traces(100, k, 'part4_5', randomize_k=True, full=False, samples_per_segment=64)

In [None]:
wrong_bits = []
for trace in traces:
    sums = get_corrected_sums([trace], poi)
    guess = poi_guess(sums, attempt4thresholds)
    wrong_bits.append(check_guess(guess, trace.textin['k'])[1])

print('Average wrong bits per trace: %f' % np.average(wrong_bits))
print('Minimum wrong bits per trace: %f' % min(wrong_bits))
print('Maximum wrong bits per trace: %f' % max(wrong_bits))

attempt1_average_wrong_bits = np.average(wrong_bits)

Despite its flaws, this countermeasure is effective against our attack (which, reminder, could be improved here!): the average number of wrong bits per single-trace guess should be almost as high as what we obtained with attempt #3 in part 3 of this series.

# The Hidden Number Problem

As in part 3, we conclude by measuring the number of traces with sufficient and good consecutive guesses, using segmented traces for efficiency.

In the interest of storage constraints, these traces are not saved, so you'll need the required hardware to run this.

In [None]:
trace_segments = get_trace_segments(N=5000, poi=poi, randomize_k=True, husky_timed_segments=True)

In [None]:
consecutives(trace_segments=trace_segments, poi=poi, distance_threshold=0.91, thresholds=attempt4thresholds)

Some adjustment on `threshold` may be required; you should find substantially fewer good traces compared to the original target results from part 2, but more than with attempt #3.

# Conclusion

In this part we learned more about the unintended effects of countermeasures.

Attempt #4 is much more expensive than attempt #3, yet it performs less well against our attack, **and** it introduces additional leakage which could be leveraged by a different attack.

In part 5 we'll take a look at what TVLA can tell us about our target.