# Breaking Hardware ECC on CW305 FPGA, part 2

This builds on CW305_ECC.ipynb; be sure to digest that one first before starting this one.

In this notebook, we improve the original attack and show that the Difference of Means (DoM) approach can work better than originally thought, with some small tweaks.

The tutorial was developed with a CW-Pro with the 100t FPGA; the observations made in the attack's development should be accurate if you're using the same, but other combinations of CW-Pro / CW-Lite / CW-Husky / 100t / 35t may behave somewhat differently.

## Setup

See CW305_ECC.ipynb for explanations which are not repeated here.

In [None]:
#PLATFORM = 'CWLITE'
PLATFORM = 'CWPRO'
#PLATFORM = 'CWHUSKY'

In [None]:
import chipwhisperer as cw
scope = cw.scope()
target = cw.target(scope, cw.targets.CW305_ECC, fpga_id='100t', force=False) # or fpga_id='35t', as appropriate

In [None]:
%run "CW305_ECC_setup.ipynb"

In [None]:
change_bitfile('original')

In [None]:
# ensure ADC is locked:
scope.clock.reset_adc()
assert (scope.clock.adc_locked), "ADC failed to lock"

Occasionally the ADC will fail to lock on the first try; when that happens, the above assertion will fail (and on the CW-Lite, the red LED will be on). Simply re-running the above cell again should fix things.

## Trace Capture
We start just like we did in the first part of this tutorial, by using a scalar for which we can very easily distinguish ones from zeros. Remember that k is the secret that we want to be able to retrieve with our side-channel attack.

In [None]:
k = 0xffffffffffffffffffffffffffffffff00000000000000000000000000000000
traces = get_traces(20, k)

In the first part, we omitted cycle 4203; including it improves the results:

In [None]:
poi = [4202, -4203, -6, 7]

In [None]:
def update_plot(no_traces):
    SS.data_source.data['y'] = get_sums(traces[:no_traces], poi)
    push_notebook()

In [None]:
from bokeh.plotting import figure, show
from bokeh.resources import INLINE
from bokeh.io import push_notebook, output_notebook
from ipywidgets import interact, Layout

output_notebook(INLINE)

S = figure(plot_width=1200, x_axis_label='k bit index', y_axis_label='D')

xrange = range(len(cycles))
sums = get_sums(traces, poi)
SS = S.line(xrange, sums, line_color='black')
S.xaxis.axis_label_text_font_size = '20pt'
S.yaxis.axis_label_text_font_size = '20pt'
S.xaxis.major_label_text_font_size = '14pt'
S.yaxis.major_label_text_font_size = '14pt'
S.title.text_font_size = '20pt'

In [None]:
show(S, notebook_handle=True)

In [None]:
interact(update_plot, no_traces=(1, len(traces)))

In the first part of the tutorial, we learned that the leakage from cycles 6 and 7 during the processing of bit $i$ is actually linked to the value of bit $i-1$.

The correlation attack accounted for this, but the difference of means attack did not! Let's correct that and see the effect on the results:

In [None]:
def update_corrected_plot(no_traces):
    SSC.data_source.data['y'] = get_corrected_sums(traces[:no_traces], poi)
    push_notebook()

In [None]:
SC = figure(plot_width=1200, x_axis_label='k bit index', y_axis_label='D')

xrange = range(len(cycles)-1)
sums = get_corrected_sums(traces, poi)
SSC = SC.line(xrange, sums, line_color='black')
SC.xaxis.axis_label_text_font_size = '20pt'
SC.yaxis.axis_label_text_font_size = '20pt'
SC.xaxis.major_label_text_font_size = '14pt'
SC.yaxis.major_label_text_font_size = '14pt'
SC.title.text_font_size = '20pt'

In [None]:
show(SC, notebook_handle=True)

In [None]:
interact(update_corrected_plot, no_traces=(1, len(traces)))

On the above plots this doesn't seem to make much difference. But the fix only matters when successive $k$ bits differ, which for these traces occurs only once.

Let's now measure traces with a patterned $k$.

In [None]:
k = 0x0000ffffffffff000000000000ffff00aaaa0000cccc00001111000033330000
traces = get_traces(30, k)

If you re-run the previous cells to get the interactive plot for this new set of traces, you'll see that alternating 1/0 bits are properly distinguished (recall that they were not in part 1).

We also see that the initial zeros have a different signature, and that the leading one also has a distinct signature.

The SNR for these appears at least as good as that of the rest of the bits, so this should not pose a problem.

Next we extract decision threshold from the collected traces, based on our known fixed $k$. In part 1, this was done manually, by visual inspection. By doing it programmatically here, we should be immune to differences in target and/or capture equipment.

We'll later use these threshold to guess arbitrary $k$.

In [None]:
sums = get_corrected_sums(traces, poi)

In [None]:
poi_init_threshold = sums[16] - (sums[16] - np.average(sums[:16]))/2
poi_reg_threshold = (np.average(sums[103:119]) - np.average(sums[56:103]))/2 + np.average(sums[56:103])
thresholds = [poi_init_threshold, poi_reg_threshold]

print('Init threhold: %3.2f, regular threshold: %3.2f' % (poi_init_threshold, poi_reg_threshold))

In [None]:
from bokeh.models import Span

S = figure(plot_width=1200, x_axis_label='k bit index', y_axis_label='D')

xrange = range(len(cycles)-1)
S.line(xrange, sums, line_color='black')

ithreshold = Span(location=poi_init_threshold, dimension='width', line_color='green', line_width=2)
rthreshold = Span(location=poi_reg_threshold, dimension='width', line_color='blue', line_width=2)
S.renderers.extend([ithreshold, rthreshold])

S.xaxis.axis_label_text_font_size = '20pt'
S.yaxis.axis_label_text_font_size = '20pt'
S.xaxis.major_label_text_font_size = '14pt'
S.yaxis.major_label_text_font_size = '14pt'
S.title.text_font_size = '20pt'

In [None]:
show(S)

Let's do the same automatic decision threshold extraction for the correlation attack:

In [None]:
rupdate_offset = 4195
rupdate_cycles = 8
rxread_offset = 205
ryread_offset = 473
rzread_offset = 17

In [None]:
corrs = get_corrs(traces)
corr_init_threshold = (np.average(corrs[1:16]) - corrs[16])/2 + corrs[16]
corr_reg_threshold = (np.average(corrs[16:56]) - np.average(corrs[56:104]))/2 + np.average(corrs[56:104])

print('Init threhold: %3.2f, regular threshold: %3.2f' % (corr_init_threshold, corr_reg_threshold))

In [None]:
from bokeh.models import Span

CC = figure(plot_width=1200, x_axis_label='k bit index', y_axis_label='D')

xrange = range(len(cycles)-1)
CC.line(xrange, corrs, line_color='red')

ithreshold = Span(location=corr_init_threshold, dimension='width', line_color='green', line_width=2)
rthreshold = Span(location=corr_reg_threshold, dimension='width', line_color='blue', line_width=2)
CC.renderers.extend([ithreshold, rthreshold])


In [None]:
show(CC)

Just going by the visual appearance of these results, we can be pretty sure that the DoM metric will outperform the correlation metric: here some $k$ bits are *very* close to the decision threhold. This is not the case for the DoM metric.

We now have all that's required to check guesses, so let's do it:

In [None]:
sums = get_corrected_sums(traces, poi)
guess = poi_guess(sums, thresholds)
print("DoM: %s" % check_guess(guess, k)[0])

corrs = get_corrs(traces)
guess = corr_guess(corrs)
print("Correlation: %s" % check_guess(guess, k)[0])

The DoM metric should be successful; the correlation metric is *usually* succesful but it's possible to have a handful of wrong bits.

Next we'll pick a random $k$, collect many traces, and see how many traces are required to fully recover $k$, and at the same time see how many bits of $k$ are correctly guessed as we reduce the number of traces used in the attack.

We'll apply the correlation attack at the same time to compare the two approaches.

In [None]:
k = random_k()
traces = get_traces(20, k)

In [None]:
print("Number of bits guessed wrong:")
print("# traces DoM Correlation")
for no_traces in range(len(traces), 0, -1):
    sums = get_corrected_sums(traces[:no_traces], poi)
    guess = poi_guess(sums, thresholds)
    print("%3d    %3d " % (no_traces, check_guess(guess, k)[1]), end='')

    corrs = get_corrs(traces[:no_traces])
    guess = corr_guess(corrs)
    print("%3d" % check_guess(guess, k)[1])

Clearly the DoM metric outperforms the correlation metric. From this point we will use only the DoM metric.

The next step is to see how many bits are guessed correctly from a *single* trace, on average. We are no longer averaging traces: we collect a single trace for a random $k$ and run the attack on that single trace, and we repeat this many times to get the average number of correctly guessed bits.

We do this to get closer to a real-world attack: recall that in normal usage, $k$ is only used for a single point multiply operation, which means that an attacker does not get to take the average of several traces.

In [None]:
traces = get_traces(100, randomize_k=True)

In [None]:
wrong_bits = []
for trace in traces:
    sums = get_corrected_sums([trace], poi)
    guess = poi_guess(sums, thresholds)
    wrong_bits.append(check_guess(guess, trace.textin['k'])[1])

print('Average wrong bits per trace: %f' % np.average(wrong_bits))
print('Minimum wrong bits per trace: %f' % min(wrong_bits))
print('Maximum wrong bits per trace: %f' % max(wrong_bits))

These are great results: from a *single trace*, we can correctly guess most bits of $k$!

Unfortunately, we don't know (yet) *which* bits we are correctly guessing, so we're not done yet.

But we might be able to do more: it's reasonable to assume that correctly guessed bits tend to be further away from the decision thresholds. This may allow us to determine which bits can be correctly guessed. Let's see if this is the case.

To do this, we need to define a new threshold to express how far the DoM measurement needs to be from the decision threshold in order for a bit guess to be accepted.

In the `poi_guess_threshold()` function, we look at the distance that each DoM metric is from the decision threshold; we then take the difference between the maximum distance and the average distance, `base`. We then accept guesses as "good" if they are at least `threshold * base` away from the decision threshold. `threshold` must be greater than 0; the larger it is, the fewer bit guesses are accepted.

There is no exact science behind this -- just heuristics!

Let's see how many bit guesses get accepted as we vary the threshold, using the last collected power trace:

In [None]:
for threshold in [0.1, 0.3, 0.5, 0.7, 0.9]:
    guess, guessed_bits = poi_guess_threshold(sums, threshold, thresholds)
    print("Treshold = %f: accepting %d guesses" % (threshold, len(guessed_bits)))

To visually validate, let's plot the location of accepted guesses on the DoM plot:

In [None]:
guess, guessed_bits = poi_guess_threshold(sums, 0.7, thresholds)

In [None]:
# Plot correlations, with thresholds, and wrong bits annotated:

T = figure(plot_width=1800)
xrange = range(len(sums))
T.line(xrange, sums, line_color="red", line_width=2)
rthreshold = Span(location=poi_reg_threshold, dimension='width', line_color='blue', line_width=2)
ithreshold = Span(location=poi_init_threshold, dimension='width', line_color='green', line_width=2)
T.renderers.extend([rthreshold, ithreshold])
for b in guessed_bits:
    T.renderers.extend([Span(location=b, dimension='height', line_color='black', line_width=1)])


In [None]:
show(T)

This should show that we are indeed picking the bits where the DoM is furthest away from the decision threhsold.

The next question is, can we set `threshold` such that all accepted guesses are actually good guesses?

In [None]:
threshold = 0.7

wrong_bits = []
solid_guessed_bits = []
total_wrong_bits = 0
total_solid_guessed_bits = 0
total_right_solid_guesses = 0
total_wrong_solid_guesses = 0
correct_solid_guesses = []
all_wrong_bits = []
    
for trace in traces:
    sums = get_corrected_sums([trace], poi)
        
    guess, tguessed_bits = poi_guess_threshold(sums, threshold, thresholds)
    (status, num_wrong_bits, twrong_bits) = check_guess(guess, trace.textin['k'])

    total_wrong_bits += num_wrong_bits
    all_wrong_bits.append(num_wrong_bits)
    total_solid_guessed_bits += len(tguessed_bits)
    
    wrong_solid_guesses = len(set(twrong_bits) & set(tguessed_bits))
    right_solid_guesses = len(tguessed_bits) - wrong_solid_guesses
    
    total_wrong_solid_guesses += wrong_solid_guesses
    total_right_solid_guesses += right_solid_guesses
        
    wrong_bits.append(twrong_bits)
    solid_guessed_bits.append(tguessed_bits)
    
    correct_solid_guesses.append(list(set(tguessed_bits) - set(twrong_bits)))
    
print('All results are per-trace averages:')
print('Average number of wrong bits (all guesses):     %5.1f' % (total_wrong_bits/len(traces)))
print('Average number of solid guessed bits:           %5.1f' % (total_solid_guessed_bits/len(traces)))
print('Average number of correct solid guessed bits:   %5.1f' % (total_right_solid_guesses/len(traces)))
print('Average number of incorrect solid guessed bits: %5.1f' % (total_wrong_solid_guesses/len(traces)))

In the output above, "solid guessed bits" are guesses which we have accepted because they are above our filtering threshold.

The last line, "incorrect solid guessed bits" are the number of accepted bit guesses that we hope are good but that are actually incorrect.

Using `threshold=0.7`, you should find averages of around 14 correct guesses and 2 incorrect guesses (these numbers can vary a bit; you may need to tweak `threshold`).

Increasing `threshold` to around 0.85 should bring the average number of incorrect guesses below 1.

# The Hidden Number Problem

Thankfully, we are not done: it turns out that if you correctly guess enough bits and repeat this many times (i.e. make a lot of single-trace guesses, each for a different $k$), you can recover a full $k$ even if you could only guess a handful of bits for any given trace. This is thanks to the **Hidden Number Problem** (HNP). We won't go into the details of it because it's not simple, but HNP is a well-known and commonly used approach for side-channel analysis of public-key cryptography.

A recent paper which used this technique is [A Side Journey to Titan](https://ninjalab.io/a-side-journey-to-titan/) from Ninjalabs. We follow their approach, which consists of collecting a large number of traces and keeping only the traces which meet at least one of the following conditions:
1. at least three runs of 3 correct consecutive bit guesses
2. at least two runs of 4 correct consecutive bit guesses
3. at least one runs of 5 correct consecutive bit guesses

We leave the actual solving of the HNP as a (non-trivial!) exercise to the reader; we finish here by seeing what is the percentage of traces that meet the above conditions.

To get a good estimate, we need to collect a large number of traces. Since the traces are long, and yet we only care about $4 \times 256 = 1024$ measurements from each 1.2 million point trace, it makes sense to modify our trace capture function to only save the points of interest from each trace.

On Husky, we can get faster captures by using the new timed segmented feature. This also allows us to turn off streaming mode, which means that the target clock could be increased for even faster captures (we don't do this here, and if you do, you'll have to re-establish the decision thresholds). The timed segmented capture works like this:

First, `scope.adc.segments` sets the number of segments to capture: 256 (one segment for each bit of $k$).

Then, `scope.adc.segment_cycles` sets the interval at which we want to capture segments: 4204, the processing time for one bit, so that each segments starts at the same time index within the bit processing time.

Finally, `scope.adc.samples` sets the number of samples to collect *per segment*: 10 (for a total of 10 * `scope.adc.segments` = 2560 samples).
```
        scope.adc.segments = 256
        scope.adc.segment_cycles = 4204
        scope.adc.samples = 10
```

We also set `scope.adc.offset` so that we capture the samples we are interested (e.g. our `poi` samples: `[6, 7, 4202, 4203]`); in this case we set it to 42 (start of first bit) + 4202 (offset into POI sample number 4202 within the bit) + 3 (Husky ADC offset). Then, once the segmented trace has been captured, we reconstruct a trace which retains only the four POI samples for each bit.

In [None]:
poi = [-6, 7, 4202, -4203] # pois need to be in this particular order for the Husky timed segmented capture to work
trace_segments = get_trace_segments(N=5000, poi=poi, randomize_k=True, husky_timed_segments=True)

In [None]:
consecutives(trace_segments=trace_segments, poi=poi, distance_threshold=0.70, thresholds=thresholds)

You should obtain approximately 4 good traces and 1 bad trace; some tweaking of `threshold` may be required.

The 5000 traces collected here are likely not sufficient for solving the EHNP; this is only intended as a demonstration to show that it is possible to make sufficient consecutive accurate guesses.

# Conclusion

In this part of the demo we've provided a roadmap towards an attack against real-world ECDSA, where each $k$ is only ever used once. Solving the EHNP is left as an exercise to the reader.

In the next part we'll shift gears and look at design improvements to reduce the side-channel leakage. We'll use the attack developed here to evaluate the efficacy of our new countermeasures.