# Breaking Hardware ECC on CW305 FPGA, part 3

This builds on CW305_ECC.ipynb and CW305_ECC_part2.ipynb; be sure to digest them before starting this one.

In this notebook, we study how small modifications to the target Verilog source code which may be able to reduce the side-channel leakage.

We'll try three different approaches; for each, we'll evaluate the countermeasure efficacy by running the attacks developed in the previous notebook in this series.

The tutorial was developed with a CW-Pro with the 100t FPGA; the observations made in the attack's development should be accurate if you're using the same, but other combinations of CW-Pro / CW-Lite / CW-Husky / 100t / 35t may behave somewhat differently.

## Setup

See CW305_ECC.ipynb for explanations which are not repeated here.

In [None]:
#PLATFORM = 'CWLITE'
#PLATFORM = 'CWPRO'
PLATFORM = 'CWHUSKY'

In [None]:
import chipwhisperer as cw
scope = cw.scope()
target = cw.target(scope, cw.targets.CW305_ECC, fpga_id='100t', force=False) # or fpga_id='35t', as appropriate

In [None]:
%run "CW305_ECC_setup.ipynb"

In [None]:
# ensure ADC is locked:
scope.clock.reset_adc()
assert (scope.clock.adc_locked), "ADC failed to lock"

Occasionally the ADC will fail to lock on the first try; when that happens, the above assertion will fail (and on the CW-Lite, the red LED will be on). Simply re-running the above cell again should fix things.

# Attempt #1

The first countermeasure attempt is a naive approach at masking the leakage caused by `move_inhibit` (see part 1 for discussion of this). Briefly: we duplicate the BRAMs which are the destination memories for the `move_inhibit`-controlled memory writes, so that when `move_inhibit` is set, we write the result to the new memories (instead of blocking the write to the original memories).

The new memories are never read from in normal operation; they are dummy memories, there for the sole purpose of carrying out a memory write, to make the power signature "look" the same indepedent of `move_inhibit`.

We take some care to ensure that the original and dummy memories are as alike as possible, but we don't take any precautions for their relative placement on the FPGA.

Each countermeasure attempt has its own FPGA bitfile; we begin by loading the appropriate bitfile:

In [None]:
change_bitfile('attempt1')

In [None]:
k = 0xffffffffffffffffffffffffffffffff00000000000000000000000000000000
traces = get_traces(1, k)

Let's begin by looking at the raw difference between ones and zeros, as we did in part 1:

In [None]:
samples = 4204
trace = traces[0]
avg_ones = np.zeros(samples)
for start in cycles[1:128]:
    avg_ones += trace.wave[start:start+samples]
avg_ones /= 128

avg_zeros = np.zeros(samples)
for start in cycles[128:256]:
    avg_zeros += trace.wave[start:start+samples]
avg_zeros /= 128

In [None]:
from bokeh.plotting import figure, show
from bokeh.resources import INLINE
from bokeh.io import push_notebook, output_notebook
from ipywidgets import interact, Layout

output_notebook(INLINE)
s = figure(plot_width=2000)

xrange = range(len(avg_ones))
s.line(xrange, avg_ones - avg_zeros, line_color="orange")

In [None]:
show(s)

The leakage is still present! Let's quickly compare it to the leakage from the original target bitfile:

In [None]:
change_bitfile('original')

In [None]:
otraces = get_traces(1, k)
otrace = otraces[0]

oavg_ones = np.zeros(samples)
for start in cycles[1:128]:
    oavg_ones += otrace.wave[start:start+samples]
oavg_ones /= 128

oavg_zeros = np.zeros(samples)
for start in cycles[128:256]:
    oavg_zeros += otrace.wave[start:start+samples]
oavg_zeros /= 128

In [None]:
diff = figure(plot_width=2000)

odiff = oavg_ones - oavg_zeros
newdiff = avg_ones - avg_zeros

compressed_odiff = np.append(odiff[0:15], odiff[4195:])
compressed_newdiff = np.append(newdiff[0:15], newdiff[4195:])
xrange = range(len(compressed_newdiff))
diff.line(xrange, compressed_odiff, line_color="black")
diff.line(xrange, compressed_newdiff, line_color="orange", line_width=2)

In [None]:
show(diff)

Not only is the leakage still present -- we've actually increased it! 

The leakage at cycles 6-7 is easiest to understand. First, a bit more background which wasn't explicitely discussed in part 1, but which you may have guessed at: from the simulation, we can see that during cycles 6-7, the core is reading back the data which was written during cycles 4202-4203. So when `move_inhibit` is **not** set, the core writes data and then reads the same data back, and when it **is** set, the core does not write data and then reads **different** data back. So if the leakage is due to reading the same data which was just written, then our countermeasure hasn't changed anything to prevent this leakage.

As for the leakage at cycles 4202{4203, our original hypothesis was that this leakage was due to the act of writing (versus not writing) the target memory. These results suggest that the leakage originates from the *control logic* for the writes, rather than thewrites themselves. The countermeasure did not eliminate the secret-dependent write control logic; it merely altered it.

This attempt is included to illustrate that hiding leakage is not as easy as it looks! Before moving on, let's look at how many bits we can correctly guess on a single trace.

First we reload the new bitfile:

In [None]:
change_bitfile('attempt1')

Then we establish the decision thresholds using a known $k$:

In [None]:
k = 0x0000ffffffffff000000000000ffff00aaaa0000cccc00001111000033330000
traces = get_traces(30, k)

The results above show that the leakage times at the end of the bit processing have shifted by one cycle, so we make a change to the `poi` array; then we establish the thresholds as usual:

In [None]:
poi = [4201, -4202, -6, 7]

sums = get_corrected_sums(traces, poi)

poi_init_threshold = sums[16] - (sums[16] - np.average(sums[:16]))/2
poi_reg_threshold = (np.average(sums[103:119]) - np.average(sums[56:103]))/2 + np.average(sums[56:103])

print('Init threhold: %3.2f, regular threshold: %3.2f' % (poi_init_threshold, poi_reg_threshold))

attempt1thresholds = [poi_init_threshold, poi_reg_threshold]

As a sanity check, we check that we correctly guess $k$ when multiple traces are averaged:

In [None]:
sums = get_corrected_sums(traces, poi)
guess = poi_guess(sums, attempt1thresholds)
print("DoM: %s" % check_guess(guess, k)[0])

Now we can finally see how many errors we make on single-trace attacks, on average:

In [None]:
traces = get_traces(100, randomize_k=True)

In [None]:
wrong_bits = []
for trace in traces:
    sums = get_corrected_sums([trace], poi)
    guess = poi_guess(sums, attempt1thresholds)
    wrong_bits.append(check_guess(guess, trace.textin['k'])[1])

print('Average wrong bits per trace: %f' % np.average(wrong_bits))
print('Minimum wrong bits per trace: %f' % min(wrong_bits))
print('Maximum wrong bits per trace: %f' % max(wrong_bits))

attempt1_average_wrong_bits = np.average(wrong_bits)

This should be slightly lower than what you've seen with the original bitfile in part 2.

This confirms that this countermeasure attempt has actually *increased* the side-channel leakage.

# Attempt #2

Next, we double down on our first approach to illustrate how misguided it really is: instead of doubling the target memory space, let's **quadruple** it!

We set aside the cycle 6-7 leakage for now and take a second shot at hiding the leakage at cycles 4202-4203 by attempting to better uncoupling the memory write control logic from $k$.

In attempt #1, each half of the intermediate result memories played a static role (one always held good intermediate results, the other always held unused intermediate results).

Now, we quadruple the target memory space and alter the write destination logic so that the good intermediate results can go to any of the four memory sections, and ensure that the destination memory changes at every bit of $k$, regardless of its value.

As before, we start by comparing the raw leakage:

In [None]:
change_bitfile('attempt2')

In [None]:
k = 0xffffffffffffffffffffffffffffffff00000000000000000000000000000000
attempt2traces = get_traces(1, k)

In [None]:
attempt2trace = attempt2traces[0]
attempt2avg_ones = np.zeros(samples)
for start in cycles[1:128]:
    attempt2avg_ones += attempt2trace.wave[start:start+samples]
attempt2avg_ones /= 128

attempt2avg_zeros = np.zeros(samples)
for start in cycles[128:256]:
    attempt2avg_zeros += attempt2trace.wave[start:start+samples]
attempt2avg_zeros /= 128

In [None]:
diff = figure(plot_width=2000)

attempt2diff = attempt2avg_ones - attempt2avg_zeros

compressed_attempt2diff = np.append(attempt2diff[0:15], attempt2diff[4195:])
xrange = range(len(compressed_newdiff))
diff.line(xrange, compressed_odiff, line_color="black")
diff.line(xrange, compressed_newdiff, line_color="orange", line_width=2)
diff.line(xrange, compressed_attempt2diff, line_color="red", line_width=3)

In [None]:
show(diff)

In [None]:
s = figure(plot_width=2000)

xrange = range(len(attempt2diff))
s.line(xrange, attempt2diff, line_color="red")

In [None]:
show(s)

On the first plot, we found that the leakage previously seen is somewhat reduced but still present, and stretched out over more cycles at the end.

However the second plot shows *new* strong leakage at cycles 1428-1429.

Let's carry on as we did for attempt #1 -- the only change required is `poi` as follows:

In [None]:
poi = [4199, -4200, 4201, -4202, -6, 7, -1428, 1429]

In [None]:
k = 0x0000ffffffffff000000000000ffff00aaaa0000cccc00001111000033330000
traces = get_traces(30, k)
sums = get_corrected_sums(traces, poi)

poi_init_threshold = sums[16] - (sums[16] - np.average(sums[:16]))/2
poi_reg_threshold = (np.average(sums[103:119]) - np.average(sums[56:103]))/2 + np.average(sums[56:103])
print('Init threhold: %3.2f, regular threshold: %3.2f' % (poi_init_threshold, poi_reg_threshold))
attempt2thresholds = [poi_init_threshold, poi_reg_threshold]

guess = poi_guess(sums, attempt2thresholds)
print("DoM: %s" % check_guess(guess, k)[0])

In [None]:
traces = get_traces(100, randomize_k=True)

In [None]:
wrong_bits = []
for trace in traces:
    sums = get_corrected_sums([trace], poi)
    guess = poi_guess(sums, attempt2thresholds)
    wrong_bits.append(check_guess(guess, trace.textin['k'])[1])

print('Average wrong bits per trace: %f' % np.average(wrong_bits))
print('Minimum wrong bits per trace: %f' % min(wrong_bits))
print('Maximum wrong bits per trace: %f' % max(wrong_bits))

attempt2_average_wrong_bits = np.average(wrong_bits)

You should find that the average number of wrong bits per trace is close to that of attempt #1, and perhaps less.

Clearly, this isn't working!

# Attempt #3

Instead of devising additional measures of increasing complexity to hide the leakage, we now take a completely different approach: adding "noise".

The earlier improvements are abandoned; the $k$-dependent write logic and the target memories are returned to their original state.

Instead, we add dummy logic which operates in tandem with the original leaky logic; the objective of this new logic is to add noise which hides the leakage.

We do this by instantiating additional copies of the target memories. These copies are exercised with the same control logic as the real target memories, except that an LFSR is used to pseudo-randomly enable or disable the writes. The goal is for the noise memories to be active at the same time of the leakage, but in a way that does not depend on $k$. 

Experimentally, we find that adding a single "noise" memory for each of the 3 target memories does not help much, so we crank the noise up to 16 noise memories per target memory (48 in total).

In [None]:
change_bitfile('attempt3')

Due to the LFSR, some additional setup is required for this bitfile:

In [None]:
# Initialize the LFSR:
target.fpga_write(0xe, [101,22,35,43])
target.fpga_write(0xd, [1]) 

# enable all 16 noise memories:
target.fpga_write(0x11, [0xff, 0xff])

# Noise memories are controlled by the 16-bit register at address 0x11; each bit enables one noise memory.
#target.fpga_write(0x11, [0x00, 0xff]) # enable half noise memories
#target.fpga_write(0x11, [0x00, 0x03]) # enable two noise memories

In [None]:
k = 0xffffffffffffffffffffffffffffffff00000000000000000000000000000000
attempt3traces = get_traces(1, k)

In [None]:
attempt3trace = attempt3traces[0]
attempt3avg_ones = np.zeros(samples)
for start in cycles[1:128]:
    attempt3avg_ones += attempt3trace.wave[start:start+samples]
attempt3avg_ones /= 128

attempt3avg_zeros = np.zeros(samples)
for start in cycles[128:256]:
    attempt3avg_zeros += attempt3trace.wave[start:start+samples]
attempt3avg_zeros /= 128

In [None]:
diff = figure(plot_width=2000)

attempt3diff = attempt3avg_ones - attempt3avg_zeros
compressed_attempt3diff = np.append(attempt3diff[0:15], attempt3diff[4195:])

xrange = range(len(compressed_newdiff))
diff.line(xrange, compressed_odiff, line_color="black")
diff.line(xrange, compressed_newdiff, line_color="orange", line_width=2)
diff.line(xrange, compressed_attempt2diff, line_color="red", line_width=3)
diff.line(xrange, compressed_attempt3diff, line_color="green", line_width=6)

In [None]:
show(diff)

At first glance the results are not encouraging, until you realize that when we look at the average between ones and zeros, we are comparing the average power trace segment of two groups of 128 bits from the same trace.

This averaging over 128 bits allows the new added noise to get averaged out quite well.

We hope that when we move to single-trace attacks, where each bit of k is treated individually, without the benefits of averaging, we'll see better results.

First we refresh our decision thresholds:

In [None]:
poi = [4201, -4202, -6, 7]

Then we extract the thresholds:

In [None]:
k = 0x0000ffffffffff000000000000ffff00aaaa0000cccc00001111000033330000
traces = get_traces(30, k)
sums = get_corrected_sums(traces, poi)
poi_init_threshold = sums[16] - (sums[16] - np.average(sums[:16]))/2
poi_reg_threshold = (np.average(sums[103:119]) - np.average(sums[56:103]))/2 + np.average(sums[56:103])
print('Init threhold: %3.2f, regular threshold: %3.2f' % (poi_init_threshold, poi_reg_threshold))
attempt3threshold = [poi_init_threshold, poi_reg_threshold]

guess = poi_guess(sums, attempt3threshold)
print("DoM: %s" % check_guess(guess, k)[0])

And we repeat the single-trace attack:

In [None]:
traces = get_traces(100, randomize_k=True)

wrong_bits = []
for trace in traces:
    sums = get_corrected_sums([trace], poi)
    guess = poi_guess(sums, attempt3threshold)
    wrong_bits.append(check_guess(guess, trace.textin['k'])[1])

print('Average wrong bits per trace: %f' % np.average(wrong_bits))
print('Minimum wrong bits per trace: %f' % min(wrong_bits))
print('Maximum wrong bits per trace: %f' % max(wrong_bits))

attempt3_average_wrong_bits = np.average(wrong_bits)

Finally, you should see a big jump (to around 70) in the average number of wrong bits per trace. This countermeasure works!

Summarizing the results so far:

In [None]:
print('Average number of wrong bit guesses for a single trace attack:')
print('Attempt 1: %5.1f' % attempt1_average_wrong_bits)
print('Attempt 2: %5.1f' % attempt2_average_wrong_bits)
print('Attempt 3: %5.1f' % attempt3_average_wrong_bits)

If you re-run this section with fewer noise memories enabled, you should see a corresponding decrease in the number of wrong bit guesses (don't forget to re-establish the thresholds).

With a single noise memory enabled, there should not be much difference from the results obtained from the original bitfile.

# The Hidden Number Problem

To finish, let's see how effective the attempt #3 countermeasure may be at increasing the number of traces required for recovering $k$ in a real-world attack.

We'll run this with the attempt #3 bitfile, but you can run it on the other attempts if you wish (some adjustments to the Husky segmented capture are required if you use a bitfile with different POIs; however you can use the (slightly slower) CWPRO segmented capture routine for Husky without any modifications).

First, make sure that the previous section was run with all noise memories enabled (or not, if you want to see different results!).

We'll acquire segmented traces, like we did in part 2.

In [None]:
poi = [-6, 7, 4201, -4202] # pois need to be in this particular order for the Husky timed segmented capture to work
trace_segments = get_trace_segments(N=5000, poi=poi, randomize_k=True)

In [None]:
consecutives(trace_segments=trace_segments, poi=poi, distance_threshold=0.80, thresholds=attempt3threshold) # 0.8

Some adjustment on `distance_threshold` may be required, but you should find substantially fewer good traces compared to the original target results from part 2.

# Conclusion

In this part we tried different countermeasures: some which don't work very well, and one that does. Hopefully this gave you some insight on the challenges of addressing side-channel leakage from the implementer's side.

From the attacker's side, this also showed how the attack developed in part 2 can still be effective when countermeasures are added.

In part 4 we'll look at a final countermeasure which will reveal that our target has additional side-channel leakage which we haven't leveraged yet.