# Clock Alignment for Trace Capture

This notebook shows how to shift the trace capture clock to properly capture trace data.

It only runs on the CW-Husky because it uses Husky's logic analyzer, but the same principles apply for the CW610 platform.

On the CW610, you'll have to make the phase adjustments somewhat blindly, although you can use an external logic analyzer to probe the external trace signals, and from that, infer how the clock should be shifted. While you won't be able to run this notebook on the CW610, you can still read through it to understand what needs to be done.

All this is irrelevant on the CW305 platform because no alignment is required there.

Finally, this is written for our MK82F target with the CW308, running our simpleserial-trace firmware.

Given all this, there aren't really any options left to choose:

In [None]:
TRACE_PLATFORM = 'Husky' # other platforms (CW610/CW305) are not supported
PLATFORM = 'CW308_K82F' # other targets not supported
TRACE_INTERFACE = 'parallel'
RAW_CAPTURE = True # using raw capture will make it easier to see how things are working

In [None]:
# platform setup:
SCOPETYPE = 'OPENADC'
%run "Helper_Scripts/Setup_Generic.ipynb"
scope.trace.target = target
trace = scope.trace
trace.enabled = True
scope.adc.clip_errors_disabled = True
scope.adc.lo_gain_errors_disabled = True

In [None]:
assert scope._is_husky, "This notebook is only for CW-Husky."

In [None]:
# required after programming some targets:
def target_reset():
    if TRACE_PLATFORM == 'CW610' or TRACE_PLATFORM == 'Husky':
        scope.io.nrst = 'low'
        time.sleep(0.05)
        scope.io.nrst = 'high'
        time.sleep(0.05)

In [None]:
target_reset()

In [None]:
# target info and buildtimes:
print(trace.phywhisperer_name())
print(trace.get_fw_buildtime())
if TRACE_PLATFORM == 'Husky':
    print(scope.fpga_buildtime)
else:
    print(trace.fpga_buildtime)

In [None]:
assert 'ChipWhisperer simpleserial-trace, compiled' in trace.get_fw_buildtime(), "Looks like you have the wrong firmware, please compile and program the firmware in this directory of your ChipWhisperer installation: hardware/victims/firmware/simpleserial-trace"

We set the target clock faster than the default 7.37 MHz because phase shifting the trace clock doesn't work if the target clock is slower than 10 MHz.

In [None]:
clock = 20e6
scope.clock.clkgen_freq = clock
scope.clock.adc_mul = 1
time.sleep(0.1)
assert scope.clock.pll.pll_locked == True
assert scope.clock.adc_freq == clock
target.baud = 38400 * clock / 1e6 / 7.37

Next, we set up trace in much the same way as TraceWhisperer.ipynb does. Refer to that notebook for explanations on what these commands do.

In [None]:
trace.trace_mode = 'parallel'

trace.capture.trigger_source = 'firmware trigger'
trace.capture.raw = True
trace.capture.rules_enabled = []
trace.capture.mode = 'while_trig'

trace.set_isync_matches(addr0=0x3ef0, addr1=0x3f1c, match='both')
trace.set_periodic_pc_sampling(enable=1)

sstarget = trace._ss

We'll start by using the target clock, since this works out-of-the-box on our K82 target.

This way, you can see what a capture is expected to look like.

In [None]:
trace.clock.fe_clock_src = 'target_clock'
assert trace.clock.fe_clock_alive, "Hmm, the clock you chose doesn't seem to be active."
trace.resync()

We'll also turn on the shifted trace clock, even though we're not going to use it yet:

In [None]:
trace.clock.trace_clock_shift_enable = True
trace.clock.trace_clock_set_freq(10e6)
trace.clock.trace_clock_shift_steps = 0

Next we set up Husky's logic analyzer to capture raw trace data waveforms. We're triggering the `scope.LA` capture on the falling edge of the `USERIO D4` pin which is bit 0 of the 4-bit parallel trace data bus.

(Most trace data pins would work just as well, except for bit 3 (`USERIO D7`) because it periodically toggles even when the trace bus is idle; if we triggered on bit 0, we wouldn't observe the rest of the data bus toggling.)

In [None]:
trace.clock._warning_frequency = 401e6
scope.trace.enabled = False
scope.LA.enabled = True
scope.LA.trigger_source = 'falling_userio_d4'
scope.LA.oversampling_factor = 20
scope.LA.capture_depth = 400
scope.LA.capture_group = 'internal trace 2'

In [None]:
scope.LA.arm()
cw.capture_trace(scope, sstarget, bytearray(16), bytearray(16))
assert scope.LA.fifo_empty() == False

In [None]:
raw = scope.LA.read_capture_data()
target_clk        = scope.LA.extract(raw, 0)
trace_clk_in      = scope.LA.extract(raw, 1)
trace_d0          = scope.LA.extract(raw, 3)
trace_d1          = scope.LA.extract(raw, 4)
trace_d2          = scope.LA.extract(raw, 5)
trace_d3          = scope.LA.extract(raw, 6)

In [None]:
from bokeh.plotting import figure, show
from bokeh.resources import INLINE
from bokeh.models import Span, Legend, LegendItem
from bokeh.io import output_notebook
import numpy as np
output_notebook(INLINE)

o = figure(plot_width=1800)

xrange = range(len(target_clk))
T0 = o.line(xrange, target_clk        + 10, line_color='black')
T1 = o.line(xrange, trace_clk_in      + 8, line_color='red')
T3 = o.line(xrange, trace_d0          + 6,  line_color='orange')
T4 = o.line(xrange, trace_d1          + 4,  line_color='green')
T5 = o.line(xrange, trace_d2          + 2,  line_color='brown')
T6 = o.line(xrange, trace_d3          + 0,  line_color='black')

legend = Legend(items=[
    LegendItem(label='target clock', renderers=[T0]),
    LegendItem(label='trace clock', renderers=[T1]),
    LegendItem(label='trace data[0]', renderers=[T3]),
    LegendItem(label='trace data[1]', renderers=[T4]),
    LegendItem(label='trace data[2]', renderers=[T5]),
    LegendItem(label='trace data[3]', renderers=[T6]),
])
o.add_layout(legend)

In [None]:
# add glitch markers:
def find_transitions(data, pattern):
    return [i for i in range(0,len(data)) if list(data[i:i+len(pattern)])==pattern]

data_edge = find_transitions(trace_d3[200:], [0,1])[0]
target_clock_edge = find_transitions(target_clk[200+data_edge:], [0,1])[0]
trace_clock_edge = find_transitions(trace_clk_in[200+data_edge-10:], [0,1])[0]

transitions = [data_edge+200+1, target_clock_edge+200+data_edge+1, trace_clock_edge+200+data_edge-10+1]

o.renderers.extend([Span(location=transitions[0], dimension='height', line_color='blue',  line_width=1, line_dash='dashed')])
o.renderers.extend([Span(location=transitions[1], dimension='height', line_color='green', line_width=1, line_dash='dashed')])
o.renderers.extend([Span(location=transitions[2], dimension='height', line_color='red',   line_width=1, line_dash='dashed')])

In [None]:
show(o)

The top black clock is the target clock, generated by Husky and provided to the target.

The four bottom lines are the raw DDR trace data lines (the same you would observed if you hooked up a logic analyzer on your target's trace data pins).

The red clock is the trace clock generated by the target.

The vertical blue dashed line shows when all 4 trace data lines transition from 0 to 1.

The green dashed line shows when this `TRACEDATA = 0xf` nibble would be sampled by the target clock; all is well if the target clock is used to sample the trace data.

However, the red dashed line shows the rising edge of the trace clocks are occuring *just before* the trace data change, which results in missampled data.

We'll fix this problem by shifting the trace clock.

We'll do this by sweeping the trace clock phase shift across one full trace clock period. This will let us visually identify a good phase shift setting.

We first switch to selecting the shifted trace clock as the sampling clock:

In [None]:
trace.clock.fe_clock_src = 'trace_clock'
trace.clock.trace_clock_shift_enable = True

In [None]:
STEPS = scope.LA.oversampling_factor * 2
increment = trace.clock.trace_clock_shift_range // STEPS
start = 0

import numpy as np
trace_clk_in  = np.zeros((STEPS, scope.LA.capture_depth))
trace_clk_shifted = np.zeros((STEPS, scope.LA.capture_depth))
trace_data3 = np.zeros((STEPS, scope.LA.capture_depth))
steps = []

from tqdm.notebook import tnrange

trace.clock.trace_clock_shift_steps = start

for o in tnrange(STEPS):
    steps.append(trace.clock.trace_clock_shift_steps)
    scope.LA.arm()
    cw.capture_trace(scope, sstarget, bytearray(16), bytearray(16))
    raw = scope.LA.read_capture_data()
    trace_clk_in[o]   = scope.LA.extract(raw, 1)
    trace_clk_shifted[o]  = scope.LA.extract(raw, 2)
    trace_data3[o]  = scope.LA.extract(raw, 6)
    trace.clock.trace_clock_shift_steps += increment

In [None]:
def update_plot(offset):
    S1.data_source.data['y'] = trace_clk_in[offset] + 4
    S2.data_source.data['y'] = trace_clk_shifted[offset] + 2
    S3.data_source.data['y'] = trace_data3[offset] + 0
    push_notebook()

In [None]:
from ipywidgets import interact, Layout
from bokeh.io import push_notebook
from bokeh.models import Span, Legend, LegendItem

o = 0

S = figure(plot_width=1800)

xrange = range(len(trace_clk_in[o]))
S1 = S.line(xrange, trace_clk_in[o]  + 4, line_color='red')
S2 = S.line(xrange, trace_clk_shifted[o] + 2, line_color='blue')
S3 = S.line(xrange, trace_data3[o] + 0, line_color='black')


In [None]:
show(S, notebook_handle=True)

In [None]:
interact(update_plot, offset=(0, STEPS-1))

By moving the "offset" slider across its range, you should observe the (blue) shifted trace clock travel one full period.

The interactive plot also shows one of the trace data lines for reference. Some of the trace data values vary across the captures, but this should still serve as a useful reference for picking a good phase shift.

Ideally, you want to pick an offset such that the *falling edge* of the shifted clock is roughly in the middle of the single-bit zero that's around x=360. The screen capture below illustrates this.

![offset](images/offset.png)

Play around with the interactive plot to find a good offset value for your setup.

To get the actual phase shift step value corresponding to the "offset" value on the interactive slider, use the slider value to index the `steps` array:

In [None]:
trace.clock.trace_clock_shift_steps = steps[11]

Now let's use this shifted trace clock to actually sample the trace data:

In [None]:
trace.clock.fe_clock_src = 'trace_clock'
trace.clock.trace_clock_shift_enable = True
assert trace.clock.trace_clock_shift_locked == True

Even when idle, the parallel trace data port emits periodic synchronization frames.

A successful call to `trace.resync()` indicates that Husky is able to identify these synchronization frames, so that's a good first sign that we're sampling correctly.

(Synchronization frames have a single 0 bit followed by a long string of ones, so this doesn't prove we can sample everything correctly, but it's a start.)

In [None]:
trace.resync()

Now, let's see that the trace data we're collecting can actually be parsed. Let's run an actual trace capture and see what we get.

`scope.LA` and `scope.trace` can't both be active at the same time, so we need to disable the logic analyzer.

In [None]:
scope.LA.enabled = False
trace.enabled = True

In [None]:
trace.arm_trace()
powertrace = cw.capture_trace(scope, sstarget, bytearray(16), bytearray(16))
assert powertrace is not None, 'Capture failed'

Then we read the raw trace data and segment it into raw frames.

If data was sampled correctly, you will get an output of many lines starting with:

`Pseudoframe: 03 17 (...)`

or:

`Pseudoframe: 03 08 (...)`

In [None]:
raw = trace.read_capture_data()

In [None]:
frames = trace.get_raw_trace_packets(raw, removesyncs=True, verbose=True)

Like in `TraceWhisperer.ipynb`, we use Orbuculum to parse the raw trace data. This is the test for whether the trace data was sampled correctly; if it wasn't, Orbuculum won't be able to make much sense of it:

In [None]:
# first, write out the raw trace data to a file:
trace.write_raw_capture(frames, 'raw.bin')

In [None]:
# change the path to where the orbuculum executable resides on your own system:

In [None]:
%%bash
/home/jpnewae/git/orbuculum/ofiles/orbuculum -t -f raw.bin -P -e
cat hwevent

If decoding was successful, the output above should be a fairly long list of frames.

Refer to Orbuculum documentation for more information, but basically you should see two types of entries in the output above:
1. Starts with '2': periodic PC sample; last field is the PC value
2. Starts with '8': Isync match; last field is the PC value

In our case, we should find 20 Isync match events (and lots more periodic PC events):

In [None]:
%%bash
grep -c ^8 hwevent
grep -c ^2 hwevent

Moreover, the PC value for the isync match frames should alternate between `0x00003f1c` and `0x00003ef2`:

In [None]:
%%bash
grep ^8 hwevent

If you repeat the trace capture with a `trace.clock.trace_clock_shift_steps` which is very very close to the data edge but still achieves synchronization (e.g. 10), you may find that the sample raw trace data looks quite different, and that Orbuculum has trouble decoding it.

Now let's sample more trace data to ensure it's always being sampled properly.

Unfortunately, raw trace data is non-trivial to parse. Orbuculum can do this for us, but we can't call it in a notebook loop. We can however check the first two bytes of each trace "frame" without decoding the trace data:

In [None]:
num_traces = 10
total_bytes_checked = 0

from tqdm.notebook import tnrange
for i in tnrange(num_traces):
    trace.arm_trace()
    cw.capture_trace(scope, sstarget, bytearray(16), bytearray(16))
    raw = trace.read_capture_data()
    frames = trace.get_raw_trace_packets(raw, removesyncs=True, verbose=False)
    for f in frames:
        assert f[1][0] == 0x03, "Expected 0x03, got 0x%0x" % f[1][0]
        assert f[1][1] in [0x17,0x08], "Expected 0x17 or 0x08, got 0x%0x" % f[1][1]
        total_bytes_checked += 2

print("Number of trace bytes checked: %d" % total_bytes_checked)

To check all the trace data, we need to use Orbuculum.

We'll change the capture a bit to maximimize how much trace data we can obtain and check in a single execution.

First, we'll set Husky to capture *all* trace events:

In [None]:
trace.capture.mode = 'count_writes'
trace.capture.count = 0

Then, we'll have the target run 500 back-to-back AES encryptions.

We'll disable the periodic PC sampling, so that the only trace data we see are the two specific PC matches (isync frames).

Husky's trace module will capture all the trace data it sees until it runs out of storage space.

In [None]:
segments = 500
trace.set_periodic_pc_sampling(enable=0)
trace.arm_trace()
target.set_key(bytearray(16))
target.simpleserial_write('n', int.to_bytes(segments, length=2, byteorder='little'))
scope.arm()
target.simpleserial_write('f', bytearray(16))
ret = scope.capture()
raw = trace.read_capture_data()
frames = trace.get_raw_trace_packets(raw, removesyncs=True, verbose=False)

Then we call Orbuculum to decode the trace data. We expect only two types of frames, so it's easy to check whether all of the trace data was sampled correctly.

In [None]:
# first, write out the raw trace data to a file:
trace.write_raw_capture(frames, 'raw.bin')

In [None]:
%%bash
/home/jpnewae/git/orbuculum/ofiles/orbuculum -t -f raw.bin -P -e
ls -l hwevent

In [None]:
parsed_trace_frames = open('hwevent', 'r')

In [None]:
errors = 0
for frame in parsed_trace_frames:
    frame = frame.strip()
    if frame not in ['8,21,0x00003ef2', '8,21,0x00003f1c']:
        print("Got unexpected frame: %s" % frame)
        errors += 1
parsed_trace_frames.close()

Now count how many raw trace bytes we've checked:

In [None]:
total_bytes_checked = 0
for f in frames:
    total_bytes_checked += len(f[1])
print('Total raw frame bytes checked: %d' % total_bytes_checked)

If you want to further increase your confidence in Husky's trace sampling, go back and re-run the last few cells (starting from `segments = 500...`) as many times as you'd like.