# Introduction to Debug Tracing for Side-Channel Analysis

This notebook shows some of the things you can do with "TraceWhisperer".

TraceWhisperer pairs Arm trace debug with ChipWhisperer.

If you have a ChipWhisperer Husky, then TraceWhisperer is included.

If you have a CW-lite or CW-pro, then you'll need a CW610 (PhyWhisperer) for trace collection.

Or, if you have a CW305 FPGA target board, then we have a target bitfile which co-locates the TraceWhisperer functionality direwctly on the target FPGA.

If you're not using the CW305 target, then supported targets are:
* CW308 multi-target board with the MK82F target
* CW308 with an STM32 target
* CWLITEARM

It should be possible to use any other target with an exposed trace interface, but the above targets will work as-is.

This notebook tries to highlight some of the many things that can be done with trace, and so it is not meant to be run "straight through". Read the instructions carefully!

Set the following defines as per your setup, but note that the CW305 platform only supports the parallel trace interface.

In [None]:
TRACE_PLATFORM = 'CW610' # AKA PhyWhisperer
#TRACE_PLATFORM = 'CW305' # CW305 FPGA target board
#TRACE_PLATFORM = 'Husky'

#PLATFORM = 'CW305'
#PLATFORM = 'CWLITEARM'
#PLATFORM = 'CW308_STM32F3'
PLATFORM = 'CW308_K82F'

TRACE_INTERFACE = 'parallel'
#TRACE_INTERFACE = 'swo'

#RAW_CAPTURE = True
RAW_CAPTURE = False

In [None]:
import chipwhisperer as cw

In [None]:
# platform setup:
if TRACE_PLATFORM == 'Husky':
    SCOPETYPE = 'OPENADC'
    %run "Helper_Scripts/Setup_Generic.ipynb"
    scope.trace.target = target
    trace = scope.trace
    # TODO! set scope.clock

elif TRACE_PLATFORM == 'CW610':
    from chipwhisperer.capture.trace.TraceWhisperer import TraceWhisperer
    SCOPETYPE = 'OPENADC'
    %run "Helper_Scripts/Setup_Generic.ipynb"
    trace = TraceWhisperer(target, scope, force_bitfile=True)
    scope.clock.adc_src = "clkgen_x4"

else:
    from chipwhisperer.capture.trace.TraceWhisperer import TraceWhisperer
    %run "Helper_Scripts/Setup_CW305_DST.ipynb"
    trace = TraceWhisperer(target, scope)
    
if TRACE_PLATFORM == 'Husky':
    scope.adc.samples = 31000
else:
    scope.adc.samples = 24400

trace.enabled = True

In [None]:
if TRACE_PLATFORM == 'CW305':
    scope.gain.setGain(30)
elif TRACE_PLATFORM == 'Husky':
    scope.gain.db = 12
elif PLATFORM == 'CW308_K82F':
    scope.gain.setGain(20)
elif PLATFORM == 'CW308_STM32F3':
    scope.gain.setGain(25)
elif PLATFORM == 'CWLITEARM':
    scope.gain.setGain(25)


### Program STM32 target:

If you're using the K82 target, you'll need an external programmer.

In [None]:
if (PLATFORM == 'CW308_STM32F3') or (PLATFORM == 'CWLITEARM'):
    fw_path = '../../cw_develop/hardware/victims/firmware/simpleserial-trace/simpleserial-trace-CW308_STM32F3.hex'
    prog = cw.programmers.STM32FProgrammer
    cw.program_target(scope, prog, fw_path)

In [None]:
# required after programming some targets:
def target_reset():
    if TRACE_PLATFORM == 'CW610' or TRACE_PLATFORM == 'Husky':
        scope.io.nrst = 'low'
        time.sleep(0.05)
        scope.io.nrst = 'high'
        time.sleep(0.05)

In [None]:
target_reset()

In [None]:
# target info and buildtimes:
print(trace.phywhisperer_name())
print(trace.get_fw_buildtime())
if TRACE_PLATFORM == 'Husky':
    print(scope.fpga_buildtime)
else:
    print(trace.fpga_buildtime)

In [None]:
if PLATFORM != 'CW305':
    assert 'ChipWhisperer simpleserial-trace, compiled' in trace.get_fw_buildtime(), "Looks like you have the wrong firmware, please compile and program the firmware in this directory of your ChipWhisperer installation: hardware/victims/firmware/simpleserial-trace"

## Set the trace operation mode.

Arm processor can output trace data on a parallel trace port or on a serial SWO pin.

Parallel trace mode operation is pretty straightforward. SWO is a bit more complicated to set up.

First, Arm processors which support JTAG and SWD come out of reset in JTAG mode. In order to get trace data out of the SWO pin, we need to switch it over to SWD mode.

The `jtag_to_swd()` call below runs a special sequence on the TMS and TCK pins to do this switchover. However, different processors (such as the STM32) may have *additional* requirements to enable the SWO pin. The `simpleserial-trace` firmware handles this for the STM32. Other targets may have their own requirements. One sure-fire way to get a target into SWD mode is to use an external debugger. In that case, do not call `jtag_to_swd()`, as this could result in contention on the TMS/TCK pins.

Unless you're on the CW305 platform, you'll need some jumper cables.

For parallel trace, you need to connect the target's trace pins to the PhyWhisperer D[4:7] and CK pins.

For SWO trace, you need to connect the target's TMS/TCK/TDO pins to the PhyWhisperer D0/D1/D2 pins.

If you wish to use the target clock (recommended!), you must connect it to HS2 on the 20-pin connector.

If you wish to use the target trigger, you must connect it to PC on the 20-pin connector.

Here is the setup for parallel trace mode.

Trace data can be captured with either the trace clock or the target clock. There are no advantages or disadvantages to either, so use whichever is available.

However on CW305, the target clock must be used (the reasons are technical and have to do with the internal DDR trace data bus; using the target clock avoids this).

In [None]:
if TRACE_INTERFACE == 'parallel':
    trace.clock.fe_clock_src = 'target_clock'
    #trace.clock.fe_clock_src = 'trace_clock'
    assert trace.clock.fe_clock_alive, "Hmm, the clock you chose doesn't seem to be active."
    trace.trace_mode = 'parallel'

For SWO mode, the target clock is the best choice; if it's not available, you can use the USB clock, but since this is not synchronous to the target clock, the timing of your trace measurements will have more jitter.

SWO setup is a bit more complicated, and you need to understand a bit of how SWO data can be generated.

The target's `TPI.ACPR` register determines the length of an SWO data bit: (`TPI.ACPR` + 1) target clock cycles. A value of zero gives the highest bandwidth. Positive integers are also allowed.

In general, 0 is the better choice for what we do with trace, but if you run into problems with recovering the trace data, try a higher value.

This covers the SWO trace generation side. On the recovery side, we need a clock which is some multiple of the SWO baud rate. We use the clock chosen by `trace.clock.fe_clk_src`; we multiply this by some integer, then set `trace.swo_div` to tell the FPGA how many clocks cycles there are per SWO bit.

In [None]:
if TRACE_INTERFACE == 'swo':
    assert TRACE_PLATFORM == 'CW610' or TRACE_PLATFORM == 'Husky', "Not supported :-("
    trace.clock.fe_clock_src = 'target_clock'
    #trace.clock.fe_clock_src = 'usb_clock'
    assert trace.clock.fe_clock_alive, "Hmm, the clock you chose doesn't seem to be active."
    trace.trace_mode = 'SWO'
    trace.jtag_to_swd() # switch target into SWO mode

    # Now the complicated bit:
    acpr = 0
    trigger_freq_mul = 8
    trace.clock.swo_clock_freq = scope.clock.clkgen_freq * trigger_freq_mul
    trace.target_registers.TPI_ACPR = acpr
    trace.swo_div = trigger_freq_mul * (acpr + 1)
    assert trace.clock.swo_clock_locked, "Trigger/UART clock not locked"
    assert scope.userio.status & 0x4, "SWO line not high"

#### Check that the target is alive.
If `get_fw_buildtime()` produces no output, the target may have become unresponsive after the above changes; it may simply require a reset.

In [None]:
print(trace.get_fw_buildtime())

### Disable sync frames for SWO:

By default, periodic sync frames are emitted every 16 million clock cycles. If you're bringing up an SWO target for the first time, this is helpful to confirm that the link is "alive".
However these sync frames will delay the trace events that we care about if they occur during our trace capture, so it's best to disable them.

Sync frames on the parallel trace port cannot be disabled.

In [None]:
if TRACE_INTERFACE == 'swo':
    trace.target_registers.DWT_CTRL = 0x40000021

### Trigger trace capture from target FW:

In [None]:
#trace.use_soft_trigger()
trace.capture.trigger_source = 'firmware trigger'

### What to capture:
There are two trace capture modes:
1. Raw mode captures the complete raw trace data.
2. Non-raw mode captures only matching rule IDs. To use this, set up some pattern match rules (see below); only the ID of the matching rule will be captured.

In [None]:
if RAW_CAPTURE:
    trace.capture.raw = True
    trace.capture.rules_enabled = []

else:
    trace.capture.raw = False

    # match on any PC match (isync) trace packet:
    trace.set_pattern_match(0, [3, 8, 32, 0, 0, 0, 0, 0], [255, 255, 255, 0, 0, 0, 0, 0])

    # match on anything:
    #trace.set_pattern_match(0, [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0])

    # enable matching rule:
    trace.capture.rules_enabled = [0]

### If we set a pattern match rule, it can be used to trigger trace capture. Alternatively, trace capture is initiated by the target FW's soft trigger:

Be aware that the pattern match is not a stable trigger source; expect jitter up to 6 clock cycles. See the DesignStartTrace [README](https://github.com/newaetech/DesignStartTrace/blob/master/README.md) for more information on jitter.

In [None]:
trace.capture.trigger_source = 'firmware trigger'
#trace.capture.trigger_source = 0 # match using pattern rule #0

### How long to capture for:
By default, trace data is captured while the target's trigger line is high. This is probably what you want to do (unless your target doesn't drive the trigger line, e.g. you're using a trace pattern match as the trigger).

Alternatively, you can manually specify how many cycles or events the capture should last.

In [None]:
trace.capture.mode = 'while_trig'
#trace.capture.mode = 'count_writes'
#trace.capture.count = 500

### Set PC addresses to match on:
Let's use the start of the `SubBytes()` and `AddRoundKey()` functions.

This will set the target's `DWT.COMP0`, `DWT.COMP1`, and `ETM.TEEVR` registers.

If you recompile, adjust accordingly.

In [None]:
if TRACE_PLATFORM == 'CW305':
    trace.set_isync_matches(addr0=0x3bc0, addr1=0x3aa8, match='both')
elif PLATFORM == 'CWLITEARM' or PLATFORM == 'CW308_STM32F3':
    trace.set_isync_matches(addr0=0x08001858, addr1=0x08001820, match='both')
elif PLATFORM == 'CW308_K82F':
    trace.set_isync_matches(addr0=0x3e84, addr1=0x3eb0, match='both')

### Enable or disable periodic PC sampling:

This can also be done directly via the `DWT.CTRL` register; by using this method, PC sampling is turned on at trigger time to ensure that the capture doesn't start in the middle of a trace frame, which would prevent automatic parsing.

In [None]:
trace.set_periodic_pc_sampling(enable=0)

# Capture power and debug trace:

In [None]:
if TRACE_PLATFORM == 'CW610':
    print("*** Don't forget the jumper cable from CW308 GPIO4/TRIG pin to PhyWhisperer PC pin on side connector! ***")

In [None]:
sstarget = trace._ss

In [None]:
# force resynchronization, ensure we are sync'd:
if TRACE_INTERFACE == 'parallel':
    trace.resync()

In SWO mode, trace data can be clocked by either the target clock or the 96MHz USB clock. You should use the target clock unless it's not available, so that the collected trace data is synchronous with the target clock.

In parallel mode, there is additionally the option of using the trace clock. This works with the K82F/CW610 combo, but on Husky the trace data tends to get mis-sampled. The same may occur with different targets. The target clock seems to work better. (In the future, a programmable delay for the trace data lines feature will be added to Husky, so that mis-sampling problems can be resolved.)

In [None]:
# arm trace sniffer:
trace.arm_trace()

In [None]:
from tqdm.notebook import tnrange
import numpy as np

ktp = cw.ktp.Basic()

powertraces = []
num_traces = 1

for i in tnrange(num_traces, desc='Capturing traces'):
    key, text = ktp.next()  # manual creation of a key, text pair can be substituted here
    powertrace = cw.capture_trace(scope, sstarget, text, key)
    if powertrace is None:
        continue
    powertraces.append(powertrace)

#Convert traces to numpy arrays
trace_array = np.asarray([trace.wave for trace in powertraces])  # if you prefer to work with numpy array for number crunching
textin_array = np.asarray([trace.textin for trace in powertraces])
known_keys = np.asarray([trace.key for trace in powertraces])  # for fixed key, these keys are all the same

### Read the raw trace data:

In [None]:
raw = trace.read_capture_data()

### If we captured raw data, parse out raw 'frames' from it:
This will *not* work if you used `trace.fpga_write(trace.REG_CAPTURE_RAW, [0])`!

When using the parallel trace port, the sync frames are used as frame delimiters.

In [None]:
if RAW_CAPTURE:
    if TRACE_INTERFACE == 'parallel':
        frames = trace.get_raw_trace_packets(raw, removesyncs=True, verbose=True)
    else:
        frames = trace.get_raw_trace_packets(raw, removesyncs=False, verbose=True)
        
else:
    times = trace.get_rule_match_times(raw, rawtimes=False, verbose=True)

For rule-based capture, we can get more information about the capture:

- `trace.capture.matched_pattern_data` shows the actual trace data which last matched one of the match rules
- `trace.capture.matched_pattern_counts` shows how many times each rule was matched

In [None]:
if not RAW_CAPTURE:
    print(trace.capture.matched_pattern_data)
    print(trace.capture.matched_pattern_counts)

# Parse the raw trace data with Orbuculum:
For the case where `REG_CAPTURE_RAW = 1` only.

Orbuculum allows you to make sense out of the cryptic TPIU-encoded trace data. It can be installed from: https://github.com/orbcode/orbuculum

In [None]:
# first, write out the raw trace data to a file:
trace.write_raw_capture(frames, 'raw.bin')

In [None]:
# change the path to where the orbuculum executable resides on your own system:

In [None]:
%%bash
/home/jpnewae/git/orbuculum/ofiles/orbuculum -t -f raw.bin -P -e
cat hwevent

Refer to Orbuculum documentation for more information, but for the example shown here you'll get two types of entries out of Orbuculum:
1. Starts with '2': periodic PC sample; last field is the PC value
2. Starts with '8': Isync match; last field is the PC value

The middle field is the timestamp inferred by Orbuculum, which is inaccurate here since TraceWhisperer strips out most of the sync frames for storage efficiency and records its own timestamps instead.

# Plotting Example
For the code below, go back above and re-run a trace capture with non-raw capture mode, using one or two PC addresses that are of interest to you.
Skip over the Orbuculum cells since we aren't capturing raw trace packets.

The default PC match values, for the target executable in the repository, are the start of the `SubBytes()` and `MixColumns()` functions.

The code below overlays black vertical lines on top of the power trace, for each rule match event.

Note that 18 matches are obtained (not 20) because the last round uses a different code path.

In order to overlay the power and debug trace data, we must match their timescales. `multiplier` expresses the ratio of the power trace sampling rate to the debug trace sampling rate. First, we account for the x1 or x4 power trace sampling rate.

Then, we account for the debug trace sampling rate. In parallel trace mode this is straightforward -- the debug trace rate is equal to the target processor speed.

In SWO mode, there is another factor to account for: the debug trace sampling rate and target clock rate can be different.

See https://github.com/newaetech/DesignStartTrace/blob/master/hardware/tracewhisperer/clocks.md for details on what's happening here.

In [None]:
if scope._is_husky:
    multiplier = scope.clock.adc_mul
elif scope.clock.adc_src == 'clkgen_x4' or scope.clock.adc_src == 'extclk_x4':
    multiplier = 4
else:
    multiplier = 1

if TRACE_INTERFACE == 'swo':
    #multiplier /= trace.swo_target_clock_ratio
    #multiplier //= (trace.clock.fe_freq / scope.clock.clkgen_freq)
    pass

In [None]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.resources import INLINE
from bokeh.models import Span

output_notebook(INLINE)
p = figure(plot_width=1200)

xrange = range(len(powertraces[0].wave))
p.line(xrange, powertraces[0].wave, line_color="red")

vlines = []
for t in times:
    vlines.append(Span(location=t[0]*multiplier, dimension='height', line_color='black', line_width=2))
p.renderers.extend(vlines)

In [None]:
show(p)

# Next steps:

1. The [pc_sample_annotate.ipynb](pc_sample_annotate.ipynb) notebook shows an example of something else you can do with trace that's pretty neat: annotating a power waveform with the functions being executed.
2. The [uecc.ipynb](https://github.com/newaetech/chipwhisperer-jupyter/blob/master/demos/uecc.ipynb) notebook shows how trace can be used to help execute a side-channel attack on a software ECC target.