# Introduction to Debug Tracing for Side-Channel Analysis

Setup depends on the trace platform, target platform, and trace interface, so make the correct choices below.

In addition to ChipWhisperer capture platform (pro or lite), you need either:
* CW305
* CW610 (PhyWhisperer); in this case you also need one of:
    * CW308 multi-target board with the MK82F target
    * CW308 with an STM32 target
    * CWLITEARM

This notebook tries to highlight some of the many things that can be done with trace, and so it is not meant to be run "straight through". Read the instructions carefully!

In [None]:
TRACE_PLATFORM = 'CW610' # AKA PhyWhisperer
#TRACE_PLATFORM = 'CW305' # CW305 FPGA target board

#PLATFORM = 'CW305'
#PLATFORM = 'CWLITEARM'
#PLATFORM = 'CW308_STM32F3'
PLATFORM = 'CW308_K82F'

#TRACE_INTERFACE = 'parallel'
TRACE_INTERFACE = 'swo'

In [None]:
import chipwhisperer as cw
from chipwhisperer.capture.trace.TraceWhisperer import TraceWhisperer

In [None]:
##### TODO: point to standard bitfile and defines ########
defines = ['../hardware/CW305_DesignStart/hdl/defines_trace.v', '../hardware/phywhisperer/software/phywhisperer/firmware/defines_pw.v']

In [None]:
!ls -l ../hardware/tracewhisperer/vivado/tracewhisperer.runs/impl_no_ilas/*bit

In [None]:
# platform setup:
if TRACE_PLATFORM == 'CW610':
    SCOPETYPE = 'OPENADC'
    %run "Helper_Scripts/Setup_Generic.ipynb"
    trace = TraceWhisperer(target, scope, force_bitfile=False, defines_files=defines)
    #trace = TraceWhisperer(target, scope, force_bitfile=True, bs='../hardware/tracewhisperer/vivado/tracewhisperer.runs/impl_no_ilas/tracewhisperer_top.bit', defines_files=defines)
    # on this platform, minimum trace frequency is 10 MHz, so minimum target frequency is twice that; increase baud rate accordingly:
    #scope.clock.clkgen_freq = 20e6
    #target.baud = 104000
    scope.clock.adc_src = "clkgen_x4"
    if PLATFORM == 'CWLITEARM':
        scope.adc.samples = 24400
    else:
        scope.adc.samples = 30000
    scope.gain.setGain(20)

else:
    %run "Helper_Scripts/Setup_CW305_DST.ipynb"
    scope.adc.samples = 35000
    trace = TraceWhisperer(target, scope, defines_files=defines)

In [None]:
trace

In [None]:
trace.target_registers

In [None]:
type(trace.target_registers.DWT_CTRL)

In [None]:
trace.target_registers.DWT_CTRL = '40000021'

In [None]:
trace.target_registers.set('DWT_CTRL', '4000007f')

In [None]:
hex(trace.target_registers.DWT_CTRL)

In [None]:
trace.target_registers.regs['DWT_CTRL']

In [None]:
'%02x' % 4

In [None]:
trace.ARM_debug_registers.DWT_CTRL

In [None]:
trace.fpga_write(trace.REG_FE_CLOCK_SEL, [0])

In [None]:
#%run "Helper_Scripts/Setup_Generic.ipynb"
#trace = TraceWhisperer(target, scope, force_bitfile=False, defines_files=defines)

In [None]:
trace.fpga_buildtime

In [None]:
trace

In [None]:
trace.fe_clock_alive

In [None]:
trace.fe_freq

In [None]:
trace.trigger_freq = 7384586*4

In [None]:
scope.clock.clkgen_freq

In [None]:
scope.clock

In [None]:
# TODO: set gain appropriately for each target/platform
if TRACE_PLATFORM == 'CW305':
    scope.gain.setGain(30)
elif PLATFORM == 'CW308_K82F':
    scope.gain.setGain(20)
elif PLATFORM == 'CW308_STM32F3':
    scope.gain.setGain(25)
elif PLATFORM == 'CWLITEARM':
    scope.gain.setGain(25)

### Program STM32 target:

If you're using the K82 target, you'll need an external programmer.

In [None]:
if (PLATFORM == 'CW308_STM32F3') or (PLATFORM == 'CWLITEARM'):
    fw_path = '../../cw_dev3/hardware/victims/firmware/simpleserial-trace/simpleserial-trace-CW308_STM32F3.hex'
    prog = cw.programmers.STM32FProgrammer
    cw.program_target(scope, prog, fw_path)

In [None]:
# required after programming some targets:
def target_reset():
    if TRACE_PLATFORM == 'CW610':
        scope.io.nrst = 'low'
        time.sleep(0.05)
        scope.io.nrst = 'high'
        time.sleep(0.05)

In [None]:
target_reset()

In [None]:
# target info and buildtimes:
print(trace.get_target_name())
print(trace.get_fw_buildtime())
print(trace.get_fpga_buildtime())

In [None]:
scope.clock

In [None]:
trace

In [None]:
trace.fe_clock_alive

## Set trace or SWO operation mode:

Trace mode operation is pretty straightforward. SWO is a bit more complicated to set up - mostly because there are more knobs to tune.

First, Arm processors which support JTAG and SWD come out of reset in JTAG mode. In order to get trace data out of the SWO pin, we need to switch it over to SWD mode.

The `jtag_to_swd()` call below runs a special sequence on the TMS and TCK pins to do this switchover. However, different processors (such as the STM32) may have *additional* requirements to enable the SWO pin. The `simpleserial-trace` firmware handles this for the STM32. Other targets may have their own requirements. One sure-fire way to get a target into SWD mode is to use an external debugger. In that case, do not call `jtag_to_swd()`, as this could result in contention on the TMS/TCK pins.

Additionally, there are lots of knobs in setting the SWO bit rate. Sensible default settings are used here, but if you want to modify them, you'll first have to understand what the knobs do. The variables at play are:
- the target clock
- the TPI.ACPR register, which defines the number of clock cycles per SWO bit
- the CW610's internal UART, which runs at 192 MHz and has a configurable number of clock cycles per SWO bit.

Look at TraceWhisperer.py's `set_trace_mode()` to see how it's all done. One thing to understand is that the target clock is determined by the `swo_div` parameter.

In [None]:
# NEW target-clocked SWO setting. This assumes trigger_clock = 2x target clock.

# inputs:
swo_div = 8
acpr = 1
new_target_clock = 10e6

trace.swo_mode = True
# Next we set the target clock and update CW baud rate accordingly:
#new_target_clock = int(target._uart_clock / (swo_div * (acpr+1)))
scope.clock.clkgen_freq = new_target_clock
target.baud = int(trace._base_baud * (new_target_clock/trace._base_target_clock))

# K82 needs this after changing clocks:
target_reset()

In [None]:
trace.set_reg('TPI_SPPR', '00000002')
trace.set_reg('TPI_ACPR', '%08x' % acpr)
trace.fpga_write(trace.REG_SWO_BITRATE_DIV, [swo_div-1]) # not a typo: hardware requires -1; doing this is easier than fixing the hardware
trace.fpga_write(trace.REG_SWO_ENABLE, [1])

In [None]:
trace.jtag_to_swd()

In [None]:
trace

In [None]:
#trace.set_trace_mode('trace')

In [None]:
if TRACE_INTERFACE == 'parallel':
    if TRACE_PLATFORM == 'CW610':
        print("*** Don't forget the jumper cables from the target's trace pins to the PhyWhisperer D[4:7] and CK pins ***")
    trace.set_trace_mode('trace')
    if TRACE_PLATFORM == 'CW610': # here the target clock must be at least 20 MHz, otherwise the CW610's PLL may fail to lock:
        scope.clock.clkgen_freq = 20e6
        target.baud = 104000
else:
    print("*** Don't forget the jumper cables from the target's TMS/TCK/TDO pins to the PhyWhisperer D0/D1/D2 pins ***")
    trace.set_trace_mode('swo', swo_div=8, acpr=0)
    trace.jtag_to_swd()

trace.check_clocks()

In [None]:
scope.clock.reset_adc()
assert (scope.clock.adc_locked), "ADC failed to lock"

#### Check that the target is alive.
If `get_fw_buildtime()` produces no output, the target may have become unresponsive after the above changes; it may simply require a reset.

In [None]:
print(trace.get_fw_buildtime())

In [None]:
#trace.jtag_to_swd()
#trace.set_trace_mode('swo', swo_div=8, acpr=0)

# reset if needed
#target_reset()

### Disable sync frames for SWO:

By default, periodic sync frames are emitted every 16 million clock cycles. If you're bringing up an SWO target for the first time, this is helpful to confirm that the link is "alive".
However these sync frames will delay the trace events that we care about if they occur during our trace capture, so it's best to disable them.

Sync frames on the parallel trace port cannot be disabled.

In [None]:
trace.ARM_debug_registers._set_reg('DWT_CTRL', '40000021')

In [None]:
trace.ARM_debug_registers

In [None]:
[None]*10

In [None]:
#trace.set_reg('TPI_SPPR', '00000002')

In [None]:
trace.get_reg('TPI_SPPR')

### Trigger trace capture from target FW:

In [None]:
trace.use_soft_trigger()

### What to capture:
There are two trace capture modes:
1. Raw mode captures the complete raw trace data.
2. Non-raw mode captures only matching rule IDs. To use this, set up some pattern match rules (see below); only the ID of the matching rule will be captured.

In [None]:
trace.fpga_write(trace.REG_CAPTURE_RAW, [1])

### Alternatively, set a pattern matching rule and capture only rule match IDs:

In [None]:
trace.fpga_write(trace.REG_CAPTURE_RAW, [0])

# match on any PC match (isync) trace packet:
trace.set_pattern_match(0, [3, 8, 32, 0, 0, 0, 0, 0], [255, 255, 255, 0, 0, 0, 0, 0])

# match on anything:
#trace.set_pattern_match(0, [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0])

# enable matching rule:
trace.fpga_write(trace.REG_PATTERN_ENABLE, [1])

### Optionally, a pattern matching rule can be used to trigger trace capture (instead of the target FW soft trigger):

Be aware that this is not a stable trigger, expect jitter up to 6 clock cycles. See the DesignStartTrace [README](https://github.com/newaetech/DesignStartTrace/blob/master/README.md) for more information on jitter.

In [None]:
trace.use_trace_trigger(rule=0)

### How long to capture for:
By default, trace data is captured while the target's trigger line is high. This is probably what you want to do (unless your target doesn't drive the trigger line, e.g. you're using a trace pattern match as the trigger).

Alternatively, you can manually specify how many cycles or events the capture should last.

In [None]:
trace.set_capture_mode('while_trig') # capture as long as the target trigger pin is high
#trace.set_capture_mode('count_cycles', 24000) # capture for 24000 clock cycles
#trace.set_capture_mode('count_writes', 50000) # capture 500 events (combination of raw trace bytes and timestamps)

### Set PC addresses to match on:
Let's use the start of the `SubBytes()` and `MixColumns()` functions.

This will set the target's `DWT.COMP0`, `DWT.COMP1`, and `ETM.TEEVR` registers.

In [None]:
if TRACE_PLATFORM == 'CW305':
    trace.set_isync_matches(addr0=0x3bc0, addr1=0x3aa8, match='both')
elif PLATFORM == 'CWLITEARM' or PLATFORM == 'CW308_STM32F3':
    trace.set_isync_matches(addr0=0x08001728, addr1=0x08001736, match='both')
elif PLATFORM == 'CW308_K82F':
    #trace.set_isync_matches(addr0=0x1d70, addr1=0x1d7c, match='both')
    trace.set_isync_matches(addr0=0x3e84, addr1=0x3eb0, match='both')

In [None]:
trace.get_reg('DWT_COMP0'), trace.get_reg('DWT_COMP1')

### Enable or disable periodic PC sampling:

In [None]:
trace.set_periodic_pc_sampling(enable=0)

# Capture power and debug trace:

In [None]:
if TRACE_PLATFORM == 'CW610':
    print("*** Don't forget the jumper cable from CW308 GPIO4/TRIG pin to PhyWhisperer PC pin on side connector! ***")

In [None]:
if TRACE_PLATFORM == 'CW610':
    sstarget = target
else:
    sstarget = trace._ss

In [None]:
# force resynchronization, ensure we are sync'd:
if TRACE_INTERFACE == 'parallel':
    trace.resync()

In [None]:
trace

In [None]:
trace.fpga_read(trace.REG_LED_SELECT, 1)[0]

In [None]:
trace.fpga_read(trace.REG_FE_CLOCK_SEL, 1)[0]

In [None]:
trace.fpga_write(trace.REG_FE_CLOCK_SEL, [0])

In [None]:
trace.check_clocks()

In [None]:
trace.fpga_write(trace.REG_LED_SELECT, [1])

In [None]:
# arm trace sniffer:
trace.arm_trace()

In [None]:
trace.fpga_read(trace.REG_STAT, 1)[0]

In [None]:
scope.adc.samples = 24400

In [None]:
from tqdm.notebook import tnrange
import numpy as np

ktp = cw.ktp.Basic()

powertraces = []
num_traces = 1

for i in tnrange(num_traces, desc='Capturing traces'):
    key, text = ktp.next()  # manual creation of a key, text pair can be substituted here
    powertrace = cw.capture_trace(scope, sstarget, text, key)
    if powertrace is None:
        continue
    powertraces.append(powertrace)

#Convert traces to numpy arrays
trace_array = np.asarray([trace.wave for trace in powertraces])  # if you prefer to work with numpy array for number crunching
textin_array = np.asarray([trace.textin for trace in powertraces])
known_keys = np.asarray([trace.key for trace in powertraces])  # for fixed key, these keys are all the same

### Read the raw trace data:

In [None]:
raw = trace.read_capture_data()
len(raw)

In [None]:
trace.fpga_read(trace.REG_STAT, 1)[0]

In [None]:
raw[:20]

In [None]:
trace

In [None]:
trace.errors = None

In [None]:
trace.check_clocks()

### If we captured raw data, parse out raw 'frames' from it:
This will *not* work if you used `trace.fpga_write(trace.REG_CAPTURE_RAW, [0])`!

When using the parallel trace port, the sync frames are used as frame delimiters.

In [None]:
if TRACE_INTERFACE == 'parallel':
    frames = trace.get_raw_trace_packets(raw, removesyncs=True, verbose=True)
else:
    frames = trace.get_raw_trace_packets(raw, removesyncs=False, verbose=True)

#### Each entry of `frames` contains a timestamp (# of clock cycles elapsed since trigger) and a payload:
This only works in parallel trace mode because we can use the sync frames as frame delimiters. But fear not, Orbuculum can parse the trace data either way (a few cells down).

In [None]:
frames = trace.get_raw_trace_packets(raw, removesyncs=True, verbose=True)

In [None]:
if TRACE_INTERFACE == 'parallel':
    frames[:3]

### If we captured matching rule events, this will list timestamped matching rule IDs:
This will *not* work if you used `trace.fpga_write(trace.REG_CAPTURE_RAW, [1])`!

In [None]:
times = trace.get_rule_match_times(raw, rawtimes=False, verbose=True)

In [None]:
len(times)

# Parse the raw trace data with Orbuculum:
For the case where `REG_CAPTURE_RAW = 1` only.

Orbuculum allows you to make sense out of the cryptic TPIU-encoded trace data. It can be installed from: https://github.com/orbcode/orbuculum

In [None]:
# first, write out the raw trace data to a file:
trace.write_raw_capture(frames, 'raw.bin')

In [None]:
# change the path to where the orbuculum executable resides on your own system:

In [None]:
%%bash
/home/jpnewae/git/orbuculum/ofiles/orbuculum -t -f raw.bin -P -e
cat hwevent

In [None]:
trace

In [None]:
trace.swo_divv = 10

In [None]:
8*2.5

In [None]:
def mul_data(mul):
    muldiv2 = int(mul/2)
    lo = muldiv2
    if mul%2:
        hi = lo+1
    else:
        hi = lo
    if hi >= 2**6:
        raise ValueError("Internal error: calculated hi/lo value exceeding range")
    raw = lo + (hi<<6) + 0x1000
    return(hex(raw))


In [None]:
trace.mmcm.get_main_div(), trace.mmcm.get_sec_div(), trace.mmcm.get_mul()

In [None]:
scope.clock.clkgen_freq

In [None]:
trace.trigger_freq

In [None]:
trace.trigger_freq = 100e6

In [None]:
trace.check_clocks()

In [None]:
trace.mmcm.set_sec_div(49)

In [None]:
trace.mmcm.set_main_div(1)

In [None]:
mul_data(46), mul_data(48)

In [None]:
trace.mmcm.set_mul(48)

In [None]:
target.baud

In [None]:
trace.get_reg('TPI_ACPR')

In [None]:
trace.check_clocks()

In [None]:
trace.fpga_read(trace.REG_MMCM_LOCKED, 1)[0]

In [None]:
scope.clock

Refer to Orbuculum documentation for more information, but for the example shown here you'll get two types of entries out of Orbuculum:
1. Starts with '2': periodic PC sample; last field is the PC value
2. Starts with '8': Isync match; last field is the PC value

The middle field is the timestamp inferred by Orbuculum, which is inaccurate here since TraceWhisperer strips out most of the sync frames for storage efficiency and records its own timestamps instead.

# Plotting Example
For the code below, go back above and re-run a trace capture with non-raw capture mode, using one or two PC addresses that are of interest to you.
Skip over the Orbuculum cells since we aren't capturing raw trace packets.

The default PC match values, for the target executable in the repository, are the start of the `SubBytes()` and `MixColumns()` functions.

The code below overlays black vertical lines on top of the power trace, for each rule match event.

Note that 18 matches are obtained (not 20) because the last round uses a different code path.

In order to overlay the power and debug trace data, we must match their timescales. `multiplier` expresses the ratio of the power trace sampling rate to the debug trace sampling rate. First, we account for the x1 or x4 power trace sampling rate.

Then, we account for the debug trace sampling rate. In parallel trace mode this is straightforward -- the debug trace rate is equal to the target processor speed.

In SWO mode, there is another factor to account for: the debug trace sampling rate and target clock rate can be different.

See https://github.com/newaetech/DesignStartTrace/blob/master/hardware/tracewhisperer/clocks.md for details on what's happening here.

In [None]:
if scope.clock.adc_src == 'clkgen_x4' or scope.clock.adc_src == 'extclk_x4':
    multiplier = 4
else:
    multiplier = 1

if TRACE_INTERFACE == 'swo':
    multiplier /= trace.swo_target_clock_ratio

In [None]:
multiplier

In [None]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.resources import INLINE
from bokeh.models import Span

output_notebook(INLINE)
p = figure(plot_width=1200)

xrange = range(len(powertraces[0].wave))
p.line(xrange, powertraces[0].wave, line_color="red")

vlines = []
for t in times:
    vlines.append(Span(location=t[0]*multiplier, dimension='height', line_color='black', line_width=2))
p.renderers.extend(vlines)

In [None]:
show(p)

# Next steps:

1. The [pc_sample_annotate.ipynb](pc_sample_annotate.ipynb) notebook shows an example of something else you can do with trace that's pretty neat: annotating a power waveform with the functions being executed.
2. The [uecc.ipynb](https://github.com/newaetech/chipwhisperer-jupyter/blob/master/demos/uecc.ipynb) notebook shows how trace can be used to help execute a side-channel attack on a software ECC target.