Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add correlation-enhanced collision attack #18

Closed
wants to merge 4 commits into from

Conversation

vogelpi
Copy link
Collaborator

@vogelpi vogelpi commented Oct 28, 2020

This attack is supposed to work both on unmasked AES implementations and implementations using the masked Canright S-Box. It uses the existing simple_capture_traces.py for the capture stage.

Note: Currently, the attack is not successful. I suspect because of noise in the traces or wrong configuration of the scope. I tried to use the ResyncSAD class of the ChipWhispererAPI and also implemented some very basic filtering, both without success. I am now collecting fresh traces with disconnected FTDI.

This is related to #11.

Copy link
Contributor

@alphan alphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vogelpi! Just some initial feedback:


from util import plot

# Open trace file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a if __name__ == '__main__' at the bottom and move this code to main()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's done now.

Comment on lines 35 to 39
# Create a local copy of the traces. This makes the remaining operations much
# faster.
traces = np.empty((num_traces, num_samples_use), np.double)
for i_trace in range(num_traces):
traces[i_trace] = project.waves[i_trace][start_sample_use:stop_sample_use]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because cw loads traces lazily? I wonder if there is something in the API to do this for us.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can do the following:

traces = np.array(project.waves)[:num_traces,start_sample:stop_sample]

I recommend the following:

import pandas as pd

traces = pd.DataFrame(np.array(project.waves)[:num_traces,start_sample:stop_sample])

# Then you can do the following
mean = traces.mean(axis=0)
std = traces.std(axis=0)

# the following requires pandas_bokeh
pd.set_option("plotting.backend", "pandas_bokeh")
mean.plot()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that project.waves is a memory mapped object that is not dense/contiguous and only indirectly accessed. This is really slow for big data sets.

I played around with this when trying to parallelize the computation. The way the memory mapping is implemented in the CW API, parallelization using multiprocessing etc. fails in most cases. Either the unpickling failed or I got errors that too many files where open (when giving each thread just the part of the project.waves it needs, this actually creates a new memory mapping for every thread). In the end, I ended up opening the project file separately for every thread. This worked, but it wasn't really efficient.

In contrast, creating this local dense copy is orders of magnitude faster. For example, the whole script now takes around 80 seconds on my machine. Working on the memory mapped traces such as with the approach of @moidx , the filtering alone takes around 3 - 4 minutes.

Comment on lines 47 to 55
stop_sample = start_sample + num_samples
mean_trace = np.zeros(num_samples, np.double)
mean_sq_trace = np.zeros(num_samples, np.double)
for i in range(len(traces)):
mean_trace += traces[i][start_sample:stop_sample]
mean_sq_trace += (traces[i][start_sample:stop_sample]**2)
mean_trace /= num_traces
mean_sq_trace /= num_traces
return mean_trace, mean_sq_trace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use something like np.mean(traces, axis=0)[start_sample:stop_sample] instead of the loop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's changed, thanks for the suggestion.

Comment on lines 47 to 52
stop_sample = start_sample + num_samples
mean_trace = np.zeros(num_samples, np.double)
mean_sq_trace = np.zeros(num_samples, np.double)
for i in range(len(traces)):
mean_trace += traces[i][start_sample:stop_sample]
mean_sq_trace += (traces[i][start_sample:stop_sample]**2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

traces is already truncated above using start_sample_use and stop_sample_use. Calls seem to be OK but can you consider removing this for the sake of simplicity?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 72 to 73
mean_trace, mean_sq_trace = get_mean_sq_traces(traces, num_samples_use, 0)
sigma_trace = np.sqrt(mean_sq_trace - (mean_trace**2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just use numpy.std and remove mean_sq_trace

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes absolutely. I wasn't aware that these functions exist...

Copy link
Collaborator

@moidx moidx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One initial comment. I will continue tomorrow

Comment on lines 35 to 39
# Create a local copy of the traces. This makes the remaining operations much
# faster.
traces = np.empty((num_traces, num_samples_use), np.double)
for i_trace in range(num_traces):
traces[i_trace] = project.waves[i_trace][start_sample_use:stop_sample_use]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can do the following:

traces = np.array(project.waves)[:num_traces,start_sample:stop_sample]

I recommend the following:

import pandas as pd

traces = pd.DataFrame(np.array(project.waves)[:num_traces,start_sample:stop_sample])

# Then you can do the following
mean = traces.mean(axis=0)
std = traces.std(axis=0)

# the following requires pandas_bokeh
pd.set_option("plotting.backend", "pandas_bokeh")
mean.plot()

@vogelpi
Copy link
Collaborator Author

vogelpi commented Oct 30, 2020

I am obviously not that experienced with Python yet :-) but happy to improve my skills, thanks @moidx and @alphan for your feedback!

This attack is supposed to work both on unmasked AES implementations and
implementations using the masked Canright S-Box. It uses the existing
simple_capture_traces.py for the capture stage.

Signed-off-by: Pirmin Vogel <vogelpi@lowrisc.org>
This attack can be performed on the first round (recovering the deltas
between bytes in the initial key) or the last round (recovering the deltas
between bytes in the final round key).

Signed-off-by: Pirmin Vogel <vogelpi@lowrisc.org>
@vogelpi
Copy link
Collaborator Author

vogelpi commented Nov 20, 2020

Hey @moidx and @alphan ,
I've finally managed to get this attack working on the unmasked implementation. It produces the following output:

$ ./correlation-enhanced_collision_attack.py 
Will work with 997740/1000000 traces.
known_key: b'2b7e151628aed2a6abf7158809cf4f3c'
key guess: b'2b7e151628aed2a6abf7158809cf4f3c'
SUCCESS!
86/120 deltas guessed correctly.

I also needed to normalize the correlations such that we indeed select the relationships with the strongest peaks. Below you can see the correlation plot for the last round key - Byte 0 xor Byte 1 (green), Byte 0 xor Byte 2 (orange), Byte 0 xor Byte (blue):

bokeh_plot_correlations

Could you please take another look at the PR?

…tack

This commit adds a possibility to sweep the number of traces used for the
attack, which allows to determine the minimum number of traces used
to successfully perform the attack.

Signed-off-by: Pirmin Vogel <vogelpi@lowrisc.org>
When sweeping the number of traces used, most of the time is spent in
computing the m_alpha_j-s. This commit parallelizes the corresponding
code to speed up the attack.

Signed-off-by: Pirmin Vogel <vogelpi@lowrisc.org>
@vogelpi
Copy link
Collaborator Author

vogelpi commented Nov 23, 2020

Update: I've added a functionality to sweep the number traces used for the attack. This allows to produce a plot like this:

correlation-enhanced_sweep

x-axis is the number of traces in steps of 100000, the y-axis is the percentage of correct guesses. The orange curve is the key bytes (max 16), the green curve is the number of key byte differences (max 120).

@vogelpi
Copy link
Collaborator Author

vogelpi commented Dec 4, 2020

Based on yesterday's discussion, should we go ahead and merge this @alphan ?

Copy link
Contributor

@alphan alphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you and kudos for successfully implementing the attack @vogelpi! I think we should go ahead and merge this. We will revisit this as we make progress on the distributed implementation anyway.

for byte in range(15):
# Take the most promising delta that involves the available
# key bytes.
for rho, delta, a, b in max_rho_deltas:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a comment here saying that this is a single-pass heuristic? Since we choose the delta value for each byte sequentially from 0 to 15, we don't consider cases where byte x could use the delta value of byte y where y > x.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure. I'll add a comment here.

@vogelpi
Copy link
Collaborator Author

vogelpi commented Dec 4, 2020

Thanks @alphan!

I have now added a comment and force pushed, but the update isn't shown here. I don't understand why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants