Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0 random coefficient can lead to degraded polynomial #2

Open
bernardpaulus opened this issue Jun 8, 2017 · 21 comments
Open

0 random coefficient can lead to degraded polynomial #2

bernardpaulus opened this issue Jun 8, 2017 · 21 comments

Comments

@bernardpaulus
Copy link
Contributor

The generated coefficients can be 0. Though improbable, if all coefficients are 0, all the shares would be the secret.

In general, the fact that the random generator can produce 0 introduces the risk of lower-order polynomials and therefore a risk of a lower number of shares needed to recover the share than required.

@fletcher
Copy link
Owner

fletcher commented Jun 8, 2017

First -- thanks for submitting! This is a niche project, and I haven't really promoted it in any way. But still, it's been quiet!

And thanks for pointing out the flaw!

@fletcher fletcher closed this as completed Jun 8, 2017
@fletcher fletcher reopened this Jun 8, 2017
@fletcher
Copy link
Owner

fletcher commented Jun 8, 2017

@bernardpaulus -- Reading through code again, and wondering if I am missing something. Do you subtract 1 from the prime twice? once before calling nonzero_random() and once inside it? Is this on purpose?

@fletcher
Copy link
Owner

fletcher commented Jun 8, 2017

I guess it seems from Wikipedia that this is on purpose. So it seems that my initial code was incorrect in two ways, that partially cancelled each other out. By not adding 1, I allowed a coefficient to be 0. By not subtracting 1, when I fixed the first problem I would introduce the problem of allowing a coefficient to be equal to the prime, rather than 1 less than the prime.

If my understanding is incorrect, please let me know. Otherwise, ignore. ;)

Thanks!

@fletcher fletcher closed this as completed Jun 8, 2017
@bernardpaulus
Copy link
Contributor Author

Hello Fletcher!

Not too fast! I actually modified the wikipedia article to gain more visibility on this and quicker feedback and I'm regularly checking it for changes.

Now you said you looked up wikipedia to check my change, I realized I was wrong and reverted the edit.

=> more details follow

@fletcher fletcher reopened this Jun 8, 2017
@fletcher
Copy link
Owner

fletcher commented Jun 8, 2017

In reading from Shamir's 1979 paper (copied from OCR'd image, so forgive "typos"):

The coefficients a~..... ak_~ in q(x) are randomly chosen from a uniform distribution over the integers in [0, p), and the values D~..... Dn are computed modulo p.

The key is "[0,p)", which would imply that the coefficients can be 0, but cannot be equal to the prime.

Correct?

@fletcher
Copy link
Owner

fletcher commented Jun 8, 2017

@bernardpaulus
Copy link
Contributor Author

So, yes, the idea was to stop generating random numbers in the interval [0, 255], but instead generate them on [1, 256]

To be more clear about the change, we moved from:

coef[i] = rand() % 256; // [0, 255]

to

coef[i] = rand() % 255 + 1; // [1, 255]

the same applies for arc4random_uniform.

I think this is required to avoid weak polynomials. Indeed, if you take wikipedia's example:

(a0 = 1234 ; a1 = 166; a2 = 94)

And, if, by chance a2 is 0 instead, the above is equivalent to:

splitting phase

(a0 = 1234 ; a1 = 166 ; a2  = 0 )
f(x) = 1234 + 166 x
Ds = [(1, 1400), (2, 1566), (3, 1732), (4, 1898), (5, 2064), (6, 2230)]

recovery phase

now, only two shares, say (x0 = 1, y0 =1400) and (x1 = 2, y1 = 1566), are sufficient to recover the secret:

Lagrange interpolation with two datapoints, check the wikipedia article for the formulas.

 l0(x) = (x - x1) / (x0 - x1) = (x - 2) / (1 - 2) = -x + 2
 l1(x) = (x - x0) / (x1 - x0) = (x - 1) / (2 - 1) = x - 1
 L(x) = y0 * l0(x) + y1 * l1(x) = 1400 * ( -x + 2 ) + 1566 ( x - 1 )
        = -1400x + 2800 + 1566 x - 1566 = 166x + 1234

we recovered our polynomial with only two shares, when 3 are supposed to be required.
Let's recover the secret:

L(0) = 0x + 1234 = 1234

It works similarly if a1 is 0 instead of a2 in the example. Also, in the general case, each factor equal to 0 reduces by one the number of shares required to find out the secret. This is a matter of degrees of freedom of the polynomial.

Caveat: I don't have enough time to transpose this to finite field arithmetic right now.

possible attack

Given an attacker that attempts to recover the secret (= a password)
with k - 1 shares instead of the k required
and who as access to an oracle (= a login form) to verify if his guess is true,
the attacker therefore has a non-negligible probability (= better than brute force) of recovering the secret (= being able to log in)
by just attempting a recovery with his k - 1 shares.

Given:

  • (p = 257), the size of the finite field,

  • (k = 3) the required number of shares

  • and uniformly distributed coefficients in [0, p-1] = [0, 256]
    The probability of him recovering the password by attempting a recovery with k - 1 keys is thus:

    P(p, k) = 1 - ((p-1) / p) ** (k-1) // NOT(the probability that no coefficient is 0)

For the above parameters, the attacker has the following probability to recover the shares with only two shares:

 P(257, 3) = 0.008 // rounded up, 0.8%

Which is better than brute-forcing.

To prevent anyone to have a non-negligible chance of recovering the secret with less than the k required shares, we have to ensure the underlying polynomial has the required degrees of freedom, e.g. that no coefficient is equal to 0.

@bernardpaulus
Copy link
Contributor Author

Cryptostatis' answer is in line with the above (just upvoted it)

@bernardpaulus
Copy link
Contributor Author

Hum... the simplest would be trying it out with the code.

Will do that tonight :)

@fletcher
Copy link
Owner

fletcher commented Jun 8, 2017

I am not a mathematician or cryptologist (just a physician with a college background in engineering.)

But, if this is truly a flaw in Shamir's scheme, wouldn't there be published criticism discussing this somewhere? There may be, but I haven't found it yet. My search is limited to google, and not very refined since I am not particularly experienced in focused searches of the literature in this field. I have found other descriptions of weaknesses, but these seem to refer to limitations of how this can be used, not actually weaknesses in the methodology or strength of "encryption." They all seem to involve dishonest shareholders, not outsiders with no information about the shares.

EDIT: For completeness, I'll add that it's possible I am misunderstanding something in the paper.

@fletcher
Copy link
Owner

fletcher commented Jun 8, 2017

Another page with a purported flaw in Shamir's scheme that seems to not be true (based on comments):

https://lardbucket.org/blog/archives/2007/10/30/a-flaw-in-shamirs-secret-sharing-method/

I don't claim to understand finite fields well enough to understand all the implications of them on Shamir's scheme, but I think that is an important part of this from what I'm reading.

@fletcher
Copy link
Owner

fletcher commented Jun 8, 2017

As a first test, I tweaked the code to always generate 0 for the first coefficient, and random numbers for the rest. It still requires the specified number of shares to reconstruct the key in my tests.

Conversely, when I set all of the coefficients to 0, then it only requires a single share (they are all identical) to recover the key. When I set all coefficients to 1, it again requires all the shares to recover the key.

I'm not saying this, by itself, disproves your theory. Just trying to add data.

@fletcher
Copy link
Owner

fletcher commented Jun 8, 2017

Here's a challenge.... ;)

I encrypted a short phrase using a modified version of the code. It is set to require 4 of 4 shares to reconstruct the secret. I modified the source to force one of the coefficients to be 0 (I won't tell you which one....).

I also included a similar example (different phrase) using the regular non-zero coefficients. You have a 50/50 chance of guessing which is which.

Here are 3 of the 4 required shares - example A:

0104AA312146142D8BB6
0204AA6B95CDE66B1170
0304AAA84BBE717861D0

And, example B:

0104AA3FA2C366FEECD4
0204AA0DA2E673BF0535
0304AAAB3DFD0884E081

Of course, the object is to determine the phrase, showing your work to prove it's not just using psychology to determine what i might have picked as phrases.

@fletcher
Copy link
Owner

fletcher commented Jun 8, 2017

In reading through your comment and trying to understand it, I have a question/comment.

I understand your comments through the splitting phase.

But it seems that you make an assumption in your recovery phase, namely you assume that the x^3 coefficient is 0, but you only know that because you set it to 0 in advance. Therefore you use the formulas to solve for a 2nd degree polynomial, and are given two data points. Therefore, of course you found the right answer. But you only know this because you are both the dealer and attacker.

What would happen if you (as attacker), appropriately used the equations for a 3rd degree polynomial with the same two data points? Would you still get an answer? Or instead, as I suspect, would you get a family of equally likely answers?

Would the same thing not also be true if the attacker chose to fix the largest coefficient to another set value, e.g. 5? If 5 happened to be correct, then they could solve with a smaller number of keys.

So, if an attacker accurately guesses the first coefficient, then they could solve the problem with one fewer key than nominally required. But this would seem to be the same as a brute force attack, no? The attacker would have no way of knowing whether the guess about the first coefficient was correct.

Am I understanding this correctly? If not, what am I missing?

Thanks!!

@bernardpaulus
Copy link
Contributor Author

Hello,

A quick answer, because I didn't had the time and won't have more time this weekend:

  • indeed, the attack above only works if, by chance the coefficient of the term of the highest degree is 0
  • the more general attack requires the attacker to try out each of the different coefficients, resulting in (k - 1) possible passwords. This requires another approach to recover the secret, based on the equation system instead of the lagrange polynomial.

Coming back to you on that next week :)

@fletcher
Copy link
Owner

@bernardpaulus The suspense is killing me! ;)

@ferrerrajc
Copy link

ferrerrajc commented Jul 25, 2017

Hi there.

It is my understanding that if your coefficients can be zero, then the security of the algorithm is not reduced. Even if you know k - 1 points on the curve, all values for s are equally likely. As a matter of fact, if coefficients are not allowed to be zero, then some values of s are not possible. Take, for example, p = 3 and k = 2. With the point (1, 1) and the knowledge that a1 is nonzero, it must be that s is 0 or 2. Otherwise, 1 = 1 + a1 which implies a1 = 0. This is exaggerated with such a small prime, but you get the picture.

However, if it is the case that some coefficients will be zero (or any other known value), then the security of the algorithm is crippled. An attacker needs to know fewer shares to guess the secret. If there's one known coefficient value (zero or otherwise), then once the attacker has k - 1 shares, they only need to solve some k - 1 linear systems of equations to get as many possible secrets.

@fletcher
Copy link
Owner

@ferrerrajc That is my understanding as well. In this case, 0 is equally as likely as any other coefficient (except for the intentionally crippled examples I posted.)

@bernardpaulus
Copy link
Contributor Author

Hi,

Sorry for not getting back at you earlier! @ferrerrajc the likeliness argument is very interesting, but could you specify the size of the finite field in your example? (P in wikipedia's notation)

Cheers :)

@ferrerrajc
Copy link

Hey @bernardpaulus,

In my example, I used p = 3 for my prime. So the secret can only take one of the values 0, 1, or 2.
I wanted a very small prime and polynomial to demonstrate the effect of limiting coefficient values.

@fletcher
Copy link
Owner

I pushed a change to develop that restores the full range for the random numbers (e.g. 0 to 256, with prime of 257).

Based on everything I have been able to read, I believe this is correct and the range stated in Shamir's paper.

If I have misrepresented the original paper, then I definitely want to know and fix it.

If there is widely accepted proof that Shamir was wrong, and 0 should not be allowed, I am happy to change that as well.

Otherwise, individuals can modify their own version to remove 0 at the risk of actually decreasing the security of their implementation slightly (by taking the number of possibilities from 257 to 256).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants