Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/rand: add func Text #67057

Closed
FiloSottile opened this issue Apr 26, 2024 · 59 comments
Closed

crypto/rand: add func Text #67057

FiloSottile opened this issue Apr 26, 2024 · 59 comments
Labels
Proposal Proposal-Accepted Proposal-Crypto Proposal related to crypto packages or other security issues
Milestone

Comments

@FiloSottile
Copy link
Contributor

FiloSottile commented Apr 26, 2024

Update, Jun 27 2024: The current proposal is #67057 (comment)


Random strings are useful as passwords, bearer tokens, and 2FA codes.

Generating them without bias from crypto/rand is not trivial at all, and applications are lured into using math/rand for it. See greenpau/caddy-security#265 for example.

I propose we add a top-level function that takes a charset and returns a string of elements randomly selected from it.

The length of the string is selected automatically to provide 128 bits of security. If uppercase and lowercase letters and digits are used, a string will be ceil(log62(2^128)) = 22 characters long.

If more than 2^48 strings are generated, the chance of collision becomes higher than 2^32, but that's also true of UUID v4. Callers that reach those scales could call String twice and concatenate the results, but it doesn't feel worth documenting.

There is no error return value on the assumption that #66821 is accepted. That allows convenient slicing if necessary.

// String returns a random sequence of characters from the supplied alphabet.
//
// The length of the returned string is selected to provide 128 bits of entropy,
// enough to make a brute-force attack infeasible. If a shorter string is
// needed, for example as a one-time token, the caller may truncate the result.
//
// The alphabet is interpreted as a sequence of runes, and must contain at least
// two Unicode characters, or String will panic.
func String(alphabet string) string

This can't really be implemented in constant time, but since it always runs randomly, attackers can get only a single timing sample, which limits the maximum theoretical leak to a few bits.

Do we already have constants for the most common charsets defined somewhere?

/cc @golang/security @golang/proposal-review

@FiloSottile FiloSottile added Proposal Proposal-Crypto Proposal related to crypto packages or other security issues labels Apr 26, 2024
@FiloSottile FiloSottile added this to the Proposal milestone Apr 26, 2024
@robpike
Copy link
Contributor

robpike commented Apr 26, 2024

Is there a reason to set the entropy to a hidden value rather than provide it as a parameter (other than Fillipo knows best)? What if I need more randomness tomorrow?

The idea is cool but it seems too rigid to me, but I am not a security maven.

@FiloSottile
Copy link
Contributor Author

I started with taking an int parameter for the entropy, then started going back and forth on whether it should be the length of the returned string (what if the caller does the math wrong, or does it right but later changes the alphabet to be smaller thinking the parameter is the entropy and doesn't need to change?) or the bits of entropy (what if the caller thinks it's the length of the string and passes a way too small number?).

This is one of those cases where we can do the work for the user and do the secure thing directly. So yeah, we can know what the best answer is, so we can save callers the work and risk.

If a caller knows they are fine with less entropy (e.g. for a PAKE or 2FA token), they can slice the string. The performance overhead is unlikely to be a bottleneck, and the slicing can alert a security reviewer to pay attention.

There's no need for more entropy than 128 bits against brute force attacks. If making hundreds of thousands of billions of strings users might need more entropy against collisions, but they will probably have someone on the team that knows that by the time they design such a system, and they can just call String() + String(). UUIDv4 made the same assumption and seems fine.

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Apr 26, 2024
@ulikunitz
Copy link
Contributor

I find the String() function with a parameter strange because the String() method doesn't usually have parameters.

The approach I usually apply is to use crypto/rand.Reader to get 16 random bytes and converting it with base64.RawURLEncoding.EncodeToString(). The resulting string reflects all the bits in the random value. Using this approach a RandomStringer with a String() method is quite easy to implement. crypto/rand could have a String() function that uses an internal default RandomStringer value. IMHO this approach would be more efficient than a String function that has to validate the alphabet parameter at every call.

@jfrech
Copy link

jfrech commented Apr 28, 2024

Would String("aab") exhibit a bias?

@FiloSottile
Copy link
Contributor Author

@ulikunitz Efficiency is not a primary concern here, I don't think applications generate passwords or tokens in a hot loop, at least not hot enough that checking that a string is valid UTF-8 will matter.

Would String("aab") exhibit a bias?

Yes, a would be selected twice as often as b. We should document that.

@ulikunitz
Copy link
Contributor

Is String("aaa") a valid use? What about String("")? Both strings are valid UTF-8. Josua Bloch stated an API should be hard to misuse.

This API can be misused to generate always the same string or biased strings.

The following API doesn't have these weaknesses:

func NewAlphabet(s string) (ab *Alphabet, err error)

func (ab *Alphabet) RandomString() string

const Base32DouglasCrockford = "0123456789ABCDEFGHJKMNPQRSTVWXZ"

// String generates a random string using Douglas Crockford's
// [base32] alphabet.
// [base32]: https://www.crockford.com/base32.html
func String() string

The alphabet will produce slightly larger strings, but according to Crockford is a good compromise between compactness and error resistance. I named the method RandomString() to be explicit about what the method does. The String() function is ok, because it will be used as rand.String() in almost all cases.

I assume that the String function will not support Unicode combining characters.

Some people might want to use the function to create keys for databases. They might be concerned about performance.

@ericlagergren
Copy link
Contributor

I like the proposal, but I'm also concerned about the alphabet just being a plain string. Inevitably somebody will read the alphabet in from a config file or basically anything other than const MyAlphabet = "..." (like attacker controlled data!).

@jfrech
Copy link

jfrech commented May 2, 2024

@ulikunitz

Is String("aaa") a valid use? What about String("")? Both strings are valid UTF-8.

Please refer to the proposed function's documentation's last paragraph (where "two" is sensibly read as "two different"):

// (...)
// The alphabet is interpreted as a sequence of runes, and must contain at least
// two Unicode characters, or String will panic.
func String(alphabet string) string

I urge you not to dilute the expressive power of the standard library by inventing novel concepts (such as *Alphabet) where plain strings suffice.

@ulikunitz
Copy link
Contributor

ulikunitz commented May 2, 2024

I apologize for overseeing the panic condition. I suggest however to check for the runes in the string to be unique or as my math teacher used to say pairwise different.

I did an experimental implementation and testing for uniqueness is not dramatically slower than testing for at least two different runes for short alphabets. And with 2 ms per call an Alphabet type is not required.

goos: linux
goarch: amd64
pkg: github.com/ulikunitz/randstr
cpu: AMD Ryzen 7 3700X 8-Core Processor             
BenchmarkString1/base10-16         	  896253	      2265 ns/op
BenchmarkString1/base26-16         	  782935	      1845 ns/op
BenchmarkString1/base36-16         	  490089	      2165 ns/op
BenchmarkString1/base62-16         	  466987	      2162 ns/op
BenchmarkString2/base10-16         	  591403	      2227 ns/op
BenchmarkString2/base26-16         	  631990	      1747 ns/op
BenchmarkString2/base36-16         	  944583	      2174 ns/op
BenchmarkString2/base62-16         	  488211	      2238 ns/op

String1 tests for at least two different runes. String2 checks for a uniqueness.

The implementation can be found here: https://github.com/ulikunitz/randstr/blob/main/string.go

(Updated the comment, had a problem with the computation of the number of runes required.)

@ulikunitz
Copy link
Contributor

After review I observed that I needed more than 128 random bits to ensure that the last character is selected from the full alphabet. The benchmark results are now:

goos: linux
goarch: amd64
pkg: github.com/ulikunitz/randstr
cpu: AMD Ryzen 7 3700X 8-Core Processor             
BenchmarkString1/base10-16                668701              2355 ns/op
BenchmarkString1/base26-16                554954              2759 ns/op
BenchmarkString1/base36-16                526098              2544 ns/op
BenchmarkString1/base62-16                634251              2497 ns/op
BenchmarkString2/base10-16                369050              2782 ns/op
BenchmarkString2/base26-16                552849              2251 ns/op
BenchmarkString2/base36-16                426348              2476 ns/op
BenchmarkString2/base62-16                423656              2515 ns/op

@AlexanderYastrebov
Copy link
Contributor

Previous discussion #53447

@rsc
Copy link
Contributor

rsc commented May 8, 2024

I go back and forth on the name String. I wonder if 'Text' is better.

rand.Text("abcdef")

somehow seems clearer than

rand.String("abcdef")

I agree that 128 bits of entropy is the right amount and unlikely to need to change.

Is the idea that the implementation will always just copy the alphabet into a []rune (or as an optimization maybe a []byte) and then index the right number of times, and then discard that copy?

@FiloSottile
Copy link
Contributor Author

I go back and forth on the name String. I wonder if 'Text' is better.

I like rand.Text but I wonder if I would think it's more like Lorem Ipsum reading the docs the first time.

Is the idea that the implementation will always just copy the alphabet into a []rune (or as an optimization maybe a []byte) and then index the right number of times, and then discard that copy?

Correct.

As for alphabet validation, if we wanted to be strict, we could disallow repeated characters, as well as Unicode joiner characters, to avoid someone putting in a multi-rune character and being surprised when they are selected separately.

I am a little worried about applications letting attacker-controlled alphabets in and causing panics. A solution would be taking a page out of safeweb, and defining a private string type, so that only string constants and literals are allowed.

type constString string

func String(alphabet constString) string

Not sure why applications would take alphabets as a parameter, and we could just add a line to the docs recommending against it.

@rsc rsc moved this from Incoming to Active in Proposals May 8, 2024
@rsc
Copy link
Contributor

rsc commented May 8, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@mateusz834
Copy link
Member

mateusz834 commented May 9, 2024

Same as in #66821 (comment), String() should not be documented that it uses rand.Reader and should not use rand.Reader directly, because it can be replaced, it is a var and errors might get ignored/it might throw.

@ulikunitz
Copy link
Contributor

Will the function ensure that every character of the returned string will be selected from the whole alphabet? If yes than in some cases more than 128 bits of entropy will be required. For example an alphabet of 32 characters will require 26*5 = 130 bits.

It should be a requirement to ensure that all same-length substrings of the generated string represent the same amount of entropy.

@jfrech
Copy link

jfrech commented May 9, 2024

I am probably bikeshedding, but I would welcome a name akin to Token128. Then the paranoid could define their own Token256 and one doesn't burn a name committing to one entropy amount. Possibly I am just squeamish in light of SHAttered.

@AlexanderYastrebov
Copy link
Contributor

If there would be a digit alphabet-based number formatter (say strconv.Format) this could be expressed like:

strconv.Format(rand.Int64(), alphabet) + strconv.Format(rand.Int64(), alphabet)

@nightlyone
Copy link
Contributor

The real requirements of such strings are usually a bit more evolved in that certain parts of the alphabet must actually be present in the output.

The classic "it must contain at least one from 0-9, a-z and a symbol from a predefined symbol set like _-+:" has a different API:

StringN(min, max int, alphabets ...string) (string, error)

which can cover more use cases as well as reduce the amount of abuse by requiring explicit error handling for any of the more interesting cases instead of requiring clever choices of defaults.

Also the real world constraints of valid character ranges and constantly changing min/max constraints on what is a secure password/token means without requiring code changes as this can now be configured.

Last but not least one could measure the amount of bits used by providing a companion function that sets min/max to the same value.

If min/max is too much config, then

StringN(n int, alphabets... string) (string, error)

might be more suitable. Here the bits used could even be returned additionally to facilitate tuning N given those alphabets to match a security target for bits of randomness used.

@bjorndm
Copy link

bjorndm commented May 15, 2024

Like @nightlyone states for corporate policies not only an alphabet but also a minimum and maximum length are required. And offer there is a policy of "it should include this character but only once". While those policies are debatable, they are also often non negotiable.

Something like rand.GenerateString(alphabet, once string, min, max int) (string, error) seems like the best suited for this. The characters in once are guaranteed to be in the generated string, but only once.

@FiloSottile
Copy link
Contributor Author

I don't think compatibility with arbitrary password policies is a goal here. As you state, they can be complex and arbitrary. Good password managers can write their own code to handle this (and usually are not written in Go because they run either in the browser or as native applications).

This is for Go applications that need a random string as an access token, a key, a session ID, and similar settings. Maybe also a password, if they control the policy, but that's secondary.

@bjorndm
Copy link

bjorndm commented May 15, 2024

Actually the requirement is often that the Go backend must generate a random password that matches such a policy. There are now maybe a dozen of open source Go libraries of various quality of that allow this. This reminds me of the situation with structured logging. There is a clear need for such a random password generator and that is why I think it passes the bar of x_in_std.

@Rican7
Copy link

Rican7 commented Jun 5, 2024

This is an interesting addition. Of note, PHP 8.3 recently added a similar function to it's standard library, though it looks like this:

public Random\Randomizer::getBytesFromString(string $string, int $length): string

(The RFC explaining the addition is here.)

I do think the length parameter is nice, and at first I echoed @robpike's sentiment, but I also understand your points, @FiloSottile, for the "we can do the work for the user and do the secure thing directly" bit.

I do think the alphabet bit is important, and documentation could be provided to strongly suggest that devs not take the alphabet as a user-input parameter, but yea without the alphabet bit I likely wouldn't use it myself.

An argument could be made for a more general function (akin to the new PHP one) that is then "aliased" of sorts to a strong default... there's prior art for this kind of thing in the standard library:

As far as this question goes:

Do we already have constants for the most common charsets defined somewhere?

Sort of, but not really...

@rsc
Copy link
Contributor

rsc commented Jun 27, 2024

crockford.Random which returns 2^64 random bits in Crockford32 format. ... I am still using crockford.Random in production. It would be nice if this were compatible with that.

Obviously it would be nice to be compatible with existing code, but it seems clear to me that Text should not return only 64 bits. The discussion here was about whether 128 is enough. I think we all agree 64 is too close to "not enough" for comfort. And unless you are decoding the texts, it doesn't matter exactly how they are decoded. You could probably do rand.Text()[:N] for a suitable N if you did want to replace your existing code.

@rsc
Copy link
Contributor

rsc commented Jun 27, 2024

Have all remaining concerns about this proposal been addressed?

The proposal is to add to crypto/rand a single function:

// Text returns a random string over from the standard RFC 4648 base32 alphabet
// for use when a secret string, token, password, or other text is needed.
// The result contains at least 128 bits of randomness, enough to prevent brute force
// guessing attacks and to make the likelihood of collisions vanishingly small.
// A future version may return longer texts as needed to maintain those properties.
func Text() string

Other features like custom alphabets and custom lengths are deferred to future proposals once we see how well Text works.

@ericlagergren
Copy link
Contributor

@rsc maybe this is better left as a comment on the CL, but the docs for Text should probably include the words "cryptographically secure."

@jfrech
Copy link

jfrech commented Jul 3, 2024

@rsc s/over from/over/ ; Why was "bits of entropy" replaced by "bits of randomness"?

@rsc
Copy link
Contributor

rsc commented Jul 24, 2024

@ericlagergren, this is package crypto/rand, but sure.
@jrech, in general I try to say randomness instead of entropy because I believe the meaning of "randomness" obvious to developers who are unfamiliar with crypto terminology; "entropy" not so much.

@rsc
Copy link
Contributor

rsc commented Jul 25, 2024

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

The proposal is to add to crypto/rand a single function:

// Text returns a cryptographically random string over from the standard RFC 4648 base32 alphabet
// for use when a secret string, token, password, or other text is needed.
// The result contains at least 128 bits of randomness, enough to prevent brute force
// guessing attacks and to make the likelihood of collisions vanishingly small.
// A future version may return longer texts as needed to maintain those properties.
func Text() string

Other features like custom alphabets and custom lengths are deferred to future proposals once we see how well Text works.

@rsc rsc moved this from Active to Likely Accept in Proposals Jul 25, 2024
@robpike
Copy link
Contributor

robpike commented Jul 25, 2024

There is a spurious "over" in the first line of the comment.

I still think we want a way to set the alphabet if you need passwords, but that can be added later.

@jfrech
Copy link

jfrech commented Jul 30, 2024

But for the "over from" typo, I am fine with the proposal.

@rsc
Copy link
Contributor

rsc commented Jul 31, 2024

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

The proposal is to add to crypto/rand a single function:

// Text returns a cryptographically random string using the standard RFC 4648 base32 alphabet
// for use when a secret string, token, password, or other text is needed.
// The result contains at least 128 bits of randomness, enough to prevent brute force
// guessing attacks and to make the likelihood of collisions vanishingly small.
// A future version may return longer texts as needed to maintain those properties.
func Text() string

Other features like custom alphabets and custom lengths are deferred to future proposals once we see how well Text works.

@rsc rsc moved this from Likely Accept to Accepted in Proposals Jul 31, 2024
@rsc rsc changed the title proposal: crypto/rand: add func Text crypto/rand: add func Text Jul 31, 2024
@rsc rsc modified the milestones: Proposal, Backlog Jul 31, 2024
@magical
Copy link
Contributor

magical commented Aug 1, 2024

Apologies for the late response.

  1. It is unclear to me what the intended use cases for this fixed-format Text function are. RSC lamented a lack of uses for custom alphabets, to which several people responded with ideas. Where are the use cases for a fixed alphabet? Personally, I can think of a few hypothetical cases where i may want a string sampled from a particular alphabet, but none where i don't care what the alphabet or length are, and none where i would specifically want it to be base32. The original proposal mentioned "passwords, bearer tokens, and 2FA codes", but as crypto/rand: add func Text #67057 (comment) explained, none of these seem well-served by the latest proposal.

  2. If this must be added, i would prefer it be called rand.Base32. It seems unlikely that the character set can ever be changed once it is added; expanding the character set may break applications which assume it is base32; reducing it would make the strings longer for little benefit. rand.Text is a good, general name and it seems a shame to use it up on such a special case.

  3. Assuming someone wants 128 bits of random base32 text (or any other standard encoding), it seems easy enough to get it:

    var buf = make([]byte, 128/8)
    rand.Read(buf)
    return base32.StdEncoding.WithPadding(base32.NoPadding).EncodeToString(buf)

    The main difference being that, in this snippet, the final character will have only 3 bits of entropy instead of 5, whereas the string returned from Text would sample every character evenly. The proposal fails to motivate why this last-character bias is important enough to address. (And if it is that important, why is it not mentioned in the doc comment?).

@ericlagergren
Copy link
Contributor

ericlagergren commented Aug 1, 2024

The main difference being that, in this snippet, the final character will have only 3 bits of entropy instead of 5, whereas the string returned from Text would sample every character evenly. The proposal fails to motivate why this last-character bias is important enough to address.

This is not the "bias" mentioned in the proposal. The "bias" is people using math/rand or people writing code with modulo bias:

var s string
for len(s) < 11 {
    var buf [8]uint64
    rand.Read(buf[:])
    v := binary.LittleEndian.Uint64(buf[:])
    // INSECURE: modulo bias.
    s += alphabet[v % len(alphabet)]
}

The snippet you posted is indeed secure since the encoding E(x) -> s outputs a cryptographically secure string if x is cryptographically secure and E is injective.

@earthboundkid
Copy link
Contributor

The proposed name change to rand.Base32() seems fine to me.

Of "passwords, bearer tokens, and 2FA codes" this doesn't seem like a fit for passwords (too many arbitrary requirements) or 2FA codes (too hard to type) but it's perfect for a bearer token, and that's a pretty common need.

@rsc
Copy link
Contributor

rsc commented Aug 7, 2024

The proposal was accepted as rand.Text. That's still a good name.

@earthboundkid
Copy link
Contributor

This is accepted and the freeze is starting soon. Is anyone working on getting it in before the freeze?

@magical
Copy link
Contributor

magical commented Nov 14, 2024

Shh — i was hoping that everyone had forgotten. ;)

@rolandshoemaker
Copy link
Member

I don't believe anyone from our team is working on it. If anyone wants to send a CL for it we can probably review it before the freeze.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/627477 mentions this issue: crypto/rand: add Text for secure random strings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposal Proposal-Accepted Proposal-Crypto Proposal related to crypto packages or other security issues
Projects
Status: Accepted
Development

No branches or pull requests