Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alphanumeric samples bytes instead of chars #1012

Merged
merged 3 commits into from Aug 5, 2020

Conversation

vks
Copy link
Collaborator

@vks vks commented Aug 1, 2020

Includes and thus closes #935.

qoh and others added 2 commits August 1, 2020 21:15
Sampling a random alphanumeric string by collecting chars (that are known to be ASCII) into a String involves re-allocation as String is encoding to UTF-8, via the example:

```rust
let chars: String = iter::repeat(())
        .map(|()| rng.sample(Alphanumeric))
        .take(7)
        .collect();
```

I wanted to get rid of the clearly unnecessary re-allocations in my applications, so I needed to be able to access to the ASCII characters as simple bytes. It seems like that was already what was going on inside Alphanumeric however, it was just internal.

This PR changes the `Distribution<char>` impl to provide `u8`s (which it generates internally) instead, and implements the previous `Distribution<char>` using it. One could then, for example, do this:

```rust
let mut rng = thread_rng();
let bytes = (0..7).map(|_| rng.sample(ByteAlphanumeric)).collect();
let chars = unsafe { String::from_utf8_unchecked(bytes) };
```
The corresponds more closely to the internally used types and can be
easily converted to a `char` via `From` and `Into`, while being more
flexible to use.

This is a breaking change.
@@ -249,7 +250,7 @@ mod tests {
'\u{ed692}',
'\u{35888}',
]);
test_samples(&Alphanumeric, 'a', &['h', 'm', 'e', '3', 'M']);
test_samples(&Alphanumeric, 0, &[104, 109, 101, 51, 77]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we need .map(..) for distributions?

Possible I think, and in a way it makes sense. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we need .map(..) for distributions?

How do you mean that? You need that if you want char, but I think that makes sense, because the conversion from u8 to char is trivial, but the other direction is not. I think it makes more sense to see Alphanumeric as a distribution of bytes, because this type is more narrow, and it's unfortunate to throw away that compile-time knowledge by forcing a conversion to char.

If you prefer, we can also use this for the test:

Suggested change
test_samples(&Alphanumeric, 0, &[104, 109, 101, 51, 77]);
test_samples(&Alphanumeric, b'a', &[b'h', b'm', b'e', b'3', b'M']);

Copy link
Member

@dhardy dhardy Aug 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Edit: start again.)

The point is that here you could replace &Alphanumeric with &Alphanumeric.map(char::from) and keep the other args to test_samples as chars. Of course that doesn't matter for this test, but may be mildly useful elsewhere — though maybe not often, since we can already do Alphanumeric.sample_iter(rng).map(char::from).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, we might as well implement Iterator for distributions, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do: the .sample_iter(rng) method. The RNG has to be attached somehow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is preferable to convert the distribution into an iterator for such cases. This also supports all the other Iterator methods, without adding more API.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — this appears the best option.

@vks vks merged commit bbb0dff into rust-random:master Aug 5, 2020
@vks vks deleted the alphanumeric-bytes branch August 5, 2020 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants