Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HexSHA256 function for computing SHA256 checksums #4893

Merged
merged 1 commit into from
Jun 3, 2022

Conversation

fingolfin
Copy link
Member

This uses C kernel code directly taken from the crypting packages, with the kernel functions renamed to avoid conflicting with crypting.

Resolves #4844.

Some concerns and ideas:

  1. I picked HexSHA256String based on the name of the SHA256String in crypting, which does the same except the return value is not a hex string but rather a list of integers. So "String" seems to encode the argument type. But is this useful here? Can't we just call it HexSHA256? Well, I guess we might want a variant which takes a filename as argument, and then calling that e.g. HexSHA256File would be natural -- but this could be implemented as HexSHA256String(StringFile(filename)), so do we really need this? Hmm, thinking about it, perhaps yes: to deal with checksumming huge files, which we don't want to read into memory all at once?
  2. It would probably be a useful to add a variant HexSHA256Stream which takes an input stream as argument. And again: this could alternatively be also called HexSHA256, and then we can decide what to do based on the type of the argument. (If we handle streams, that also alleviates the issue of checksumming huge files without reading them into memory all at once).
  3. How to ensure a smooth transition for crypting? Well, for now this PR tries hard to be compatible with crypting, and I just checked, it is. So we could leave things at that. But we could also tweak crypting to not load its kernel extension when it detects that the GAP kernel provides the required functionality. But there is no pressing need for that right now, I think? As far as I know, only JupyterKernel needs crypting, anyway.
  4. Maybe we want other APIs beyond HexSHA256String (or whatever else to call it)? Suggestions, wishes?

@fingolfin fingolfin added kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements topic: kernel labels May 19, 2022
src/sha256.c Outdated Show resolved Hide resolved
Copy link
Contributor

@ThomasBreuer ThomasBreuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this addition.
Concerning the function name(s) and possible arguments, one function HexSha256 with argument either a string or a stream would be sufficient for all use cases I have in mind.

gap> HexSHA256String("abcd");
"88d4266fd4e6338d13b845fcf289579d209c897823b9217da3e161936f031589"
gap> HexSHA256String("abcd\n");
"fc4b5fd6816f75a7c81fc8eaa9499d6a299bd803397166e8c4cf9280b801d62c"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very low level, but maybe add:

HexSHA256String(['a', 'b', 'c', 'd']); (same as string).
HexSHA256String(""); (should be 0xe3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 according to wikipedia?)
`HexSHA256String("abcd\r\n"); (just to check nothing stupid happens on windows, can't imagine how it would).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good ideas. Also, the documentation needs to be hooked up so it actually appears in the manual ;-), and a few other things. I'll work on it later

@fingolfin
Copy link
Member Author

Hrrrmmmm... it turns out that our input streams don't offer a way to, say "read the next 4096 bytes or until EOF". You can either read one byte, one line or all data. And reading one byte at a time of course is prohibitively slow. Reading a line also is not fast. This seems like a major hole in the stream API. And I guess this might be part of why people consider GAP's built-in streams as too slow ?

Anyway, I pushed an update which supports streams via ReadAll -- this is not great for huge files, but least it works, and we can improve it later if someone needs it.

I also hooked up the documentation into the ref manual, and added more tests etc.

@ThomasBreuer ThomasBreuer added the release notes: use title For PRs: the title of this PR is suitable for direct use in the release notes label Jun 2, 2022
Copy link
Contributor

@ThomasBreuer ThomasBreuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request provides SHA256 checksums in GAP, I like this, and the changes are fine.

Concerning the documentation, HexSHA256 will appear in the "Strings" chapter (number 27) not in the "Streams" chapter (number 10), thus it might be difficult to find the function for streams.
If one searches for ??sha then one gets a hit, perhaps it would be useful to add explicit index entries for "hash function" and "checksum" that point to HexSHA256 (and also to CrcFile, CrcString?).

(I have added the label that causes the title to appear in the release notes, I hope this will be enough for the release notes.)

@fingolfin
Copy link
Member Author

Good ideas. We could also add "see also..." links from CrcString/CrcFile to the new function. (CrcString already points this way to CrcFile -- but not the other way around. So perhaps that, too, should be added)

@ThomasBreuer
Copy link
Contributor

@fingolfin Yes, links from CrcFile/CrcString to HexSHA256 (and from CrcFile to CrcString) would be very useful.

@fingolfin fingolfin changed the title Add HexSHA256String function for computing SHA256 checksums Add HexSHA256 function for computing SHA256 checksums Jun 2, 2022
This uses C kernel code directly taken from the crypting packages,
with the kernel functions renamed to avoid conflicting with crypting.
@fingolfin fingolfin merged commit 13e65f8 into gap-system:master Jun 3, 2022
@fingolfin fingolfin deleted the mh/sha256 branch June 3, 2022 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements kind: new feature release notes: use title For PRs: the title of this PR is suitable for direct use in the release notes topic: kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add SHA256 function(s) to GAP itself?
3 participants