Skip to content

Conversation

chschneider
Copy link
Contributor

By using https://github.com/gvanas/KeccakCodePackage for the sha3 implementation we get a ~30 times speedup in a simple benchmark.
Currently it compiles either the generic64lc or the generic32lc version of KeccakCodePackage depending on the machine architecture, not sure if this is necessary or whether the 64 bit version could be used all the time.
It should also be easier to update the code one KeccakCodePackage releases a new version because it is stored in a separate directory without any source code modifications.

@krakjoe krakjoe added the Feature label Apr 6, 2017
@krakjoe
Copy link
Member

krakjoe commented Apr 6, 2017

This appears to cause failures in the hash extension tests ...

@chschneider
Copy link
Contributor Author

I looked at the AppVeyor build failure but I couldn't figure out how Windows builds handle include paths. Who should I get in touch with for this?

@weltling
Copy link
Contributor

weltling commented Apr 6, 2017

You'll have to edit ext/hash/config.w32. I guess best to take look at the ext/gd/config.w32, most likely it'll changes to use CHECK_HEADER_ADD_INCLUDE(), ADD_SOURCES() and anything else necessary.

Thanks.

@chschneider
Copy link
Contributor Author

Hmm, I don't have Windows around to test it so it's kind of a shot in the dark.
Do you think there is someone I could ask to help out?

@weltling
Copy link
Contributor

@chschneider if no one beats me to it, i'll be able to support you with the matter later this week. While on it, could you please link to your benchmarks?

Thanks.

@chschneider
Copy link
Contributor Author

@weltling Thanks! I tried setting up a Windows dev environment but no luck so far :-(

I used a simple loop like
for ($i = 0; $i < 1000000; $i++)
$x = hash("sha3-256", "abc");
as benchmark and on my machine it was ~30x faster.
Bigger strings increased the time for both versions +/- linearly so it didn't make a difference for the speedup.

@weltling
Copy link
Contributor

@chschneider please check to cherry-pick chschneider/php-src@sha3...weltling:sha3

The benchmark on Windows x64 shows 7x improvement best, still significant.

Yet some points

  • KeccakHash.h should be installed, so sync on m4 side needed
  • couldn't check 32-bit builds yet, should be done. I'd expect the specialized 64- or 32-bit version were best for the corresponding arch
  • should be tested on some big endian system, i've no at hand atm :(

Thanks.

@chschneider
Copy link
Contributor Author

  • If I understand correctly then your config.w32 installs include/php/ext/hash/sha3/generic64lc/KeccakHash.h which I could easily add to config.m4. But a) when would this file be used, b) this file in turn includes KeccakSponge.h, shouldn't than then be installed as well? c) How would someone using these files get the proper include path or should these files be installed directly into ext/hash?
  • Is there a procedure / person for testing it on 32bit and big endian systems?

@weltling
Copy link
Contributor

Keccak*.h is included in the ext header, which gets installed. Any depending ext will have to set the include path, yep. Thinking a bit more about it - probably indeed it would be better to hide this dependency by including Keccak*.h in the .c file only, thus introducing no new include path dependency. I'd suggest to do so.

Regarding the test procedure - nope, there is none. For 32-bit I could test as well, for big endian it were great to have it tested. Maybe @remicollet could play a part ;)

Thanks.

@chschneider
Copy link
Contributor Author

KeccakHash.h only included in php_hash_sha3.h which in turn is only included in hash_sha3.c.
Therefore I'm not sure why php_hash_sha3.h is installed in the first place. Possibly it makes more sense to remove php_hash_sha3.h from the list of files to install, right?
What I try to avoid is having to modify the files in ext/hash/sha3/ because they are straight copies from the KeccakPackage which would make updating those files to a newer version a lot simpler.
What do you think?

@weltling
Copy link
Contributor

Those headers being installed is correct, as they provide publicly available symbols. With that, algorithms are usable directly without the hash factory.

I didn't mean to modify the bundled files. Instead, the bundled header should be included only in the actual .c file hash_sha3.c where they're needed. In php_hash_sha3.h it could be abstracted the way Keccak*.h inclusion to be not required.

Thanks.

@chschneider
Copy link
Contributor Author

I moved the dependency on Keccak*.h to hash_sha3.c which meant I use void * in php_hash_sha3.h.
I hope this is ok.
Currently the travis-ci test failed with something unrelated to my patch so I guess there is nothing I can/should do?

@nikic nikic removed their assignment Apr 25, 2017
@nikic
Copy link
Member

nikic commented Apr 25, 2017

Yeah, that's a normal intermittent failure.

/cc @sgolemon as you added this functionality originally

@chschneider
Copy link
Contributor Author

Excuse my ignorance, I haven't really worked with Github before: Do you mean I should send a message (Mail? Github?) to sgolemon or was that already done with your comment?

@staabm
Copy link
Contributor

staabm commented Apr 26, 2017

she got a ping because of the comment.

@weltling
Copy link
Contributor

@chschneider with void * alone, the functions are still not usable directly. Also, the *_CTX symbols sohuld not be removed. It would rather make sense to provide a wrapper struct, similar like it's done now struct PHP_SHA_CTX { void *ctx; };, then it can be declared on stack and passed to subsequent calls. For an example, you can see how the approach is used in md5() userspace implemtation.

Thanks.

@chschneider
Copy link
Contributor Author

I don't get the struct PHP_SHA_CTX { void ctx; }; part: As long as the proper structure (or at least its size plus alignment) is not provided it cannot be used externally anyway. How would you allocate the amount of memory needed for the struct?
Now to get that size you have to include Keccak
.h (from the appropriate directory) which is something we tried to avoid.
If by the md5() implementation you mean PHP_MD5_CTX in ext/hash/php_hash_md.h then this uses the proper structure, I saw nothing like a void * in there.
Are you suggesting providing a mock struct with a size at least as big as the original struct? Sounds very hacky to me.
But if that's what necessary then I'd say it should be something like
typedef struct { unsigned char state[256] } PHP_SHA3_CTX; /* XXX 256 is a placeholder bigger than the actual structure size to ensure we allocate enough space */

Side-note: On my 64bit system the actual struct size is 224 bytes right now...

Is that what you mean?

@weltling
Copy link
Contributor

Nope, i was talking about the usage, not about the exact declaration of the PHP_MD5_CTX. Of course, every hashing algo will have some difference. The example like here https://github.com/php/php-src/blob/master/ext/standard/md5.c#L47 is how it would be used by some dependent code. Having to pass void * would mean a var of type void somewhere else :) Instead, in the init function, you'd have to alloc the required bundled struct and assign it to the ctx member. In the factory, it allocates the context like it does here https://github.com/php/php-src/blob/master/ext/hash/hash.c#L148, when the function is used directly, the context would be allocated on the stack.

Thanks.

…not need to know about implementation details while keeping API backward compatible to original sha3 implementation
@chschneider
Copy link
Contributor Author

I think I finally got it. Thank for being patient with me.
Now I'm allocating the necessary structure from Keccak*.h in the *_Init functions and free it again in *_Final.
Is that what you meant? This way the API for sha3 for other extensions should be unchanged.

@weltling
Copy link
Contributor

Yeah, thanks for being patient yourself ;) Seems the PR is close to finish. Please check the travis fails yet, seems that clone causes some mem leaks.

Thanks.

@chschneider
Copy link
Contributor Author

I fixed the memory leak but eh travis-check has some weird failures again. Is there a way to re-run it?

@staabm
Copy link
Contributor

staabm commented Apr 27, 2017

open and close the PR

@nikic
Copy link
Member

nikic commented Apr 27, 2017

I've restarted the build.

@weltling
Copy link
Contributor

Usually you close and reopen, yep. Seems Travis was still not reliable, lets retry once. Otherwise, I plan to check manually anyway, hopefully next days, will ping back yet.

Thanks.

@weltling weltling closed this Apr 28, 2017
@weltling weltling reopened this Apr 28, 2017
@weltling
Copy link
Contributor

weltling commented May 1, 2017

I finally was able to do additional tests. The current patch shows no regressions so far. The given bench snippet in #2453 (comment) shows improvements, here's by platform and factor

  • Windows 64-bit - 7x
  • Windows 32-bit - 7x
  • Linux 64-bit - 8x
  • Linux 32-bit - 5x
  • FreeBSD 64-bit - 3x

Though I could not compare absolute numbers, as all those were test VMs in different locations, so it only witnesses there's improvement on the same given VM. I couldn't see 30x speedup on any available machine with the given snippet, still the bench results look good, given also the current tests pass.

I think, the chances are good for the PR to be merged. Probably would be still nice to hear from @remicollet, if he could find time for some big endian platform test :) Anyway, in worst case, the old code might have to be kept, and the old code used where the new code is not supported. Time to let the state ripen and see whether some more comments and tests are to come.

Thanks.

@weltling
Copy link
Contributor

If there are no further comments or objections, I'm going to give this patch a try in master anytime soon.

Thanks.

@weltling
Copy link
Contributor

Squashed and merged as 91663a9.

Thanks

@remicollet
Copy link
Member

remicollet commented Sep 29, 2017

Only for memory, this only work on LittleEndian.

#error "Not yet implemented"

Should be fixed by #2789

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants