Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HashContexts serializable #5702

Closed
wants to merge 5 commits into from
Closed

Conversation

kohler
Copy link
Contributor

@kohler kohler commented Jun 12, 2020

  • Modify php_hash_ops to contain the algorithm name and
    serialize and unserialize methods.
  • Implement __serialize and __unserialize magic methods on
    HashContext.

Note that serialized HashContexts are not necessarily portable
between PHP versions or from architecture to architecture.
(Most are, but fast SHA3s are not necessarily.)
An UnexpectedValueException is thrown when an unsupported
serialization is attempted.

@kohler
Copy link
Contributor Author

kohler commented Jun 12, 2020

Thanks for your comments @sgolemon!

  1. Consider using zend_parse_parameters_throws() and family so that the exception which is thrown contains the type error information rather than the generic RETURN_THROWS() macros.

Those functions appear not to be used on master very much?

  1. Consider using hex or base64 to serialize the contexts. This will reduce various transport/storage issues.

This does not seem worthwhile, since the serialization of binary strings is already binary.

  1. It's great that you've thought about endianness, but the current implementation simply bails on endian mismatch. It'd be a nice-to-have for the user if these serializations were portable. I know this represents a lot of work for sort of an edge case so I won't hold it against you if you say 'no' and/or save this for later work if demand surfaces.

The current version does a better job of this.

  1. Storing $key makes me nervous. I don't have a good solution to this since the deserialization doesn't actually give us a chance to specify it in the deserialization process. I wish I'd made $key/hmac an option to hash_final rather than hash_init. Maybe we can think about allowing that to be specified at either end. Let's expand on this topic while you work on your RFC.

It would be fine with me to refuse to serialize an HASH_HMAC context. But if we serialize such a context, then we need to serialize the key.

  1. Yeah... I think you need an RFC because of Issue-60742: Added FilesystemIterator::OTHER_MODE_MASK #5. Sorry.
  2. TABS v SPACES indentation issues.

There are still a couple of those, sorry. Will fix them after a round of comments

ext/hash/hash.c Outdated

ops = php_hash_fetch_ops(Z_STR_P(algo_zv));
if (!ops) {
zend_throw_exception(spl_ce_UnexpectedValueException, "Unknown hash algorithm", 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of SPL's UnexpectedValueExceptions, lately we throw ValueErrors in similar cases via zend_argument_value_error and zend_value_error. Probably the examples that are tonbe found in the sodium extension are the most interesting for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I did this for all except the LogicError (when attempting to call unserialize on an already-initialized context), I wasn't sure what was appropriate there.

ext/hash/hash.c Show resolved Hide resolved
@nikic nikic changed the title HashContexts are serializable. Make HashContexts serializable Jun 16, 2020
@sgolemon
Copy link
Contributor

Consider using hex or base64 to serialize the contexts. This will reduce various transport/storage issues.

This does not seem worthwhile, since the serialization of binary strings is already binary.

The serialization of the context is binary (8-bit, non-ascii transparent), the output of base64_encode() or bin2hex() is ASCII (7-bit). The latter had fewer storage and transport issues than the former.

@sgolemon
Copy link
Contributor

sgolemon commented Jun 21, 2020

It would be fine with me to refuse to serialize an HASH_HMAC context. But if we serialize such a context, then we need to serialize the key.

Sorry, had to go back and refer to the HMAC spec. Yeah. We need the secret at the start AND at the end of stream. Ugh.

Okay. It doesn't need to be in this project, BUT I'd like to consider adding a couple API methods to HashContext.

class HashContext {
    public function forgetHMACSecret(): void { ZEND_ZERO_MEMORY(ctx->key); }
    public function setHMACSecret(string $key): void { hash_hmac_prep_key(...); }
}

Then security conscious callers could do:

$ctx = hash_init($algo, HASH_HMAC, 'password');
$ctx->forgetHMACSecret();
/* maybe call hash_update a few times */
/* At some point a serialize happens to save the state */

/* Some time later, we get unserialized */
/* Maybe more hash_update calls happen */
$ctx->setHMACSecret('password');
$result = hash_final($ctx);

This way the HMAC secret never needs to leave the app, but you have to opt-into that extra layer of security.

Another option might be a global setting for a secret encryptor/decryptor (maybe calling into sodium? Maybe just a userspace callback which can figure out encryption/stripping/decryption as needed).

I just want us to avoid spilling secrets before we push this through.

Edit to add:

It would be fine with me to refuse to serialize an HASH_HMAC context.

We can put this issue down the road by initially refusing to serialize HMAC. The window for 8.0 is closing soon and I'd hate to lose general hash serialization because of an issue specific to HMAC.

ext/hash/tests/hash_copy_001.phpt Outdated Show resolved Hide resolved
ext/hash/tests/hash-clone.phpt Outdated Show resolved Hide resolved
ext/hash/tests/hash-clone.phpt Outdated Show resolved Hide resolved
foreach ($algos as $algo) {
var_dump($algo);
$ctx0 = hash_init($algo);
$serial = serialize($ctx0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output the serialized string. We want to make sure the expect which we get on LE arches match what we get on BE arches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this for a selection of algorithms and not all of them? In particular if fast-SHA3 is enabled, the serialization is still endian-sensitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new hash_serialize_003.phpt covers this case.


var_dump($algo);
$ctx0 = hash_init($algo, HASH_HMAC, $key);
$serial = serialize($ctx0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I'd like to include the serialized state in the expect so that we can confirm endianness is accounted for correctly.

ext/hash/hash_joaat.c Outdated Show resolved Hide resolved
@kohler
Copy link
Contributor Author

kohler commented Jun 21, 2020

This does not seem worthwhile, since the serialization of binary strings is already binary.

The serialization of the context is binary (8-bit, non-ascii transparent), the output of base64_encode() or bin2hex() is ASCII (7-bit). The latter had fewer storage and transport issues than the former.

I understand, but this patch is designed to fit well into the existing PHP serialization framework, which does not generate 7-bit safe output. For instance the serialization of a binary string contains binary characters. (serialize("\xF2") === "s:1:\"\xF2\";") I do not see the point of making the serialization of a HashContext 7-bit safe when serializations in general are not 7-bit safe.

@kohler
Copy link
Contributor Author

kohler commented Jun 21, 2020

Thanks very much for the comments! Seems I should:

  1. make a separate PR for joaat changes,
  2. fix TABS/SPACES,
  3. throw an exception instead of serializing HASH_HMAC contexts,
  4. fix some tests.

At that point might we be good to merge?

@kohler
Copy link
Contributor Author

kohler commented Jun 22, 2020

@kohler
Copy link
Contributor Author

kohler commented Jun 22, 2020

So it turns out that many hash algorithms do not fully initialize portions of their contexts. For instance the input buffers in MD4, MD5, SHA1, etc. This is not a huge problem since those bytes are never read by the hash algorithm, but it does mean that the serialized version of a HashContext contains some bytes with indeterminate values, and we can't rely on the output of serialize() equaling any particular string.

I can see a couple ways forward, including (1) ensuring that HashContexts are always fully initialized (this might slow down hashing a tiny bit), (2) changing the tests to use particular serialization strings (generated on a mix of big- and little-endian). Any preference?

@kohler kohler force-pushed the hash-serialize branch 2 times, most recently from b986a77 to 3106e92 Compare June 22, 2020 12:55
@kohler
Copy link
Contributor Author

kohler commented Jun 22, 2020

Hi all, thanks very much for the suggestions! I added a test validating that specific serialized representations can be un-serialized. This found some bugs, in particular with 32-bit vs. 64-bit architectures—so thanks for pushing me to add those tests. (I added more internal documentation too.) At this point all CI tests pass.

Commit 15d2b65 isn't a critical part of this PR, and I could make a separate PR for it, but it seems valuable anyway.

@kohler
Copy link
Contributor Author

kohler commented Jun 23, 2020

Exposing uninitialized memory is bad, so now hash context memory is always zeroed before use. There was no performance impact in a test that did a couple million hashes of short data.

Copy link
Contributor

@sgolemon sgolemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this. I still personally think making an effort to be ASCII clean is worthwhile is serialize output, but I do understand that there's no expectation of ASCII safety currently, so I don't need to lose sleep over it.

ext/hash/php_hash.h Show resolved Hide resolved
@nikic nikic self-requested a review June 23, 2020 19:33
@nikic
Copy link
Member

nikic commented Jun 26, 2020

The serialization of the sha3 context seems to be unsafe. Using this base64-encoded unserialize payload:

TzoxMToiSGFzaENvbnRleHQiOjU6e2k6MDtzOjg6InNoYTMtNTEyIjtpOjE7aTowO2k6MjtzOjI1
NjoiAAAAAAAAAAD/////////////////////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAP//////////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//////////8AAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP///////wAA/wAA/wAAAAAAAAAAAAAAAP//////////
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAgAAAAAABQAAAAAAAgAABgAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACI7aTozO2k6MDtpOjQ7YTowOnt9fQ==

We get:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==525302==ERROR: AddressSanitizer: SEGV on unknown address 0x6110050188c0 (pc 0x000000b7c7e7 bp 0x7ffdc1e12220 sp 0x7ffdc1e121d0 T0)
==525302==The signal is caused by a WRITE memory access.
    #0 0xb7c7e7 in KeccakWidth1600_SpongeAbsorbLastFewBits /home/nikic/php/php-src-fuzz/ext/hash/sha3/generic64lc/KeccakSponge.inc:228:5
    #1 0xb7b6e3 in Keccak_HashFinal /home/nikic/php/php-src-fuzz/ext/hash/sha3/generic64lc/KeccakHash.c:66:34
    #2 0xaf53fb in php_hashcontext_dtor /home/nikic/php/php-src-fuzz/ext/hash/hash.c:1366:3
    #3 0x1767dd9 in zend_objects_store_del /home/nikic/php/php-src-fuzz/Zend/zend_objects_API.c:178:4
    #4 0x179246e in LLVMFuzzerTestOneInput /home/nikic/php/php-src-fuzz/sapi/fuzzer/fuzzer-unserialize.c:65:3
    #5 0x63c521 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x63c521)
    #6 0x63bc65 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x63bc65)
    #7 0x63df07 in fuzzer::Fuzzer::MutateAndTestOne() (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x63df07)
    #8 0x63ec05 in fuzzer::Fuzzer::Loop(std::__Fuzzer::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x63ec05)
    #9 0x62d5be in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x62d5be)
    #10 0x656402 in main (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x656402)
    #11 0x7ffa743a10b2 in __libc_start_main /build/glibc-YYA7BZ/glibc-2.31/csu/../csu/libc-start.c:308:16
    #12 0x60235d in _start (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x60235d)

and various variations thereon. I haven't looked closer at the cause.

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks to be in good shape, I only have some cosmetic comments.

However, I'm not convinced about the safety of this. I pointed out the sha3 issue above and fuzzing hasn't found issues with other hashes. However, this is pure unserialize fuzzing and I'm pretty sure we'd be seeing issues if we fed in any data after unserializing. From a cursory code inspection

if (context->length) {
i = 32 - context->length;
memcpy(&context->buffer[context->length], input, i);
SnefruTransform(context, context->buffer);
}
looks like something that could be problematic.

ext/hash/hash.c Outdated Show resolved Hide resolved
ext/hash/hash.c Show resolved Hide resolved
ext/hash/hash.c Show resolved Hide resolved
ext/hash/hash.c Show resolved Hide resolved
ext/hash/hash.c Outdated Show resolved Hide resolved
ext/hash/hash.c Outdated Show resolved Hide resolved
ext/hash/hash.c Outdated
if (!hash->ops->hash_serialize) {
goto serialize_failure;
} else if (hash->options & PHP_HASH_HMAC) {
zend_value_error("HashContext with HASH_HMAC option cannot be serialized");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe for serialization failures it is necessary to use an Exception subclass (just Exception itself is fine), not an Error subclass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I was advised to go for zend_value_error above. Can you help out with a suggeste change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with Exception.

ext/hash/hash.c Outdated Show resolved Hide resolved
ext/hash/hash.c Outdated
|| !options_zv || Z_TYPE_P(options_zv) != IS_LONG
|| !hash_zv
|| !members_zv || Z_TYPE_P(members_zv) != IS_ARRAY) {
zend_value_error("Incomplete or ill-formed serialization data");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same in this function, we shouldn't use ValueError.

ext/hash/hash.c Outdated Show resolved Hide resolved
@nikic
Copy link
Member

nikic commented Jun 26, 2020

To give an example of the snefru issue:

<?php
  
$payload = <<<STR
O:11:"HashContext":5:{i:0;s:6:"snefru";i:1;i:0;i:2;a:20:{i:0;i:0;i:1;i:0;i:2;i:0;i:3;i:0;i:4;i:0;i:5;i:0;i:6;i:0;i:7;i:0;i:8;i:0;i:9;i:0;i:10;i:0;i:11;i:0;i:12;i:0;i:13;i:0;i:14;i:0;i:15;i:0;i:16;i:0;i:17;i:0;i:18;i:255;i:19;s:32:"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0";}i:3;i:2;i:4;a:0:{}}
STR;
$ctx = unserialize($payload);
hash_update($ctx, "foo");
var_dump(hash_final($ctx));
/home/nikic/php/php-src-fuzz/ext/hash/hash_snefru.c:156:12: runtime error: index 255 out of bounds for type 'unsigned char [32]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/nikic/php/php-src-fuzz/ext/hash/hash_snefru.c:156:12 in 

Would cause an out-of-bounds memory access without ubsan.

The premise of serializing the raw internal state seems to be on pretty shaky ground.

@kohler
Copy link
Contributor Author

kohler commented Jun 26, 2020

First off, thanks!!!

The premise of serializing the raw internal state seems to be on pretty shaky ground.

I audited all the hash functions for possible out-of-bounds memory accesses and similar issues. Gost, md2, sha3, snefru, tiger, & whirlpool states contain positions into “sponge” buffers that absorb input. I updated the unserialize codes to validate these positions, ensuring they're within bounds. The other hash functions look safe to me already. Some (e.g. adler32) work bytewise and have no sponge buffers. Some (e.g. sha256) encode their sponge buffer positions as a bitfield; when reading the position, they mask off uninteresting bits, so no input value will cause a problem.

After ac607dc, the bad snefru serialization produces:

Fatal error: Uncaught ValueError: HashContext for algorithm "snefru" cannot be unserialized, format may be non-portable (code -2000) in Standard input code:6
Stack trace:
#0 [internal function]: HashContext->__unserialize(Array)
#1 Standard input code(6): unserialize('O:11:"HashConte...')
#2 {main}
  thrown in Standard input code on line 6

Hash serialization is a useful feature, so I'd love to hear how I could help raise the comfort level around the implementation and find the bugs that remain.

@kohler
Copy link
Contributor Author

kohler commented Jun 26, 2020

One more note: The Keccak implementation of SHA3 is now serialized semantically, rather than just by serializing the entire raw internal state. That means every hash function is serialized semantically.

@nikic
Copy link
Member

nikic commented Jun 26, 2020

The premise of serializing the raw internal state seems to be on pretty shaky ground.

I audited all the hash functions for possible out-of-bounds memory accesses and similar issues. Gost, md2, sha3, snefru, tiger, & whirlpool states contain positions into “sponge” buffers that absorb input. I updated the unserialize codes to validate these positions, ensuring they're within bounds. The other hash functions look safe to me already. Some (e.g. adler32) work bytewise and have no sponge buffers. Some (e.g. sha256) encode their sponge buffer positions as a bitfield; when reading the position, they mask off uninteresting bits, so no input value will cause a problem.

After ac607dc, the bad snefru serialization produces:

Fatal error: Uncaught ValueError: HashContext for algorithm "snefru" cannot be unserialized, format may be non-portable (code -2000) in Standard input code:6
Stack trace:
#0 [internal function]: HashContext->__unserialize(Array)
#1 Standard input code(6): unserialize('O:11:"HashConte...')
#2 {main}
  thrown in Standard input code on line 6

Hash serialization is a useful feature, so I'd love to hear how I could help raise the comfort level around the implementation and find the bugs that remain.

That looks like a reasonable approach in general. We just need to be somewhat confident that everything that needs checking has been checked.

I've done a fuzzing run with the following patch: https://gist.github.com/nikic/34d798a0c579e5069ac9e9eccf8511c8 Which turned up the following issue:

TzoxMToiSGFzaENvbnRleHQiOjU6e2k6MDtzOjk6IndoaXJscG9vbCI7aToxO2k6MDtpOjI7YToy
MDp7aTowO2k6MDtpOjE7aTowO2k6MjtpOjA7aTozO2k6MDtpOjQ7aTowO2k6NTtpOjA7aTo2O2k6
MDtpOjc7aTowO2k6ODtpOjA7aTo5O2k6MDtpOjEwO2k6MDtpOjExO2k6MDtpOjEyO2k6MDtpOjEz
O2k6MDtpOjE0O2k6MDtpOjE1O2k6MDtpOjE2O3M6MzI6IgAAAAAAAAABAAAAAZHgZwAAAAAAAAAA
AAAAAAAAAAAAIjtpOjE3O2k6NTtpOjE4O2k6MDtpOjE5O3M6NjQ6IgAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABgAAAAAAAAAAAAAiO31pOjM7aToy
O2k6NDthOjA6e318eGVlZWVlZWVlZWVlZf//////////////OjAwMDAwMTE1NTQwMjc3OTtpOjk7
aToxMzU5ODE5AAAAOjE=

The part before | is the serialized string, the part after is the hash_update payload.

==741172==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60f000004be8 at pc 0x000000ae0c4d bp 0x7ffc15aa1d40 sp 0x7ffc15aa1d38
WRITE of size 1 at 0x60f000004be8 thread T0
    #0 0xae0c4c in PHP_WHIRLPOOLUpdate /home/nikic/php/php-src-fuzz/ext/hash/hash_whirlpool.c:366:27
    #1 0xa7ed81 in zif_hash_update /home/nikic/php/php-src-fuzz/ext/hash/hash.c:666:2
    #2 0x1115f41 in zend_call_function /home/nikic/php/php-src-fuzz/Zend/zend_execute_API.c
    #3 0x1114545 in _call_user_function_ex /home/nikic/php/php-src-fuzz/Zend/zend_execute_API.c:637:9
    #4 0x153d1e4 in fuzzer_call_php_func_zval /home/nikic/php/php-src-fuzz/sapi/fuzzer/fuzzer-sapi.c:247:2
    #5 0x153bcd5 in LLVMFuzzerTestOneInput /home/nikic/php/php-src-fuzz/sapi/fuzzer/fuzzer-unserialize.c:75:4
    #6 0x63c521 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x63c521)
    #7 0x63ee1a in fuzzer::Fuzzer::MinimizeCrashLoop(std::__Fuzzer::vector<unsigned char, fuzzer::fuzzer_allocator<unsigned char> > const&) (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x63ee1a)
    #8 0x629a51 in fuzzer::MinimizeCrashInputInternalStep(fuzzer::Fuzzer*, fuzzer::InputCorpus*) (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x629a51)
    #9 0x62d50f in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x62d50f)
    #10 0x656402 in main (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x656402)
    #11 0x7f5839f8a0b2 in __libc_start_main /build/glibc-YYA7BZ/glibc-2.31/csu/../csu/libc-start.c:308:16
    #12 0x60235d in _start (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x60235d)

0x60f000004be8 is located 0 bytes to the right of 168-byte region [0x60f000004b40,0x60f000004be8)
allocated by thread T0 here:
    #0 0x70208d in malloc (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x70208d)
    #1 0x1085739 in __zend_malloc /home/nikic/php/php-src-fuzz/Zend/zend_alloc.c:2992:14
    #2 0x1085c83 in _malloc_custom /home/nikic/php/php-src-fuzz/Zend/zend_alloc.c:2417:10
    #3 0x1085c83 in _emalloc /home/nikic/php/php-src-fuzz/Zend/zend_alloc.c:2536:10
    #4 0x1085c83 in _ecalloc /home/nikic/php/php-src-fuzz/Zend/zend_alloc.c:2608:6
    #5 0xa87464 in php_hash_alloc_context /home/nikic/php/php-src-fuzz/ext/hash/php_hash.h:151:9
    #6 0xa87464 in zim_HashContext___unserialize /home/nikic/php/php-src-fuzz/ext/hash/hash.c:1521:18
    #7 0x1115f41 in zend_call_function /home/nikic/php/php-src-fuzz/Zend/zend_execute_API.c
    #8 0x1117fbd in zend_call_known_function /home/nikic/php/php-src-fuzz/Zend/zend_execute_API.c:890:15
    #9 0xf06bda in zend_call_known_instance_method /home/nikic/php/php-src-fuzz/Zend/zend_API.h:579:2
    #10 0xf06bda in zend_call_known_instance_method_with_1_params /home/nikic/php/php-src-fuzz/Zend/zend_API.h:591:2
    #11 0xf06bda in var_destroy /home/nikic/php/php-src-fuzz/ext/standard/var_unserializer.re:267:6
    #12 0xf0612d in php_var_unserialize_destroy /home/nikic/php/php-src-fuzz/ext/standard/var_unserializer.re:82:3
    #13 0x153b959 in LLVMFuzzerTestOneInput /home/nikic/php/php-src-fuzz/sapi/fuzzer/fuzzer-unserialize.c:68:3
    #14 0x63c521 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x63c521)
    #15 0x63ee1a in fuzzer::Fuzzer::MinimizeCrashLoop(std::__Fuzzer::vector<unsigned char, fuzzer::fuzzer_allocator<unsigned char> > const&) (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x63ee1a)
    #16 0x629a51 in fuzzer::MinimizeCrashInputInternalStep(fuzzer::Fuzzer*, fuzzer::InputCorpus*) (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x629a51)
    #17 0x62d50f in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x62d50f)
    #18 0x656402 in main (/home/nikic/php/php-src-fuzz/sapi/fuzzer/php-fuzz-unserialize+0x656402)
    #19 0x7f5839f8a0b2 in __libc_start_main /build/glibc-YYA7BZ/glibc-2.31/csu/../csu/libc-start.c:308:16

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/nikic/php/php-src-fuzz/ext/hash/hash_whirlpool.c:366:27 in PHP_WHIRLPOOLUpdate
Shadow bytes around the buggy address:
  0x0c1e7fff8920: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
  0x0c1e7fff8930: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c1e7fff8940: fd fa fa fa fa fa fa fa fa fa fd fd fd fd fd fd
  0x0c1e7fff8950: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
  0x0c1e7fff8960: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
=>0x0c1e7fff8970: 00 00 00 00 00 00 00 00 00 00 00 00 00[fa]fa fa
  0x0c1e7fff8980: fa fa fa fa fa fa fd fd fd fd fd fd fd fd fd fd
  0x0c1e7fff8990: fd fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c1e7fff89a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1e7fff89b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1e7fff89c0: fa fa fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==741172==ABORTING

@kohler
Copy link
Contributor Author

kohler commented Jun 27, 2020

Fixed the whirlpool issue. Also committed an unserializehash fuzzer based on your gist. Maybe it would be good to have around?

@kohler
Copy link
Contributor Author

kohler commented Jun 27, 2020

Hi! I rebased and compressed the commits, updated some documentation, switched to using Exception instead of ValueError, and moved the configury around. An overnight fuzzing run of the fuzzer committed above didn't find anything after 131M runs. (I checked that the fuzzer worked by undoing the Snefru fix; it found the problem after 200K runs.) Happy to answer any further questions or problems.

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good to me. The only thing that's missing is some test coverage for the error conditions in the unserialize implementation (we should at least have the distinct exceptions thrown once).

ext/hash/hash.c Outdated

PHP_HASH_API int php_hash_serialize_spec(const php_hashcontext_object *hash, zend_long *magic, zval *zv, const char *spec) /* {{{ */
{
size_t pos = 0, max_alignment = 1, sz, count;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, what I meant here is that you can write size_t count = ... in the while loop below, as the value does not need to be available outside the loop.

Previously, the Keccak_HashInstance was separately allocated.
This could cause memory leaks on errors. For instance,
in php_hash_do_hash_hmac, the following code cleans up after
a file read error:

    if (n < 0) {
    	efree(context);
    	efree(K);
    	zend_string_release(digest);
    	RETURN_FALSE;
    }

This does not call the context's hash_final operation, which
was the only way to free the separately-allocated Keccak state.

The simplest fix is simply to place the Keccak_HashInstance state
inside the context object. Then it doesn't need to be freed.

As a result, there is no need to call hash_final in the
HashContext destructor: HashContexts cannot contain internally
allocated resources.
@kohler
Copy link
Contributor Author

kohler commented Jun 29, 2020

  • Added a test for HashContext unserialize error detection (hash_serialize_004.phpt)
  • Added expected Keccak SHA-3 serializations, which caught issues concerning 32-bit vs. 64-bit architectures; now the serialization format depends on sizeof(long)
  • Got that size_t count thing
  • Rebased & squashed into fewer commits
  • Thanks!!

* Modify php_hash_ops to contain the algorithm name and
  serialize and unserialize methods.

* Implement __serialize and __unserialize magic methods on
  HashContext.

Note that serialized HashContexts are not necessarily portable
between PHP versions or from architecture to architecture.
(Most are, though Keccak and slow SHA3s are not.)

An exception is thrown when an unsupported serialization is
attempted.

Because of security concerns, HASH_HMAC contexts are not
currently serializable; attempting to serialize one throws
an exception.

Serialization exposes the state of HashContext memory, so ensure
that memory is zeroed before use by allocating it with a new
php_hash_alloc_context function. Performance impact is
negligible.

Some hash internal states have logical pointers into a buffer,
or sponge, that absorbs input provided in bytes rather than
chunks. The unserialize functions for these hash functions
must validate that the logical pointers are all within bounds,
lest future hash operations cause out-of-bounds memory accesses.

* Adler32, CRC32, FNV, joaat: simple state, no buffer positions
* Gost, MD2, SHA3, Snefru, Tiger, Whirlpool: buffer positions
  must be validated
* MD4, MD5, SHA1, SHA2, haval, ripemd: buffer positions encoded
  bitwise, forced to within bounds on use; no need to validate
Unlike the straight unserialize fuzzer, this runs only on HashContexts,
and it does an update and finalize on the contexts it creates.

With @nikic.
@kohler
Copy link
Contributor Author

kohler commented Jun 29, 2020

OK, the AppVeyor build demonstrated that some platforms have sizeof(long) == 4 but use the 64-bit Keccak implementation. (I had based my check on the ext/hash/config.m4 configury.) It is safer to use a symbol defined by the included SHA3 implementation to determine which serialization format to use, so this version does that.

@kohler
Copy link
Contributor Author

kohler commented Jun 30, 2020

I think that's it.

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@nikic
Copy link
Member

nikic commented Jun 30, 2020

Merged as 1e9ff7e, ff69a8a, dc85be5, ada776c, 75ada66. Thanks!

@nikic nikic closed this Jun 30, 2020
@kohler
Copy link
Contributor Author

kohler commented Jun 30, 2020

Thanks @nikic!

@carusogabriel carusogabriel added this to the PHP 8.0 milestone Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants