Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HHVM incorrectly returns a "valid" (but not initialized) object from a broken serialization string #6032

Open
derickr opened this issue Aug 18, 2015 · 9 comments

Comments

@derickr
Copy link
Contributor

derickr commented Aug 18, 2015

I am working on a driver for MongoDB, mimicking the same API as one for PHP. In our extension, we have a couple of built-in classes, that we don't want to serialize (or inherit from). We can signal that by setting class handlers for serialize and unserialize:

    ce->ce_flags    |= ZEND_ACC_FINAL_CLASS;
    ce->serialize    = zend_class_serialize_deny;
    ce->unserialize  = zend_class_unserialize_deny;

zend_class_serialize_deny and zend_class_unserialize_deny both throw an exception in the case such an object are attempted to be serialized or deserialized:

ZEND_API int zend_class_serialize_deny(zval *object, unsigned char **buffer, zend_uint *buf_len, zend_serialize_data *data TSRMLS_DC) /* {{{ */
{
    zend_class_entry *ce = Z_OBJCE_P(object);
    zend_throw_exception_ex(NULL, 0 TSRMLS_CC, "Serialization of '%s' is not allowed", ce->name);
    return FAILURE;
}

I was attempting to do imitate the same behaviour with HHVM, by creating a trait:

trait NoSerialize {
    public function __sleep()
    {
        throw Exception("Serialization of '" . get_class($this) . "' is not allowed");
    }

    public function __wakeUp()
    {
        throw Exception("Unserialization of '" . get_class($this) . "' is not allowed");
    }
}

But the methods are never called, and HHVM insists on just throwing a warning: Warning: Attempted to serialize unserializable builtin class MongoDB\Driver\Manager in

This happens because of https://github.com/facebook/hhvm/blob/HHVM-3.9/hphp/runtime/base/object-data.cpp#L855

(In master, it is now at https://github.com/facebook/hhvm/blob/master/hphp/runtime/base/variable-serializer.cpp#L1517)

It will always do this, unless isCppSerializable is part of the class info flags, which I don't seem to be able to set myself.

So the question is really:

derickr added a commit to mongodb/mongo-hhvm-driver that referenced this issue Aug 18, 2015
@paulbiss
Copy link
Contributor

The way this API currently works requires that the native classes declare C++ sleep and wakeup functions. To do this declare functions named sleep and wakeup on the native-data struct for your class.

The sleep function takes no arguments and returns the Variant to be serialized, the wakeup function takes a context argument (Variant) and an ObjectData argument (the object being unserialized), and returns void.

You should be able to throw the exceptions directly fromt these functions.

@derickr
Copy link
Contributor Author

derickr commented Aug 19, 2015

I've tested this now, and implemented it as follows:

        void wakeup (const Variant& context, ObjectData* obj) {                                                                                                                      
            throw Object(SystemLib::AllocExceptionObject("Unserialization of MongoDB\\Driver\\Manager is not allowed"));                                                             
        }                                                                                                                                                                            

        Variant sleep() const {                                                                                                                                                      
            throw Object(SystemLib::AllocExceptionObject("Serialization of MongoDB\\Driver\\Manager is not allowed"));                                                               
        } 

This works fine, I get the exception, however, I also get an "improper" object.

Code:

<?php
$str = 'O:22:"MongoDB\Driver\Manager":0:{}';
var_dump(unserialize($str));

PHP does:

[PHP: 5.6.12-dev  ]
[HHVM: 3.9.0-dev]
derick@whisky:~/dev/php/10gen-labs-mongo-hhvm-driver-prototype $ php -n -dextension=mongodb.so /tmp/test.php

Warning: Erroneous data format for unserializing 'MongoDB\Driver\Manager' in /tmp/test.php on line 3
bool(false)

HHVM does:

derick@whisky:~/dev/php/10gen-labs-mongo-hhvm-driver-prototype $ hhvm -vDynamicExtensions.0=mongodb.so /tmp/test.php
object(MongoDB\Driver\Manager)#1 (0) {
}

An exception thrown in a constructor (or wakeup), should not result in a "valid" object, which subsequently crashes when using it, as it has not been setup correctly (hence the throwing of an exception):

Code:

<?php
$str = 'O:22:"MongoDB\Driver\Manager":0:{}';
$m = unserialize($str);

$rp = new MongoDB\Driver\ReadPreference(MongoDB\Driver\ReadPreference::RP_PRIMARY);
$m->selectServer( $rp );

PHP:

derick@whisky:~/dev/php/10gen-labs-mongo-hhvm-driver-prototype $ php -n -dextension=mongodb.so /tmp/test.php

Warning: Erroneous data format for unserializing 'MongoDB\Driver\Manager' in /tmp/test.php on line 3

Fatal error: Call to a member function selectServer() on boolean in /tmp/test.php on line 6

HHVM:

derick@whisky:~/dev/php/10gen-labs-mongo-hhvm-driver-prototype $ hhvm -vDynamicExtensions.0=mongodb.so /tmp/test.php
Segmentation fault

So how do I make it not return a "valid" object, but instead return "false", like PHP does?

@paulbiss
Copy link
Contributor

Looking at the serialize/unserialize code I think what you actually want to do is throw an Exception object, as this is what we catch in order to raise a notice and return false in unserialize_ex.

throw Exception("Unserialization of MongoDB\\Driver\\Manager is not allowed");                                                             

@derickr
Copy link
Contributor Author

derickr commented Aug 20, 2015

Okay - there are actually two things at play here:

  1. The serialization and deserialization handlers. Your first hint, and my implementation: throw Object(SystemLib::AllocExceptionObject("Serialization of MongoDB\\Driver\\Manager is not allowed")); actually works just fine. I'll be adding that to the cookbook. Your second suggestion (throw Exception("Unserialization of MongoDB\\Driver\\Manager is not allowed");) makes HHVM throw a fatal error instead of a real exception, which I can't catch. I don't think this very useful.
  2. The deserialization of "broken" strings is where the real problem is, so let's focus on that. I have (I will) amend the title of this issue with this.

Code:

<?php
$str = 'O:22:"MongoDB\Driver\Manager":0:{}';
$m = unserialize($str);
var_dump($m);

PHP throws a warning, and returns "NULL" while deserialization of a broken string:

derick@whisky:~/dev/php/10gen-labs-mongo-hhvm-driver-prototype $ php -n -dextension=mongodb.so /tmp/test.php 

Warning: Erroneous data format for unserializing 'MongoDB\Driver\Manager' in /tmp/test.php on line 3
bool(false)

HHVM doesn't throw a warning, and returns an improperly initialised object:

object(MongoDB\Driver\Manager)#1 (0) {
}

Making use of this improperly initialized object makes things crash - because the constructor would have filled in some internal structures. This is what needs fixing in HHVM. It should ideally throw a warning, but certainly not return a pseudo-valid object.

@derickr derickr changed the title Incompatibility with denying serialization/unserialization of built-in classes HHVM incorrectly returns a "valid" (but not initialized) object from a broken serialization string Aug 20, 2015
@paulbiss
Copy link
Contributor

So, the way you're doing things for serialize sounds right. What I don't follow is how throwing an Exception in unserialize is resulting in a fatal. We catch the two separately and an Exception should trigger a notice and return false when unserializing:

try {
v = vu.unserialize();
} catch (FatalErrorException &e) {
throw;
} catch (Exception &e) {
raise_notice("Unable to unserialize: [%s]. %s.", str,
e.getMessage().c_str());
return false;
}

@derickr
Copy link
Contributor Author

derickr commented Aug 21, 2015

I used GDB in anger this morning, and I found the following while going through

if (cls) {
// Only unserialize CPP extension types which can actually
// support it. Otherwise, we risk creating a CPP object
// without having it initialized completely.
if (cls->instanceCtor() && !cls->isCppSerializable()) {
assert(obj.isNull());
throw_null_pointer_exception();
} else {
obj = Object{cls};
if (UNLIKELY(collections::isType(cls, CollectionType::Pair) &&
(size != 2))) {
throw Exception("Pair objects must have exactly 2 elements");
}
}
} else {
obj = Object{SystemLib::s___PHP_Incomplete_ClassClass};
obj->o_set(s_PHP_Incomplete_Class_Name, clsName);
}
assert(!obj.isNull());
self = obj;
if (size > 0) {
// Check stack depth to avoid overflow.
check_recursion_throw();
if (type == 'O') {
// Collections are not allowed
if (obj->isCollection()) {
throw Exception("%s does not support the 'O' serialization "
"format", clsName.data());
}
Variant serializedNativeData = init_null();
bool hasSerializedNativeData = false;
/*
Count backwards so that i is the number of properties
remaining (to be used as an estimate for the total number
of dynamic properties when we see the first dynamic prop).
see getVariantPtr
*/
for (int64_t i = size; i--; ) {
Variant v;
unserializeVariant(v, uns, UnserializeMode::Key);
String key = v.toString();
int ksize = key.size();
const char *kdata = key.data();
int subLen = 0;
if (key == s_serializedNativeDataKey) {
unserializeVariant(serializedNativeData, uns);
hasSerializedNativeData = true;
} else if (kdata[0] == '\0') {
if (UNLIKELY(!ksize)) {
raise_error("Cannot access empty property");
}
// private or protected
subLen = strlen(kdata + 1) + 2;
if (UNLIKELY(subLen >= ksize)) {
if (subLen == ksize) {
raise_error("Cannot access empty property");
} else {
throw Exception("Mangled private object property");
}
}
String k(kdata + subLen, ksize - subLen, CopyString);
Class* ctx = (Class*)-1;
if (kdata[1] != '*') {
ctx = Unit::lookupClass(
String(kdata + 1, subLen - 2, CopyString).get());
}
unserializeProp(uns, obj.get(), k, ctx, key, i + 1);
} else {
unserializeProp(uns, obj.get(), key, nullptr, key, i + 1);
}
if (i > 0) {
auto lastChar = uns->peekBack();
if ((lastChar != ';') && (lastChar != '}')) {
throw Exception("Object property not terminated properly");
}
}
}
// nativeDataWakeup is called last to ensure that all properties are
// already unserialized. We also ensure that nativeDataWakeup is
// invoked regardless of whether or not serialized native data exists
// within the serialized content.
if (obj->getAttribute(ObjectData::HasNativeData) &&
obj->getVMClass()->getNativeDataInfo()->isSerializable()) {
Native::nativeDataWakeup(obj.get(), serializedNativeData);
} else if (hasSerializedNativeData) {
raise_warning("%s does not expect any serialized native data.",
clsName.data());
}
} else {
assert(type == 'V' || type == 'K');
if (!obj->isCollection()) {
throw Exception("%s is not a collection class", clsName.data());
}
unserializeCollection(obj.get(), uns, size, type);
}
}
uns->expectChar('}');
if (uns->type() != VariableUnserializer::Type::DebuggerSerialize ||
(cls && cls->instanceCtor() && cls->isCppSerializable())) {
// Don't call wakeup when unserializing for the debugger, except for
// natively implemented classes.
obj->invokeWakeup();
}
check_request_surprise_unlikely();

The code never bothers calling the wakeup() that I define in my NativeData struct. I do need to define it though, because serialization does call sleep and without wakeup defined, I get the following assertion:

hhvm: /home/derick/dev/facebook-hhvm/hphp/runtime/vm/native-data.cpp:53: void HPHP::Native::registerNativeDataInfo(const HPHP::StringData*, size_t, HPHP::Native::NativeDataInfo::InitFunc, HPHP::Native::NativeDataInfo::CopyFunc, HPHP::Native::NativeDataInfo::DestroyFunc, HPHP::Native::NativeDataInfo::SweepFunc, HPHP::Native::NativeDataInfo::SleepFunc, HPHP::Native::NativeDataInfo::WakeupFunc): Assertion `(sleep == nullptr && wakeup == nullptr) || (sleep != nullptr && wakeup != nullptr)' failed.

However, it does call the HHVM_METHOD function __wakeup (

if (uns->type() != VariableUnserializer::Type::DebuggerSerialize ||
(cls && cls->instanceCtor() && cls->isCppSerializable())) {
// Don't call wakeup when unserializing for the debugger, except for
// natively implemented classes.
obj->invokeWakeup();
).

In the end, I have made it work with this:

class MongoDBDriverManagerData
{
    public:
…
        void wakeup (const Variant& context, ObjectData* obj) {
            std::cout << "NEVER GETS RUN\n";
            throw Object(SystemLib::AllocExceptionObject("Unserialization of MongoDB\\Driver\\Manager is not allowed"));
        }

        Variant sleep() const {
            throw Object(SystemLib::AllocExceptionObject("Serialization of MongoDB\\Driver\\Manager is not allowed"));
        }
<<__NativeData("MongoDBDriverManager")>>
class Manager {
    <<__Native>>
    function __wakeUp() : void;
}
void HHVM_METHOD(MongoDBDriverManager, __wakeup)
{       
    throw Object(SystemLib::AllocExceptionObject("Unserialization of MongoDB\\Driver\\Manager is not allowed"));
}

So there seem the be the following issues:

@derickr
Copy link
Contributor Author

derickr commented Aug 27, 2015

Can I please have a comment on both points?

@paulbiss
Copy link
Contributor

The native wakeup function is only called if the serialized string contains a native-data key ("\0native"). On the second point, HHVM raises an notice and returns false when it catches an exception in unserialize, see the link below.

} catch (Exception &e) {
raise_notice("Unable to unserialize: [%.1000s]. %s.", str,
e.getMessage().c_str());
return false;
}

@derickr
Copy link
Contributor Author

derickr commented Aug 28, 2015

| The native wakeup function is only called if the serialized string contains a native-data key ("\0native").

What's the point with that? How do I define a wake-up handler in my class then? Do I really have to use

void HHVM_METHOD(MongoDBDriverManager, __wakeup)

Even though there is a native method, which just doesn't happen to be called? I can't just add a "\0native" into my unserialized data to trigger the wakeup function on the ObjectData...

| On the second point, HHVM raises an notice and returns false when it catches an exception in unserialize, see the link below.

Yes, it does.

However, what it does not do is detect that the string: 'O:22:"MongoDB\Driver\Manager":0:{}' is not a valid serializable representation. In PHP, it triggers the code: https://github.com/php/php-src/blob/master/ext/standard/var_unserializer.re#L450-L468

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants