Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py/runtime: Avoid crash on calling members of uninitialized native type. #9997

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

laurensvalk
Copy link
Contributor

@laurensvalk laurensvalk commented Nov 17, 2022

When subclassing a native type, calling native members in __init__ before super().__init__() has been called could cause a crash. In this situation, self in mp_convert_member_lookup is the native_base_init_wrapper_obj. This check ensures that a TypeError is raised when this happens.

Also fix a typo in a related comment.


This is an attempt at fixing the following crash in Pybricks MicroPython

from pybricks.parameters import Port
from pybricks.pupdevices import Motor  # Builtin type

class NewMotor(Motor):

    def __init__(self, port):

        # Would crash. After this PR it raises a TypeError.
        print(self.angle())

        super().__init__(port)

        # Would work.
        print(self.angle())

motor = NewMotor(Port.A)

EDIT: This isn't quite the right solution as seen in the tests, so I'll have to have another look.

This variant appears to work without breaking the other tests, but still needs some cleaning up:

diff --git a/py/objtype.c b/py/objtype.c
index e0753aace..101a82beb 100644
--- a/py/objtype.c
+++ b/py/objtype.c
@@ -91,7 +91,7 @@ STATIC mp_obj_t native_base_init_wrapper(size_t n_args, const mp_obj_t *args) {
     self->subobj[0] = native_base->make_new(native_base, n_args - 1, 0, args + 1);
     return mp_const_none;
 }
-STATIC MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(native_base_init_wrapper_obj, 1, MP_OBJ_FUN_ARGS_MAX, native_base_init_wrapper);
+MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(native_base_init_wrapper_obj, 1, MP_OBJ_FUN_ARGS_MAX, native_base_init_wrapper);
 
 #if !MICROPY_CPYTHON_COMPAT
 STATIC
diff --git a/py/runtime.c b/py/runtime.c
index 5a7474fba..62254af0d 100644
--- a/py/runtime.c
+++ b/py/runtime.c
@@ -1111,13 +1111,13 @@ void mp_convert_member_lookup(mp_obj_t self, const mp_obj_type_t *type, mp_obj_t
                 #endif
                 else {
                     // Return a (built-in) bound method, with self being this object.
-                    #if MICROPY_BUILTIN_METHOD_CHECK_SELF_ARG
-                    // Ensures that self is not uninitialized object.
-                    dest[0] = mp_obj_new_checked_fun(type, member);
-                    #else 
                     dest[0] = member;
-                    #endif
                     dest[1] = self;
+                    extern const mp_obj_fun_builtin_var_t native_base_init_wrapper_obj;
+                    if (self == MP_OBJ_FROM_PTR(&native_base_init_wrapper_obj)) {
+                        // object not initialized yet
+                        mp_raise_msg(&mp_type_RuntimeError, MP_ERROR_TEXT("call super().__init__() first"));
+                    }
                 }
             } else {
                 // Return a bound method, with self being this object.

@dlech
Copy link
Sponsor Contributor

dlech commented Nov 17, 2022

I added a CPython diff for this. It works with the proposed revised patch.
image

We also need to add some tests to get code coverage. For example, in addition to one like the cpydiff that calls a method, we should check cases where MP_TYPE_FLAG_BUILTIN_FUN is not set, like operators and non-function attributes.

@laurensvalk
Copy link
Contributor Author

laurensvalk commented Nov 21, 2022

Updated now with a variant that does not fail the tests. I've also added a test to reproduce this (relies on Timer class from #10038).

But this variant only checks for this crash for lookups in locals_dict. It still needs some generalization to get this right for other lookups like attr, but without repeating the same check multiple times.

@laurensvalk laurensvalk marked this pull request as draft November 21, 2022 20:47
@laurensvalk
Copy link
Contributor Author

Now that the timer test has been merged, the bug in question can be reproduced as follows:

from cexample import Timer

class BadTimer(Timer):
    def __init__(self):
        print(self.time())   # Reads from arbitrary memory. More complex funcs will segfault.
        super().__init__(self)

watch = BadTimer()

image

I'm basically looking for the right (and minimal) place to ensure we don't attempt any bad lookups in the native_base_init_wrapper_obj object.

This adds a separate AdvancedTimer class that demonstrates
a few more advanced concepts usch as custom handlers for
printing and attributes.

Signed-off-by: Laurens Valk <laurens@pybricks.com>
@github-actions
Copy link

github-actions bot commented Dec 6, 2022

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 

@laurensvalk
Copy link
Contributor Author

I think this is ready for review now.

I have expanded the native class example with a few more features. This is useful for the example in itself, but also helps build the test case for this PR.

This ensures we can also test that calls to .attr() don't crash (they don't).

A few considerations:

  • Do we only want to check this if MICROPY_BUILTIN_METHOD_CHECK_SELF_ARG or always?
  • What's the best exception type? I had RuntimeError initially, but I changed it to AttributeError after seeing that this is already used for trying to access attributes which are not ready at that point.

laurensvalk and others added 2 commits December 6, 2022 10:29
When subclassing a native type, calling native members in `__init__`
before `super().__init__()` has been called could cause a crash. In
this situation, `self` in `mp_convert_member_lookup` is the
`native_base_init_wrapper_obj`. This check ensures that an
AttributeError is raised before this happens, which is consistent with
other failed lookups.

Also fix a typo in a related comment.

Signed-off-by: Laurens Valk <laurens@pybricks.com>
This adds a CPython diff that explains why calling super().__init__()
is required in MicroPython when subclassing a native type (because
__new__ and __init__ are not separate functions).

Signed-off-by: David Lechner <david@pybricks.com>
@laurensvalk laurensvalk marked this pull request as ready for review December 6, 2022 09:32
@github-actions
Copy link

github-actions bot commented Dec 6, 2022

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 

#if MICROPY_BUILTIN_METHOD_CHECK_SELF_ARG
if (obj_obj == MP_OBJ_FROM_PTR(&native_base_init_wrapper_obj)) {
// But we shouldn't attempt lookups on object that is not yet instantiated.
mp_raise_msg(&mp_type_AttributeError, MP_ERROR_TEXT("call super().__init__() first"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be possible to just fix it and make it work like CPython (as much as that's possible without having both __new__ and __init__ in MicroPython) by doing this:

if (obj_obj == ...) {
    native_base_init_wrapper(0, NULL);
    obj_obj = obj->subobj[0];
}

That will call __init__ automatically and reload the newly-created native instance.

Then I wouldn't wrap it in MICROPY_BUILTIN_METHOD_CHECK_SELF_ARG, just do it unconditionally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(if you do this, please don't remove the C example module or tests, they are good!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that segfault (null dereference)? I started digging based on your suggestion. Perhaps did you mean something like this? This seems to work, with no AttributeError required, so closer to CPython indeed.

diff --git a/py/objtype.c b/py/objtype.c
index a202b023b..f5100e239 100644
--- a/py/objtype.c
+++ b/py/objtype.c
@@ -170,12 +170,12 @@ STATIC void mp_obj_class_lookup(struct class_lookup_data *lookup, const mp_obj_t
                     if (obj != NULL && mp_obj_is_native_type(type) && type != &mp_type_object /* object is not a real type */) {
                         // If we're dealing with native base class, then it applies to native sub-object
                         obj_obj = obj->subobj[0];
-                        #if MICROPY_BUILTIN_METHOD_CHECK_SELF_ARG
                         if (obj_obj == MP_OBJ_FROM_PTR(&native_base_init_wrapper_obj)) {
-                            // But we shouldn't attempt lookups on object that is not yet instantiated.
-                            mp_raise_msg(&mp_type_AttributeError, MP_ERROR_TEXT("call super().__init__() first"));
+                            // Attempted lookup on uninstantiated native object, so instantiate now.
+                            mp_obj_t args[] = {obj};
+                            native_base_init_wrapper(1, args);
+                            obj_obj = obj->subobj[0];
                         }
-                        #endif // MICROPY_BUILTIN_METHOD_CHECK_SELF_ARG
                     } else {
                         obj_obj = MP_OBJ_FROM_PTR(obj);
                     }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's right!

(We had a fairly long discussion around this today and wrapped up the conclusion a bit quickly :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the instant feedback. While updating the tests, I noticed this would cause make_new to be called twice:

class SubTimer(AdvancedTimer):
    def __init__(self):

        # At this point, self does not yet represent a AdvancedTimer instance.
        print(self)  # <function>

        print(self.time())   # <--- with proposed variant, this will trigger make_new instead of raising AttributeError

        # Initialize base class.
        super().__init__(self)  # <---- But then this calls make_new again

watch = SubTimer()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's similar to how CPython works: CPython can init a type (eg list) multiple times if you call __init__ multiple times.

So I think it's OK that MicroPython does that. You just have to be aware of it.

Copy link
Contributor Author

@laurensvalk laurensvalk Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior is still a bit odd (and different from CPython). Take this slightly more elaborate class, which takes a parameter to __init__:

from pybricks.parameters import Port
from pybricks.iodevices import PUPDevice

class NewSensor(PUPDevice):

    def __init__(self, port):

        # This would now try to call make_new under the hood, but it expects
        # an argument, so raises TypeError: 'port' argument required
        print(self.info())

        # Never gets here.
        super().__init__(port)

sensor = NewSensor(Port.A)

I suppose I would OK with that (anything that doesn't segfault is fine), but I just wanted to make sure. We can add an updated CPython difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed exactly this scenario... It makes sense to me that this should fail, but I agree that the error message is confusing because it looks like info() is missing the port argument.

What's the CPython behaviour that you'd expect here? (I'm struggling to think of an example where this could come up with a built-in type. It is different to the behaviour of a non-native base of course)

Note there's another CPython diff here which is that if you don't explicitly call super().__init__ and you have a native base, then by default we pass the derived type's __init__ args to the native base. (CPython only does this when the base is tuple and frozenset, perhaps others).

class MyList(list):
    def __init__(self, a):
        pass

class MyTuple(tuple):
    def __init__(self, a):
        pass

print(MyList([1,2,3]))  # CPython: []. MicroPython: [1,2,3]
print(MyTuple([1,2,3])) # Both: (1,2,3)

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MiroPython's make_new is really the equivalent of __new__ without __init__ in CPython. This is why immutable types like tuple have the different behavior as above - since they are immutable, all of the "init" has to be done in __new__. So I don't think we should be calling make_new more than once, otherwise it forces all implementations to have to be aware of which call it is (if first call treat as __new__, otherwise treat as __init__) or risk leaking resources or other unexpected behaviors (e.g. if make_new opens a file descriptor, then is called again, that file descriptor has to be closed, otherwise it is leaked with no way to close it later).

Copy link
Sponsor Contributor

@dlech dlech Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we implement something like this?

  • If self is accessed in an __init__ overload of a native type and super.__init__() was not called yet, then call make_new with the args of the overload.
  • If super().__init__() is called after make_new was called because of the condition above, raise an exception.
  • Otherwise, if super.__init__() is called before something triggered the call to make_new, then call make_new with the args from super.__init__().
  • Finally, if nothing triggered make_new before the __init__ overload returns, call make_new with the args from the overload.

The recommendation in the MicroPython docs should be to call super().__init__() as the first line of any __init__ overload of a native type (or at least before using self). But at least in other cases, it might work as expected or at least not segfault.

@projectgus
Copy link
Contributor

This is an automated heads-up that we've just merged a Pull Request
that removes the STATIC macro from MicroPython's C API.

See #13763

A search suggests this PR might apply the STATIC macro to some C code. If it
does, then next time you rebase the PR (or merge from master) then you should
please replace all the STATIC keywords with static.

Although this is an automated message, feel free to @-reply to me directly if
you have any questions about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants