Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Introduce macros to allocate C++ classes on gc_heap #8691

Open
Gadgetoid opened this issue May 23, 2022 · 8 comments
Open

Proposal: Introduce macros to allocate C++ classes on gc_heap #8691

Gadgetoid opened this issue May 23, 2022 · 8 comments

Comments

@Gadgetoid
Copy link
Contributor

One of the biggest issues we face building our batteries-included MicroPython port is out-of-heap memory usage.

A trivial C++ class that allocates some amount of memory for a buffer or lookup table would typically be allocated outside of gc_heap. This region is very limited (roughly 9% of RP2040's available RAM) and would very quickly cause OOM errors invisible to MicroPython. Not to mention this resource is not tracked by MicroPython at all, so it's not possible for an end-user to manage RAM usage carefully.

Recently I rediscovered C++'s "placement new". This is a variation on the "new" syntax that allocates a class in a given region of memory. Combining this with m_new in the form: new(m_new(Class, 1)) Class(...) allocates a C++ class on the gc_heap where it's accountable to MicroPython's GC.

A pointer to this class is then stored within the type struct:

typedef struct _some_class_obj_t {
    mp_obj_base_t base;
    Class *ptr;
} some_class_obj_t;

So that it's reachable via gc during the lifetime of the parent MicroPython object.

I don't know how popular writing MicroPython extensions in C++ outside of our project is, but I propose at least the following as part of a C++-specific header to encourage more robust allocation of C++ classes:

#include <new>
#define m_new_class(cls, ...) new(m_new(cls, 1)) cls(__VA_ARGS__)
#define m_del_class(cls, ptr) ptr->~cls();m_del(cls, ptr, 1)

Is there a better approach to this problem? There are other forms of new (ie: new uint8_t[n]) which would not be covered by these very specific functions. However I believe those are somewhat covered by the basic m_new and m_del.

A real-world (abridged) example:

#define MP_OBJ_TO_PTR2(o, t) ((t *)(uintptr_t)(o))

typedef struct _Hub75_obj_t {
    mp_obj_base_t base;
    Hub75* hub75;  // Pointer to class instance for GC to follow
} _Hub75_obj_t;

mp_obj_t Hub75_make_new(const mp_obj_type_t *type, size_t n_args, size_t n_kw, const mp_obj_t *all_args) {
    ... <snip> ...

    hub75_obj = m_new_obj_with_finaliser(_Hub75_obj_t); // Parent object
    hub75_obj->base.type = &Hub75_type;
    hub75_obj->hub75 = m_new_class(Hub75, width, height, buffer, paneltype, stb_invert); // Child C++ class

    return MP_OBJ_FROM_PTR(hub75_obj);
}

mp_obj_t Hub75___del__(mp_obj_t self_in) {
    _Hub75_obj_t *self = MP_OBJ_TO_PTR2(self_in, _Hub75_obj_t);
    self->hub75->stop(dma_complete);
    m_del_class(Hub75, self->hub75);
    return mp_const_none;
}

Note, there is no urgency to this request- I merely raise it here to give us a chance to discuss and establish best practice (or for someone to inevitably point me to the canonical solution to this problem which I have overlooked). Presently we just include these macros in our modules directly.

For the interminably curious you can find the PR against our pimoroni-pico batteries-included-MicroPython-for-RP2040 repository that introduces placement new here: pimoroni/pimoroni-pico#365

@dpgeorge
Copy link
Member

A pointer to this class is then stored within the type struct:

This seems reasonable if the C++ instance is intended to be wrapped within a C-based uPy object, which it looks like it is. Then the GC can manage both the C uPy instance and the C++ instance, and free them both when they are no longer reachable.

But:

mp_obj_t Hub75___del__(mp_obj_t self_in) {
    _Hub75_obj_t *self = MP_OBJ_TO_PTR2(self_in, _Hub75_obj_t);
    self->hub75->stop(dma_complete);
    m_del_class(Hub75, self->hub75);
    return mp_const_none;
}

This is wrong: you cannot call gc_free(ptr) from a finaliser, because the GC is not reentrant. It's luckily just a no-op (gc_free() will just silently return if the GC is running), but better not to call it. Instead just rely on the GC to reclaim this pointer, which it will do as part of this GC sweep (since it's calling the finaliser it should also see that all sub-pointers within this instance are unreachable).

OTOH, if you don't want to have the C++ instance as part of a C uPy object then you can use the new m_tracked_calloc/free functions. But memory allocated that way requires explicit free'ing.


#include <new>
#define m_new_class(cls, ...) new(m_new(cls, 1)) cls(__VA_ARGS__)
#define m_del_class(cls, ptr) ptr->~cls();m_del(cls, ptr, 1)

These macros seem reasonable, and having them part of the core would save other C++ devs time trying to work them out.

@stinos do you have any comment on this?

@Gadgetoid
Copy link
Contributor Author

Aha! Thank you for the heads up regarding the finaliser issue. I see exactly what you mean.

In that case I suspect it would suffice just to ptr->~Cls() to call the class delete method, but keep m_del_class as-is for explicitly freeing if you need to bail out of a constructor cleanly. Assuming GC wont just handle that for you anyway.

Luckily there are only a few instances of finalisers using this macro which I can clean up.

Tracked alloc could be very useful for module lifetime allocations, but I'm trying to move us away from modules where classes make sense. I think there are a few places I might eventually deploy it. One step at a time!

Thank you for your feedback!

@stinos
Copy link
Contributor

stinos commented May 30, 2022

Then the GC can manage both the C uPy instance and the C++ instance, and free them both when they are no longer reachable.

I guess this works, but I'm not sure if this is beneficial over using the C++ heap. I mean you can just do obj->hub75 = new Hub75() and then delete obj->hub75 in the __del__ implementation. No macros, clearer and, perhaps oversimplified but maybe more performant as it means less allocations on the uPy heap? But perhaps I'm missing something.

@ZodiusInfuser
Copy link

The issue with that is that the C++ heap is tiny compared to the uPy one, and causes a hardlock if overfilled. We know this because it's how we did our C++ classes for the past year. Allocating on the uPy heap means that the GC is aware of the usage, and if there's not enough memory available, will return an error to the user.

@dpgeorge
Copy link
Member

Is it possible to increase the size of the C++ heap?

I guess the main benefit of a unified heap, using the uPy GC heap for both uPy and C++ objects, is that you don't have to decide at build time how to split the RAM between the two.

@stinos
Copy link
Contributor

stinos commented May 31, 2022

Unless you globally override all new and delete operators to forward to uPy allocation, the C++ heap will still be used though. But maybe that works out for you?

@ZodiusInfuser
Copy link

I suspect this issue is most apparent with our batteries-included MicroPython.

In it we have several C++ classes that contain rather large buffers (10s of K in some cases https://github.com/pimoroni/pimoroni-pico/blob/99672e27609634b57bfdd4899d3b101cd40babfc/drivers/pwm/pwm_cluster.hpp#L111). Originally we just allocated these classes on the C++ heap using new but quickly ran into hardlock situations where that heap would get filled, either by the user creating too many objects or by objects from a previous program run staying around because the GC had yet to do a collection.

Is it possible to increase the size of the C++ heap?

Increasing the share of the heap available to C++ helped, but only delayed the inevitable hardlocks.

As a workaround for this, we changed our most problematic classes to have additional constructors that would accept pointers to buffers created outside of themselves. Our uPy bindings then allocated byte arrays on the uPy GC heap and passed them in, storing pointers in the C uPy instance. This solution fixed a lot of our issues, as the GC was now better aware of the real memory size of our classes.

We still had some classes that used many smaller variables that this technique was impractical for, but they were small enough that hardlocks would only occur if someone were to intentionally create many hundreds in a loop so we quietly ignored it.

Then @Gadgetoid stumbled across C++'s "placement new", which lets us entirely allocate our C++ classes on the uPy GC heap, and return the buffers to being regular class members. We have now been trialling this in our builds and it has been working great so think it will be of benefit to any other devs who may be doing similar batteries-included style builds.

Unless you globally override all new and delete operators to forward to uPy allocation, the C++ heap will still be used though. But maybe that works out for you?

You are correct, what we are suggesting only deals with the initial construction of these uPy wrapped C++ objects, not anything that is allocated during the existence of such objects. In our batteries-included build though, pretty much all our C++ classes just contain variables and buffers needed for their lifetime, so this initial construction has been our main problem.

For those few points in our code where a variable is dynamically allocated, the C++ heap should be sufficient now the bulk of the memory is handled by uPy, and now we are better aware of the system we can strive to move such things over to the uPy GC heap in the future.

I hope that helps explain our reasoning for this PR.

@Gadgetoid
Copy link
Contributor Author

Most general runtime allocation (variables, little buffers and such) is stack and not heap afaik so we broadly survive by the skin of our teeth. But yes, MicroPython bindings that bind classes which can use arbitrary amounts of C stack/heap are always going to be something we have to tiptoe around.

Nonetheless wrapping placement new so that a class with known buffer sizes gets allocated into gc_heap is very much preferable to the defensive programming needed to allocate, pass and track buffers with m_new. This is not something we need to upstream, but I thought it at least worth raising.

Buffers with sizes not known at compile time still need to be allocated manually, but these are generally quite a lot easier to reason about. IE: for PicoGraphics we need a framebuffer of fairly arbitrary size depending on the configured screen width/height/pixel format etc.

Actually properly unifying the heap could be a much better answer, though I must admit it falls a little beyond the scope of my understanding at the moment. m_new_class is a rather clumsy attempt at this, and it does work very well for us. I'm aware that accepting such macros into MicroPython may amount to endorsing our bad design decisions.

Increasing the C++ heap would necessitate decreasing the gc_heap, which is fixed to 192K in main.c. I have actually produced builds with this changed to tackle these problems, but I think better tracing of C++ module memory usage in gc_heap is the more correct answer.

Obviously our single biggest problem is that we're taking significant liberties with MicroPython by shipping so many C++ modules, so I am not unsympathetic of any push back here! I don't think I've made it a secret to anyone that I'd love to have a clean and consistent suite of MicroPython drivers for all of our sensors. Though some of the stuff we're doing really... just is better handled as wrapped C++.

Appreciate the feedback so far!

RetiredWizard pushed a commit to RetiredWizard/micropython that referenced this issue Dec 30, 2023
…ker-zero-samd21-8.2.x

Submit a PR based on branch 8.2.x to remove Cytron Maker Zero SAMD21 from repo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants