Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: py/objnone: Add mp_get_none() helper. #5314

Closed
wants to merge 1 commit into from

Conversation

Jongy
Copy link
Contributor

@Jongy Jongy commented Nov 10, 2019

mp_const_none is widely used throughout the project, in many many different functions. Ideally, this constant should be loaded as easily as possible in different architectures (i.e, fit in different "load immediate" opcodes).
But that's not the case at the moment - mp_const_none is an object in the data section, and e.g in the case of ARM/Thumb, we get the ldr r0, pc, ...; .word mp_const_none_obj sequence. Sometimes the linker manages to merge multiple immediates as such, but not always.
In order to make the loading of it easier, we can either make the value a small immediate like MP_OBJ_NULL (and perhaps translate it back to the pointer in a common spot, mp_call_function*). Or we can add a helper that'll retrieve the value, and that's what I'm proposing here.

TL;DR by changing all return mp_const_none; in the project to return mp_get_none(); I managed to get a 500~ bytes decrease in nrf, 200~ in stm32 and cc3200 and lower diffs in others.
Now, that's not optimized - not all accesses should be replaced with the helper call; In the case of ARM, only when it'll save the inclusion of the "load immediate" constant. I'm sure we can get better results if only selected call sites will be modified.
The cool thing with this is that it can be easily disabled based on a config - just #define mp_get_none() mp_const_none and it's done. So ports which are affected badly by this (e.g esp32) can opt-out.

What do you think?

@dpgeorge
Copy link
Member

In order to make the loading of it easier, we can either make the value a small immediate like MP_OBJ_NULL (and perhaps translate it back to the pointer in a common spot, mp_call_function*).

I think that would be difficult to get right, because there are lots of uses which are not return mp_const_none (eg storing it into a list/tuple). But it could lead to quite good code size reduction, (and maybe even improved efficiency). Eg:

#define mp_const_none ((mp_obj_t)0x100)
#define mp_const_false ((mp_obj_t)0x200)
#define mp_const_true ((mp_obj_t)0x300)

Then a few macros to help test for these values. Probably a fair amount of work to get this working, and would potentially break a lot of existing code.

I'm sure we can get better results if only selected call sites will be modified.

That would be hard because it'd depend on the architecture, and the compiler version, optimisation, etc.

The cool thing with this is that it can be easily disabled based on a config

That is good!

What do you think?

Also need to test for change in performance (just for informative purposes to see if it has a big/little/no impact), and also C stack usage (see -fstack-usage). If it can be shown to decrease code, not increase C stack usage and not be too messy (in terms of code readability), then it might be a useful change, and ports can opt in if they want to reduce code size.

@Jongy
Copy link
Contributor Author

Jongy commented Nov 11, 2019

The #define mp_const_none ((mp_obj_t)0x100) ... approach would probably be even better. It will "just work" whenever mp_obj_t's are compared to mp_const_none, but when they are used just as objects it will require to translate that value to the actual address...

Unless, of course, we can force the linker into that :) I'll look into it.

About the current approach - I'll provide more info about performance on relevant ports.

@dpgeorge
Copy link
Member

TL;DR by changing all return mp_const_none; in the project to return mp_get_none(); I managed to get a 500~ bytes decrease in nrf, 200~ in stm32 and cc3200 and lower diffs in others.

A simpler way to do the same thing with only a few lines changed would be (in py/obj.h):

#if MICROPY_REDUCE_CONST_LOAD_SIZE
mp_obj_t mp_const_none_fun(void); // define in obj.c
#define mp_const_none mp_const_none_fun()
#else 
#define mp_const_none (MP_OBJ_FROM_PTR(&mp_const_none_obj))
#endif 

Could also try doing this for True and False consts.

@Jongy
Copy link
Contributor Author

Jongy commented Nov 20, 2019

A simpler way to do the same thing with only a few lines changed would be (in py/obj.h):

Yeah, but as I said earlier, I think it should be decided on a per-callsite basis. That's why I refrained from changing the globally used macro.

Actually my original intention here was to improve all functions that return mp_const_none; - instead, they would return mp_get_none(); and that'll be optimized as a tail-call. But apparently TCO is not that useful on thumb as I expected, for 2 reasons:

  1. The pop {...., pc} opcodes are usually 2 bytes, and there's no 2 bytes version of pop {...., lr}. These are usually 4 bytes.
  2. The call itself is emitted as a long call, so that's another 4 bytes instead of 2.

So, while changing some call sites is an improvement, changing others actually does harm in this regard.

If I'd go over them and select manually, will this change be accepted, despite the slight performance hit?

@dpgeorge
Copy link
Member

If I'd go over them and select manually, will this change be accepted, despite the slight performance hit?

I don't think that's a great idea because it's not very maintainable: the change/optimisation depends strongly on architecture and compiler. Different archs/compilers would likely require a different set to be changed/not changed.

I do like the approach in #5320 better.

@Jongy
Copy link
Contributor Author

Jongy commented Nov 21, 2019

Yeah, I agree on the maintainability part. Manual selection is okay a single time, but it's too much to keep up when things change.

Well, on some ports this is effective even when replacing everything with a call.

I also prefer the #5320 better so let's finish with that one and then see what can be done here, possibly for nrf.

@dpgeorge
Copy link
Member

See #5429 for an alternative approach that makes None/False/True small immediate values.

@dpgeorge
Copy link
Member

Superseded by #5429

@dpgeorge dpgeorge closed this Jan 12, 2020
tannewt pushed a commit to tannewt/circuitpython that referenced this pull request Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants