MAINT: remove duplicated integer code #8856

juliantaylor · 2017-03-27T18:09:49Z

Our historical integer names contain a duplicate for example long and
longlong are int64 on 64 bit platforms or int and long are int32 on 32
bit platforms.
This causes quite significant code bloat, in particular for functions
compiled for multiple instruction sets.

This duplication can't be removed as it is part of our API but we can
avoid some of the bloat by only generating code for once the each real
type and correctly mapping them to the right slot in the ufuncs type
list.
This is achieved by defining appropriate macros for the current platform
and creating the correct function names via macro concatenation.

juliantaylor · 2017-03-27T18:11:57Z

this removes about 200KiB (~20%) code of the loops.o

eric-wieser · 2017-03-27T18:18:30Z

Looks pretty good to me.

This doesn't help with my changeset in #8853 though, which includes some function names with integer typecodes (bhilq). Any used of TypeDescription(..., FullTypeDescr, ...) in generate_umath.py with these arguments is likely to suffer from this.

What do you propose I do instead? Should we adjust Typedescription to take i8,i16 instead of bh?

eric-wieser · 2017-03-27T18:23:59Z

To be clear, what you have doesn't appear to break anything I want to do, but it doesn't go quite far enough to be used in those places too.

juliantaylor · 2017-03-27T18:29:51Z

NPY_FUNCNAME needs to be used in all places, shouldn't be an issue. Let's merge #8853 first.

charris · 2017-03-27T18:32:25Z

I'm not sure this is worth doing, as it complicates the code with no real gain except some smallering.

eric-wieser · 2017-03-27T18:43:27Z

Let's merge #8853 first.

Actually, I think that might have a bug, which this PR fixes. #8853 only adds handling for longlong - whereas what it actually cares about is int64

charris · 2017-03-27T18:50:53Z

This has been discussed before many years ago... My first choice if we go this way would be to eliminate long altogether.

njsmith · 2017-03-27T18:51:56Z

I think the goal is good, because we've seen multiple bugs due to code expecting longlong and getting int64 or similar. I haven't looked at the code yet though :-)

njsmith · 2017-03-27T18:57:42Z

Oh, I see, this only affects some deep internals, not the typecodes API. No strong opinion then.

eric-wieser · 2017-03-27T19:00:21Z

I'm a definite +1 on this if we can adjust how typecodes are resolved to function names with FullTypeDescr as well

Our historical integer names contain a duplicate for example long and longlong are int64 on 64 bit platforms or int and long are int32 on 32 bit platforms. This causes quite significant code bloat, in particular for functions compiled for multiple instruction sets. This duplication can't be removed as it is part of our API but we can avoid some of the bloat by only generating code once for each real type and correctly mapping them to the right slot in the ufuncs type list. This is achieved by defining appropriate macros for the current platform and creating the correct function names via macro concatenation.

eric-wieser · 2017-03-27T19:16:59Z

Would this work on platforms that don't have a 64-bit integer type? Wouldn't we end up referring to a type like np_int64 that doesn't exist?

juliantaylor · 2017-03-27T19:22:49Z

I don't think the new code is too bad and the duplication and the increased compilation time does bother me. Though an issue is that system macros can get in the way of the function name expansion, e.g. copysign is a macro on windows. This avoided by adding the underscore in the first NPY_FUNCNAME expansion but it remains a theoretical possibility.

This doesn't change the typecodes, while I'd love to change them I don't think we can, they are baked into our API.

I don't think we support systems without 64 bit integers at the compiler level, if we do that configuration likely hasn't been tested in 20 years.

eric-wieser · 2017-03-27T19:28:34Z

I don't think we support systems without 64 bit integers, if we do that configuration likely hasn't been tested in 20 years.

There's an alarming line then in npy_common of:

#if NPY_BITSOF_LONGLONG == 8

Note that C99 defines long long as at least 64 bits - but right now, we're coding to an older version of C.

juliantaylor · 2017-03-27T19:30:42Z

that code exists specifically to handle the issue that we don't know what size the types have, so we check them at compile time in a large ifdef. The exact same approach is used in this PR. It would also work for longlong = char should such a system exist (spoiler, it doesn't).

eric-wieser · 2017-03-27T19:32:37Z

This doesn't change the typecodes

I'm not proposing we change those - just change the mapping between typecodes and the function names used in loops.

The exact same approach is used in this PR. It would also work for longlong = char should such a system exist

Except it wouldn't in the context of loops.c.src. Whether or not UINT64 is defined by your code, we'll still loop over it in our repeat template, but now we'll output the type npy_uint64 which may not have been typedeffed, unlike npy_longlong which always exists. Our templates need to only iterate over the types that exist on the system.

More importantly, this doesn't define a UINT128 type for the case when sizeof(long long) = 16

charris · 2017-03-27T19:37:27Z

My compiles are pretty quick as is. The question is simplicity and maintenance, especially for new folks who aren't familiar with the code.

Note that compile times on my system are significantly faster if I delete previous builds. That is weird...

juliantaylor · 2017-03-27T19:37:42Z

ah yes you are correct there, still it shouldn't really matter in practice. The situation would also be easily fixable with an ifdef.

eric-wieser · 2017-03-27T19:39:26Z

I think suddenly not having any loops for UINT128 is a thing that matters in practice ;)

njsmith · 2017-03-27T22:49:48Z

We have to continue to accept all old typecodes so long as we maintain ABI compatibility, but we could normalize them internally and on output. We could even make the typecodes numerically identical so new source builds always use the normalized codes regardless of how they spell them.

charris · 2020-12-28T19:47:42Z

Where does this stand in light of current and ongoing work.

charris · 2021-04-21T22:46:23Z

@eric-wieser Still interested in this?

charris · 2021-04-21T23:06:29Z

This looks reasonable to me. We could also use the integer types in stdint.h now that we require C99.

mattip · 2022-06-15T16:36:46Z

I am going to close this. At some point we could remove the redundant int types across the board, and this PR could point the way but I think the conflicts indicate the code has changed significantly since this was proposed. If someone disagrees, they can reopen or resubmit the changes as a new PR and champion getting it updated and reviewed.

eric-wieser added 03 - Maintenance component: numpy._core labels Mar 27, 2017

juliantaylor force-pushed the remove-int-dups branch from 82e9611 to 2f7c97c Compare March 27, 2017 19:07

eric-wieser mentioned this pull request Dec 12, 2017

Enumerated Types documentation #9974

Open

Base automatically changed from master to main March 4, 2021 02:03

charris added the triage review Issue/PR to be discussed at the next triage meeting label Jun 10, 2022

mattip removed the triage review Issue/PR to be discussed at the next triage meeting label Jun 15, 2022

mattip closed this Jun 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: remove duplicated integer code #8856

MAINT: remove duplicated integer code #8856

juliantaylor commented Mar 27, 2017

juliantaylor commented Mar 27, 2017

eric-wieser commented Mar 27, 2017 •

edited

eric-wieser commented Mar 27, 2017

juliantaylor commented Mar 27, 2017

charris commented Mar 27, 2017

eric-wieser commented Mar 27, 2017 •

edited

charris commented Mar 27, 2017

njsmith commented Mar 27, 2017

njsmith commented Mar 27, 2017

eric-wieser commented Mar 27, 2017 •

edited

eric-wieser commented Mar 27, 2017

juliantaylor commented Mar 27, 2017 •

edited

eric-wieser commented Mar 27, 2017

juliantaylor commented Mar 27, 2017

eric-wieser commented Mar 27, 2017 •

edited

charris commented Mar 27, 2017

juliantaylor commented Mar 27, 2017 •

edited

eric-wieser commented Mar 27, 2017

njsmith commented Mar 27, 2017

charris commented Dec 28, 2020

charris commented Apr 21, 2021

charris commented Apr 21, 2021

mattip commented Jun 15, 2022

MAINT: remove duplicated integer code #8856

MAINT: remove duplicated integer code #8856

Conversation

juliantaylor commented Mar 27, 2017

juliantaylor commented Mar 27, 2017

eric-wieser commented Mar 27, 2017 • edited

eric-wieser commented Mar 27, 2017

juliantaylor commented Mar 27, 2017

charris commented Mar 27, 2017

eric-wieser commented Mar 27, 2017 • edited

charris commented Mar 27, 2017

njsmith commented Mar 27, 2017

njsmith commented Mar 27, 2017

eric-wieser commented Mar 27, 2017 • edited

eric-wieser commented Mar 27, 2017

juliantaylor commented Mar 27, 2017 • edited

eric-wieser commented Mar 27, 2017

juliantaylor commented Mar 27, 2017

eric-wieser commented Mar 27, 2017 • edited

charris commented Mar 27, 2017

juliantaylor commented Mar 27, 2017 • edited

eric-wieser commented Mar 27, 2017

njsmith commented Mar 27, 2017

charris commented Dec 28, 2020

charris commented Apr 21, 2021

charris commented Apr 21, 2021

mattip commented Jun 15, 2022

eric-wieser commented Mar 27, 2017 •

edited

eric-wieser commented Mar 27, 2017 •

edited

eric-wieser commented Mar 27, 2017 •

edited

juliantaylor commented Mar 27, 2017 •

edited

eric-wieser commented Mar 27, 2017 •

edited

juliantaylor commented Mar 27, 2017 •

edited