-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer Type System: Refactor #337
Conversation
66d2f5b
to
b1aefcf
Compare
@franzpoeschel I am sorry, but this PR will affect your current json backend please feel free to raise any questions in issues that might occur from this change! |
8cf1fa1
to
46eef88
Compare
Oh yes, I'm excited 😵 |
46eef88
to
56f91d6
Compare
case DT::UINT64: | ||
case DT::VEC_UINT64: | ||
case DT::ULONG: | ||
case DT::VEC_ULONG: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're very likely going to have problems with ADIOS in this regard.
From what I can see, they do NOT make the destinction that causes problems with OSX and MSVC. Rather, they treat the bit-widths as I did initally (see adios_types.h
, which explicitly lists sizes of the datatypes).
As e.g. int
may now be either 16 or 32 bit, and ADIOS assumes a size of 32, this might explode in our face.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I know but that's ADIOS' portability problem not ours: ornladios/ADIOS#187
(Side note: ADIOS1 only compiles on Linux and OSX, not on MSVC.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh wait, you mean we will not be able to store a (u)int64_t
since ADIOS has no long long
... Yeah... Well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's ADIOS' portability problem not ours
So the baseline here is to rely on the C standard and ignore potentially faulty behaviour in one of our backends. Not ideal, but probably the only thing we can should do.
you mean we will not be able to store a (u)int64_t
We most likely can, unless we're on exotic platforms (MSVC, anyone?) where LONG
and ULONG
are 32 Bits wide... Which again comes back to the standard compliance stated above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I try to say is: if adios_long
is defined as 8 byte on all platforms, we will implement it as such and convert properly in the backend from whatever matches (int
or long
or long long
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fileformat is responsible for portability, e.g. HDF5 stores automatically platform information with the stored types
Hi, just so I get this right: When reading e.g. a long
from some stored file, I will also have to take into account the platform that wrote the value and use the reading platform's datatype that corresponds with the actual length of the value stored, so possibly not a long
but an int
for example?
If so, it's good you mention this, because floating point datatypes already have the same issue and I ignored that so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will also have to take into account the platform
My idea is to handle it exactly the other way around. Our backends, such as HDF5, are portable across platforms. If HDF5 stores for example a long
(aka int64_t
) on x86-64 Linux it does store in the back automatically the type of the platform. If you read this file back on x86-64 Windows, the HDF type presented to you will be long long
(still aka int64_t
) since int
(and long
) are int32_t
there. The precision of the data is preserved.
JSON
Let's migrate that discussion here: #65
Note: The leftover HDF5 CI errors on OSX and MSVC are "just" the Note2: we might have to fake types in the ADIOS1 backend on OSX in case it's really assuming fixed sizes behind it's |
src/ParticlePatches.cpp
Outdated
@@ -69,7 +69,7 @@ ParticlePatches::read() | |||
IOHandler->flush(); | |||
|
|||
using DT = Datatype; | |||
if( DT::UINT64 != *dOpen.dtype ) | |||
if( DT::ULONG != *dOpen.dtype ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but this does not strictly enforce the standard, ULONG
may be 32 or 64.
if( determineDatatype< uint64_t >() != *dOpen.dtype )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly, that's the leftover from my replace marathon and your suggestion is correct!
src/Series.cpp
Outdated
@@ -768,7 +768,7 @@ Series::readBase() | |||
aRead.name = "openPMDextension"; | |||
IOHandler->enqueue(IOTask(this, aRead)); | |||
IOHandler->flush(); | |||
if( *aRead.dtype == DT::UINT32 ) | |||
if( *aRead.dtype == DT::UINT ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if( *aRead.dtype == determineDatatype< uint32_t >() )
return py::dtype("uint32"); | ||
else if( brc.getDatatype() == DT::UINT64 ) | ||
else if( brc.getDatatype() == DT::ULONG ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now probably not precise enough as ULONG
may be 32 or 64.
Numpy provides a number of compatible C types, but not all of the applicable ones:
https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.scalars.html#built-in-scalar-types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you recorgnized the problem in RecordComponent.cpp
already. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, pls feel free to annotate any line you catch :)
test/SerialIOTest.cpp
Outdated
@@ -749,19 +749,19 @@ TEST_CASE( "hzdr_hdf5_sample_content_test", "[serial][hdf5]" ) | |||
|
|||
RecordComponent& e_positionOffset_x = e_positionOffset["x"]; | |||
REQUIRE(e_positionOffset_x.unitSI() == 2.599999993753294e-07); | |||
REQUIRE(e_positionOffset_x.getDatatype() == Datatype::INT32); | |||
REQUIRE(e_positionOffset_x.getDatatype() == Datatype::INT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem for all datatypes read back from a file.
INT
may be 16, 32 or 64.
REQUIRE(e_positionOffset_x.getDatatype() == determineDatatype< int32_t >());
10bc435
to
9d421be
Compare
return py::dtype("int"); | ||
// missing in numpy: covered by uint or ulonglong | ||
// else if( brc.getDatatype() == DT::LONG ) | ||
// return py::dtype("long"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this is weird.
https://docs.scipy.org/doc/numpy/user/basics.types.html
int_
| Default integer type (same as Clong
; normally eitherint64
orint32
)
https://docs.scipy.org/doc/numpy-1.14.2/reference/arrays.scalars.html
int_
| compatible: Python int |'l'
https://docs.python.org/3/c-api/long.html
(Note that there's two meanings of long
here: (a) C long
(b) Pythons's 'bignum' integer of arbitrary precision)
PyLong_FromLong(long v)
Note the lacking PyLong_FromInt(int v)
So there seems to be an equivalent of long
in int_
, but no counterpart to unsigned long
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.scipy.org/doc/numpy/user/basics.types.html#array-types-and-conversions-between-types
Additionally to
intc
the platform dependent C integer typesshort
,long
,longlong
and their unsigned versions are defined.
Quick test:
In [1]: import numpy as np
In [2]: np.dtype('long')
Out[2]: dtype('int64')
Haven't figured out how unsigned long
works yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs are wrong. The C integer type
long
is callednp.int_
, and the integer typeunsigned long
is callednp.uint
So there we have it:
C int
/ unsigned int
is Python intc
/uintc
C long
/unsigned long
is Python int_
/ uint
In [1]: import numpy as np
In [2]: np.dtype('short')
Out[2]: dtype('int16')
In [3]: np.dtype('intc')
Out[3]: dtype('int32')
In [4]: np.dtype('int_')
Out[4]: dtype('int64')
In [5]: np.dtype('longlong')
Out[5]: dtype('int64')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, as fancy as my float-confusion in Python xD Thanks a ton for checking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, should be fixed now together with my float question in numpy's docs: numpy/numpy#11837
else if( r.getDatatype() == Datatype::SHORT ) dtype = py::dtype("short"); | ||
else if( r.getDatatype() == Datatype::INT ) dtype = py::dtype("int"); | ||
// missing in numpy: covered by int or longlong | ||
// else if( r.getDatatype() == Datatype::LONG ) dtype = py::dtype("long"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment.
else if( a.dtype().is(py::dtype("int")) ) | ||
r.storeChunk( offset, extent, shareRaw( (int*)a.mutable_data() ) ); | ||
// missing in numpy: covered by int or longlong | ||
// else if( a.dtype().is(py::dtype("long")) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment.
test/CoreTest.cpp
Outdated
REQUIRE(Datatype::INT == a.dtype); | ||
a = Attribute(static_cast< int >(0)); | ||
REQUIRE(Datatype::LONG == a.dtype); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scary thought:
There might be platforms where
sizeof(short) > 2u
making it impossible to write or read 16 Bit wide data (as int16_t
does not alias to any of our provided integer types).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that would be weird. But if a platform does not implement a 2 byte fundamental integer type, there will also be no alias int16_t
for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to throw out this SO answer again, this is present for char
on certain (albeit rare) systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, excuse the avalanche of comments.
Python dtypes need adjustment and a few questions need to be resolved.
Refactor the integer type system to support all fundamental C/C++ integer types: `short`, `int`, `long`, `long long` plus the unsigned versions of those. Fixed size ints are aliases of the above, leading to issues on OSX and MSVC since there is no fixed int alias for "long" in the old design.
176d37d
to
01e54fd
Compare
(just pushing again since travis forgot to report its success state) |
01e54fd
to
92dceaa
Compare
92dceaa
to
0cdcb65
Compare
Besides returning true for the same types, identical implementations on some platforms, e.g. if long and long long are the same or double and long double will also return true. Affected by https://stackoverflow.com/questions/44515148/why-is-operator-overload-of-enum-ambiguous-in-msvc on MSVC.
0cdcb65
to
159d8f4
Compare
Refactor the integer type system to support all fundamental C/C++ integer types:
short
,int
,long
,long long
plus the unsigned versions of those.Fixed size and "least size" ints are aliases of the above, leading to issues on MSVC since there is no fixed int alias for
long
in the old design.Another detail is that
uint64_t
is an alias forlong
on 64bit Linux but an alias forlong long
on 64bit OSX. On both platforms,long
andlong long
are 64 bit.User Changes
Users will see no changes as long as they use helpers such as
determineDatatype<T>()
. Otherwise, they must be aware thatDatatype
s such asDT::INT32
are now in one ofDT::SHORT
,DT::INT
, etc. Types such asint32_t
can still be used for attributes and data chunks, since they are simple aliases.Backend Changes
Instead of implementing types such as
INT16
,INT32
,INT64
, ... one now needs to implement the four typesshort
,int
,long
andlong long
. Throw runtime errors on missing types, e.g. ADIOS 1.13.1 does not supportlong long
yet.The fileformat is responsible for portability, e.g. HDF5 stores automatically platform information with the stored types, making sure a stored
long
on one platform is anint32_t
on one and andint64_t
on another platform (and will not show up aslong
on both, potentially).Related
Related and depending issues are #330 and #333.