Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer Type System: Refactor #337

Merged
merged 2 commits into from
Sep 11, 2018
Merged

Integer Type System: Refactor #337

merged 2 commits into from
Sep 11, 2018

Conversation

ax3l
Copy link
Member

@ax3l ax3l commented Sep 7, 2018

Refactor the integer type system to support all fundamental C/C++ integer types: short, int, long, long long plus the unsigned versions of those.

Fixed size and "least size" ints are aliases of the above, leading to issues on MSVC since there is no fixed int alias for long in the old design.
Another detail is that uint64_t is an alias for long on 64bit Linux but an alias for long long on 64bit OSX. On both platforms, long and long long are 64 bit.

User Changes

Users will see no changes as long as they use helpers such as determineDatatype<T>(). Otherwise, they must be aware that Datatypes such as DT::INT32 are now in one of DT::SHORT, DT::INT, etc. Types such as int32_t can still be used for attributes and data chunks, since they are simple aliases.

Backend Changes

Instead of implementing types such as INT16, INT32, INT64, ... one now needs to implement the four types short, int, long and long long. Throw runtime errors on missing types, e.g. ADIOS 1.13.1 does not support long long yet.

The fileformat is responsible for portability, e.g. HDF5 stores automatically platform information with the stored types, making sure a stored long on one platform is an int32_t on one and and int64_t on another platform (and will not show up as long on both, potentially).

Related

Related and depending issues are #330 and #333.

@ax3l
Copy link
Member Author

ax3l commented Sep 7, 2018

@franzpoeschel I am sorry, but this PR will affect your current json backend
@lge0303 I am verry sorry, but this PR will affect your current netcdf backend

please feel free to raise any questions in issues that might occur from this change!

@C0nsultant
Copy link
Member

... your current json backend

Oh yes, I'm excited 😵

case DT::UINT64:
case DT::VEC_UINT64:
case DT::ULONG:
case DT::VEC_ULONG:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're very likely going to have problems with ADIOS in this regard.
From what I can see, they do NOT make the destinction that causes problems with OSX and MSVC. Rather, they treat the bit-widths as I did initally (see adios_types.h, which explicitly lists sizes of the datatypes).
As e.g. int may now be either 16 or 32 bit, and ADIOS assumes a size of 32, this might explode in our face.

Copy link
Member Author

@ax3l ax3l Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I know but that's ADIOS' portability problem not ours: ornladios/ADIOS#187

(Side note: ADIOS1 only compiles on Linux and OSX, not on MSVC.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh wait, you mean we will not be able to store a (u)int64_t since ADIOS has no long long... Yeah... Well.

Copy link
Member

@C0nsultant C0nsultant Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's ADIOS' portability problem not ours

So the baseline here is to rely on the C standard and ignore potentially faulty behaviour in one of our backends. Not ideal, but probably the only thing we can should do.

you mean we will not be able to store a (u)int64_t

We most likely can, unless we're on exotic platforms (MSVC, anyone?) where LONG and ULONG are 32 Bits wide... Which again comes back to the standard compliance stated above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I try to say is: if adios_long is defined as 8 byte on all platforms, we will implement it as such and convert properly in the backend from whatever matches (int or long or long long).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fileformat is responsible for portability, e.g. HDF5 stores automatically platform information with the stored types

Hi, just so I get this right: When reading e.g. a long from some stored file, I will also have to take into account the platform that wrote the value and use the reading platform's datatype that corresponds with the actual length of the value stored, so possibly not a long but an int for example?
If so, it's good you mention this, because floating point datatypes already have the same issue and I ignored that so far.

Copy link
Member Author

@ax3l ax3l Sep 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also have to take into account the platform

My idea is to handle it exactly the other way around. Our backends, such as HDF5, are portable across platforms. If HDF5 stores for example a long (aka int64_t) on x86-64 Linux it does store in the back automatically the type of the platform. If you read this file back on x86-64 Windows, the HDF type presented to you will be long long (still aka int64_t) since int (and long) are int32_t there. The precision of the data is preserved.

JSON

Let's migrate that discussion here: #65

@ax3l
Copy link
Member Author

ax3l commented Sep 7, 2018

Note: The leftover HDF5 CI errors on OSX and MSVC are "just" the shape type for constant records.

Note2: we might have to fake types in the ADIOS1 backend on OSX in case it's really assuming fixed sizes behind it's adios_int/long/... types.

@@ -69,7 +69,7 @@ ParticlePatches::read()
IOHandler->flush();

using DT = Datatype;
if( DT::UINT64 != *dOpen.dtype )
if( DT::ULONG != *dOpen.dtype )
Copy link
Member

@C0nsultant C0nsultant Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but this does not strictly enforce the standard, ULONG may be 32 or 64.

if( determineDatatype< uint64_t >() != *dOpen.dtype )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly, that's the leftover from my replace marathon and your suggestion is correct!

src/Series.cpp Outdated
@@ -768,7 +768,7 @@ Series::readBase()
aRead.name = "openPMDextension";
IOHandler->enqueue(IOTask(this, aRead));
IOHandler->flush();
if( *aRead.dtype == DT::UINT32 )
if( *aRead.dtype == DT::UINT )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if( *aRead.dtype == determineDatatype< uint32_t >() )

return py::dtype("uint32");
else if( brc.getDatatype() == DT::UINT64 )
else if( brc.getDatatype() == DT::ULONG )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now probably not precise enough as ULONG may be 32 or 64.
Numpy provides a number of compatible C types, but not all of the applicable ones:
https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.scalars.html#built-in-scalar-types

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you recorgnized the problem in RecordComponent.cpp already. 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, pls feel free to annotate any line you catch :)

@@ -749,19 +749,19 @@ TEST_CASE( "hzdr_hdf5_sample_content_test", "[serial][hdf5]" )

RecordComponent& e_positionOffset_x = e_positionOffset["x"];
REQUIRE(e_positionOffset_x.unitSI() == 2.599999993753294e-07);
REQUIRE(e_positionOffset_x.getDatatype() == Datatype::INT32);
REQUIRE(e_positionOffset_x.getDatatype() == Datatype::INT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem for all datatypes read back from a file.
INT may be 16, 32 or 64.

REQUIRE(e_positionOffset_x.getDatatype() == determineDatatype< int32_t >());

@ax3l ax3l force-pushed the topic-intTypeSystem branch 3 times, most recently from 10bc435 to 9d421be Compare September 8, 2018 08:01
@ax3l ax3l mentioned this pull request Sep 8, 2018
Closed
return py::dtype("int");
// missing in numpy: covered by uint or ulonglong
// else if( brc.getDatatype() == DT::LONG )
// return py::dtype("long");
Copy link
Member

@C0nsultant C0nsultant Sep 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this is weird.
https://docs.scipy.org/doc/numpy/user/basics.types.html

int_ | Default integer type (same as C long; normally either int64 or int32)

https://docs.scipy.org/doc/numpy-1.14.2/reference/arrays.scalars.html

int_ | compatible: Python int | 'l'

https://docs.python.org/3/c-api/long.html
(Note that there's two meanings of longhere: (a) C long (b) Pythons's 'bignum' integer of arbitrary precision)

PyLong_FromLong(long v)

Note the lacking PyLong_FromInt(int v)

So there seems to be an equivalent of long in int_, but no counterpart to unsigned long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.scipy.org/doc/numpy/user/basics.types.html#array-types-and-conversions-between-types

Additionally to intc the platform dependent C integer types short, long, longlong and their unsigned versions are defined.

Quick test:

In [1]: import numpy as np

In [2]: np.dtype('long')
Out[2]: dtype('int64')

Haven't figured out how unsigned long works yet.

Copy link
Member

@C0nsultant C0nsultant Sep 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy/numpy#10678 (comment)

Docs are wrong. The C integer type long is called np.int_, and the integer type unsigned long is called np.uint

So there we have it:
C int/ unsigned int is Python intc/uintc
C long/unsigned long is Python int_/ uint

In [1]: import numpy as np

In [2]: np.dtype('short')
Out[2]: dtype('int16')

In [3]: np.dtype('intc')
Out[3]: dtype('int32')

In [4]: np.dtype('int_')
Out[4]: dtype('int64')

In [5]: np.dtype('longlong')
Out[5]: dtype('int64')

Copy link
Member Author

@ax3l ax3l Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, as fancy as my float-confusion in Python xD Thanks a ton for checking!

Copy link
Member Author

@ax3l ax3l Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, should be fixed now together with my float question in numpy's docs: numpy/numpy#11837

else if( r.getDatatype() == Datatype::SHORT ) dtype = py::dtype("short");
else if( r.getDatatype() == Datatype::INT ) dtype = py::dtype("int");
// missing in numpy: covered by int or longlong
// else if( r.getDatatype() == Datatype::LONG ) dtype = py::dtype("long");
Copy link
Member

@C0nsultant C0nsultant Sep 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment.

else if( a.dtype().is(py::dtype("int")) )
r.storeChunk( offset, extent, shareRaw( (int*)a.mutable_data() ) );
// missing in numpy: covered by int or longlong
// else if( a.dtype().is(py::dtype("long")) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment.

REQUIRE(Datatype::INT == a.dtype);
a = Attribute(static_cast< int >(0));
REQUIRE(Datatype::LONG == a.dtype);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scary thought:
There might be platforms where

sizeof(short) > 2u

making it impossible to write or read 16 Bit wide data (as int16_t does not alias to any of our provided integer types).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that would be weird. But if a platform does not implement a 2 byte fundamental integer type, there will also be no alias int16_t for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to throw out this SO answer again, this is present for char on certain (albeit rare) systems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D

Copy link
Member

@C0nsultant C0nsultant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, excuse the avalanche of comments.
Python dtypes need adjustment and a few questions need to be resolved.

@franzpoeschel franzpoeschel mentioned this pull request Sep 10, 2018
6 tasks
Refactor the integer type system to support all fundamental
C/C++ integer types: `short`, `int`, `long`, `long long` plus
the unsigned versions of those.

Fixed size ints are aliases of the above, leading to issues on
OSX and MSVC since there is no fixed int alias for "long" in the
old design.
@ax3l
Copy link
Member Author

ax3l commented Sep 11, 2018

(just pushing again since travis forgot to report its success state)

Besides returning true for the same types, identical implementations
on some platforms, e.g. if long and long long are the same or double
and long double will also return true.

Affected by
  https://stackoverflow.com/questions/44515148/why-is-operator-overload-of-enum-ambiguous-in-msvc

on MSVC.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants