index and dimension types in for loops #28

rppawlo · 2015-06-18T20:08:12Z

Below are a bunch of emails on the proper way of indexing kokkos views in for loops and preventing signed/unsigned comparison warnings. We didn't come to a conclusion, but this issue can impact performance.

Carter and Christian,

There's been something bothering me about kokkos and I just wanted to discuss. The indexing into an array is templated for the index type, but the size returned by dimension is based on the memory space. This leads to lots of warnings for signed/unsigned comparisons if you do

for (int i=0; i < a.dimension_1(); ++i)

So what is the recommended index type? I am getting tons of compiler warnings and need to do the following:

for (Kokkos::View<double***,execution_space>::size_type i=0; i < a.dimension_1(); ++i)

or

for (int i=0; i < static_cast(a.dimension_1()); ++i)

or

for (int i=0; i < Teuchos::as(a.dimension_1()); ++i)

What's bothersome is that I seem to have to write quite a bit of superfluous code to make sure we don't generate warnings and remain consistent. In looking at the kokkos objects, there is a mix of usage of "size_t" and memory_space::size_type. For the host space, these are the same, but for cuda space, the size_type is defined to be unsigned, not size_t, so there is a chance that these could be out of sync (especially if someone writes their own memory space). In looking at the programming guide, I did not see guidance on this, although size_t seemed to be what the examples use. But when I look at the fenl example, it was using "unsigned" in its functor loops. Is "size_t"
always guaranteed to be "unsigned"? If not, then there may be some consistency issues in kokkos for different platforms/execution spaces. I have not explored that deeply. What do you recommend?

Can you add a section to the programming guide discussing indexing types?

I was under the impression that int can be faster than unsigned. Is this still the case? To be able to assess the impact of the index type, we define the index_size_type in the phalanx namespace which is set at compile time to change all functors at once:

for (PHX::index_size_type i=0; i <
Teuchos::asPHX::index_size_type(a.dimension_1()), ++i)

Again, I'd like to avoid the static cast (especially when drekar/Charon2 have 200+ functors) but it doesn't seem safe in the current design.

Also, I noticed that you are including stddef.h to get at size_t definition, which pollutes the global namespace with size_t. Why isn't "size_t" defined in the kokkos namespace instead?

Thanks!
Roger

I recommend:

const int n = a.dimension_1();
for ( int i = 0 ; i < n ; ++i )

as long as you know n < 2^31

--Carter

I have actually done that for some of the functors in the phalanx examples. Well, actually I use:

const int n = Teuchos::as(a.dimension_1());
for (int i=0; i < n; ++i)

since the Teuchos::as should check in debug builds for the integer size violation.

It just seems to be alot of superfluous code to write (especially when having to change over so many functors). I can see people not wanting to adopt kokkos when the basic loop mechanism is not as simple as what they are used to writing:

for (int i=0; i < a.dimension_1(); ++i)

(that's assuming they care about compiler warnings).

Is there a clear boundary on when to use memory_space::size_type vs size_t?

Roger

Hi

there is no good way to solve the issue (but maybe you have an idea).
The underlying issue is that we need to support more than 32bit, while in many cases it is better to use int as the loop index. In particular with Vectorization that can help significantly with performance. Now there are not many nice options.

(i) Template views on size_type: Good you can decide what to do, Bad: another template parameter
(ii) Template dimension call on size_type: Good slightly less code than casting, we can internally do the size checking, Bad: you always need to write at least dimension_0<>()
(iii) You cast the thing: Good: no additional interface stuff for Kokkos, Bad: more code to write
(iv) We always use size_t: Good smallest interface: Bad up to 30% observed performance degradation vs using int in index calculation heavy code.

Personally I was in favour of (iii) in particular since many people will not care about warnings, and the ones who do are typically prepared to write more code.
If you have other suggestions / thoughts let us know.

Christian

I am in the early stages of a complete refactorization of the View internals, so now is actually a good time to address the issue.

Some issues to consider given: for (int i=0; i < a.dimension_1(); ++i)

Could the potential dimensions of 'a' be > 2^31 ?
Optimizers tend to do better with 'int' as the iteration type.
Assuming all dimension bounds are 64bit consumes more registers.
If 'a' is non-const the compiler must pessimistically assume 'a.dimension_1()' could change, preventing optimization.

An idea: for ( int i = 0 ; i < a.dimension<int,1>() ; ++i )

Hide the cast
Could in debug mode check that the dimension extent is OK to cast

--Carter

Hi

we already have the dimension(int) call which accepts the runtime dimension id.
This is useful for loops over dimensions etc.

Does that call collide with a
dimension<int,3>()
or
dimension(3)

Christian

Currently, our functors in Albany, Charon2 and Drekar are using the dimension(int). If this is not compatible with template versions, then it would require us to coordinate the change of all 200+ functors when the kokkos change is pushed. Yuck! But writing the template parameter on dimension is certainly less work than writing all the casting. Should we talk about this more at the Wednesday meeting with the larger group?

Roger

Thats exactly my thought.

One option would be to use the index_type you give as the return type for dimension(int)

Christian

yes. That may be the way to go.

One other comment. I believe that templating the dimension methods will require the "template" keyword when calling methods from templated code (see attached example). This can be confusing for non-template savvy users since it is not always obvious what the true error is. This might be worse than the casting for some user bases.

Roger

Yeah - that's why we have 'dimension_1()' and not 'dimension<1>()'

Carter

One problem with doing

template
iType dimension(const iType& k);

is that k can always be an int (or short for that matter), but the return value might need to be larger.

As I said no great solutions to the the issue.

Christian

Since it could impact performance if the user picks the wrong index type, maybe just add a discussion on this to the user guide - possible options, the advantages and drawbacks, etc... Maybe I missed it and it is already there is some form?

Roger

I'd really like a better API solution, just haven't found one yet that covers all the issues.

Another 'brainstorming' thought:
for ( int i = 0 ; i < dimension<0,int>(a) ; ++i )
vs.
for ( int i = 0 ; i < a.template dimension<0,int>() ; ++i )
vs.
for ( int i = 0 ; i < static_cast( a.dimension_0() ) ; ++i )

At least if Kokkos is in charge then the cast can be verified compatible.

--Carter

Just thought of something. In phalanx, we are wrapping the kokkos views in a phalanx mdfield object, so the issue with a backwards compatible change to kokkos could be controlled (and removed from influencing the best long term solution).

I like the template version, its just the debugging when forgetting to write "template" that is tough for people.

Could a non-member function(s) be useful here? I could imagine hard coding for certain types like int to avoid template argument:

for (int i=0; i < Kokkos::dim_int(a,1); ++i)

Maybe that just makes things harder.

Roger

If we don't mind verbose member function names then:

a.dimension_0_int()
a.dimension_0_long()
a.dimension_1_int()
a.dimension_1_long()

for the fundamental signed integral types.

You mentioned you are using 'dimension(0)' instead of 'dimension_0()' ? Necessary for functionality or desired for syntax? The '(n)' version has an 'if-else-if-...' implementation.

--Carter

For backwards compatibility. The phalanx mdfield originally had dimension(n) to be consistent with shards array and intrepid (man that was a long time ago). But we have equivalent dimension_n() now. We can't afford to change all evaluators for kokkos at this point, so we support old syntax and update evaluators as needed to the new syntax.

KOKKOS_FORCEINLINE_FUNCTION
size_type dimension_7() const
{return m_field_data.dimension_7();}

template<typename iType>
KOKKOS_FORCEINLINE_FUNCTION
size_type dimension(const iType& ord) const;

Roger

I have usecases where it is much more convenient to loop over dimensions when manually unroll. For those I use the dimension(n) call.

Christian

The text was updated successfully, but these errors were encountered:

crtrott · 2015-06-18T20:48:05Z

I still feel like the best option is to leave in the static_cast as the method of choice:
(i) its only for warning prevention
(ii) people who care about those warnings know what to do, people who don't care might not be familiar with a.template dimension(0) syntax (the .template part of it). This would hurt adoption more I think than the warning thingy.

That said if you really want something I think we need a templated option. The reason is that I imagine templating a functor on the index type and instantiating both a 32 bit and 64 bit variant depending on my problem size. We already do that in the Trilinos Kernels package for Sparse MatVec.
But I don't want to get rid of the non-templated dimension function. Therefore my proposal:

template<typename RETURN_TYPE, typename ITYPE>
const RETURN_TYPE& dimension_cast(const ITYPE& dim);

Any thoughts?

crtrott · 2015-06-18T20:48:33Z

P.S. this _cast would also imply that we might do a downcast safety check.

rppawlo · 2015-06-18T21:01:12Z

If we use Teuchos::as instead of static_cast, then in debug builds, this should check that the downcast in within the limits. We should check this though. Since SIERRA builds with warnings as errors, this will definitely impact their entire dev team.

crtrott · 2015-06-18T21:21:53Z

We can't use Teuchos::as since Kokkos doesn't depend on Teuchos. You can use it for manual user side casts though.

mhoemmen · 2015-06-18T21:54:15Z

Roger: Teuchos::as never actually did what it promised until I rewrote parts of it a few years ago. You should check Teuchos_as.hpp to make sure that it works like you think it should in a debug build. Furthermore, if the loop is long enough that using int helps performance, then why not always check for overflow? Unit tests generally don't exercise big problems, but end users (who build RELEASE) do.

rppawlo · 2015-06-19T14:13:08Z

Mark - good point - I agree that check like that only make sense at
large scale where quite often a general debug mode flag could make the
entire code too slow due to too many checks. But if there is a chance
on impacting performance, I would like to be able to disable. It should
probably be a variable that the user can specifically toggle at
configure time.

On 06/18/2015 05:54 PM, Mark Hoemmen wrote:

Roger: Teuchos::as never actually did what it promised until I rewrote
parts of it a few years ago. You should check |Teuchos_as.hpp| to make
sure that it works like you think it should in a debug build.
Furthermore, if the loop is long enough that using |int| helps
performance, then why not /always/ check for overflow? Unit tests
generally don't exercise big problems, but end users (who build
RELEASE) do.

—
Reply to this email directly or view it on GitHub
#28 (comment).

maherou · 2015-08-07T16:31:29Z

Kokkos developers: Please resolve this situation by either declaring that the API will or will not change to accommodate the need for indexing needs. Users need to know how to best go forward. Thanks.

crtrott · 2015-08-07T17:44:53Z

I didn't hear from users yet whether the proposed solution of a RETURN_TYPE dimension_cast<typename RETURNTYPE, typename ITYPE>(ITYPE i); as a member of View would be a better solution for anybody than what we do now. The loop would look like this:

for(int i=0; i < a.template dimension_cast(0); i++) {
}

As opposed to the current:

for(int i=0; i < static_cast(a.dimension(0)); i++) {
}

I don't see much of a benefit in readability or line length.

The other option mentioned earlier is named functions:

for(int i=0; i < a.dimension_int(0); i++) {
}

But I don't like that much because it means I need a lot of different member functions.

Christian

crtrott · 2015-08-07T17:46:21Z

Ok to clarify again: I want expressed user preferences. So far I only have "we don't like using static_cast".

rppawlo · 2015-08-07T17:58:44Z

I would probably avoid:

for(int i=0; i < a.template dimension_cast(0); i++)

Because it forces users to remember to add the template keyword. You
will spend your days answering this error for many users. So would lean
towards static_cast instead.

How about an extra template parameter on the kokkos view that allows
users to define the dimension_X() return types (that defaults to the
node type definition as currently implemented)?

Roger

On 08/07/2015 01:44 PM, Christian Trott wrote:

I didn't hear from users yet whether the proposed solution of a
RETURN_TYPE dimension_cast(ITYPE i); as a member of View would be a
better solution for anybody than what we do now. The loop would look
like this:

for(int i=0; i < a.template dimension_cast(0); i++) {
}

As opposed to the current:

for(int i=0; i < static_cast(a.dimension(0)); i++) {
}

I don't see much of a benefit in readability or line length.

The other option mentioned earlier is named functions:

for(int i=0; i < a.dimension_int(0); i++) {
}

But I don't like that much because it means I need a lot of different
member functions.

Christian

—
Reply to this email directly or view it on GitHub
#28 (comment).

mhoemmen · 2015-08-07T18:46:41Z

I seriously don't see this as a bug. The same issue comes up with std::vector, whose size() method returns size_t (usually unsigned). Furthermore, using int exposes users to run-time overflow errors, if they iterate over more than 2 billion things. Kokkos requires C++11, so users could always do this:

for (decltype (a.dimension (0)) i = 0; i < a.dimension (0); ++i) { ... }

Furthermore, if users want int for performance reasons, the code should make it obvious that they are doing a cast, in order to help future debugging.

Users have already decided to write in C++, not Lisp; fixnums don't automatically get promoted to bignums. The preferred common case should always be correct, even if that sacrifices a bit of convenience.

hcedwar · 2015-08-07T19:49:20Z

The probable best (least worst) is to fix the return type to 'size_t' even when the internal value is something else (e.g., unsigned on GPU). That fix is compatible with C++ usage and maintains the exact same usability warning issues as with standard C++ constructs:
std::vector a(N);
for ( int i = 0 ; i < a.size(); ++i ) {...} // warning about signed/unsigned comparison
User who know they have N <= 2^31, want to be warning free, and want every bit of performance should do:
const int n = a.size();
for ( int i = 0 ; i < n ; ++i ) { ... }

eric-c-cyr · 2015-08-07T20:09:41Z

In response to "I want expressed user preferences. " I think this would be ideal (and the thing I ideally want)

for(int i=0;i<a.dimension(0);i++)

The static_cast is frustrating, because it takes a simple loop and makes it longer with some "C++ Magic" included. Additionally, I find people, even "experienced" C++ developers, don't like any calls *_cast (this baffles me, but I think its accurate).

One option would be to have a non member function that does this for you...

for(int i=0;i<dimension(0,a);i++)

Of course this function would have to be in the namespace, to get rid of the Kokkos (alternatively its an ad that "this loop brought to you by Kokkos"!). This would automatically convert to int with a static cast underneath. A few problems though...

The int as default may make it inflexible for the future. Though a simple compile time type definition would conceivably make this go away.
There may be many such non member functions
You may have already considered and dismissed this as an option

hcedwar · 2015-08-07T20:33:25Z

Just need to know how to handle situation when 2^31 < dimension(r)

eric-c-cyr · 2015-08-07T20:36:31Z

Tough luck I suppose. You can always detect this (right, I honestly don't know) and throw an informative error suggesting the user recompile with a bigger type used in the method.

hcedwar · 2015-08-07T20:38:26Z

Can detect at the runtime expense of checking at every loop iteration.

eric-c-cyr · 2015-08-07T20:39:17Z

No you can't, though the current "static_cast" situation is not better is it?

hcedwar · 2015-08-07T20:40:41Z

static_cast does not check at runtime, it just trusts that you know what you are doing.

eric-c-cyr · 2015-08-07T20:41:42Z

All this nonmember function does is hides a static cast. It will allow you to interject with a debug mode though...

hcedwar · 2015-08-07T20:46:29Z

Is all you (as our user representative) want is an 'int' interface? If so then we could boil down to just two methods:
size_t dimension(unsigned) const ;
int dimension_int(unsigned) const ;
We would in the documentation warn that 'dimension_int' may silently truncate and you will only know it if you turn on bounds checking.
Would this suffice?

crtrott · 2015-08-07T21:34:23Z

Eric I don't get how the non member function would solve the casting problem if its not templated.

One of those will do a warning:
for(int i=0;i<dimension(0,a);i++)
for(size_t i=0;i<dimension(0,a);i++)

mhoemmen · 2015-08-07T21:36:48Z

It could work if you pass in i instead of a.

etphipp · 2015-08-07T21:44:46Z

I have to say, I really don't understand what's wrong with this approach:

const int n = a.dimension_0();
for (int i=0; i<n; ++i) { ... }

It's clear, avoids any issues with signed/unsigned comparisons regardless of what the return type of dimension_0() is, makes debugging easier and saves the compiler from having to optimize away the repeated calls to dimension_0(). Every single kokkos kernel I have written does it this way, and its a good habit to get into in general.

-Eric

From: Mark Hoemmen <notifications@github.com mailto:notifications@github.com>
Reply-To: kokkos/kokkos <reply@reply.github.com mailto:reply@reply.github.com>
Date: Friday, August 7, 2015 at 3:36 PM
To: kokkos/kokkos <kokkos@noreply.github.com mailto:kokkos@noreply.github.com>
Subject: [EXTERNAL] Re: [kokkos] index and dimension types in for loops (#28)

It could work if you pass in i instead of a.

Reply to this email directly or view it on GitHubhttps://github.com//issues/28#issuecomment-128838783.

mhoemmen · 2015-08-07T22:16:49Z

@etphipp It probably even reduces compilation time. The compiler no longer has to try to prove that the dimension of a (which is nonconst) can change during the loop.

hcedwar · 2015-08-07T22:33:36Z

const int n = a.dimension_0();
This is absolutely "best practice," with the caveat that you will get a warning from the implicit conversion to 'int'. Thus the offer to add just the 'dimension_int(0)' interface to suppress warnings and add optional bounds checking.

crtrott · 2015-08-07T22:36:36Z

I agree that that is the best practice. I am still opposed a bit to add dimension_int() because I believe that either you don't care as a user or you are prepared to write:

const int n = static_cast(a.dimension_0());

Or whatever your preferred method of conversion is.

Christian

eric-c-cyr · 2015-08-07T22:53:18Z

Christian, it would behave exactly like dimension_int. I don't think its any different really so it would contain a static cast, and I would just force int or some specified index_type that was guaranteed to be fast.

I don't mind the structure:

const int n = a.dimension_0();
for (int i=0; i<n; ++i) { ... }

Though the warning is a little disconcerting and it does increase the size of my loop bodies visually which I'm not in love with. Particularly with nested loops where I might not put braces.

Regarding the static_cast. Its clear that cut and paste might be sufficient. Though that depends on how sloppy people are. What I think is likely is that a project will have lots of developer who don't care, and a few (say Mark) who do and it becomes incumbent on them to clean the warnings periodically. Is that acceptable (Mark?)? My concern (and at this point I will go with whatever decision you make soon) is that this creates yet another maintenance headache and one that is rather silly.

Is there a sense that unsigned will be more uniformly supported in the future?

In the end I'm fine with whatever decision you go with, as long as one is made.

eric-c-cyr · 2015-08-07T23:09:07Z

After a little more thought, I think I prefer the dimension_int approach. It has two aspects I like

Keeping static_casts from the user code (these should be exceptional as opposed to ordinary)
Keeping my loop bodies visually compact

Now I'm done (really)

crtrott · 2015-08-09T09:22:55Z

a) does anybody second eric?
b) does that mean you also want a dimension_long, dimension_int64, dimension_size_t and dimension_unsigned?

hcedwar · 2015-08-10T14:25:12Z

No, only the following
size_t dimension(i)
int dimension_int(i)

rppawlo · 2015-08-10T15:16:36Z

I will second Eric Cyr's request. Just add the one "int" return type.

Just to be clear, the dimension(i) is not guaranteed to return size_t.
It returns a typedef traits::size_type in the view. This trait is
specified by the node type. It is almost always typedef'd to size_t,
but I think the cuda node(?) returns a different value (at least one
node type did when I traced through this a few months ago).

Roger

On 08/10/2015 10:25 AM, hcedwar wrote:

No, only the following
size_t dimension(i)
int dimension_int(i)

—
Reply to this email directly or view it on GitHub
#28 (comment).

mhoemmen · 2015-08-10T15:30:23Z

It would make sense for the function to raise an error on overflow, at least in a debug build. The documentation should also point out that this function is a compromise and show readers what they should do instead.

rppawlo · 2016-01-07T17:38:59Z

I think we came to the conclusion in this issue that kokkos would hard code the int return type for dimension to the view:

int dimension_int(i)

Did this ever get added? Any time frame for this?

hcedwar · 2016-01-09T23:52:17Z

The Kokkos::Experimental::View now has 'int extent_int(i)'. The name 'extent' was used instead of 'dimension' for compatibility with 'std::extent' vocabulary, and the multidimensional array view proposal going forward in the ISO C++ committee.

rppawlo · 2016-01-11T15:28:57Z

Thanks Carter! A few more questions:

Since this is only in the experimental view, what's the time frame for making the experimental view the default?
Will there be corresponding functions for extent_int_0(), extent_int_1(), ... as well?

hcedwar · 2016-01-11T15:41:22Z

ASAP - had targeted pre-holiday, but the best laid plans...
No need, given 'constexpr' declaration then extent_int(0) is just as optimized as extent_int_0(), as long as the calling argument is a literal value or constexpr value and not a variable.

hcedwar · 2016-03-17T13:44:26Z

The new view has this in master.

Pinned memory

hcedwar added Enhancement Improve existing capability; will potentially require voting InDevelop labels Jan 9, 2016

hcedwar closed this as completed Mar 17, 2016

rppawlo mentioned this issue Jul 27, 2016

Panzer: Make Workset::num_cells an int trilinos/Trilinos#530

Closed

Rombur added a commit to Rombur/kokkos that referenced this issue Jan 27, 2020

Pinned memory (kokkos#28)

70d1138

Pinned memory

ndellingwood mentioned this issue Nov 4, 2020

Added "TRIBITS Pthread" to PTHREAD CMake configuration in Kokkos. #3558

Merged

index and dimension types in for loops #28

index and dimension types in for loops #28

Comments

rppawlo commented Jun 18, 2015

crtrott commented Jun 18, 2015

crtrott commented Jun 18, 2015

rppawlo commented Jun 18, 2015

crtrott commented Jun 18, 2015

mhoemmen commented Jun 18, 2015

rppawlo commented Jun 19, 2015

maherou commented Aug 7, 2015

crtrott commented Aug 7, 2015

crtrott commented Aug 7, 2015

rppawlo commented Aug 7, 2015

mhoemmen commented Aug 7, 2015

hcedwar commented Aug 7, 2015

eric-c-cyr commented Aug 7, 2015

hcedwar commented Aug 7, 2015

eric-c-cyr commented Aug 7, 2015

hcedwar commented Aug 7, 2015

eric-c-cyr commented Aug 7, 2015

hcedwar commented Aug 7, 2015

eric-c-cyr commented Aug 7, 2015

hcedwar commented Aug 7, 2015

crtrott commented Aug 7, 2015

mhoemmen commented Aug 7, 2015

etphipp commented Aug 7, 2015

mhoemmen commented Aug 7, 2015

hcedwar commented Aug 7, 2015

crtrott commented Aug 7, 2015

eric-c-cyr commented Aug 7, 2015

eric-c-cyr commented Aug 7, 2015

crtrott commented Aug 9, 2015

hcedwar commented Aug 10, 2015

rppawlo commented Aug 10, 2015

mhoemmen commented Aug 10, 2015

rppawlo commented Jan 7, 2016

hcedwar commented Jan 9, 2016

rppawlo commented Jan 11, 2016

hcedwar commented Jan 11, 2016

hcedwar commented Mar 17, 2016