Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index and dimension types in for loops #28

Closed
rppawlo opened this issue Jun 18, 2015 · 37 comments
Closed

index and dimension types in for loops #28

rppawlo opened this issue Jun 18, 2015 · 37 comments
Labels
Enhancement Improve existing capability; will potentially require voting

Comments

@rppawlo
Copy link
Contributor

rppawlo commented Jun 18, 2015

Below are a bunch of emails on the proper way of indexing kokkos views in for loops and preventing signed/unsigned comparison warnings. We didn't come to a conclusion, but this issue can impact performance.



Carter and Christian,

There's been something bothering me about kokkos and I just wanted to discuss. The indexing into an array is templated for the index type, but the size returned by dimension is based on the memory space. This leads to lots of warnings for signed/unsigned comparisons if you do

for (int i=0; i < a.dimension_1(); ++i)

So what is the recommended index type? I am getting tons of compiler warnings and need to do the following:

for (Kokkos::View<double***,execution_space>::size_type i=0; i < a.dimension_1(); ++i)

or

for (int i=0; i < static_cast(a.dimension_1()); ++i)

or

for (int i=0; i < Teuchos::as(a.dimension_1()); ++i)

What's bothersome is that I seem to have to write quite a bit of superfluous code to make sure we don't generate warnings and remain consistent. In looking at the kokkos objects, there is a mix of usage of "size_t" and memory_space::size_type. For the host space, these are the same, but for cuda space, the size_type is defined to be unsigned, not size_t, so there is a chance that these could be out of sync (especially if someone writes their own memory space). In looking at the programming guide, I did not see guidance on this, although size_t seemed to be what the examples use. But when I look at the fenl example, it was using "unsigned" in its functor loops. Is "size_t"
always guaranteed to be "unsigned"? If not, then there may be some consistency issues in kokkos for different platforms/execution spaces. I have not explored that deeply. What do you recommend?

Can you add a section to the programming guide discussing indexing types?

I was under the impression that int can be faster than unsigned. Is this still the case? To be able to assess the impact of the index type, we define the index_size_type in the phalanx namespace which is set at compile time to change all functors at once:

for (PHX::index_size_type i=0; i <
Teuchos::asPHX::index_size_type(a.dimension_1()), ++i)

Again, I'd like to avoid the static cast (especially when drekar/Charon2 have 200+ functors) but it doesn't seem safe in the current design.

Also, I noticed that you are including stddef.h to get at size_t definition, which pollutes the global namespace with size_t. Why isn't "size_t" defined in the kokkos namespace instead?

Thanks!
Roger



I recommend:

const int n = a.dimension_1();
for ( int i = 0 ; i < n ; ++i )

as long as you know n < 2^31

--Carter



I have actually done that for some of the functors in the phalanx examples. Well, actually I use:

const int n = Teuchos::as(a.dimension_1());
for (int i=0; i < n; ++i)

since the Teuchos::as should check in debug builds for the integer size violation.

It just seems to be alot of superfluous code to write (especially when having to change over so many functors). I can see people not wanting to adopt kokkos when the basic loop mechanism is not as simple as what they are used to writing:

for (int i=0; i < a.dimension_1(); ++i)

(that's assuming they care about compiler warnings).

Is there a clear boundary on when to use memory_space::size_type vs size_t?

Roger



Hi

there is no good way to solve the issue (but maybe you have an idea).
The underlying issue is that we need to support more than 32bit, while in many cases it is better to use int as the loop index. In particular with Vectorization that can help significantly with performance. Now there are not many nice options.

(i) Template views on size_type: Good you can decide what to do, Bad: another template parameter
(ii) Template dimension call on size_type: Good slightly less code than casting, we can internally do the size checking, Bad: you always need to write at least dimension_0<>()
(iii) You cast the thing: Good: no additional interface stuff for Kokkos, Bad: more code to write
(iv) We always use size_t: Good smallest interface: Bad up to 30% observed performance degradation vs using int in index calculation heavy code.

Personally I was in favour of (iii) in particular since many people will not care about warnings, and the ones who do are typically prepared to write more code.
If you have other suggestions / thoughts let us know.

Christian



I am in the early stages of a complete refactorization of the View internals, so now is actually a good time to address the issue.

Some issues to consider given: for (int i=0; i < a.dimension_1(); ++i)

  • Could the potential dimensions of 'a' be > 2^31 ?
  • Optimizers tend to do better with 'int' as the iteration type.
  • Assuming all dimension bounds are 64bit consumes more registers.
  • If 'a' is non-const the compiler must pessimistically assume 'a.dimension_1()' could change, preventing optimization.

An idea: for ( int i = 0 ; i < a.dimension<int,1>() ; ++i )

  • Hide the cast
  • Could in debug mode check that the dimension extent is OK to cast

--Carter



Hi

we already have the dimension(int) call which accepts the runtime dimension id.
This is useful for loops over dimensions etc.

Does that call collide with a
dimension<int,3>()
or
dimension(3)

Christian



Currently, our functors in Albany, Charon2 and Drekar are using the dimension(int). If this is not compatible with template versions, then it would require us to coordinate the change of all 200+ functors when the kokkos change is pushed. Yuck! But writing the template parameter on dimension is certainly less work than writing all the casting. Should we talk about this more at the Wednesday meeting with the larger group?

Roger



Thats exactly my thought.

One option would be to use the index_type you give as the return type for dimension(int)

Christian



yes. That may be the way to go.

One other comment. I believe that templating the dimension methods will require the "template" keyword when calling methods from templated code (see attached example). This can be confusing for non-template savvy users since it is not always obvious what the true error is. This might be worse than the casting for some user bases.

Roger



Yeah - that's why we have 'dimension_1()' and not 'dimension<1>()'

Carter



One problem with doing

template
iType dimension(const iType& k);

is that k can always be an int (or short for that matter), but the return value might need to be larger.

As I said no great solutions to the the issue.

Christian



Since it could impact performance if the user picks the wrong index type, maybe just add a discussion on this to the user guide - possible options, the advantages and drawbacks, etc... Maybe I missed it and it is already there is some form?

Roger



I'd really like a better API solution, just haven't found one yet that covers all the issues.

Another 'brainstorming' thought:
for ( int i = 0 ; i < dimension<0,int>(a) ; ++i )
vs.
for ( int i = 0 ; i < a.template dimension<0,int>() ; ++i )
vs.
for ( int i = 0 ; i < static_cast( a.dimension_0() ) ; ++i )

At least if Kokkos is in charge then the cast can be verified compatible.

--Carter



Just thought of something. In phalanx, we are wrapping the kokkos views in a phalanx mdfield object, so the issue with a backwards compatible change to kokkos could be controlled (and removed from influencing the best long term solution).

I like the template version, its just the debugging when forgetting to write "template" that is tough for people.

Could a non-member function(s) be useful here? I could imagine hard coding for certain types like int to avoid template argument:

for (int i=0; i < Kokkos::dim_int(a,1); ++i)

Maybe that just makes things harder.

Roger



If we don't mind verbose member function names then:

a.dimension_0_int()
a.dimension_0_long()
a.dimension_1_int()
a.dimension_1_long()

for the fundamental signed integral types.

You mentioned you are using 'dimension(0)' instead of 'dimension_0()' ? Necessary for functionality or desired for syntax? The '(n)' version has an 'if-else-if-...' implementation.

--Carter



For backwards compatibility. The phalanx mdfield originally had dimension(n) to be consistent with shards array and intrepid (man that was a long time ago). But we have equivalent dimension_n() now. We can't afford to change all evaluators for kokkos at this point, so we support old syntax and update evaluators as needed to the new syntax.

KOKKOS_FORCEINLINE_FUNCTION
size_type dimension_7() const
{return m_field_data.dimension_7();}

template<typename iType>
KOKKOS_FORCEINLINE_FUNCTION
size_type dimension(const iType& ord) const;

Roger



I have usecases where it is much more convenient to loop over dimensions when manually unroll. For those I use the dimension(n) call.

Christian



@crtrott
Copy link
Member

crtrott commented Jun 18, 2015

I still feel like the best option is to leave in the static_cast as the method of choice:
(i) its only for warning prevention
(ii) people who care about those warnings know what to do, people who don't care might not be familiar with a.template dimension(0) syntax (the .template part of it). This would hurt adoption more I think than the warning thingy.

That said if you really want something I think we need a templated option. The reason is that I imagine templating a functor on the index type and instantiating both a 32 bit and 64 bit variant depending on my problem size. We already do that in the Trilinos Kernels package for Sparse MatVec.
But I don't want to get rid of the non-templated dimension function. Therefore my proposal:

template<typename RETURN_TYPE, typename ITYPE>
const RETURN_TYPE& dimension_cast(const ITYPE& dim);

Any thoughts?

@crtrott
Copy link
Member

crtrott commented Jun 18, 2015

P.S. this _cast would also imply that we might do a downcast safety check.

@rppawlo
Copy link
Contributor Author

rppawlo commented Jun 18, 2015

If we use Teuchos::as instead of static_cast, then in debug builds, this should check that the downcast in within the limits. We should check this though. Since SIERRA builds with warnings as errors, this will definitely impact their entire dev team.

@crtrott
Copy link
Member

crtrott commented Jun 18, 2015

We can't use Teuchos::as since Kokkos doesn't depend on Teuchos. You can use it for manual user side casts though.

@mhoemmen
Copy link
Contributor

Roger: Teuchos::as never actually did what it promised until I rewrote parts of it a few years ago. You should check Teuchos_as.hpp to make sure that it works like you think it should in a debug build. Furthermore, if the loop is long enough that using int helps performance, then why not always check for overflow? Unit tests generally don't exercise big problems, but end users (who build RELEASE) do.

@rppawlo
Copy link
Contributor Author

rppawlo commented Jun 19, 2015

Mark - good point - I agree that check like that only make sense at
large scale where quite often a general debug mode flag could make the
entire code too slow due to too many checks. But if there is a chance
on impacting performance, I would like to be able to disable. It should
probably be a variable that the user can specifically toggle at
configure time.

On 06/18/2015 05:54 PM, Mark Hoemmen wrote:

Roger: Teuchos::as never actually did what it promised until I rewrote
parts of it a few years ago. You should check |Teuchos_as.hpp| to make
sure that it works like you think it should in a debug build.
Furthermore, if the loop is long enough that using |int| helps
performance, then why not /always/ check for overflow? Unit tests
generally don't exercise big problems, but end users (who build
RELEASE) do.


Reply to this email directly or view it on GitHub
#28 (comment).

@maherou
Copy link

maherou commented Aug 7, 2015

Kokkos developers: Please resolve this situation by either declaring that the API will or will not change to accommodate the need for indexing needs. Users need to know how to best go forward. Thanks.

@crtrott
Copy link
Member

crtrott commented Aug 7, 2015

I didn't hear from users yet whether the proposed solution of a RETURN_TYPE dimension_cast<typename RETURNTYPE, typename ITYPE>(ITYPE i); as a member of View would be a better solution for anybody than what we do now. The loop would look like this:

for(int i=0; i < a.template dimension_cast(0); i++) {
}

As opposed to the current:

for(int i=0; i < static_cast(a.dimension(0)); i++) {
}

I don't see much of a benefit in readability or line length.

The other option mentioned earlier is named functions:

for(int i=0; i < a.dimension_int(0); i++) {
}

But I don't like that much because it means I need a lot of different member functions.

Christian

@crtrott
Copy link
Member

crtrott commented Aug 7, 2015

Ok to clarify again: I want expressed user preferences. So far I only have "we don't like using static_cast".

@rppawlo
Copy link
Contributor Author

rppawlo commented Aug 7, 2015

I would probably avoid:

for(int i=0; i < a.template dimension_cast(0); i++)

Because it forces users to remember to add the template keyword. You
will spend your days answering this error for many users. So would lean
towards static_cast instead.

How about an extra template parameter on the kokkos view that allows
users to define the dimension_X() return types (that defaults to the
node type definition as currently implemented)?

Roger

On 08/07/2015 01:44 PM, Christian Trott wrote:

I didn't hear from users yet whether the proposed solution of a
RETURN_TYPE dimension_cast(ITYPE i); as a member of View would be a
better solution for anybody than what we do now. The loop would look
like this:

for(int i=0; i < a.template dimension_cast(0); i++) {
}

As opposed to the current:

for(int i=0; i < static_cast(a.dimension(0)); i++) {
}

I don't see much of a benefit in readability or line length.

The other option mentioned earlier is named functions:

for(int i=0; i < a.dimension_int(0); i++) {
}

But I don't like that much because it means I need a lot of different
member functions.

Christian


Reply to this email directly or view it on GitHub
#28 (comment).

@mhoemmen
Copy link
Contributor

mhoemmen commented Aug 7, 2015

I seriously don't see this as a bug. The same issue comes up with std::vector, whose size() method returns size_t (usually unsigned). Furthermore, using int exposes users to run-time overflow errors, if they iterate over more than 2 billion things. Kokkos requires C++11, so users could always do this:

for (decltype (a.dimension (0)) i = 0; i < a.dimension (0); ++i) { ... }

Furthermore, if users want int for performance reasons, the code should make it obvious that they are doing a cast, in order to help future debugging.

Users have already decided to write in C++, not Lisp; fixnums don't automatically get promoted to bignums. The preferred common case should always be correct, even if that sacrifices a bit of convenience.

@hcedwar
Copy link
Contributor

hcedwar commented Aug 7, 2015

The probable best (least worst) is to fix the return type to 'size_t' even when the internal value is something else (e.g., unsigned on GPU). That fix is compatible with C++ usage and maintains the exact same usability warning issues as with standard C++ constructs:
std::vector a(N);
for ( int i = 0 ; i < a.size(); ++i ) {...} // warning about signed/unsigned comparison
User who know they have N <= 2^31, want to be warning free, and want every bit of performance should do:
const int n = a.size();
for ( int i = 0 ; i < n ; ++i ) { ... }

@eric-c-cyr
Copy link

In response to "I want expressed user preferences. " I think this would be ideal (and the thing I ideally want)

for(int i=0;i<a.dimension(0);i++)

The static_cast is frustrating, because it takes a simple loop and makes it longer with some "C++ Magic" included. Additionally, I find people, even "experienced" C++ developers, don't like any calls *_cast (this baffles me, but I think its accurate).

One option would be to have a non member function that does this for you...

for(int i=0;i<dimension(0,a);i++)

Of course this function would have to be in the namespace, to get rid of the Kokkos (alternatively its an ad that "this loop brought to you by Kokkos"!). This would automatically convert to int with a static cast underneath. A few problems though...

  1. The int as default may make it inflexible for the future. Though a simple compile time type definition would conceivably make this go away.
  2. There may be many such non member functions
  3. You may have already considered and dismissed this as an option

@hcedwar
Copy link
Contributor

hcedwar commented Aug 7, 2015

Just need to know how to handle situation when 2^31 < dimension(r)

@eric-c-cyr
Copy link

Tough luck I suppose. You can always detect this (right, I honestly don't know) and throw an informative error suggesting the user recompile with a bigger type used in the method.

@hcedwar
Copy link
Contributor

hcedwar commented Aug 7, 2015

Can detect at the runtime expense of checking at every loop iteration.

@eric-c-cyr
Copy link

No you can't, though the current "static_cast" situation is not better is it?

@hcedwar
Copy link
Contributor

hcedwar commented Aug 7, 2015

static_cast does not check at runtime, it just trusts that you know what you are doing.

@eric-c-cyr
Copy link

All this nonmember function does is hides a static cast. It will allow you to interject with a debug mode though...

@hcedwar
Copy link
Contributor

hcedwar commented Aug 7, 2015

Is all you (as our user representative) want is an 'int' interface? If so then we could boil down to just two methods:
size_t dimension(unsigned) const ;
int dimension_int(unsigned) const ;
We would in the documentation warn that 'dimension_int' may silently truncate and you will only know it if you turn on bounds checking.
Would this suffice?

@crtrott
Copy link
Member

crtrott commented Aug 7, 2015

Eric I don't get how the non member function would solve the casting problem if its not templated.

One of those will do a warning:
for(int i=0;i<dimension(0,a);i++)
for(size_t i=0;i<dimension(0,a);i++)

@mhoemmen
Copy link
Contributor

mhoemmen commented Aug 7, 2015

It could work if you pass in i instead of a.

@etphipp
Copy link
Contributor

etphipp commented Aug 7, 2015

I have to say, I really don't understand what's wrong with this approach:

const int n = a.dimension_0();
for (int i=0; i<n; ++i) { ... }

It's clear, avoids any issues with signed/unsigned comparisons regardless of what the return type of dimension_0() is, makes debugging easier and saves the compiler from having to optimize away the repeated calls to dimension_0(). Every single kokkos kernel I have written does it this way, and its a good habit to get into in general.

-Eric

From: Mark Hoemmen <notifications@github.commailto:notifications@github.com>
Reply-To: kokkos/kokkos <reply@reply.github.commailto:reply@reply.github.com>
Date: Friday, August 7, 2015 at 3:36 PM
To: kokkos/kokkos <kokkos@noreply.github.commailto:kokkos@noreply.github.com>
Subject: [EXTERNAL] Re: [kokkos] index and dimension types in for loops (#28)

It could work if you pass in i instead of a.

Reply to this email directly or view it on GitHubhttps://github.com//issues/28#issuecomment-128838783.

@mhoemmen
Copy link
Contributor

mhoemmen commented Aug 7, 2015

@etphipp It probably even reduces compilation time. The compiler no longer has to try to prove that the dimension of a (which is nonconst) can change during the loop.

@hcedwar
Copy link
Contributor

hcedwar commented Aug 7, 2015

const int n = a.dimension_0();
This is absolutely "best practice," with the caveat that you will get a warning from the implicit conversion to 'int'. Thus the offer to add just the 'dimension_int(0)' interface to suppress warnings and add optional bounds checking.

@crtrott
Copy link
Member

crtrott commented Aug 7, 2015

I agree that that is the best practice. I am still opposed a bit to add dimension_int() because I believe that either you don't care as a user or you are prepared to write:

const int n = static_cast(a.dimension_0());

Or whatever your preferred method of conversion is.

Christian

@eric-c-cyr
Copy link

Christian, it would behave exactly like dimension_int. I don't think its any different really so it would contain a static cast, and I would just force int or some specified index_type that was guaranteed to be fast.

I don't mind the structure:

const int n = a.dimension_0();
for (int i=0; i<n; ++i) { ... }

Though the warning is a little disconcerting and it does increase the size of my loop bodies visually which I'm not in love with. Particularly with nested loops where I might not put braces.

Regarding the static_cast. Its clear that cut and paste might be sufficient. Though that depends on how sloppy people are. What I think is likely is that a project will have lots of developer who don't care, and a few (say Mark) who do and it becomes incumbent on them to clean the warnings periodically. Is that acceptable (Mark?)? My concern (and at this point I will go with whatever decision you make soon) is that this creates yet another maintenance headache and one that is rather silly.

Is there a sense that unsigned will be more uniformly supported in the future?

In the end I'm fine with whatever decision you go with, as long as one is made.

@eric-c-cyr
Copy link

After a little more thought, I think I prefer the dimension_int approach. It has two aspects I like

  1. Keeping static_casts from the user code (these should be exceptional as opposed to ordinary)
  2. Keeping my loop bodies visually compact

Now I'm done (really)

@crtrott
Copy link
Member

crtrott commented Aug 9, 2015

a) does anybody second eric?
b) does that mean you also want a dimension_long, dimension_int64, dimension_size_t and dimension_unsigned?

@hcedwar
Copy link
Contributor

hcedwar commented Aug 10, 2015

No, only the following
size_t dimension(i)
int dimension_int(i)

@rppawlo
Copy link
Contributor Author

rppawlo commented Aug 10, 2015

I will second Eric Cyr's request. Just add the one "int" return type.

Just to be clear, the dimension(i) is not guaranteed to return size_t.
It returns a typedef traits::size_type in the view. This trait is
specified by the node type. It is almost always typedef'd to size_t,
but I think the cuda node(?) returns a different value (at least one
node type did when I traced through this a few months ago).

Roger

On 08/10/2015 10:25 AM, hcedwar wrote:

No, only the following
size_t dimension(i)
int dimension_int(i)


Reply to this email directly or view it on GitHub
#28 (comment).

@mhoemmen
Copy link
Contributor

It would make sense for the function to raise an error on overflow, at least in a debug build. The documentation should also point out that this function is a compromise and show readers what they should do instead.

@rppawlo
Copy link
Contributor Author

rppawlo commented Jan 7, 2016

I think we came to the conclusion in this issue that kokkos would hard code the int return type for dimension to the view:

int dimension_int(i)

Did this ever get added? Any time frame for this?

@hcedwar
Copy link
Contributor

hcedwar commented Jan 9, 2016

The Kokkos::Experimental::View now has 'int extent_int(i)'. The name 'extent' was used instead of 'dimension' for compatibility with 'std::extent' vocabulary, and the multidimensional array view proposal going forward in the ISO C++ committee.

@hcedwar hcedwar added Enhancement Improve existing capability; will potentially require voting InDevelop labels Jan 9, 2016
@rppawlo
Copy link
Contributor Author

rppawlo commented Jan 11, 2016

Thanks Carter! A few more questions:

  1. Since this is only in the experimental view, what's the time frame for making the experimental view the default?
  2. Will there be corresponding functions for extent_int_0(), extent_int_1(), ... as well?

@hcedwar
Copy link
Contributor

hcedwar commented Jan 11, 2016

  1. ASAP - had targeted pre-holiday, but the best laid plans...
  2. No need, given 'constexpr' declaration then extent_int(0) is just as optimized as extent_int_0(), as long as the calling argument is a literal value or constexpr value and not a variable.

@hcedwar
Copy link
Contributor

hcedwar commented Mar 17, 2016

The new view has this in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improve existing capability; will potentially require voting
Projects
None yet
Development

No branches or pull requests

7 participants