Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porting to POWER and remove assumption of X86 default #103

Closed
nmhamster opened this issue Oct 7, 2015 · 21 comments
Closed

Porting to POWER and remove assumption of X86 default #103

nmhamster opened this issue Oct 7, 2015 · 21 comments
Assignees
Labels
Enhancement Improve existing capability; will potentially require voting
Milestone

Comments

@nmhamster
Copy link
Contributor

Building unit tests for Kokkos on POWER yields error (see below). Seems default in Kokkos is to use X86 assembler which shouldn't be allowed?

g++ -I./ -I../../core/src -I../../containers/src -I../../algorithms/src --std=c++11 -I/home/projects/power8/gtest/20151006/src -I../../core/unit_test -I/home/projects/power8/gtest/20151006/include -I/home/projects/power8/gtest/20151006/src/src -O3 -c TestSynchronic.cpp
/tmp/ccJLQMjA.s: Assembler messages:
/tmp/ccJLQMjA.s:4451: Error: unrecognized opcode: rep' /tmp/ccJLQMjA.s:5024: Error: unrecognized opcode:rep'
/tmp/ccJLQMjA.s:5097: Error: unrecognized opcode: rep' /tmp/ccJLQMjA.s:5106: Error: unrecognized opcode:rep'
/tmp/ccJLQMjA.s:5144: Error: unrecognized opcode: rep' /tmp/ccJLQMjA.s:5697: Error: unrecognized opcode:rep'

@crtrott
Copy link
Member

crtrott commented Oct 7, 2015

Yeah I noticed that. This is experimental code to prototype synchronization mechanism for C++14 or 17 or something and not used in any real part of Kokkos. Should either be fixed or simply disabled for non-x86 builds.

@nmhamster
Copy link
Contributor Author

I'd rather have the low-level primitives abstracted out (possibly with a default slow mechanism) and then using assembly on X86. Better in my view to have the functionality across platforms (even if slow) than have only some features on some platforms.

@crtrott
Copy link
Member

crtrott commented Oct 7, 2015

What I meant is that this functionality is never used except in that unit test. So for now we can simply disable it for Power and ARM, and then later fix the code with correct stuff when we actually want to use it. I believe this was more of a side project. But Carter can comment more.

@hcedwar hcedwar changed the title Default Unit Test Build of Synchronic uses X86 Assembler (doesn't work on POWER or ARM) Porting to POWER and remove assumption of X86 default Oct 7, 2015
@hcedwar hcedwar added Feature Request Create new capability; will potentially require voting Enhancement Improve existing capability; will potentially require voting and removed bug - experimental feature Feature Request Create new capability; will potentially require voting labels Oct 7, 2015
@hcedwar
Copy link
Contributor

hcedwar commented Oct 7, 2015

Much bigger undertaking than synchronic. Need a systematic analysis/porting effort.

@crtrott
Copy link
Member

crtrott commented Oct 7, 2015

Btw: the fact that the unit tests caught this while all the miniApps and LAMMPS actually work and report back correct results (at least most of the times) confirms that our design intend of making the unit tests very susceptible to race conditions actually worked 👍

@jgfouca
Copy link
Contributor

jgfouca commented Oct 7, 2015

OK, I will fix.

@jgfouca
Copy link
Contributor

jgfouca commented Oct 7, 2015

Obvious question, anyone know of a POWER machine that I could jump onto to confirm the fix?

@nmhamster
Copy link
Contributor Author

white.sandia.gov

@crtrott
Copy link
Member

crtrott commented Oct 7, 2015

White. Its a testbed you can request from webcars.

On 10/07/2015 01:21 PM, James Foucar wrote:

Obvious question, anyone know of a POWER machine that I could jump onto to confirm the fix?


Reply to this email directly or view it on GitHub:
#103 (comment)

@jgfouca
Copy link
Contributor

jgfouca commented Oct 7, 2015

Ty, I should have known that...

@jgfouca
Copy link
Contributor

jgfouca commented Oct 7, 2015

I found these in the preprocessor list:

#define __powerpc64__ 1
#define __powerpc__ 1
#define _ARCH_PPCGR 1
#define __PPC64__ 1
#define _ARCH_PPC 1
#define __PPC__ 1
#define _ARCH_PPC64 1

Any preference on which one to check?

@jgfouca
Copy link
Contributor

jgfouca commented Oct 7, 2015

Sorry, I finally searched the right thing in google.

#if defined(__powerpc__) || defined(__ppc__) || defined(__PPC__)

@jgfouca
Copy link
Contributor

jgfouca commented Oct 7, 2015

OK, fix pushed to develop.

@jgfouca jgfouca closed this as completed Oct 7, 2015
@jgfouca jgfouca reopened this Oct 7, 2015
@jgfouca
Copy link
Contributor

jgfouca commented Oct 7, 2015

Carter pointed-out that this issue goes beyond just disabling this unit test. What he'd like to see is all non-synchronic unit tests passing on POWER before this can be closed. With that in mind, should we assign someone else to own to task?

@nmhamster
Copy link
Contributor Author

We are looking at this. It's going to need some work on POWER atomics which I've been prototyping before we can make progress.

S

Si Hammond
Scalable Computer Architectures
Sandia National Laboratories, NM
[Sent remotely, please excuse typing errors]


From: James Foucar notifications@github.com
Sent: Wednesday, October 7, 2015 3:57:01 PM
To: kokkos/kokkos
Cc: Hammond, Simon David (-EXP)
Subject: [EXTERNAL] Re: [kokkos] Porting to POWER and remove assumption of X86 default (#103)

Carter pointed-out that this issue goes beyond just disabling this unit test. What he'd like to see is all non-synchronic unit tests passing on POWER before this can be closed. With that in mind, should we assign someone else to own to task?

Reply to this email directly or view it on GitHubhttps://github.com//issues/103#issuecomment-146344600.

@jgfouca jgfouca assigned nmhamster and unassigned jgfouca Oct 7, 2015
@crtrott
Copy link
Member

crtrott commented Nov 23, 2015

I think we need to do this stepwise:
(i) Fix the OpenMP backend now in Issue #139
(ii) Fix the Threads backend now in Issue #140
(iii) Fix atomics. now in Issue #141

@crtrott
Copy link
Member

crtrott commented Dec 11, 2015

The OpenMP backend did now pass the unit tests.

@nmhamster
Copy link
Contributor Author

I am still seeing failures in the TeamReduces on POWER8 Pthreads but think I have a fix for that, testing now.

@crtrott
Copy link
Member

crtrott commented Dec 17, 2015

Yeah I didn't touch the Threads backend yet. I actually would be surprised if its just the TeamReduce which is failing, or did someone else add stuff to Pthreads? In the OpenMP backend I had to fix the fan_in, fan_out on the team level.

@nmhamster
Copy link
Contributor Author

As a minimum the TeamReduction in develop is failing about 1in 4 or so tests. The others are still probably error prone but we don't see because of so few threads. I can see lots more memory_fence() calls have been added to the code which is probably helping but, I'm trying to work out how to switch these into store_fence and load_fence where possible. There is a branch to optimize the fences and a separate branch to fix dispatch issues on weakly ordered memory systems (P8, ARM ..).

@hcedwar hcedwar added this to the Spring 2016 milestone Mar 17, 2016
@hcedwar
Copy link
Contributor

hcedwar commented Apr 1, 2016

Additional Power8 + CUDA porting issue: trilinos/Trilinos#258

@hcedwar hcedwar closed this as completed May 4, 2016
@hcedwar hcedwar reopened this May 4, 2016
@hcedwar hcedwar closed this as completed May 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improve existing capability; will potentially require voting
Projects
None yet
Development

No branches or pull requests

4 participants