OpenCL error: Irreducible ControlFlow Detected #2986

jchodera · 2021-01-18T00:39:31Z

Any idea what might cause an error like this (on the Folding@home version, core22 0.0.14)?

Failed to create OpenCL context:
Error compiling kernel: "C:\Users\Owner\AppData\Local\Temp\OCL5264T24.cl", line 21: warning: OpenCL
          extension is now part of core
  #pragma OPENCL EXTENSION cl_khr_fp64 : enable
                           ^

Error:E010:Irreducible ControlFlow Detected

The configuration is:

************************************ System ************************************
        CPU: Intel(R) Pentium(R) CPU G840 @ 2.80GHz
     CPU ID: GenuineIntel Family 6 Model 42 Stepping 7
       CPUs: 2
     Memory: 15.98GiB
Free Memory: 11.53GiB
    Threads: WINDOWS_THREADS
 OS Version: 6.2
Has Battery: false
 On Battery: false
 UTC Offset: -5
        PID: 5264
        CWD: C:\ProgramData\FAHClient\work
************************************ OpenMM ************************************
   Revision: 189320d0
********************************************************************************
  -- 0 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 2.1 AMD-APP (3188.4)
  NAME = AMD Accelerated Parallel Processing
  VENDOR = Advanced Micro Devices, Inc.

(1) device(s) found on platform 0:
  -- 0 --
  DEVICE_NAME = Capeverde
  DEVICE_VENDOR = Advanced Micro Devices, Inc.
  DEVICE_VERSION = OpenCL 1.2 AMD-APP (3188.4)
  DRIVER_VERSION = 3188.4

cc: https://foldingforum.org/viewtopic.php?p=348173#p348173

The text was updated successfully, but these errors were encountered:

peastman · 2021-01-18T00:51:18Z

The part about the extension is just a warning. You can ignore it.

Regarding the error, this looks relevant: https://community.khronos.org/t/errorirreducible-controlflow-detected/1986. Is that all we have in the log? Unfortunately, it doesn't give any indication about what kernel is causing the problem.

jchodera · 2021-01-18T01:45:06Z

It's likely one of the Custom forces since the next WU did not have any of those and ran successfully.

Would we need to try to build and run a system with one Force at a time in order to debug? Is there a way to step through compiling kernel by kernel?

peastman · 2021-01-18T04:09:54Z

If we can add debugging code to the core, we could just have it print out the source of each kernel before compiling it.

bb30994 · 2021-01-18T15:20:39Z

We've seen the "Irreducible ControlFlow Detected" message before, though not frequently enough to identify a pattern. Is there a reasonable way to add in-line diagnostic information to that particular error? That just isn't enough information to answer John's question.

bdenhollander · 2021-01-21T01:42:33Z

Regarding the error, this looks relevant: https://community.khronos.org/t/errorirreducible-controlflow-detected/1986. Is that all we have in the log? Unfortunately, it doesn't give any indication about what kernel is causing the problem.

I searched through all the .cl and .cc for loops based on that thread and found one instance where the starting condition is specified outside. It should be obvious to the compiler but it is stylistically inconsistent with the rest of the code base.

openmm/platforms/common/src/kernels/verlet.cc

Lines 57 to 58 in fce2608

    
           int index = GLOBAL_ID; 
        
           for (; index < numAtoms; index += GLOBAL_SIZE) {

I don't know what this while loop does but it looks suspicious since tbx is unchanged.

openmm/platforms/common/src/kernels/gbsaObc.cc

Lines 206 to 220 in fce2608

    
           // Skip over tiles that have exclusions, since they were already processed. 
        
           SYNC_WARPS; 
        
           while (skipTiles[tbx+TILE_SIZE-1] < pos) { 
        
               SYNC_WARPS; 
        
               if (skipBase+tgx < NUM_TILES_WITH_EXCLUSIONS) { 
        
                   int2 tile = exclusionTiles[skipBase+tgx]; 
        
                   skipTiles[LOCAL_ID] = tile.x + tile.y*NUM_BLOCKS - tile.y*(tile.y+1)/2; 
        
               } 
        
               else 
        
                   skipTiles[LOCAL_ID] = end; 
        
               skipBase += TILE_SIZE;             
        
               currentSkipIndex = tbx; 
        
               SYNC_WARPS; 
        
           }

customGBValueN2.cc and nonbonded.cl have similar loops. The CPU version looks more likely to break out of the loop.

openmm/platforms/common/src/kernels/gbsaObc_cpu.cc

Lines 212 to 221 in fce2608

    
           // Skip over tiles that have exclusions, since they were already processed. 
        
           while (nextToSkip < pos) { 
        
               if (currentSkipIndex < NUM_TILES_WITH_EXCLUSIONS) { 
        
                   int2 tile = exclusionTiles[currentSkipIndex++]; 
        
                   nextToSkip = tile.x + tile.y*NUM_BLOCKS - tile.y*(tile.y+1)/2; 
        
               } 
        
               else 
        
                   nextToSkip = end; 
        
           }

peastman · 2021-01-21T20:16:53Z

That loop is scanning through the exclusionTiles array to find a particular index. The exit condition isn't based on tbx changing. It's based on the values of the latest data that got loaded into skipTiles.

Did the system with the error involve implicit solvent? What integrator did it use? That will tell us whether the above code could be related.

jchodera · 2021-01-22T01:50:41Z

This was FAH project 13438, for the COVID Moonshot, which involves a hybrid alchemical system with a good number of Custom*Force terms and NonbondedForce perturbation groups.

I've attached serialized XML files of the RUN that failed if that is of interest.

PROJ13438-RUN12681.zip

peastman · 2021-01-22T18:41:40Z

It doesn't have a GBSAOBCForce, and it uses a CustomIntegrator instead of a VerletIntegrator. So none of the loops mentioned above is involved.

peastman · 2021-01-23T22:40:18Z

I can't reproduce this on an AMD Navi GPU. The following script runs without problems.

from simtk.openmm import *

system = XmlSerializer.deserialize(open('system.xml').read())
integrator = XmlSerializer.deserialize(open('integrator.xml').read())
state = XmlSerializer.deserialize(open('state.xml').read())
context = Context(system, integrator, Platform.getPlatformByName('OpenCL'))
context.setState(state)
context.getState(getForces=True)

It's a different GPU of course, and also a different OS (Ubuntu 20.04). Cape Verde is a pretty old GPU, released in 2012.

jchodera · 2021-01-23T23:01:01Z

Thanks for trying this out!

Is there any instrumentation we can add to the core to bring back more information?

Failing that, we will keep trying to find someone experiencing this issue.

It's unclear to me whether Cape Verde refers to the first release in 2012 or the architecture, which has been in production for many years.

jchodera · 2021-01-23T23:03:17Z

Hm, I might be misreading the info about which GPUs featured Cape Verde:
https://www.techpowerup.com/gpu-specs/amd-cape-verde.g100

peastman · 2021-01-23T23:40:11Z

Cape Verde was a specific GPU. It was based on the GCN 1.0 architecture.

If you can find someone who is experiencing the problem, we definitely could create an instrumented core that would provide more information.

weisspe · 2021-01-25T17:01:42Z

I have a cape verde card and am currently experiencing this issue on Windows.

I wasn't getting it last week, I think my last GPU work unit was completed Friday night and nothing has changes as far as I know since then. I have both windows updates and my graphics drivers updates configured to notify me of available updates but not install anything automatically so I feel confident that.

I'd be happy to run a modified version of FAH to gather more information about this issue. Given that this started without any software changes it is possible this issue is related to the work units being issued by the server or something else that could change and 'fix' itself on it's own so we'll have to cross our fingers it continues long enough to test.

peastman · 2021-01-25T17:42:07Z

Great! @jchodera are you set up for building cores? Here are the lines where it compiles kernels:

openmm/platforms/opencl/src/OpenCLContext.cpp

Lines 616 to 622 in 9008050

    
           cl::Program::Sources sources({src.str()}); 
        
           cl::Program program(context, sources); 
        
           try { 
        
               program.build(vector<cl::Device>(1, device), options.c_str()); 
        
           } catch (cl::Error err) { 
        
               throw OpenMMException("Error compiling kernel: "+program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(device)); 
        
           }

Immediately before those lines, add the line

cout<<src.str()<<endl;

That will make it print the source for each kernel to the console (which I believe gets redirected to one of the logs?) just before attempting to compile it. Then we can see what the last kernel was it attempted to compile.

weisspe · 2021-01-25T19:37:54Z

I haven't done any sort of development for the project so I'm not sure if I'm setup for building cores. Based on the log messages it seems like the folding at home client may be doing the building for me. If you can point me to the common location for the core code I'd be happy to modify it and see what I get in my logs.

weisspe · 2021-01-25T19:40:13Z

I just realized that I have the file path from the logs, so that solves that. However that's a temp directory and no longer exists for me. It seems like the folding at home client is downloading the kernel code and cleaning it up rather quickly so I'm not sure the best way to jump in an interfere. Any suggestions?

jchodera · 2021-03-01T18:42:56Z

Apologies we haven't been able to make progress on this yet. We're still working on automating core22 builds with @dotsdl but hope to have something soon we can use to help debug this.

jchodera · 2021-03-01T18:44:48Z

This seems to be specific to Custom*Forces, since it's only appearing with my COVID Moonshot alchemical free energy calculations.

This issue may be related?

gunnarre · 2021-05-02T22:31:13Z

I am still seeing some of these errors on the Radeon 7770 HD under Windows 10 on project 13446.

22:20:16:WU00:FS00:0x22:*************************** Core22 Folding@home Core ***************************
22:20:16:WU00:FS00:0x22:       Core: Core22
22:20:16:WU00:FS00:0x22:       Type: 0x22
22:20:16:WU00:FS00:0x22:    Version: 0.0.13
(....)
22:20:16:WU00:FS00:0x22:************************************ OpenMM ************************************
22:20:16:WU00:FS00:0x22:   Revision: 189320d0
22:20:16:WU00:FS00:0x22:********************************************************************************
22:20:16:WU00:FS00:0x22:Project: 13446 (Run 6351, Clone 17, Gen 0)
22:20:16:WU00:FS00:0x22:Unit: 0x00000000000000000000000000000000
22:20:16:WU00:FS00:0x22:Reading tar file core.xml
22:20:16:WU00:FS00:0x22:Reading tar file integrator.xml.bz2
22:20:16:WU00:FS00:0x22:Reading tar file state.xml.bz2
22:20:16:WU00:FS00:0x22:Reading tar file system.xml.bz2
22:20:16:WU00:FS00:0x22:Digital signatures verified
22:20:16:WU00:FS00:0x22:Folding@home GPU Core22 Folding@home Core
22:20:16:WU00:FS00:0x22:Version 0.0.13
22:20:17:WU00:FS00:0x22:  Checkpoint write interval: 50000 steps (5%) [20 total]
22:20:17:WU00:FS00:0x22:  JSON viewer frame write interval: 10000 steps (1%) [100 total]
22:20:17:WU00:FS00:0x22:  XTC frame write interval: 250000 steps (25%) [4 total]
22:20:17:WU00:FS00:0x22:  Global context and integrator variables write interval: 25000 steps (2.5%) [40 total]
22:20:17:WU00:FS00:0x22:There are 3 platforms available.
22:20:17:WU00:FS00:0x22:Platform 0: Reference
22:20:17:WU00:FS00:0x22:Platform 1: CPU
22:20:17:WU00:FS00:0x22:Platform 2: OpenCL
22:20:17:WU00:FS00:0x22:  opencl-device 0 specified
22:20:34:WU00:FS00:0x22:Attempting to create OpenCL context:
22:20:34:WU00:FS00:0x22:  Configuring platform OpenCL
22:20:42:WU00:FS00:0x22:Failed to create OpenCL context:
22:20:42:WU00:FS00:0x22:Error compiling kernel: "C:\Users\admin\AppData\Local\Temp\OCL6916T24.cl", line 21: warning: OpenCL
22:20:42:WU00:FS00:0x22:          extension is now part of core
22:20:42:WU00:FS00:0x22:  #pragma OPENCL EXTENSION cl_khr_fp64 : enable
22:20:42:WU00:FS00:0x22:                           ^
22:20:42:WU00:FS00:0x22:
22:20:42:WU00:FS00:0x22:Error:E010:Irreducible ControlFlow Detected
22:20:42:WU00:FS00:0x22:ERROR:125: Failed to create a GPU-enabled OpenMM Context.
22:20:42:WU00:FS00:0x22:Saving result file ..\logfile_01.txt
22:20:42:WU00:FS00:0x22:Saving result file science.log
22:20:42:WU00:FS00:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
22:20:42:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
22:20:42:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:13446 run:6351 clone:17 gen:0 core:0x22 unit:0x000000110000000000003486000018cf
22:20:42:WU00:FS00:Uploading 2.82KiB to 54.157.202.86
22:20:42:WU00:FS00:Connecting to 54.157.202.86:8080
22:20:43:WU00:FS00:Upload complete
22:20:43:WU00:FS00:Server responded WORK_ACK (400)
22:20:43:WU00:FS00:Cleaning up

jchodera · 2021-05-31T18:00:37Z

@peastman Did we ever figure out where this is coming from? I'm still seeing a ton of this on Folding@home.

peastman · 2021-06-01T01:42:03Z

Not that I know of. I gave some suggestions above on how we could begin tracking it down.

bdenhollander · 2023-10-28T18:42:29Z

Double precision FP was an extension to OpenCL 1.0 and 1.1. It became an optional part of OpenCL 1.2 but the the extension was kept for backwards compatibility. Alternatively, clGetDeviceInfo can be used to check that CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE is greater than 0 to confirm a device supports double precision FP. Listing cl_khr_fp64 in CL_DEVICE_EXTENSIONS is still required in OpenCL 3.0 (pg. 77) so it will continue to be valid as a check for double precision.

An overzealous driver was probably to blame for throwing a warning when cl_khr_fp64 was explicitly enabled on OpenCL 1.2+.

openmm/platforms/opencl/src/OpenCLContext.cpp

Lines 606 to 607 in 76520ce

    
           if (supportsDoublePrecision) 
        
               src << "#pragma OPENCL EXTENSION cl_khr_fp64 : enable\n";

Wrapping this pragma inside an OpenCL version check may avoid having the issue reappear.

peastman · 2023-10-30T15:47:05Z

A PR removing the pragma would be welcome! There's no need for a version check. We don't support versions earlier than 1.2 anymore.

Closes openmm#2986

jchodera added the bug label May 31, 2021

bdenhollander added a commit to bdenhollander/openmm that referenced this issue Oct 30, 2023

Remove OpenCL cl_khr_fp64 pragma no longer required in OpenCL 1.2

063f21d

Closes openmm#2986

bdenhollander mentioned this issue Oct 30, 2023

Remove OpenCL cl_khr_fp64 pragma no longer required in OpenCL 1.2 #4289

Merged

peastman changed the title ~~OpenCL error: extension is now part of core~~ OpenCL error: Irreducible ControlFlow Detected Oct 30, 2023

peastman closed this as completed in 335820c Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenCL error: Irreducible ControlFlow Detected #2986

OpenCL error: Irreducible ControlFlow Detected #2986

jchodera commented Jan 18, 2021 •

edited

peastman commented Jan 18, 2021

jchodera commented Jan 18, 2021

peastman commented Jan 18, 2021

bb30994 commented Jan 18, 2021 •

edited

bdenhollander commented Jan 21, 2021

peastman commented Jan 21, 2021

jchodera commented Jan 22, 2021

peastman commented Jan 22, 2021

peastman commented Jan 23, 2021

jchodera commented Jan 23, 2021

jchodera commented Jan 23, 2021

peastman commented Jan 23, 2021

weisspe commented Jan 25, 2021 •

edited

peastman commented Jan 25, 2021

weisspe commented Jan 25, 2021

weisspe commented Jan 25, 2021

jchodera commented Mar 1, 2021

jchodera commented Mar 1, 2021

gunnarre commented May 2, 2021

jchodera commented May 31, 2021

peastman commented Jun 1, 2021

bdenhollander commented Oct 28, 2023

peastman commented Oct 30, 2023

OpenCL error: Irreducible ControlFlow Detected #2986

OpenCL error: Irreducible ControlFlow Detected #2986

Comments

jchodera commented Jan 18, 2021 • edited

peastman commented Jan 18, 2021

jchodera commented Jan 18, 2021

peastman commented Jan 18, 2021

bb30994 commented Jan 18, 2021 • edited

bdenhollander commented Jan 21, 2021

peastman commented Jan 21, 2021

jchodera commented Jan 22, 2021

peastman commented Jan 22, 2021

peastman commented Jan 23, 2021

jchodera commented Jan 23, 2021

jchodera commented Jan 23, 2021

peastman commented Jan 23, 2021

weisspe commented Jan 25, 2021 • edited

peastman commented Jan 25, 2021

weisspe commented Jan 25, 2021

weisspe commented Jan 25, 2021

jchodera commented Mar 1, 2021

jchodera commented Mar 1, 2021

gunnarre commented May 2, 2021

jchodera commented May 31, 2021

peastman commented Jun 1, 2021

bdenhollander commented Oct 28, 2023

peastman commented Oct 30, 2023

jchodera commented Jan 18, 2021 •

edited

bb30994 commented Jan 18, 2021 •

edited

weisspe commented Jan 25, 2021 •

edited