Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gunrock races reported in PLDI 2018 paper #344

Open
jowens opened this issue Jun 26, 2018 · 6 comments
Open

Gunrock races reported in PLDI 2018 paper #344

jowens opened this issue Jun 26, 2018 · 6 comments
Assignees

Comments

@jowens
Copy link
Contributor

jowens commented Jun 26, 2018

"CURD: a dynamic CUDA race detector", https://dl.acm.org/citation.cfm?id=3192368

Yuanfeng Peng (@stilltracy) reported the following two races (note line numbers are commented in the snippets below):

on ../../gunrock/util/soa_tuple.cuh,line 136 :

    template <typename TupleSlice>
    __host__ __device__ __forceinline__ void Set(
        const TupleSlice &tuple,
        const int row,
        const int col)
    {   
        t0[row][col] = tuple.t0; // line 136
        t1[row][col] = tuple.t1;
    }

and ../../gunrock/util/scan/warp_scan.cuh,line 159:

    template <typename T, typename WarpscanT, typename ReductionOp>
    static __device__ __forceinline__ T Invoke(
        T current_partial,                          // Input partial
        T &total_reduction,                         // Total reduction (out param)
        WarpscanT warpscan[][NUM_ELEMENTS],     // Smem for warpscanning.  Contains at least two segments of size NUM_ELEMENTS (the first being initialized to identity)
        ReductionOp scan_op,                        // Scan operator
        int warpscan_tid = threadIdx.x)             // Thread's local index into a segment of NUM_ELEMENTS items
    {   
        const int WIDTH = 1 << STEPS;
        T inclusive_partial = Iterate<1, WIDTH>::Invoke( // line 159
            current_partial,
            warpscan,
            scan_op,
            warpscan_tid);

@sgpyc can you take a look? @yzhwang also helpful if you look ...

@jowens
Copy link
Contributor Author

jowens commented Jun 27, 2018

New info from @stilltracy!

Hi John,

I re-ran CURD on bfs and re-collected the race reports, and it seems that the line numbers I sent you previously were outdated and doesn't correspond to the right source locations (as CURD instruments the source files), sorry!

Below is the update-to-date reports from CURD:

intra-block(SMem) (W,R) race by 0 and 1 on < 260, 4>, between (../../gunrock/util/soa_tuple.cuh,line 136) and (../../gunrock/util/soa_tuple.cuh,line 154)

    template <typename TupleSlice>                                                                                                                                                                                                            
    __host__ __device__ __forceinline__ void Set(                                                                                                                                                                                             
        const TupleSlice &tuple,                                                                                                                                                                                                              
        const int row,                                                                                                                                                                                                                        
        const int col)                                                                                                                                                                                                                        
    {                                                                                                                                                                                                                                         
        t0[row][col] = tuple.t0;     // 136                                                                                                                                                                                                   
        t1[row][col] = tuple.t1;                                                                                                                                                                                                              
    }
 
    template <typename TupleSlice>                                                                                                                                                                                                            
    __host__ __device__ __forceinline__ void Get(                                                                                                                                                                                             
        TupleSlice &retval,                                                                                                                                                                                                                   
        const int row,                                                                                                                                                                                                                        
        const int col) const                                                                                                                                                                                                                  
    {                                                                                                                                                                                                                                         
        retval = TupleSlice(t0[row][col], t1[row][col]);     // 154                                                                                                                                                                                 
    }

The race seems to be on accesses to t0[row][col], by thread n and n+1.

Another one is inside ../../gunrock/util/scan/soa/warp_soa_scan.cuh:

    template <int OFFSET_LEFT, int WIDTH>                                                                              
    struct Iterate                                                                                                     
    {   
        // Scan                                                                                                        
        template <
            typename Tuple,
            typename WarpscanSoa,
            typename ReductionOp>
        static __device__ __forceinline__ Tuple Scan(                                                                  
            Tuple partial,                                                                                             
            WarpscanSoa warpscan_partials,                                                                             
            ReductionOp scan_op,
            int warpscan_tid)
        {       
            // Store exclusive partial
            warpscan_partials.Set(partial, 1, warpscan_tid);                                                           
                                                                                                                       
            if (!WarpscanSoa::VOLATILE) __threadfence_block();                                                         
            
            if (ReductionOp::IDENTITY_STRIDES || (warpscan_tid >= OFFSET_LEFT)) {                                      
        
                // Load current partial                                                                                
                Tuple current_partial;                                                                                 
                warpscan_partials.Get(current_partial, 1, warpscan_tid - OFFSET_LEFT);     // 73,  Read                           
    
                // Compute inclusive partial from exclusive and current partials
                partial = scan_op(current_partial, partial);                                      // 76, Write                     
            }

This one is tricky, and I'm not 100% sure about it. I'll give my two cents here: line 73 reads from current_partial and the scan_op() on line 76 writes to it.

@sgpyc
Copy link
Member

sgpyc commented Jun 28, 2018

The code pointed out are for warp/block level scan and iteration; they are quite deep, and used by the TWC_advance and CULL_filter operators. The data races are very difficult to detect. If @stilltracy have the condition (dataset + command line parameters) or any other info that can narrow down to when the race could happen, it will save us a lot of time.

By the way, are the data races from CUB occur in similar operations?

@jowens
Copy link
Contributor Author

jowens commented Jun 28, 2018

@stilltracy these are theoretical races, right? you didn't find them in practice, this is through program analysis?

@neoblizz
Copy link
Member

Would this also be useful on the new dev-refactor branch with refactored code?

@stilltracy
Copy link

@jowens these are actual races that were caught by CURD on a real execution.
@sgpyc I ran test/bfs rgg and observed the reported races. Unfortunately, CURD currently can't give more detailed information about the call sites, such as call stacks, etc..
Yeah, some of the data races on CUB are on warp/block level scans etc.

@jowens
Copy link
Contributor Author

jowens commented Jun 29, 2018

@sgpyc I don't know this low-level code, but is it possible to look at the code itself and figure out where the race might be, just from first principles?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants