doc/moldyn-FST.txt

moldyn-FST.txt
--------------
NOTE: see moldyn-FST.in for omega calculator commands used to play with example in this file.

Goals
1) How would the composed inspector and executor be generated if we use a library approach where data dependences and data mappings are explicitly generated as hypergraphs?
    Subgoal:
        -get a data and iteration reordering composition working


----------------------
Index Array Generators
----------------------
BFSHyper
    Input:
        hypergraph and dual
    Output:
        Permutation of nodes in hypergraph
        
CPack
    Input:
        hypergraph
    Output:
        Permutation of hyperedges in hypergraph


---------------------
Transformation Writer
---------------------
    Data reordering RTRT
        Constraints: 
            -1D data space
            -No dependences between index arrays, or uninterpreted function symbols used to represent them.  Is this an issue?  See 1D iteration space below.
            -Perfectly nested loop?  1D loop?
        Possible IAGS:
            list these
        Input: 
            -Data mapping relation
            -Select IAG
        Output:
            -Permutation mapping for data
            
    Iteration reordering RTRT
        Constraints:
            -1D iteration space?  I don't think this is necessary because the inspector can generate hypergraph representing computation no matter what.  Is the problem generating the transformed executor?
        Possible IAGS:
            list these
        Input:
            -Data mapping relation
            -Select IAG
        Output:
            -Permuation mapping for data
    
    SparseTiling RTRT
    Tile Packing RTRT
    
-----------------------
What the user specifies
-----------------------
Code
    for (ii=0; ii<n_inter; ii++) {
        // simplified computations
        fx[inter1[ii]] += x[inter1[ii]] - x[inter2[ii]]; 
        fx[inter2[ii]] += x[inter1[ii]] - x[inter2[ii]]; 
    }

Iteration Space
    I_0 := { [ii,1] : 0 <= ii <= (n_inter-1)  }
          union { [ii,2] : 0 <= ii <= (n_inter-1)  }

Data Spaces (probably need a name field for these as well so can match with uninterpreted function symbols.  For now use the same string for the name and UFS.
    X_0 := name = "x", { [k] : 0 <= k <= (N-1) }, data array
    FX_0 := name = "fx", { [k] : 0 <= k <= (N-1) }, data array
    INTER1_0 := name = "inter1", { [k] : 0 <= k <= (n_inter-1) }, index array
    INTER2_0 := name = "inter2", { [k] : 0 <= k <= (n_inter-1) }, index array

Index Array Value Constraints
    // if (0 <= ii <= (n_iter-1)) then (0 <= inter1(ii) <= (N-1))
    { [ii] -> [inter1(ii)] : not (0 <= ii <= (n_inter-1)) || (0 <= inter1(ii) <= (N-1)) }
    
    // if (0 <= ii <= (n_iter-1)) then (0 <= inter2(ii) <= (N-1))
    { [ii] -> [inter2(ii)] : not (0 <= ii <= (n_inter-1)) || (0 <= inter2(ii) <= (N-1)) }   
        

Access Relations
    These actually need to be specified per data access in each statement.
    Need access relations for index arrays as well.
    The ones to index arrays should be derived from the access relations to data that involve uninterpreted function symbols?  (Unless used out of the context of a data array access).
    
    statement   access#     DataSpace   AccessRelation
    [ii,1]      1a           INTER1_0    { [ii,1] -> [ ii ] }
    [ii,1]      1b           FX_0        { [ii,1] -> [ inter1(ii) ] }
    
    [ii,1]      2a           INTER1_0    { [ii,1] -> [ ii ] }
    [ii,1]      2b           X_0         { [ii,1] -> [ inter1(ii) ] }
    
    [ii,1]      3a           INTER2_0    { [ii,1] -> [ ii ] }
    [ii,1]      3b           X_0         { [ii,1] -> [ inter2(ii) ] }
    
    [ii,2]      1a           INTER2_0    { [ii,1] -> [ ii ] }
    [ii,2]      1b           FX_0        { [ii,1] -> [ inter2(ii) ] }
    
    [ii,2]      2a           INTER1_0    { [ii,1] -> [ ii ] }
    [ii,2]      2b           X_0         { [ii,1] -> [ inter1(ii) ] }
    
    [ii,2]      3a           INTER2_0    { [ii,1] -> [ ii ] }
    [ii,2]      3b           X_0         { [ii,1] -> [ inter2(ii) ] }
    
    // summary access relations are the union of all access relations
    // to a particular data space
    A_I0_to_X0 := name = "A_I0_to_X0", { [ii,j] -> [ inter1(ii) ] : 1<=j<=2 } 
                   union { [ii,j] -> [ inter2(ii) ] : 1<=j<=2 }
                   
    A_I0_to_FX0 := name = "A_I0_to_FX0", { [ii,1] -> [ inter1(ii) ] } 
                   union { [ii,2] -> [ inter2(ii) ] }

    A_I0_to_INTER10 := name = "A_I0_to_INTER10", { [ii,1] -> [ ii ] } 
                       union { [ii,2] -> [ ii ] }
                       
    A_I0_to_INTER20 := name = "A_I0_to_INTER20", { [ii,1] -> [ ii ] } 
                       union { [ii,2] -> [ ii ] }


Data Dependences
    Only reduction dependences.  It is important to indicate that there are reduction dependences however, because that means each iteration needs to be executed atomically if the loop is being parallelized.

Composition and choice of RTRTs
    data reordering
        DataPermuteRTRT
            data_reordering = { [ k ] -> [ sigma( k ) ] }
            iteration_space = I_0
            data_spaces = [ X_0, FX_0 ]
            access_relation = A_I0_to_X0
            iter_sub_space_relation = { [ ii, j ] -> [ ii ] }
            iag_func_name = CPackHyper
            iag_type = IAG_Permute
    
    Naming standards
        After any transformation, we are going to assume that a space with the subscript n will be transformed into the same named space with the subscript n+1.  For example I_0 will become I_1.
    
    iteration reordering
        IterPermuteRTRT
            iter_reordering = { [ i ] -> [ delta( i ) ] }
            iteration_space = I_1
            access_relation = A_I0_to_X0
            iter_sub_space_relation = { [ ii, j ] -> [ ii ] }
            iag_func_name = LexMin
            iag_type = IAG_Permute

------------------------------------------------------------
Automatically Generating Straight-Forward Inspector/Executor
------------------------------------------------------------

Generating the composed inspector

    1) DataPermuteRTRT
        a) restrict access_relation to iteration subspace
            input:
                access_relation
                iter_sub_space_relation

            output:
                access_relation_to_traverse = new AccessRelation
                    name = access_relation.name
                    iter_space = iter_sub_space_relation (     
                                    [access_relation.iter_space] )
                    data_space = access_relation.data_space
                    iter_to_data =  inverse 
                    ( iter_sub_space_relation compose (inverse access_relation))
                
                
            For the example, output is the following:
                access_relation_to_traverse = new AccessRelation
                    name = "A_I0_to_X0"
                    iter_space = { [ii] : 0 <= ii < n_inter  }
                    data_space = X0                    
                    iter_to_data := { [ii] -> [ inter1(ii) ] } 
                                       union { [ii] -> [ inter2(ii) ] }
                                 
                
        b) generate code that explicitly creates hypergraph representation of access relation at runtime.
            input:
                access_relation_to_traverse (artt)
            output:
                code
                    // data index calc for each disjunct in access relation
                    #define s1(i_1, ..., i_d) data_index = ...; \
                        Hypergraph_ordered_insert_node([artt.name], \
                        count,data_index); \
                        count++;
                        
                    // initialize variables
                    Hypergraph* [artt.name] = Hypergraph_ctor();
                    int count=0;
                    int data_index=0;
                    int t1, ..., ;  // declare index variables omega will use
                    
                    // loop that traverses [artt.iter_space]
                    for ...
                        s1(i_1, ..., i_d);
                        
                    Hypergraph_finalize([argtt.name]);
                    
           
            For the example, output is the following:
                    // this macro captures A_I0_to_X0, count, and data_index
                    #define s1(t1) data_index = inter1[t1]; \
                        Hypergraph_ordered_insert_node(A_I0_to_X0,count, \
                        data_index); \
                        data_index = inter2[t1]; \
                        Hypergraph_ordered_insert_node(A_I0_to_X0,count, \
                        data_index); \
                        count++
                        
                    // initialize variables
                    Hypergraph* A_I0_to_X0 = Hypergraph_ctor();
                    int count=0;
                    int data_index=0;
                    
                    // this specific example
                    for(t1 = 0; t1 <= n_inter-1; t1++) {
                        s1(t1);
                    }

                    Hypergraph_finalize(A_I0_to_X0);
                        
            algorithm:
                1) use omega codegen to generate loop over iteration sub space
                    codegen iter_sub_space;
                    
                2) generate a macro definition for the statement in the loop body from the relation_to_traverse
                    a) for each conjunct in the DNF // seem omega lib docs
                         -gen code that solves for the out variable, data_index, using the input iterator values
                         -gen code that makes the following call:
                            Hypergraph_ordered_insert_node( hgraph, iter_count, data_index )

        c) Automatically generate the IndexArray specification for sigma
            input:
                DataPermuteRTRT instance
                access_relation_to_traverse
                
            output:
                iag_spec = new IAG_Permute(
                    String name = rtrt.iag_func_name
                    AccessRelation input = access_relation_to_traverse
                    IndexArray result = new IndexArray(
                        data_space = [name of uninterp func in data_reordering],
                                     [specification from data space in RTRT],
                                     index array,
                        index_value_constraints = { [k] -> [ uninterp(k) ] : 
                            not [k has same bounds as data space] || 
                            [uninterp(k) has same bounds as data space] }   
                    )
                )
                        
            Initial Assumptions:
                -only one interpreted function in data reordering
                -that uninterpreted function has a domain with dimensionality 1
                    
        
            For this example:
                IAG_cpack = new IAG_Permute (
                    String name = "CPackHyper"       // function name
                    AccessRelation input = A_I0_to_X0
                    IndexArray result = new IndexArray( 
                        data_space = "sigma", X_0.spec, index array,
                        index_value_constraints = { [k] -> [sigma(k)] : not (0 <= k <= (N-1)) || (0 <= sigma(k) <= (N-1)) }
                    )

                )        
        
        
        d) generate code that passes hypergraph to IAG
            input: 
                iag_spec for sigma
                
            output:
                code
                    [iag_spec.result.data_space.name] = malloc( ... );
                    [iag_spec.name]( [iag_spec.input.name],
                                     [iag_spec.result.data_space.name] );
                    
            For this example:
                    int *sigma;
                    MALLOC(sigma,int,N);
                    CPackHyper( A_I0_to_X0, sigma );
                    
         
        e) generate code that does data reordering
            input:
                iag_spec for sigma
                DataPermuteRTRT.data_spaces to be reordered
                
            output:
                code that reorders the specified data spaces with sigma
                // one call to reorderArray for each data space being reordered
                reorderArray((unsigned char*) [data_space.name], 
                             sizeof([data_space.datatype]), 
                             [data_space.num_elem],
                             [iag_spec.name])
        
            For this example:
                reorderArray((unsigned char *)x, sizeof(double), NUM_NODES, sigma);
                reorderArray((unsigned char *)fx, sizeof(double), NUM_NODES, sigma);

        
        f) generate executor code for data reordering only
        Note: eventually we want to generate the composed inspector/executor only, but for testing we need to be able to test as we go.
        
            input:
                DataPermuteRTRT.data_spaces
                
                
            output:
                // statement macros
                #define s1(t1, ..., ) ...    
        
                ...
                
                // new executor loop
                for ...
        
            algorithm:
                for each statement in computation
                    for each data access
                        if accessing array that was reordered
                            modify array access relation as per DataPermuteRTRT.data_reordering specification
                            
                for each statement in computation
                    generate macro code for statement
                    
                have omega generate code for transformed iteration space
        
        
            output for this example:
                #define e1(ii,g)    if ((g)==1) { \
                    fx[sigma[inter1[ii]]] += x[sigma[inter1[ii]]]*0.1 \
                             + x[sigma[inter2[ii]]]*0.3; \
                } else { \
                    fx[sigma[inter2[ii]]] += x[sigma[inter1[ii]]]*0.2  \
                             + x[sigma[inter2[ii]]]*0.4; \
                }

                ...
            
                // executor - execute the computation with modified arrays
                for(t1 = 0; t1 <= n_inter-1; t1++) {
                    e1(t1,1);
                    e1(t1,2);
                }
        
    2) Modify iteration space and access functions based on previous transformation.
    
        I1 := I0    // iteration space was not modified
        X1 := X0    // same size
        FX1 := FX0    // same size
        A_I1_to_X1 = data_reordering(A_I0_to_X0) 
                   = { [ii,1] -> [ sigma(inter1(ii)) ] } 
                     union { [ii,1] -> [ sigma(inter2(ii)) ] }
                     
        A_I1_to_FX1 = data_reordering(A_I0_to_FX0)
                    = { [ii,1] -> [ sigma(inter1(ii)) ] } 
                      union { [ii,1] -> [ sigma(inter2(ii)) ] }

    3) IterPermuteRTRT
        a) restrict access_relation to iteration subspace (same as 1a above)

            For the example, output is the following:
                access_relation_to_traverse = new AccessRelation
                    name = "A_I1_to_X1"
                    iter_space = { [ii] : 0 <= ii < n_inter  }
                    data_space = X1                    
                    iter_to_data := { [ii] -> [ sigma(inter1(ii)) ] } 
                                       union { [ii] -> [ sigma(inter2(ii)) ] }
                                 
                
        b) generate code that explicitly creates hypergraph generate code that explicitly creates hypergraph representation of access relation at runtime. (same as 1b above)
        
           
            For the example, output is the following:
                    // this macro captures A_I1_to_X1, 
                    // count, sigma, inter1, inter2, and data_index
                    #define s2(t1) data_index = sigma[inter1[t1]]; \
                        Hypergraph_ordered_insert_node(A_I1_to_X1,count, \
                        data_index); \
                        data_index = sigma[inter2[t1]]; \
                        Hypergraph_ordered_insert_node(A_I1_to_X1,count, \
                        data_index); \
                        count++
                        
                    // initialize variables
                    Hypergraph* A_I1_to_X1 = Hypergraph_ctor();
                    int count=0;
                    int data_index=0;
                    
                    // this specific example
                    for(t1 = 0; t1 <= n_inter-1; t1++) {
                        s2(t1);
                    }

                    Hypergraph_finalize(A_I1_to_X1);
                        

        c) Automatically generate the IndexArray specification for delta.
        Should this algorithm be a method associated with IterPermuteRTRT?
            input:
                IterPermuteRTRT instance
                access_relation_to_traverse
                
            output: (almost the same as 1c, just replace data space with iteration space)
                iag_spec = new IAG_Permute(
                    String name = rtrt.iag_func_name
                    AccessRelation input = access_relation_to_traverse
                    IndexArray result = new IndexArray(
                        data_space = [name of uninterp func in iter_reordering],
                                     [rtrt.iter_space],
                                     index array,
                        index_value_constraints = { [k] -> [ uninterp(k) ] : 
                            not [k has same bounds as rtrt.iter_space] || 
                            [uninterp(k) has same bounds as rtrt.iter_space] }   
                    )
                )
                        
            Initial Assumptions:
                -only one interpreted function in iteration reordering
                -that uninterpreted function has a domain with dimensionality 1
                    
        
            For this example:
                IAG_lexmin = new IAG_Permute (
                    String name = "IAG_lexmin"       // function name
                    AccessRelation input = A_I1_to_X1
                    IndexArray result = new IndexArray( 
                        data_space = "delta", I_1.spec, index array,
                        index_value_constraints = { [k] -> [delta(k)] : not (0 <= k <= (n_iter-1)) || (0 <= delta(k) <= (N-1)) }
                    )

                )        
        
        
        d) generate code that passes hypergraph to IAG (same as 1d)

            For this example:
                    int *delta;
                    MALLOC(delta,int,n_iter);
                    IAG_lexmin( A_I1_to_X1, delta );
                    
         
        e) generate code that does iteration reordering, which in this case is just a data reordering of the index arrays (same as 1e)
        
            For this example:
                reorderArray((unsigned char *)inter1, sizeof(int), NUM_NODES, delta);        
                reorderArray((unsigned char *)inter2, sizeof(int), NUM_NODES, delta);

        
        f) generate executor code for data reordering followed by iteration reordering
        If we treat this like a data reordering of the index arrays inter1 and inter2, then we can avoid extra indirection?  It really is an iteration reordering though.  So we need to calculate the new iteration space and generate code for it.
        
            I_1 := { [ii,1] : 0 <= ii <= (n_inter-1)  }
                   union { [ii,2] : 0 <= ii <= (n_inter-1)  }
                   
            iter_reordering := { [ i ] -> [ delta( i ) ] }
            
            iter_sub_space_relation := { [ ii, j ] -> [ ii ] }
        
            I_2 = (inverse iter_sub_space_relation) compose iter_reordering(iter_sub_space_relation(I_1)) 
                = { [ii',j] : Exists (ii: ii'=delta(ii) && 0 <= ii <= (n_inter-1)) && 1 <= j <=2 };
        
                Naive code gen for above
                for (ii=0; ii<n_iter; i++) {
                    ii' = delta[ii];
                    ...
                }
        
Now what if we extend the iteration space to the following:

Code
    for (i=0; i<n_atom; i++) {
        x[i] = i*5;
    }
    
    for (ii=0; ii<n_inter; ii++) {
        // simplified computations
        fx[inter1[ii]] += x[inter1[ii]] - x[inter2[ii]]; 
        fx[inter2[ii]] += x[inter1[ii]] - x[inter2[ii]]; 
    }

Iteration Space
    I_0 := { [1,i,1] : 0 <= i <= (n_atom-1)  }
           union { [2,ii,1] : 0 <= ii <= (n_inter-1)  }
           union { [2,ii,2] : 0 <= ii <= (n_inter-1)  }

Iteration Reordering

    iter_reordering := { [ i ] -> [ delta( i ) ] }
            
    iter_sub_space_relation := { [ k, ii, j ] -> [ ii ] : k==2 }
    

#Hmmmm, now I am thinking that the iteration reorderings need to be specified as #transformations on the whole iteration space.  Even if the access relation used #to compute the index array only involves a subspace of the iteration space.

    iter_reordering := { [1,i,1] -> [1,i,1] } 
                       union { [2,ii,j] -> [2,delta(ii),j] };
                       
    full_sparse_tiling := { [s1,i,s2] -> [t,s1,i,s2] : t = tile(s1,i) };
    
    full_sparse_tiling(iter_reordering) = { [t,1,i,1] : t = tile(1,i) }
                                          union { [t,2,ii',j] : (Exists ii: ii' = delta(i) and t = tile(2,ii')) };
                                          
                                          =======