Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DependenceAnalysis] Check memref dependencies on each loop nest #5287

Merged
merged 3 commits into from
Jun 10, 2023

Conversation

matth2k
Copy link
Contributor

@matth2k matth2k commented May 26, 2023

This my attempt at fixing #5284 and #4814.

@matth2k matth2k requested a review from mikeurbach May 26, 2023 22:57
Copy link
Contributor

@mikeurbach mikeurbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some quick questions, but this general approach makes sense to me. Do you mind adding a test case that exhibits the problem and passes with this patch?

lib/Analysis/DependenceAnalysis.cpp Outdated Show resolved Hide resolved
lib/Analysis/DependenceAnalysis.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@mikeurbach mikeurbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few more minor nits but this looks good to me after those are addressed and/or comments added.

lib/Analysis/DependenceAnalysis.cpp Outdated Show resolved Hide resolved
lib/Analysis/DependenceAnalysis.cpp Outdated Show resolved Hide resolved
lib/Analysis/DependenceAnalysis.cpp Outdated Show resolved Hide resolved
lib/Analysis/DependenceAnalysis.cpp Outdated Show resolved Hide resolved
@matth2k
Copy link
Contributor Author

matth2k commented May 31, 2023

The test I added is a copy loop. We see that the store and load on %arg0 is flagged with a dependence, which checks out. But how about the lack of dependence on Ops that use %alloc? It's not a loop carried dependence, but a dependence nonetheless.

  func.func @test9(%arg0: memref<?xi32>, %arg1: memref<?xi32>, %arg2: memref<?xi32>) {
    %alloc = memref.alloc() : memref<256xi32>
    %c1_i32 = arith.constant 1 : i32
    affine.store %c1_i32, %arg0[1] {dependences = [[]]} : memref<?xi32>
    affine.for %arg3 = 0 to 256 {
      %0 = affine.load %arg0[%arg3] {dependences = [[]]} : memref<?xi32>
      affine.store %0, %alloc[%arg3] {dependences = []} : memref<256xi32>
    }
    affine.for %arg3 = 0 to 128 {
      %0 = affine.load %alloc[%arg3] {dependences = []} : memref<256xi32>
      %1 = affine.load %alloc[%arg3 + 128] {dependences = []} : memref<256xi32>
      affine.store %0, %arg1[%arg3] {dependences = []} : memref<?xi32>
      affine.store %1, %arg2[%arg3 + 128] {dependences = []} : memref<?xi32>
    }
    return
  }

@mikeurbach
Copy link
Contributor

But how about the lack of dependence on Ops that use %alloc?

Is this because now it's only calling checkMemrefAccessDependence for the memory ops within a single loop? I guess maybe it needs to check across loops, but at the proper depth? Maybe we should revisit how this utility is used upstream and double check we are doing the right thing. For example, some of the Affine analyses use it: https://github.com/llvm/llvm-project/blob/0442d08fdb173d89b0779d32eb929957a344f5e6/mlir/lib/Dialect/Affine/Utils/LoopFusionUtils.cpp#L186

@matth2k
Copy link
Contributor Author

matth2k commented May 31, 2023

I think I get something closer to the correct solution when I in addition collect all the Ops across loop nests together and checkMemrefDependence at depth of 0. Does that intuition sound correct?

This adds the missing dependence on %alloc, but it also results in duplicate distance vector entries. Here is an example of the duplicates:

func.func @test2(%arg0: memref<?xi32>, %arg1: memref<?xi32>) {
    affine.for %arg2 = 0 to 10 {
      %0 = affine.load %arg0[%arg2] {dependences = [[[-3, -3]]]} : memref<?xi32>
      affine.if #set(%arg2) {
        %1 = affine.load %arg0[%arg2 - 3] {dependences = [[[3, 3]], [[3, 3]]]} : memref<?xi32>
        %2 = arith.addi %0, %1 : i32
        affine.store %2, %arg1[%arg2 - 3] {dependences = []} : memref<?xi32>
      }
    }
    return
  }

Calling checkMemrefDependence at depth 0 now adds a dependence on the first load with distance -3, where before it was empty brackets []. As for the middle load, it now has a duplicate distance vector in it.

I don't get why the code can find the distance of 3, if its checking a depth of 0. So my naive attempt at just checking across loops did not work completely.

maybe it needs to check across loops, but at the proper depth?

Is the proper depth for a src and dst in different nestings ever greater than 0? Maybe the checkMemrefDependence helper function needs to change how it iterates through the operation passed to it (to stop duplicate dependencies).

@matth2k
Copy link
Contributor Author

matth2k commented May 31, 2023

What if I revert all the changes to MemoryDependenceAnalysis(Operation *op) and instead add a line to check the depth in checkMemrefDependence():

      // Look for inter-iteration dependences on the same memory location.
      MemRefAccess src(source);
      MemRefAccess dst(destination);
      FlatAffineValueConstraints dependenceConstraints;
      SmallVector<DependenceComponent, 2> depComps;

      // matth2k: Request depth might not be valid (i.e. they only share a partial loop nesting together)
      if (depth >
          getInnermostCommonLoopDepth(SmallVector({source, destination})))
        continue;

      DependenceResult result = checkMemrefAccessDependence(
          src, dst, depth, &dependenceConstraints, &depComps, true);

      results[destination].emplace_back(source, result.value, depComps);

Doing this gives the following result:

func.func @test9(%arg0: memref<?xi32>, %arg1: memref<?xi32>, %arg2: memref<?xi32>) {
    %alloc = memref.alloc() : memref<256xi32>
    %c1_i32 = arith.constant 1 : i32
    affine.store %c1_i32, %arg0[1] {dependences = []} : memref<?xi32>
    affine.for %arg3 = 0 to 256 {
      %0 = affine.load %arg0[%arg3] {dependences = []} : memref<?xi32>
      affine.store %0, %alloc[%arg3] {dependences = []} : memref<256xi32>
    }
    affine.for %arg3 = 0 to 128 {
      %0 = affine.load %alloc[%arg3] {dependences = []} : memref<256xi32>
      %1 = affine.load %alloc[%arg3 + 128] {dependences = []} : memref<256xi32>
      affine.store %0, %arg1[%arg3] {dependences = []} : memref<?xi32>
      affine.store %1, %arg2[%arg3 + 128] {dependences = []} : memref<?xi32>
    }
    return
  }

It also passes all the previous tests.

@mikeurbach
Copy link
Contributor

What if I revert all the changes to MemoryDependenceAnalysis(Operation *op) and instead add a line to check the depth in checkMemrefDependence()

I think this sounds reasonable as well. I've been out on vacation and I still need to take a look at how the upstream tools are expected to work, but I think this makes sense.

@matth2k
Copy link
Contributor Author

matth2k commented Jun 9, 2023

Alright, I made those changes. If you think they make sense, I can merge them in.

Copy link
Contributor

@mikeurbach mikeurbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I like this solution. Thanks for digging into the tools and understanding them enough to propose this.

@matth2k matth2k merged commit dbc3ede into llvm:main Jun 10, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants