Skip to content

Conversation

@abhinavgaba
Copy link
Contributor

@abhinavgaba abhinavgaba commented Oct 28, 2025

For cases like:

  map(alloc: x) map(to: x)

If the entry of map(to: x) is encountered after the entry for map(alloc:x), we still want to do a data-transfer even though the ref-count of x was already 0, because the new allocation for x happened as part of the current directive.

Similarly, for:

  ... map(alloc: x) map(from: x)

If the entry for map(from:x) is encountered before the entry for map(alloc:x), we want to do a data-transfer even though the ref-count was not 0 when looking at the from entry, because by the end of the directive, the ref-count of x will go down to zero.

For:

  ... map(from : x) map(alloc, present: x)

If the "present" entry is encountered after the "from" entry, then it becomes a no-op, as the "from" entry will do an allocation if no match was found.

And for the following:

  ... map(delete: x) map(from: x)
  ... map(always, from : x) map(always, from: x)

We need to make sure that only one data transfer for from happens, and it happens even when delete is encountered before from.

In this PR, these are handled by the runtime via the following:

  • For to and present, we also look-up in the existing table where we tracked new allocations when making the decision for the entry.
  • For from, we keep track of any deferred data transfers and when the ref-count of a pointer goes to zero, see if there were any previously deferred from transfers for that pointer.
  • For handling multiple from and delete + from cases, we keep track of any successful delete entries and from transfers.

This can be done in the compiler, and that would avoid any runtime overhead, but it would require creating two separate offload struct entries for the entry and exit mappings (even for the target construct), with properly decayed maps, and either:
(1) Sorted in the following order:

  • present > to/tofrom > ... for the implied target enter data; and
  • from/tofrom > ... for the target exit data e.g.
  #pragma omp target map(to: x) map(present, alloc: x) map(always, from: x)
  // has to be broken into:
  // from becomes alloc on entry:
  //  #pragma omp target enter data map(present, alloc: x)
  //                                map(to: x)
  //                                map(alloc: x)
  //
  //  "present" and "to" just "decay" into "alloc"
  //  #pragma omp target exit data map(always, from: x)
  //                               map(alloc: x)
  //                               map(alloc: x)

Or,
(2) Merged into one entry each on the target enter/exit data directives.

  #pragma omp target map(to: x) map(present, alloc: x) map(always, from: x)
  // has to be broken into:
  // from becomes alloc on entry:
  //  #pragma omp target enter data map(present, to: x)
  //
  //  "present" and "to" just "decay" into "alloc"
  //  #pragma omp target exit data map(always, from: x)

The number of entries on the two would need to stay the same on the two to avoid ref-count mismatch.

(1) would be simpler, but won't work for cases like:

  ... map(delete: x) map(from:x)

as there is no clear "winner" between the two. So, for such cases, the compiler would likely have to do (2), which is the cleanest solution, but will take longer to implement. For EXPR comparisons, it can build-upon the AttachPtrExprComparator that was implemented as part of #153683, but that should probably wait for the PR to be merged to avoid conflicts.

Another alternative is to sort the entries in the runtime, which may be slower than on-demand lookups/updates that this PR does, because we always would be doing this sorting even when not needed, but may be faster in others where the constant-time overhead of map/set insertions/lookups becomes too large because of the number of maps. But that will still have to worry about the from + delete case.

…espective of order.

For cases like:
```c
  map(alloc: x) map(to: x)
```
If the entry of `map(to: x)` is encountered after the entry for
`map(alloc:x)`, we still want to do a data-transfer even though the
ref-count of `x` was already 0, because the new allocation for `x`
happened as part of the current directive.

Similarly, for:
```c
  ... map(alloc: x) map(from: x)
```

If the entry for `map(from:x)` is encountered before the entry for
`map(alloc:x)`, we want to do a data-transfer even though the
ref-count was not 0 when looking at the `from` entry, because by the end of
the directive, the ref-count of `x` will go down to zero.

And for:
```c
  ... map(from : x) map(alloc, present: x)
```

If the "present" entry is encountered after the "from" entry, then it becomes
a no-op, as the "from" entry will do an allocation if no match was found.

In this PR, these are handled by the runtime via the following:
* For `to` and `present`, we also look-up in the existing table where we tracked
new allocations when making the decision for the entry.
* For `from`, we keep track of any deferred data transfers and when the
ref-count of a pointer goes to zero, see if there were any previously
deferred `from` transfers for that pointer.

This can be done in the compiler, and that would avoid any runtime
overhead, but it would require creating two separate offload struct entries
for the entry and exit mappings (even for the `target` construct),
with properly decayed maps, and either:
(1) sorted in order of:
* `present > to > ...` for the implied `target enter data`; and
* `from > ...` for the `target exit data`
e.g.

```c
  #pragma omp target map(to: x) map(present, alloc: x) map(always, from: x)
  // has to be broken into:
  // from becomes alloc on entry:
  //  #pragma omp target enter data map(present, alloc: x)
  //                                map(to: x)
  //                                map(alloc: x)
  //
  //  "present" and "to" just "decay" into "alloc"
  //  #pragma omp target exit data map(always, from: x)
  //                               map(alloc: x)
  //                               map(alloc: x)
```
Or,
(2) Merged into one entry each on the `target enter/exit data`
directives.

```c
  #pragma omp target map(to: x) map(present, alloc: x) map(always, from: x)
  // has to be broken into:
  // from becomes alloc on entry:
  //  #pragma omp target enter data map(present, to: x)
  //
  //  "present" and "to" just "decay" into "alloc"
  //  #pragma omp target exit data map(always, from: x)
```

The number of entries on the two would need to stay the same on the two to avoid
ref-count mismatch.

(1) would be simpler, but won't likely work for cases like:
```c
  ... map(delete: x) map(from:x)
```
as there is no clear "winner" between the two. So, for such cases, the compiler
would likely have to do (2), which is the cleanest solution, but will take
longer to implement. For EXPR comparisons, it can build-upon the
`AttachPtrExprComparator` that was implemented as part of llvm#153683,
but that should probably wait for the PR to be merged to avoid
conflicts.

Another alternative is to sort the entries in the runtime, which may be
slower than on-demand lookups/updates that this PR does, because we
always would be doing this sorting even when not needed, but may be
faster in others where the constant-time overhead of map/set
insertions/lookups becomes too large because of the number of maps. But
that will still have to worry about the `from` + `delete` case.
@abhinavgaba abhinavgaba changed the title [OpenMP][Offload] Handle for non-memberof present/to/from entries irrespective of order. [OpenMP][Offload] Handle non-memberof present/to/from entries irrespective of order. Oct 29, 2025
@mjklemm mjklemm self-requested a review October 29, 2025 08:37
@ro-i ro-i self-requested a review October 29, 2025 11:17
@ro-i
Copy link
Contributor

ro-i commented Oct 30, 2025

Another alternative is to sort the entries in the runtime, which may be slower than on-demand lookups/updates that this PR does, because we always would be doing this sorting even when not needed, but may be faster in others where the constant-time overhead of map/set insertions/lookups becomes too large because of the number of maps. But that will still have to worry about the from + delete case.

I thought that the re-ordering may be an equivalent but simpler approach. But I think that it may not be. The issue for target entry, for example:

  • My solution just pushed all "from" mappers to the end of the list and left the other mappers as they are. This solved the conflicts between separate to- and from-mappers, but we would still have conflicts between alloc- and to-mappers as shown by your tests.
  • Trying to pull all "to" mappers to the front of the list, and leave the other mappers as they are, would be even worse.
    While this would solve the conflicts between separate to- and from-mappers, it would potentially even add new conflicts between alloc- and to-mappers. When we have e.g. map(alloc: a[0:4]) map(to: a[1]), it is imperative that the larger map is performed first. So, we cannot just pull to-mappers to the start of the list without further analysis.

It's questionable if the re-ordering approach is still simpler if we need to check for overlapping memory.

@abhinavgaba
Copy link
Contributor Author

abhinavgaba commented Oct 30, 2025

We don't have to worry about cases like map(alloc: a[0:4]) map(to: a[1]) since OpenMP guarantees that the two list-items should be identical in this case.

OpenMP 6.0 page 286:

18 • Two list items of the map clauses on the same construct must not share original storage
19   unless one of the following is true: they are the same list item, one is the containing structure
20   of the other, at least one is an assumed-size array, or at least one is implicitly mapped due to
21   the list item also appearing in a use_device_addr clause.

However, the following is allowed. @dreachem might be able to confirm:

 int *p = ...;
 #pragma omp target map(to : p[0:4]) map(present, from: p[0:4]) map(delete, storage: p[:])

Which can be handled by re-ordering for the enter_data part similar to (1) , but not the exit_data part:

// (3.1)
#pragma omp target_enter_data map(present, from: p[0:4]) // "from" decays into "storage"
                              map(to: p[0:4]) // No transfer because "present"
                              map(delete, storage: p[:])) // No-op

// ...
// (3.2)
#pragma omp target_exit_data map(delete, storage: p[:]))
                             map(present, from: p[0:4])
                             map(to: p[0:4]) // "to" decays into "storage"

There is no clear way to order "delete" vs "from" if they are on different maps.

This can still be handled by the compiler like the following (or in the runtime by maintaining state, like this PR), but this case makes it more complicated for the compiler since the expressions are not identical.

// Possible compiler codegen
// (4.1)
#pragma omp target_enter_data map(present, to: p[0:4]) // use list-item from "to"
                                                       // and add "present".

...
// (4.2)
#pragma omp target_exit_data map(present, delete, from: p[0:4])) // Use list-item from "from"
                                                                 // and add "present", "delete"

Another thing to think of, but maybe from the spec's perspective, is that should the modifiers from a from/to clause be completely ignored on enter/exit_data?

For the following:

int x;

int main(void) {
  #pragma omp target map(delete, to: x) map(present, from: x)
  ;
}

@dreachem, @mjklemm, is this expected to run OK? If we decay from into storage, and then apply it to enter_data, then it would seem to imply that it's equivalent to doing map(present, to: x), in which case, this will cause a runtime-error-termination.

// Applying "present" to "enter_data" would cause runtime-error
// (5.1)
#pragma omp target_enter_data(present, to: x)
...
// (5.2)
#pragma omp target_exit_data(present, delete, from: x)

But the user may be expecting to only do a present-check on exit_data here, which would means we need to ignore any modifiers from a from/to clause that's decayed into storage on target_enter/exit_data.

// Ignoring modifiers from the decayed-to-storage clause.
// (6.1)
#pragma omp target_enter_data(to: x) // delete ignored for enter_data
...
// (6.2)
#pragma omp target_exit_data(present, from: x)

But if we do so, then delete gets ignored on target_exit_data, since it came from a decayed to. Maybe we only special-case present and not apply it to decayed-to-storage maps.

@ro-i
Copy link
Contributor

ro-i commented Oct 30, 2025

We don't have to worry about cases like map(alloc: a[0:4]) map(to: a[1]) since OpenMP guarantees that the two list-items should be identical in this case.

Hm, is this also true if the alloc map exists due to an implicit mapping?

@dreachem
Copy link
Contributor

@abhinavgaba

(1) would be simpler, but won't work for cases like:

  ... map(delete: x) map(from:x)

as there is no clear "winner" between the two. So, for such cases, the compiler would likely have to do (2), which is the cleanest solution, but will take longer to implement. For EXPR comparisons, it can build-upon the AttachPtrExprComparator that was implemented as part of #153683, but that should probably wait for the PR to be merged to avoid conflicts.

Why wouldn't from just be sorted before delete in that case? The behavior should be "Do a target_update from(x) if ref count equals 1, and then delete it".

@dreachem
Copy link
Contributor

@abhinavgaba

However, the following is allowed. @dreachem might be able to confirm:

 int *p = ...;
 #pragma omp target map(to : p[0:4]) map(present, from: p[0:4]) map(delete, storage: p[:])

Which can be handled by re-ordering for the enter_data part similar to (1) , but not the exit_data part:

// (3.1)
#pragma omp target_enter_data map(present, from: p[0:4]) // "from" decays into "storage"
                              map(to: p[0:4]) // No transfer because "present"
                              map(delete, storage: p[:])) // No-op

// ...
// (3.2)
#pragma omp target_exit_data map(delete, storage: p[:]))
                             map(present, from: p[0:4])
                             map(to: p[0:4]) // "to" decays into "storage"

There is no clear way to order "delete" vs "from" if they are on different maps.

The "delete" should just be handled at the end, after everything else. I'm not too familiar with the LLVM runtime implementation, so it's not clear to me what the technical difficulty is here. Is there a reason the algorithm in the map clause section (based on "mappable storage blocks") can't be used? It does require grouping individual maps into sets that apply to a common storage block. These are the pertinent sentences:

All map clause list items that share storage or have the same containing structure or containing array result in a single mappable storage block that contains the storage of the list items, unless otherwise specified. The storage for each other map clause list item becomes a distinct mappable storage block. If a list item is a referencing variable that has a containing structure, the behavior is as if only the storage for its referring pointer is part of that structure. In general, if a list item is a referencing variable then the storage for its referring pointer and its referenced pointee occupy distinct mappable storage blocks.

Two list items of the map clauses on the same construct must not share original storage unless one of the following is true: they are the same list item, one is the containing structure of the other, at least one is an assumed-size array, or at least one is implicitly mapped due to the list item also appearing in a use_device_addr clause.

If multiple list items are explicitly mapped on the same construct and have the same containing array or have base pointers that share original storage, and if any of the list items do not have corresponding list items that are present in the device data environment prior to a task encountering the construct, then the list items must refer to the same array elements of either the containing array or the implicit array of the base pointers.

In general, the runtime would need to be involved in some of the matching because we allow overlapping storage for assumed-size arrays and use_device_addr. Other than these cases, the compiler should in theory be able to group the maps into sets where each set applies to the same storage block.

@dreachem
Copy link
Contributor

@abhinavgaba

Another thing to think of, but maybe from the spec's perspective, is that should the modifiers from a from/to clause be completely ignored on enter/exit_data?

For the following:

int x;

int main(void) {
  #pragma omp target map(delete, to: x) map(present, from: x)
  ;
}

@dreachem, @mjklemm, is this expected to run OK? If we decay from into storage, and then apply it to enter_data, then it would seem to imply that it's equivalent to doing map(present, to: x), in which case, this will cause a runtime-error-termination.

// Applying "present" to "enter_data" would cause runtime-error
// (5.1)
#pragma omp target_enter_data(present, to: x)
...
// (5.2)
#pragma omp target_exit_data(present, delete, from: x)

But the user may be expecting to only do a present-check on exit_data here, which would means we need to ignore any modifiers from a from/to clause that's decayed into storage on target_enter/exit_data.

// Ignoring modifiers from the decayed-to-storage clause.
// (6.1)
#pragma omp target_enter_data(to: x) // delete ignored for enter_data
...
// (6.2)
#pragma omp target_exit_data(present, from: x)

But if we do so, then delete gets ignored on target_exit_data, since it came from a decayed to. Maybe we only special-case present and not apply it to decayed-to-storage maps.

So, you are describing target as if it there is a target_enter_data on entry and a target_exit_data on exit, but this isn't how the specification describes it. That's the description for target_data.

For target, a map(delete, to: x) clause has the effect of map(to: x) on entry and map(delete,storage:x) on exit, and a map(present,from:x) clause has the effect of a map(present,storage:x) on entry and a map(from:x) on exit. So, putting it together, for that target construct you effectively get the equivalent of map(present, to: x) on entry and map(delete,from:x) on exit.

Let's suppose the construct was instead target_data map(delete,to: x) map(present,from: x). In that case, we describe it as equivalent to:

// note that "delete" on target_enter_data is just ignored
#pragma omp target_enter_data map(delete,to: x) map(present,storage: x)
#pragma omp task if(0) // a merged "sharing task"
;
#pragma omp target_exit_data map(delete,storage: x) map(present,from: x)

Here, it looks like we hit a possible unintended discrepancy. When we decided to make target_data composite, I'm not sure we accounted for the behavior of the present modifier possibly changing. Since there are now 2 data-mapping regions each with their own entry and exit, as opposed to a single target_data region with one entry and one exit, there are two points at which the present modifier can have an effect. I don't think this behavior is unreasonable for target_data, but just wanted to call it out.

@mjklemm
Copy link
Contributor

mjklemm commented Oct 31, 2025

Tagging @ro-i for awareness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants