Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RemoveDIs] Update migration doc to explain how to handle textual IR updates #91725

Merged
merged 1 commit into from
Jun 10, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions llvm/docs/RemoveDIsDebugInfo.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,107 @@ The second matter is that if you transfer sequences of instructions from one pla

For a more in-depth overview of how to update existing code to support debug records, see [the guide below](#how-to-update-existing-code).

## Textual IR Changes

As we change from using debug intrinsics to debug records, any tools that depend on parsing IR produced by LLVM will need to handle the new format. For the most part, the difference between the printed form of a debug intrinsic call and a debug record is trivial:

1. An extra 2 spaces of indentation are added.
2. The text `(tail|notail|musttail)? call void @llvm.dbg.<type>` is replaced with `#dbg_<type>`.
3. The leading `metadata ` is removed from each argument to the intrinsic.
4. The DILocation changes from being an instruction attachment with the format `!dbg !<Num>`, to being an ordinary argument, i.e. `!<Num>`, that is passed as the final argument to the debug record.

Following these rules, we have this example of a debug intrinsic and the equivalent debug record:

```
; Debug Intrinsic:
call void @llvm.dbg.value(metadata i32 %add, metadata !10, metadata !DIExpression()), !dbg !20
; Debug Record:
#dbg_value(i32 %add, !10, !DIExpression(), !20)
```

### Test updates

Any tests downstream of the main LLVM repo that test the IR output of LLVM may break as a result of the change to using records. Updating an individual test to expect records instead of intrinsics should be trivial, given the update rules above. Updating many tests may be burdensome however; to update the lit tests in the main repository, the following steps were used:

1. Collect the list of failing lit tests into a single file, `failing-tests.txt`, separated by (and ending with) newlines.
2. Use the following line to split the failing tests into tests that use update_test_checks and tests that don't:
```
$ while IFS= read -r f; do grep -q "Assertions have been autogenerated by" "$f" && echo "$f" >> update-checks-tests.txt || echo "$f" >> manual-tests.txt; done < failing-tests.txt
```
3. For the tests that use update_test_checks, run the appropriate update_test_checks script - for the main LLVM repo, this was achieved with:
```
$ xargs ./llvm/utils/update_test_checks.py --opt-binary ./build/bin/opt < update-checks-tests.txt
$ xargs ./llvm/utils/update_cc_test_checks.py --llvm-bin ./build/bin/ < update-checks-tests.txt
```
4. The remaining tests can be manually updated, although if there is a large number of tests then the following scripts may be useful; firstly, a script used to extract the check-line prefixes from a file:
```
$ cat ./get-checks.sh
#!/bin/bash

# Always add CHECK, since it's more effort than it's worth to filter files where
# every RUN line uses other check prefixes.
# Then detect every instance of "check-prefix(es)=..." and add the
# comma-separated arguments as extra checks.
for filename in "$@"
do
echo "$filename,CHECK"
allchecks=$(grep -Eo 'check-prefix(es)?[ =][A-Z0-9_,-]+' $filename | sed -E 's/.+[= ]([A-Z0-9_,-]+).*/\1/g; s/,/\n/g')
for check in $allchecks; do
echo "$filename,$check"
done
done
```
Then a second script to perform the work of actually updating the check-lines in each of the failing tests, with a series of simple substitution patterns:
```
$ cat ./substitute-checks.sh
#!/bin/bash

file="$1"
check="$2"

# Any test that explicitly tests debug intrinsic output is not suitable to
# update by this script.
if grep -q "write-experimental-debuginfo=false" "$file"; then
exit 0
fi

sed -i -E -e "
/(#|;|\/\/).*$check[A-Z0-9_\-]*:/!b
/DIGlobalVariableExpression/b
/!llvm.dbg./bpostcall
s/((((((no|must)?tail )?call.*)?void )?@)?llvm.)?dbg\.([a-z]+)/#dbg_\7/
:postcall
/declare #dbg_/d
s/metadata //g
s/metadata\{/{/g
s/DIExpression\(([^)]*)\)\)(,( !dbg)?)?/DIExpression(\1),/
/#dbg_/!b
s/((\))?(,) )?!dbg (![0-9]+)/\3\4\2/
s/((\))?(, ))?!dbg/\3/
" "$file"
```
Both of these scripts combined can be used on the list in `manual-tests.txt` as follows:
```
$ cat manual-tests.txt | xargs ./get-checks.sh | sort | uniq | awk -F ',' '{ system("./substitute-checks.sh " $1 " " $2) }'
```
These scripts dealt successfully with the vast majority of checks in `clang/test` and `llvm/test`.
5. Verify the resulting tests pass, and detect any failing tests:
```
$ xargs ./build/bin/llvm-lit -q < failing-tests.txt
********************
Failed Tests (5):
LLVM :: DebugInfo/Generic/dbg-value-lower-linenos.ll
LLVM :: Transforms/HotColdSplit/transfer-debug-info.ll
LLVM :: Transforms/ObjCARC/basic.ll
LLVM :: Transforms/ObjCARC/ensure-that-exception-unwind-path-is-visited.ll
LLVM :: Transforms/SafeStack/X86/debug-loc2.ll


Total Discovered Tests: 295
Failed: 5 (1.69%)
```
6. Some tests may have failed - the update scripts are simplistic and preserve no context across lines, and so there are cases that they will not handle; the remaining cases must be manually updated (or handled by further scripts).

# C-API changes

Some new functions that have been added are temporary and will be deprecated in the future. The intention is that they'll help downstream projects adapt during the transition period.
Expand Down
Loading