Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert commit ba8565fbcb975e2d067ce3ae5a7dbaae4953edd3 #3

Closed
wants to merge 720 commits into from

Conversation

madhur13490
Copy link
Owner

jhuber6 and others added 30 commits October 18, 2023 12:52
- [Libomptarget] Make the references to 'malloc' and 'free' weak.
- [Libomptarget][NFC] Use C++ style attributes instead
…ment (llvm#69405)

When the code is built with -mbackchain, it is possible to retrieve the
caller's frame and return addresses. GCC already can do this, add this
support to Clang as well. Use RISCVTargetLowering and GCC's
s390_return_addr_rtx() as inspiration. Add tests based on what GCC is
emitting.
Use of llvm::Optional was migrated to std::optional. This included a
change in the constructor of ArrayRef.
However, there are still 2 places in the SubtargetEmitter which uses
llvm::None, causing a compile error when emitted.
Specializing for 8-bit integers to ensure values are printed as integers

Fixes llvm#69310
Fix 'Bullet list ends without a blank line; unexpected unindent.
stack-uar.c is flaky (1 in 256 executions) because the random tag
may be zero (llvm#69221).

This patch works around the issue in the same way as deep-recursion.c

(llvm@aa4dfd3),
by falling back to a neighboring object, which must have a different
(non-zero) tag.

This patch also does a minor cleanup of the aforementioned
deep-recursion.c, for consistency with stack-uar.c.

Co-authored-by: Thurston Dang <thurston@google.com>
The user of CodeExtractor should be able to specify that
the aggregate argument should be passed as a pointer in zero address
space.

CodeExtractor is used to generate outlined functions required by OpenMP
runtime. The arguments of the outlined functions for OpenMP GPU code
are in 0 address space. 0 address space does not need to be the default
address space for GPU device. That's why there is a need to allow
the user of CodeExtractor to specify, that the allocated aggregate parameter
is passed as pointer in zero address space.
Document clang support for function pointers and virtual functions with
HIP
…ion (llvm#68628)

When MergeFuncs creates a thunk, it does not modify the function in
place, but creates a new one altogether. If type metadata is not
properly forwarded to this new function, LowerTypeTests will be unable
to put this thunk into the dispatch table.

The fix here is to just forward the type metadata to the newly created
functions.
The current test cases to guard against speculative execution can actually be
safely speculated because the denominator is known to be not 0 or -1, and
isSafeToSpeculativelyExecuteWithOpcode will account for this. This adds some
more test cases and rejigs some existing ones to use an unknown variable
instead.
This makes it a lot easier to make wide ranging changes like I am
about to do in https://llvm.org/D150610.
…ctions (llvm#69407)

This will make it easier to implement new(nothrow) without calling the
throwing version of new when exceptions are disabled. See
https://llvm.org/D150610 for the full discussion.
The MVETRUNC operation can perform the same truncate of two vectors, without
requiring lane inserts/extracts from every vector lane. This moves the concat
i1 lowering to use it for v8i1 and v16i1 result types, trading a bit of extra
stack space for less instructions.
…n to disable memcpy instrumentation (llvm#69240)

Deploying llvm#67766 to a large internal codebase uncovers many bugs (many
are
probably benign but need cleaning up). There are also issues in
high-profile
open-source projects like v8. Add a cl::opt to disable builtin
instrumentation
for -fsanitize=alignment to help large codebase users.

In the long term, this cl::opt option may still be useful to debug
-fsanitize=alignment instrumentation on builtins, so we probably want to
keep it around.
This patch fixes:

  compiler-rt/lib/builtins/int_to_fp_impl.inc:36:10: error: expression
  is not an integer constant expression; folding it to a constant is a
  GNU extension [-Werror,-Wgnu-folding-constant]
This enables reading block sparse from file using libgen! (and soon also
direct IR codegen)
When compiling for Darwin, sigset is not initialized. 

When -Werror,-Wuninitialized-const-reference are enabled we see the
error:
asan_interceptors.cpp:260:38: error: variable 'sigset' is uninitialized
when passed as a const reference argument here
[-Werror,-Wuninitialized-const-reference]

This fixes the error
We can use std::pop_heap first and then retrieve the top priority item
with pop_back_val, saving one line of code.
InlinerOrder::front was removed by:

  commit d3b95ec
  Author: Kazu Hirata <kazu@google.com>
  Date:   Sun Sep 18 08:49:44 2022 -0700

This patch removes a mention of front.
Example:

dimLevelType = [ "compressed", "compressed" ] to
map = (d0, d1) -> (d0 : compressed, d1 : compressed)
With the legality checks in place it is now safe to do. S_MOV_B64 shall
not be used with wide literals, thus updating the test.
…nder with undef elements. (llvm#69482)

Division/remainder by undef is immediate UB across the entire vector.
Skip TrailingAnnotation when looking for TrailingReturnArrow.

Fixes llvm#69234.
…lvm#69510)

Broke https://lab.llvm.org/buildbot/#/builders/181/builds/24470.

Could we build the example/tutorial code in the submit checks? This
breakage wasn't caught at submit time.
…#69495)

Reduce memory usage by only extract unit DIEs when cloning clang
modules. We don't need the full debug info yet at this stage. This
reduces peak memory usage of dsymutil when linking the swift driver by
multiple gigabytes.

rdar://117156180
nikic and others added 25 commits October 20, 2023 09:59
perf2bolt launches a few perf script commands and stores the output in
temporary files before processing the output and cleaning them up before
it exits.

The command `perf script --show-mmap-events` outputs PERF_RECORD_MMAP2
and instruction tracing data but when processed it only looks for
PERF_RECORD_MMAP2 and the instruction tracing data is ignored. This is
fine for small amounts of instruction trace data but when I've recorded
Arm ETM or Intel PT AUX I get lots of it

By adding `--no-itrace` is will just show the PERF_RECORD_MMAP2 records
and will save on time running the `perf script`, disk space storing the
output & time parsing the output.

It is the same for `perf script --show-task-events` where BOLT is only
interested in the PERF_RECORD_COMM & PERF_RECORD_FORK records.

### Data

| Perf Record | Perf Data Size  | MMap Size | MMap No Itrace Size |
|---|---|---|---|
| perf record -e cs_etm/@tmc_etr0/u | 137K | 4468K | 0.632K |
| perf record -e intel_pt//u | 890K | 33378K | 0.673K |
ISO_Fortran_binding.h was only added to in gcc 10.0. Flang should be
buildable with older versions. Remove the test until a safe way to
check that the compiler can run the test (that it is clang from the
build for instance).

Fix bots failure https://lab.llvm.org/buildbot/#/builders/181/builds/24526
Also in:

https://lab.llvm.org/buildbot/#/builders/160
https://lab.llvm.org/buildbot/#/builders/268
https://lab.llvm.org/buildbot/#/builders/181
As described in: ARM-software/acle#257

Patch by: Rosie Sumpter <rosie.sumpter@arm.com>

Reviewed By: dtemirbulatov

Differential Revision: https://reviews.llvm.org/D151709
This makes the docs a little nicer to read, as these otherwise show up
as "«unnamed»".

The extra include is needed as naming means getters are generated, and
the getters use the LLVM types.
…#69192)

In TOSA MLIR dialect, fix the definition of the Clamp op to
accept fp16 & bf16 datatype for the min_fp and max_fp attributes.
Add ClampOp verifier to check attributes types compatibility.
Add related test cases in Tosa/ops.mlir.

Signed-off-by: Fabrizio Indirli <Fabrizio.Indirli@arm.com>
…LFIR (llvm#69441)

The code in `copyHostAssociateVar` is using `createSomeArrayAssignment`
for arrays which is using the soon legacy expression lowering. Update
the copy to use hlfir.assign instead.

I used the temporary_lhs flag to mimic the current behavior, but maybe
user defined assignment should be called when needed .This flag also
prevents any finalizers to be called on the LHS if the LHS type has
finalizers (which would occur otherwise in normal intrinsic assignment).
Again, I am not sure what the OpenMP spec wants here.

Also, I added special handling for ALLOCATABLE, the current code seems
broken to me since it is basically copying the descriptor which would
lead to memory leak given the TEMP was previously allocated with the
shape of the variable in createHostAssociateVarClone. So copying the
DATA instead seemed like the right thing to do.
Power10 does not support Hardware Transactional Memory instructions.
Remove to keep consistency.
)

As requested in (llvm#66521)

I confirmed a crash with "return" instead of "continue" in
setVectorizedCallDecision's fmuladd reduction recognition.
Type extension is currently handled in FIR by inlining the parents
components as the first member of the record type.

This is not correct from a memory layout point of view since the storage
size of the parent type may be bigger than the sum of the size of its
component (due to alignment requirement). To avoid making FIR types
target dependent and fix this issue, make the parent component a single
component with the parent type at the beginning of the record type.

This also simplifies addressing since parent component is now a "normal"
component that can be designated with hlfir.designate.

StructureComponent lowering however is a bit more complex since the
symbols in the structure component may refer to subcomponents of parent
types.

Notes:
1. The fix is only done in HLFIR for now, a similar fix should be done
in ConvertExpr.cpp to fix the path without HLFIR (I will likely still do
it in a new patch since it would be an annoying bug to investigate for
people testing flang without HLFIR).
2. The private component extra mangling is useless after this patch. I
will remove it after 1.
3. The "parent component" TODO in constant CTOR is free to implement for
HLFIR after this patch, but I would rather remove it and test it in a
different patch.
…69615)

This adds a flag to the `TransformDialectInterpreter` that relaxes the
requirement for only a single top-level transform op.
This is useful for supporting transforms that take transform IR as
payload.

This also aligns the function `findTopLevelTransform`
[here](llvm@7b0f4c9#diff-551f92bb609487ccf981daf9571f0f1b1703ab2330560a388a5f0d133e520be4L59)
with its documentation:
In the presence of multiple top-level transform ops it now correctly
returns the first of them after reporting the error instead of returning
a `nullptr`.
This uses the fast-check allowlist added in the previous commit.
This is behind a config option to allow users/developers to enable checks
we haven't timed yet, and to allow the --check-tidy-time flag to work.

Fixes clangd/clangd#1337

Differential Revision: https://reviews.llvm.org/D138505
Add AfterPlacementNew option to SpaceBeforeParensOptions to have more
control on placement new expressions.

Fixes llvm#41501
Relates to llvm#54703

Differential Revision: https://reviews.llvm.org/D127270
Fix two issues:
* If a constant is used in another constant, we need to insert newly
  created instructions to worklist so that constant used in them will
  be converted.
* Set debug info of original instruction to newly created instructions.
A recent change modified the parameter tileSize from Value to
OpFoldResult. Therefore we should call getAsOpFoldResult before passing
on the tileSize.
Adjust a test regarding this new behavior.
Drop code inserting pointer casts. Check pointer types instead of
address spaces.
Fixes llvm#67761
Trying `getDimSize()` before checking for 0-ranked-tensors throws assert
errors. This PR ensures that it is checked for.
Or should we throw an error if we have a 0-ranked-tensor in a tosa
operation?
…lvm#69700)

If PyYAML is not installed, the `-export-fixes` can be used to specify a
directory (not a file).

Mentioning @PiotrZSL @dyung 

Follows llvm#69453
This patch adds "nice-to-have" feature in lit.
it prints the total number of discovered tests at the beginning. It is
covenient to see the total number of tests and avoid scrolling up to the
beginning of log.

Further, this patch also prints %ge of tests.

Reviewed By: RoboTux, jdenny-ornl

Co-authored-by: Madhur A <madhura@nvidia.com>
@madhur13490 madhur13490 changed the title revert patch Revert commit ba8565fbcb975e2d067ce3ae5a7dbaae4953edd3 Oct 20, 2023
@madhur13490 madhur13490 marked this pull request as ready for review October 20, 2023 12:38
@madhur13490 madhur13490 changed the base branch from master to main October 20, 2023 12:38
madhur13490 pushed a commit that referenced this pull request Mar 12, 2024
TestCases/Misc/Linux/sigaction.cpp fails because dlsym() may call malloc
on failure. And then the wrapped malloc appears to access thread local
storage using global dynamic accesses, thus calling
___interceptor___tls_get_addr, before REAL(__tls_get_addr) has
been set, so we get a crash inside ___interceptor___tls_get_addr. For
example, this can happen when looking up __isoc23_scanf which might not
exist in some libcs.

Fix this by marking the thread local variable accessed inside the
debug checks as "initial-exec", which does not require __tls_get_addr.

This is probably a better alternative to llvm#83886.

This fixes a different crash but is related to llvm#46204.

Backtrace:
```
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff6a9d89e in ___interceptor___tls_get_addr (arg=0x7ffff6b27be8) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:2759
#2 0x00007ffff6a46bc6 in __sanitizer::CheckedMutex::LockImpl (this=0x7ffff6b27be8, pc=140737331846066) at /path/to/llvm/compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp:218
#3 0x00007ffff6a448b2 in __sanitizer::CheckedMutex::Lock (this=0x7ffff6b27be8, this@entry=0x730000000580) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_mutex.h:129
#4 __sanitizer::Mutex::Lock (this=0x7ffff6b27be8, this@entry=0x730000000580) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_mutex.h:167
llvm#5 0x00007ffff6abdbb2 in __sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock (mu=0x730000000580, this=<optimized out>) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_mutex.h:383
llvm#6 __sanitizer::SizeClassAllocator64<__tsan::AP64>::GetFromAllocator (this=0x7ffff7487dc0 <__tsan::allocator_placeholder>, stat=stat@entry=0x7ffff570db68, class_id=11, chunks=chunks@entry=0x7ffff5702cc8, n_chunks=n_chunks@entry=128) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_primary64.h:207
llvm#7 0x00007ffff6abdaa0 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__tsan::AP64> >::Refill (this=<optimized out>, c=c@entry=0x7ffff5702cb8, allocator=<optimized out>, class_id=<optimized out>)
 at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_local_cache.h:103
llvm#8 0x00007ffff6abd731 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__tsan::AP64> >::Allocate (this=0x7ffff6b27be8, allocator=0x7ffff5702cc8, class_id=140737311157448)
 at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_local_cache.h:39
llvm#9 0x00007ffff6abc397 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__tsan::AP64>, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate (this=0x7ffff5702cc8, cache=0x7ffff6b27be8, size=<optimized out>, size@entry=175, alignment=alignment@entry=16)
 at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_combined.h:69
llvm#10 0x00007ffff6abaa6a in __tsan::user_alloc_internal (thr=0x7ffff7ebd980, pc=140737331499943, sz=sz@entry=175, align=align@entry=16, signal=true) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_mman.cpp:198
llvm#11 0x00007ffff6abb0d1 in __tsan::user_alloc (thr=0x7ffff6b27be8, pc=140737331846066, sz=11, sz@entry=175) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_mman.cpp:223
llvm#12 0x00007ffff6a693b5 in ___interceptor_malloc (size=175) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:666
llvm#13 0x00007ffff7fce7f2 in malloc (size=175) at ../include/rtld-malloc.h:56
llvm#14 __GI__dl_exception_create_format (exception=exception@entry=0x7fffffffd0d0, objname=0x7ffff7fc3550 "/path/to/llvm/compiler-rt/cmake-build-all-sanitizers/lib/linux/libclang_rt.tsan-x86_64.so",
 fmt=fmt@entry=0x7ffff7ff2db9 "undefined symbol: %s%s%s") at ./elf/dl-exception.c:157
llvm#15 0x00007ffff7fd50e8 in _dl_lookup_symbol_x (undef_name=0x7ffff6af868b "__isoc23_scanf", undef_map=<optimized out>, ref=0x7fffffffd148, symbol_scope=<optimized out>, version=<optimized out>, type_class=0, flags=2, skip_map=0x7ffff7fc35e0) at ./elf/dl-lookup.c:793
--Type <RET> for more, q to quit, c to continue without paging--
llvm#16 0x00007ffff656d6ed in do_sym (handle=<optimized out>, name=0x7ffff6af868b "__isoc23_scanf", who=0x7ffff6a3bb84 <__interception::InterceptFunction(char const*, unsigned long*, unsigned long, unsigned long)+36>, vers=vers@entry=0x0, flags=flags@entry=2) at ./elf/dl-sym.c:146
llvm#17 0x00007ffff656d9dd in _dl_sym (handle=<optimized out>, name=<optimized out>, who=<optimized out>) at ./elf/dl-sym.c:195
llvm#18 0x00007ffff64a2854 in dlsym_doit (a=a@entry=0x7fffffffd3b0) at ./dlfcn/dlsym.c:40
llvm#19 0x00007ffff7fcc489 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffd310, operate=0x7ffff64a2840 <dlsym_doit>, args=0x7fffffffd3b0) at ./elf/dl-catch.c:237
llvm#20 0x00007ffff7fcc5af in _dl_catch_error (objname=0x7fffffffd368, errstring=0x7fffffffd370, mallocedp=0x7fffffffd367, operate=<optimized out>, args=<optimized out>) at ./elf/dl-catch.c:256
llvm#21 0x00007ffff64a2257 in _dlerror_run (operate=operate@entry=0x7ffff64a2840 <dlsym_doit>, args=args@entry=0x7fffffffd3b0) at ./dlfcn/dlerror.c:138
llvm#22 0x00007ffff64a28e5 in dlsym_implementation (dl_caller=<optimized out>, name=<optimized out>, handle=<optimized out>) at ./dlfcn/dlsym.c:54
llvm#23 ___dlsym (handle=<optimized out>, name=<optimized out>) at ./dlfcn/dlsym.c:68
llvm#24 0x00007ffff6a3bb84 in __interception::GetFuncAddr (name=0x7ffff6af868b "__isoc23_scanf", trampoline=140737311157448) at /path/to/llvm/compiler-rt/lib/interception/interception_linux.cpp:42
llvm#25 __interception::InterceptFunction (name=0x7ffff6af868b "__isoc23_scanf", ptr_to_real=0x7ffff74850e8 <__interception::real___isoc23_scanf>, func=11, trampoline=140737311157448)
 at /path/to/llvm/compiler-rt/lib/interception/interception_linux.cpp:61
llvm#26 0x00007ffff6a9f2d9 in InitializeCommonInterceptors () at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors.inc:10315
```

Reviewed By: vitalybuka, MaskRay

Pull Request: llvm#83890
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment