Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests with asan failures (mostly leaks) #5715

Closed
15 of 19 tasks
GMNGeoffrey opened this issue May 3, 2021 · 2 comments
Closed
15 of 19 tasks

Tests with asan failures (mostly leaks) #5715

GMNGeoffrey opened this issue May 3, 2021 · 2 comments
Labels
bug 🐞 Something isn't working good first issue 🌱 Good for newcomers infrastructure Relating to build systems, CI, or testing

Comments

@GMNGeoffrey
Copy link
Contributor

GMNGeoffrey commented May 3, 2021

A bunch of our tests have leaks (and perhaps other asan failures). I'm filing a separate bug for leaks we get via vulkan, since that's one big thing instead of a bunch of little things.

Logs: https://gist.github.com/GMNGeoffrey/0babfa9925b4545ad9673aa25c3747ef

The full list of failing tests is:

  • "iree/base/testing/dynamic_library_test"
  • "iree/base/internal/wait_handle_test"
  • "iree/samples/static_library/static_library_demo_test"
  • "iree/samples/simple_embedding/simple_embedding_vulkan_test"
  • "iree/modules/check/check_test"
  • "iree/hal/cts/semaphore_test"
  • "iree/hal/cts/semaphore_submission_test"
  • "iree/hal/cts/executable_layout_test"
  • "iree/hal/cts/event_test"
  • "iree/hal/cts/driver_test"
  • "iree/hal/cts/descriptor_set_layout_test"
  • "iree/hal/cts/command_buffer_test"
  • "iree/hal/cts/buffer_mapping_test"
  • "iree/hal/cts/allocator_test"
  • "iree/base/internal/file_io_test"
  • "iree/tools/test/iree-benchmark-module.mlir.test"
  • "iree/tools/test/iree-run-module.mlir.test"
  • "iree/tools/test/multiple_exported_functions.mlir.test"
  • "bindings/tflite/smoke_test"

Canonical location is the list of exclusions from the asan build

Fix a test, remove it from the list

@GMNGeoffrey GMNGeoffrey added bug 🐞 Something isn't working good first issue 🌱 Good for newcomers labels May 3, 2021
@GMNGeoffrey GMNGeoffrey self-assigned this May 3, 2021
@GMNGeoffrey
Copy link
Contributor Author

iree/base/internal/file_io_test appears to ahve been fixed by c8cd350

hanhanW pushed a commit to hanhanW/iree that referenced this issue May 21, 2021
hanhanW pushed a commit to hanhanW/iree that referenced this issue May 21, 2021
@GMNGeoffrey GMNGeoffrey removed their assignment Nov 12, 2021
ScottTodd added a commit that referenced this issue Mar 15, 2022
Progress on #5715 and #5716

Leaks in the Vulkan-related libraries we use were hidden behind incomplete handling of shared library loading/unloading in ASan. By disabling calls to `dlclose()` in both `iree/base/internal/dynamic_library_posix.c` and the Vulkan Loader (`libvulkan.so.1`) so those libraries remained open for ASan to reference, I was able to get useful leak reports. Those reports showed that my NVIDIA system Vulkan ICD (`libnvidia-glcore.so`) was leaking and an up to date SwiftShader (`libvk_swiftshader.so`) was _not_ leaking.

This PR updates SwiftShader to a commit that doesn't leak (with our usage, anyways) and enables most of the Vulkan tests that were previously excluded from running under ASan.

---

A few tests are still failing with crashes in ASan, with logs like this:
```
Tracer caught signal 11: addr=0x0 pc=0x50c558 sp=0x7fb28fdffd10
==50923==LeakSanitizer has encountered a fatal error.
==50923==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==50923==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
```

([full logs here](https://source.cloud.google.com/results/invocations/a37ab871-4cab-4591-a5d6-8ad849f196e3/targets/iree%2Fgcp_ubuntu%2Fcmake%2Flinux%2Fx86-swiftshader-asan%2Fpresubmit/log)), so I'm keeping those disabled explicitly.
stellaraccident pushed a commit that referenced this issue Apr 22, 2022
mariecwhite pushed a commit to mariecwhite/iree that referenced this issue May 6, 2022
@julianwa julianwa added the infrastructure Relating to build systems, CI, or testing label Jun 13, 2022
benvanik added a commit that referenced this issue Aug 24, 2022
Same behavior as the other excluded tests from #5715.
benvanik added a commit that referenced this issue Aug 25, 2022
Same behavior as the other excluded tests from #5715.
benvanik added a commit that referenced this issue Aug 30, 2022
commit f62ec3b
Author: bjacob <benoitjacob@google.com>
Date:   Tue Aug 30 14:51:10 2022 -0400

    VMVX mmt4d ukernel (#10239)

    This brings an initial (unoptimized, reference code only) mmt4d ukernel - both `f32f32f32` and `i8i8i32`.

    It is covered by the e2e matmul tests: if you purposefully introduce a numerical bug in the ukernel function, `iree_vmvx_mmt4d_f32f32f32` then this test fails: `iree/tests/e2e/matmul/e2e_matmul_mmt4d_f32_small_ukernel_vmvx_local-task` . Ditto for `i8i8i32`.

    That the whole reference code is for now in `module.c`, as opposed to being nicely isolated in `iree/builtings/ukernel`, is temporary. I have a few questions to ask about the placeholders in this directory, but it will be so much more concrete to discuss after we are done reviewing this PR so I hope that's OK to split as a separate code move for another PR.

    A couple of nontrivial decisions in this PR:

    * In `LowerLinalgMicrokernels.cpp` there was a `isUnitInnerStride` helper function. It was only applied to 2D memrefs. The underlying question is how much layout generality do we want ukernels to support, and the existing code embodied a decision on this for 2D arrays, but mmt4d deals with 4D arrays so the question was how to generalize this from 2D to 4D arrays. I chose to generalize `isUnitInnerStride` into `areInnerDimsContiguousRowMajor`. See the comment where it is defined. The lit test, `lower_linalg_microkernels.mlir`, has testcases to cover several edge cases here.

    * Similar to what we decided last week for matmul in #10211, there was the question of how to deal with the accumulators that is nonzero in the general case but that we know will often be zero in practice so that we will want to retain the ability to take advantage of that. This is handled here exactly like it was for matmul in #10211. I even reused the flag symbolic constant rather than create a separate one. Yay for weak typing.

commit ddaaaaa
Author: Jerry Wu <cheyuw@google.com>
Date:   Tue Aug 30 11:10:40 2022 -0700

    Generate CMake rules to download and import models (#10167)

commit 753ac4d
Author: Geoffrey Martin-Noble <gcmn@google.com>
Date:   Tue Aug 30 11:04:59 2022 -0700

    Remove RV32 Mobile Bert Compilation Benchmark (#10234)

    Building the benchmarks is currently the critical path in CI latency,
    taking almost 25 minutes for just that job, after it waits 25 minutes
    for the TF integrations binaries (was that alway so slow??).

    [![ci_run_graph](https://user-images.githubusercontent.com/5732088/187279027-21137775-5a3b-4ddf-ae4d-42e39051e7b2.png)](https://github.com/iree-org/iree/actions/runs/2950708667)


    Of that time, 20 minutes is spent compiling this one vmfb, which we
    only do so we can get statistics on how long it takes to compile. I
    sampled the ten slowest build actions from a local build of the
    benchmarks:

    ```
    1179.39 benchmark_suites/TFLite/vmfb/mobilebert-baseline-tf2-quant.tflite.mlir-22179362840f853977acc734ee75e6ce.vmfb
    216.321 benchmark_suites/TFLite/vmfb/mobilebert-baseline-tf2-quant.tflite.mlir-53b16b00b2d02162b1706d73ab6270b4.vmfb
    159.585 benchmark_suites/TFLite/vmfb/mobilebert-baseline-tf2-quant.tflite.mlir-3bcb3f959e9f123bbaa01aa4d237bab8.vmfb
    146.027 benchmark_suites/TFLite/vmfb/mobilebert-baseline-tf2-quant.tflite.mlir-73879267ae95d3551e73c7f078f4410d.vmfb
    109.922 benchmark_suites/TFLite/vmfb/mobilebert-baseline-tf2-quant.tflite.mlir-cf781c710ad5c59b5e7f205b17b3c37b.vmfb
    104.864 benchmark_suites/TFLite/vmfb/mobilebertsquad.tflite.mlir-fddd07b06a1abf9f5d4ea97225066f01.vmfb
    88.665 benchmark_suites/TFLite/vmfb/mobilebertsquad.tflite.mlir-4fe50b8684bdd4684941c8a5698d3a48.vmfb
    88.316 benchmark_suites/TFLite/vmfb/mobilebertsquad.tflite.mlir-833fba075c9cf413b8acbea9be0acade.vmfb
    87.238 benchmark_suites/TFLite/vmfb/mobilebert-baseline-tf2-float.tflite.mlir-8a916ab990bd1cb5521dce6dd6a5ac6a.vmfb
    86.905 benchmark_suites/TFLite/vmfb/mobilebert-baseline-tf2-float.tflite.mlir-e304860762a8369f86b813d45b3c699a.vmfb
    ```

    This one is the clear winner. I don't think it's worth running this
    compilation only to discover what we already know (it is very slow to
    run this compilation).

    Tested:
    - `build_benchmarks` for this PR ran in 7 minutes instead of 25.
    - Ran riscv benchmark pipeline.

commit a456db6
Author: CindyLiu <hcindyl@google.com>
Date:   Mon Aug 29 13:57:29 2022 -0700

    Add iree_bytecode_module and iree_c_module static lib support (#10231)

    Check and parse `iree-llvm-static-library-output-path` flag to add
    static library object support.

    To make the secondary function like iree_static_linker_test cleaner.

commit 6d4b129
Author: Thomas <thomasraoux@google.com>
Date:   Mon Aug 29 13:32:42 2022 -0700

    Fix gcc build (#10235)

    Prevent ambigous constructor call.

commit 6fa18e0
Author: Kojo Acquah <KoolJBlack@users.noreply.github.com>
Date:   Mon Aug 29 12:20:04 2022 -0700

    Implementation of GPU Shared Memory Transpose Pipeline (#10209)

    Currently only `32x32` aligned 2D transposes are supported. Based on
    https://developer.nvidia.com/blog/efficient-matrix-transpose-cuda-cc/,
    uses a fixed tile size of `32x32` and workgroup size of `{8x32}` to
    preform vectorized copy for transpose. The tile is padded to `32x33` to
    reduce bank conflicts. Note that bank conflicts aren fully eliminated
    due to use of vector load/store 4.

    Todo:
    * Move beyond single hard coded workgroup and tile size?
    * Handle non aligned transpose
    * Handle dynamic sized transpose

    Related to #10005

commit 546ffcb
Author: Jerry Wu <cheyuw@google.com>
Date:   Mon Aug 29 11:49:58 2022 -0700

    Fix the typos of riscv names in CI (#10232)

commit da21e83
Author: Han-Chung Wang <hanchung@google.com>
Date:   Tue Aug 30 02:29:41 2022 +0800

    Optimize tiling sizes heuristics for elementwise dispatches. (#10179)

    In the past, small numbers could be picked because we want vectorization
    enabled for all the kernels. The PR picks a more reasonable tiling sizes
    and addresses tiny dispatch issues.

    The peeling pipeline works in IREE and the PR moves elementwise dispatches
    (and copy only dispatches) to use peeling approach. In this case, we're
    still able to vectorize the dispatches.

    This PR changes the logic to limit the unroll factor when computing the
    vector level tiling sizes. It avoids generating many operations, which saves
    many compilation time and binary size for quantized models. It also
    improves models performance that IREE is tracking for all CPU backends.

    Fixes #9660

commit 249c813
Author: Thomas <thomasraoux@google.com>
Date:   Mon Aug 29 11:06:31 2022 -0700

    [LLVMGPU] Move bufferization after vectorization for matmulSIMT (#10217)

    Transition mamul SIMT pipeline to do vectorization before bufferization.
    This relies on alloc_tensor op to model shared memory promotion and
    foreach_thread for the tiling at the tensor level.

    Also simplify significantly the vectorization pass by removing patterns
    not needed anymore.

    This will allow us to do more optimizations at the tensor going forward.

commit 3f173de
Author: Geoffrey Martin-Noble <gcmn@google.com>
Date:   Mon Aug 29 09:52:45 2022 -0700

    Pin GitHub runner configuration to a specific commit (#10218)

    This changes the startup script on the runners to fetch configuration
    from a specific commit, rather than directly from tip of tree on
    `main`. That makes it possible to actually test, canary, and roll back
    configuration changes, almost as if this were a real production system.

    There are some early-stage scripts to automate the creation of
    templates and managed instance group roll-outs. I've also set up
    functionality to have testing runner groups. Because of the way
    targeting runners works, that means that workflow have to explicitly
    specify the environment so that testing runners *don't* pick up the
    job. The testing group will allow testing new runner configurations on
    presubmit as much as possible.

    Of course, for this change, I actually can't do the safe thing because
    I can't test adding the extra tag to the runners. I've still pushed
    a new template to the testing instance group and set the `build_all`
    job for this PR to run on it by targeting a specific instance by
    hostname: https://github.com/iree-org/iree/runs/8027570693. (Note that
    that run actually had a failure in the asan workflow, but that wasn't
    running on my runner and I don't think it could possibly be related).

    Note that because this doesn't alter the `config/` directory,
    submitting it will not have any effect on the current runners.

    skip-ci

    Peeled out of #10133

commit c50bac3
Author: Geoffrey Martin-Noble <gcmn@google.com>
Date:   Mon Aug 29 09:49:10 2022 -0700

    Build Linux releases on big managed runners (#10126)

    This speed the linux builds up a bit, bringing the time for the longest
    job down from ~5 hours to ~20 minutes. Note that this is *only* the
    Linux jobs. The mac ones still take about 4 hours. This should still
    help when iterating on the release though and for faster failure
    indicators (it was indeed helpful when I was iterating here).

    I ran into issues when testing because I was using a package suffix in
    the workflow dispatch, which evidently had never actually been tested
    because it was totally broken. This gave me a lot of wonderful
    opportunity to bash my head against bash and I reworked a lot of the
    `build_linux_package.sh` script. In retrospect, I wish I'd just removed
    the `package_suffix` feature.

    Test run: https://github.com/iree-org/iree/actions/runs/2923210349

    skip-ci

commit c338ae9
Author: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
Date:   Fri Aug 26 15:25:22 2022 -0700

    Cherry pick D132720 (#10227)

    Cherry pick : llvm/llvm-project@a235562
    Cherry pick : llvm/llvm-project@766f5d8

commit df4c96e
Author: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
Date:   Fri Aug 26 14:09:45 2022 -0700

    Cherry-pick llvm/llvm-project@7744253 (#10226)

    Towards landing #10177

commit b533909
Author: bjacob <benoitjacob@google.com>
Date:   Fri Aug 26 15:16:39 2022 -0400

    Support the i8i8i32 case in vmvx matmul ukernel. (#10222)

commit 62d2be5
Author: Scott Todd <scotttodd@google.com>
Date:   Fri Aug 26 11:46:33 2022 -0700

    [NFC] Slight cleanup in HAL compiler passes. (#10223)

commit 8a48e10
Author: Thomas <thomasraoux@google.com>
Date:   Fri Aug 26 11:09:30 2022 -0700

    Cherry-pick llvm/llvm-project@2e34599b and llvm/llvm-project@1ee0d60a (#10221)

    * commit 2e34599bfd01e5b20e09bd6af590a52d6a63a64c
    * commit 1ee0d60a9be5dcbe3234b81a1c93e6a206a88154

commit cf5a5d5
Author: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
Date:   Fri Aug 26 10:10:40 2022 -0700

    Find root by traversing the compute ops in reverse. (#10210)

    Since most of the codegeneration uses tile + fuse, where the consumer
    is tiled and the producer is fused with it, find the root by
    traversing the ops in reverse.

    Issue #10208

commit 272ea37
Author: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
Date:   Fri Aug 26 10:00:56 2022 -0700

    Change `softmax` test to use `maxf`. (#10219)

    The e2e softmax test uses `cmpf` -> `select` for max operations. Use
    `maxf`instead. This allows the op to be vectorized. The TOSA to Linalg
    lowering has been recently updated to do the same (and this test was
    derived from using an older TOSA to Linalg lowering).

    Related to PR #10177

commit 233795f
Author: bjacob <benoitjacob@google.com>
Date:   Fri Aug 26 12:00:09 2022 -0400

    Tidy the VMVX ukernels matmul interface (#10211)

    This makes the VMVX ukernel interface for matmul somewhat sustainable and generalizable.

    It's official now that the only supported case is when all operands are row-major (more general support might be wanted in the future, but would have to allow separate storage orders for each operand in order to be likely to be used).

    The only flag now is one bit to tell whether to accumulate into an existing accumulator, or just zero it. At the moment we always accumulate but could soon generate calls without the accumulate flag when compiling code where the accumulator operand is known to be zero-filled. In terms of optimized runtime code, it is nearly zero overhead to support that boolean degree of generality in the ukernel.

    The "reference" ukernel impl is changed to be a little more suggestive of how an optimized impl would look.

    The alpha, beta parameters are gone. There were hard to generalize to integer data types, and they were mostly gratuitous generality anyway (they didn't do the same as the namesake BLAS GEMM parameters).

commit 094ec6d
Author: Lei Zhang <antiagainst@google.com>
Date:   Thu Aug 25 19:34:02 2022 -0400

    Integrate llvm/llvm-project@71604f4c4c30 (#10204)

    * Reset third_party/llvm-project: 8f45b5a7a90f24ae1dabeff161e22594039a8b0a (2022-08-24 20:26:48 +0000): RISCV: permit unaligned nop-slide padding emission
    * Updated tensorflow/tensorflow@aed7775
    * Updated tensorflow/mlir-hlo@3b1b023
    * Fixed mhlo include paths

commit 4f0c5b1
Author: Jakub Kuderski <kubak@google.com>
Date:   Thu Aug 25 19:05:01 2022 -0400

    Add debug option to dump LLVMCPU/GPU pass pipeline (#10214)

    This is enabled using the
    `--debug-only=iree-llvm-cpu-lowering-pass-pipeline` and `--debug-only=iree-llvm-gpu-lowering-pass-pipeline` flags.
    The SPIR-V codegen path has a similar option.

commit acb7355
Author: bjacob <benoitjacob@google.com>
Date:   Thu Aug 25 16:24:21 2022 -0400

    Add e2e matmul tests on vmvx+ukernels (float32-only for now) (#10193)

    Other types than `float32` are blocked on vmvx ukernels support for those (#9903).  I'm interested in landing float32 support early because the path to supporting other data types goes through breaking changes in the existing vmvx ukernel interface for matmul (limiting the generality of the BLAS-inspired interface, particularly the `alpha` and `beta` parameters) so I want to have e2e tests in place at the start of that process.

commit 8863f9e
Author: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
Date:   Thu Aug 25 12:02:11 2022 -0700

    Cherry pick llvm/llvm-project@71604f4 (#10207)

    Fixes #10194

commit 22e6bd4
Author: bjacob <benoitjacob@google.com>
Date:   Thu Aug 25 15:00:18 2022 -0400

    try to be compatible with more pyyaml versions (#10206)

commit da6829d
Author: Scott Todd <scotttodd@google.com>
Date:   Thu Aug 25 11:36:04 2022 -0700

    Replace dedicated host_tools CI job with superset build_all. (#10195)

    Relates to #9855

    These builds shared the same options but just built different targets. Just building the tools _is_ faster than building the tools and tests, but not by enough to justify having a separate job. The build_host_tools.sh script is still referenced by some samples, so I think it's worth keeping for a bit.

    * Spell out `build-dir-gcs-artifact` and `binaries-gcs-artifact` to match other output names
    * Remove host_tools.yml
    * Replace host_tools_assertions with build_all. Note that this uses GCS instead of upload-artifact/download-artifact for transferring archives between jobs
    * Sort jobs in `needs:` so the summary graph groups jobs as expected

    Note: `${BUILD_DIR}/install` is implicit. It could be made explicit with more plumbing.

    Co-authored-by: Geoffrey Martin-Noble <gcmn@google.com>

commit 38e718e
Author: bjacob <benoitjacob@google.com>
Date:   Thu Aug 25 13:32:55 2022 -0400

    Fix printing of matrices on test failure: was overflowing (#10202)

commit d8cabf7
Author: Kevin Gleason <gleasonk@google.com>
Date:   Thu Aug 25 12:20:52 2022 -0400

    Allow blank issues to be created (#10197)

    Currently clicking the "Blank Issue" button loops you back to the issue choose page because blank issues are disabled.

    When disabled, the following redirect is in place:
    https://github.com/iree-org/iree/issues/new  --> https://github.com/iree-org/iree/issues/new/choose

    Background: I based the StableHLO issues config off this file, and noticed that the blank issues are not working on that repo because they are disabled. Flipping this boolean did the trick in openxla/stablehlo.

commit 579d527
Author: Matthias Springer <springerm@google.com>
Date:   Thu Aug 25 09:37:22 2022 +0200

    Add CPU matmul benchmark test (#10174)

    This test illustrates how a simple matmul example can be compiled with
    the transform dialect and then benchmarked. Parameter search will use
    the commands that are used in this test.

commit 85171e9
Author: Lei Zhang <antiagainst@google.com>
Date:   Wed Aug 24 21:30:51 2022 -0400

    Cherry-pick MHLO dependency fix to fix release (#10198)

commit 1adcebb
Merge: 8301a5c 7fe1437
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Wed Aug 24 18:14:42 2022 -0700

    Merge pull request #10181 from iree-org/benvanik-execute-commands

    Secondary command buffers can now be executed from primary command buffers via iree_hal_command_buffer_execute_commands. During recording of nested command buffers push descriptors can indirectly reference slots in a binding table provided with each execution request. This enables the same reusable command buffer to be executed many times with unique bindings (even with prior execution in-flight), which is a common pattern with queue-ordered allocations.

    In the future we could allow the indirect bindings on primary command buffers as well but that requires more work in each backend to support and for now making it nested-only lets us turn on the feature incrementally. For now nothing supports either nested or indirect bindings so this is pure plumbing.

    The compiler has the HAL ops modeled but nothing is lowering into them yet; a pass that memoizes portions of streams and sets up the indirect binding references is required.

    Progress on #10144.
    Bumps bytecode version due to HAL changes.

commit 7fe1437
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Tue Aug 23 22:10:56 2022 -0700

    Disabling ASAN fully_connected.mlir test due to swiftshader issue.
    Same behavior as the other excluded tests from #5715.

commit dd93b3c
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Tue Aug 23 16:28:16 2022 -0700

    Bumping bytecode version due to breaking HAL changes.

commit 9bd7031
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Tue Aug 23 10:44:30 2022 -0700

    Plumbing support for nested command buffers and binding tables.
    Secondary command buffers can now be executed from primary command
    buffers via iree_hal_command_buffer_execute_commands. During recording
    of nested command buffers push descriptors can indirectly reference
    slots in a binding table provided with each execution request. This
    enables the same reusable command buffer to be executed many times
    with unique bindings (even with prior execution in-flight), which is
    a common pattern with queue-ordered allocations.

    In the future we could allow the indirect bindings on primary command
    buffers as well but that requires more work in each backend to support
    and for now making it nested-only lets us turn on the feature
    incrementally.

    The compiler has the HAL ops modeled but nothing is lowering into them
    yet; a pass that memoizes portions of streams and sets up the indirect
    binding references is required.

    Progress on #10144.

commit 8301a5c
Author: Scott Todd <scotttodd@google.com>
Date:   Wed Aug 24 16:39:27 2022 -0700

    Rework build_benchmarks to reuse already built host tools. (#10190)

    This should address #4662 (comment). This workflow is currently our slowest, taking ~32 minutes (of which half of that time is spent rebuilding `iree-compile`, and that's 30 minutes _after_ blocking on the 20 minute build_tf_integrations job).

    New timing is ~20 minutes (saving 10 minutes): https://github.com/iree-org/iree/runs/8004780350?check_suite_focus=true

commit cca2ff6
Author: bjacob <benoitjacob@google.com>
Date:   Wed Aug 24 18:46:11 2022 -0400

    Handle rank-reducing subviews in ResolveBufferDescriptors (#10192)

commit d9e6eb7
Author: CindyLiu <hcindyl@google.com>
Date:   Wed Aug 24 10:54:06 2022 -0700

    Update the candidate commitish value with the last green commit (#10183)

    Make it consistent with the rest of the release steps.

commit 1d55c6c
Author: Thomas <thomasraoux@google.com>
Date:   Wed Aug 24 10:06:44 2022 -0700

    clean up workaround after upstream fix (#10188)

commit c9e9482
Author: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
Date:   Wed Aug 24 08:20:05 2022 -0700

    Cherry-pick llvm/llvm-project@a7bfdc2 (#10150)

commit 00d34d1
Author: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
Date:   Wed Aug 24 07:59:13 2022 -0700

    NFC: Refactoring to make extending fusion heuristics in dispatch formation easier. (#10187)

    Minor refactoring to allow for extending fusion heuristics for fusing
    root with producers.

commit 3c69ea9
Author: Jakub Kuderski <kubak@google.com>
Date:   Wed Aug 24 10:31:08 2022 -0400

    [iree-run-module] Do not abort when `Run` fails. (#10186)

commit 63d4693
Author: Jakub Kuderski <kubak@google.com>
Date:   Wed Aug 24 10:30:50 2022 -0400

    [iree-run-module] Clarify how to pass scalar inputs. NFC. (#10185)

    Be more explicit and provide an example.

commit 2ec165b
Author: Lei Zhang <antiagainst@google.com>
Date:   Wed Aug 24 00:33:04 2022 -0400

    Integrate llvm/llvm-project@4332b049edf6 (#10180)

    * Reset third_party/llvm-project: 4332b049edf6ccf98c9e31dcc983760a89f01d40 (2022-08-23 17:37:12 +0800): [docs] Add examples for printing asynchronous stack for coroutines
    * Updated tensorflow/tensorflow@55791c2
    * Updated tensorflow/mlir-hlo@184a76a
    * Fixed mhlo/chlo enum split.

commit ae72b95
Author: CindyLiu <hcindyl@google.com>
Date:   Tue Aug 23 15:26:08 2022 -0700

    Add llvm static library linker test targets (#10149)

    * Add llvm static library linker test targets

    Add a cmake function to build/test llvm static library modules with
    the llvm-cpu compiler target backend and executed using the
    local-sync runtime HAL driver. The executable is linked
    to a simple runtime runner generated by a template.

    Add simple e2e mlir linker tests in `tests/e2e/models`.

commit e4dc88c
Author: Rob Suderman <suderman@google.com>
Date:   Tue Aug 23 10:54:02 2022 -0700

    Update flex ops test for the TFLite front-end test (#10164)

commit 57ec69d
Author: Thomas <thomasraoux@google.com>
Date:   Tue Aug 23 09:01:23 2022 -0700

    [LLVMGPU] Start transitioning to scf.foreach for second level tiling (#10166)

    This will allow doing distribution at the tensor level.

commit bd33104
Merge: 35d28b9 d1ca241
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Tue Aug 23 08:55:34 2022 -0700

    Merge pull request #10170 from iree-org/benvanik-pipeline-layout-3

    Replacing descriptor set layout usage with a flag bitfield.
    Descriptor sets are only used in layouts and the usage is now always push-only today. As we support things like binding tables
    we may want to indicate which bindings may come from tables and if we want to carry access information (which bindings are read-only, etc) we'll need somewhere for that too: instead of having 4 enums with 2 options each we'll just mash them together for now.

    This also adds a per-descriptor flag that can be used for indicating binding behavior. Today it's got a bit indicating whether the particular descriptor is read-only but we could extend it to support caching behavior (non-temporal, atomics, etc).

    The upstream bitfield enum has some glitchy behavior with lowercase strings (hardcoded to look for "None" instead of "none", etc) - I've got a refresh of the HAL dialect I've got to do at some point and will normalize things then.

    Progress on #10144.
    VMFB version bumped because of breaking type/export name change.

commit 35d28b9
Author: Matthias Springer <springerm@google.com>
Date:   Tue Aug 23 15:31:15 2022 +0200

    Support multiple target ops in clone_succeeding_op_into_dispatch_region (#10035)

    The target ops are sorted topoloically before cloning them one-by-one.
    This is to ensure that there are no dominance violations.

commit b5bf9d5
Author: Matthias Springer <springerm@google.com>
Date:   Tue Aug 23 14:31:02 2022 +0200

    Add clone_succeeding_op_into_dispatch_region transform op (#10022)

    This op is symmetric to `clone_preceding_op_into_dispatch_region` and
    can be used to build heuristics for dispatch region formation.

commit 7e8c831
Author: Matthias Springer <springerm@google.com>
Date:   Tue Aug 23 11:56:07 2022 +0200

    Support multiple target ops in clone_preceding_op_into_dispatch_region (#10020)

    The target ops are sorted topoloically before cloning them one-by-one.
    This is to ensure that there are no dominance violations.

commit dc06d95
Author: Han-Chung Wang <hanchung@google.com>
Date:   Tue Aug 23 14:26:26 2022 +0800

    [NFC] Remove outdated method arguments from KernelConfig. (#10165)

    The distribution tiling was done at flow level, and it's moved to a
    stage after setting kernel configurations. We no longer need the
    tiledLoop information when setting configurations.

    Also apply minor cleanups when revisiting the file -- use `.empty()`
    method instead of `.size() > 0`.

commit d1ca241
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 21:42:35 2022 -0700

    Bumping bytecode version due to breaking HAL changes.

commit a4da601
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 15:50:25 2022 -0700

    Replacing descriptor set layout usage with a flag bitfield.
    Descriptor sets are only used in layouts and the usage is now
    always push-only today. As we support things like binding tables
    we may want to indicate which bindings may come from tables and
    if we want to carry access information (which bindings are read-only,
    etc) we'll need somewhere for that too: instead of having 4 enums
    with 2 options each we'll just mash them together for now.

    This also adds a per-descriptor flag that can be used for indicating
    binding behavior. Today it's got a placeholder read-only value but we
    can add more in the future controlling cache behavior and such.

    Progress on #10144.

commit 88795f5
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 16:22:05 2022 -0700

    Fixing deprecation warnings on mlir::OptionalParseResult.

commit d86e3a7
Author: Scott Todd <scotttodd@google.com>
Date:   Mon Aug 22 18:45:18 2022 -0700

    Remove ArithmeticExpandOpsPass from SPIRV and VMVX lowerings. (#10162)

    Based on discussion at #10142 (comment) . This "fixes" one case of `spv.IsNan` ops getting introduced while lowering of `arith.minf`, but it does not generally address NaNs coming from other sources (user-space or internal to the compiler).

    ## Rationale

    The `ArithmeticExpandOpsPass` pass (declaration [here](https://github.com/llvm/llvm-project/blob/af29db64b2c7091070dd623c81872559657e7b3d/mlir/include/mlir/Dialect/Arithmetic/Transforms/Passes.td#L31-L34) and [here](https://github.com/llvm/llvm-project/blob/af29db64b2c7091070dd623c81872559657e7b3d/mlir/include/mlir/Dialect/Arithmetic/Transforms/Passes.h#L23-L24)) is overly specific to a particular lowering to LLVM. The `minf` and `maxf` lowerings in particular generate IR like
    ```mlir
      %8 = arith.cmpf ult, %7, %5 : vector<1x5xf32>
      %9 = arith.select %8, %7, %5 : vector<1x5xi1>, vector<1x5xf32>
      %10 = arith.cmpf uno, %5, %5 : vector<1x5xf32>
      %11 = arith.select %10, %5, %9 : vector<1x5xi1>, vector<1x5xf32>
    ```
    rather than tunnel down to intrinsics like [`llvm.minnum`](https://llvm.org/docs/LangRef.html#llvm-minnum-intrinsic). Digging through the history a bit, I see where the min/max ops were added in https://reviews.llvm.org/D110540, which carries forward some rational for using `select` to implement min/max.

    For our uses, quoting @benvanik ,
    > Yeah, that cmp/select/cmp/select dance is really bad as IIRC LLVM/other backends can't/don't practically ever simplify that again while retaining the same semantics. The behavior that nearly everything uses is "return the non-nan value if either value is nan" (GLSL min, OpenCL fminf, C/C++ fminf, CUDA fminf, numpy.fmin, AVX minps, etc), aka "between a NaN and a numeric value, the numeric value is chosen". We need to make sure that if that's the intent of the model (which I hope it is, as it's the only thing that makes sense) we can propagate that all the way to backends. There's some ISAs that do weird things but it'd be better to pay the cost there rather than everywhere like we do today.

    So this PR removes the `ArithmeticExpandOpsPass` from our SPIRV and VMVX lowerings, allowing us to lower min/max/ceil/floor directly from `arith` to the backend dialects (e.g. `spv.GL.FMin`). The LLVM-based backends would need direct lowerings implemented for us to drop the pass there too (e.g. I see errors like `error: failed to legalize operation 'arith.maxf' that was explicitly marked illegal` if I remove it from the LLVMGPU pipeline used for CUDA).

commit 8ea0009
Author: Geoffrey Martin-Noble <gcmn@google.com>
Date:   Mon Aug 22 18:19:14 2022 -0700

    Add a script for deploying to PyPi (#10169)

    The old Python script just downloaded the release artifacts, which can
    be accomplished with the GitHub CLI. We need to repair the wheels for
    reasons that aren't quite clear (and this step should probably be moved
    to the release if we can't fix it directly), but this works for now.

    skip-ci

    Tested:
    Deployed a release to PyPi with this script.

      > View at:
      > https://pypi.org/project/iree-tools-tf/20220811.232/
      > https://pypi.org/project/iree-runtime-instrumented/20220811.232/
      > https://pypi.org/project/iree-tools-tflite/20220811.232/
      > https://pypi.org/project/iree-tools-xla/20220811.232/
      > https://pypi.org/project/iree-compiler/20220811.232/
      > https://pypi.org/project/iree-runtime/20220811.232/

commit c0fd1dc
Author: Jerry Wu <cheyuw@google.com>
Date:   Mon Aug 22 17:52:46 2022 -0700

    Define some IREE benchmarks as an example (#10115)

    Co-authored-by: Geoffrey Martin-Noble <gcmn@google.com>

commit 0ee5c15
Author: Han-Chung Wang <hanchung@google.com>
Date:   Tue Aug 23 07:39:30 2022 +0800

    Fix tests for midair collision. (#10163)

commit ef27692
Author: Lei Zhang <antiagainst@google.com>
Date:   Mon Aug 22 18:02:38 2022 -0400

    Integrate llvm/llvm-project@72136d8ba266 (#10159)

    * Reset third_party/llvm-project: 72136d8ba266eea6ce30fbc0e521c7b01a13b378 (2022-08-19 21:02:07 +0700): [Test] Add test for miscompile described in PR57247
    * Update third_party/mlir-hlo to 5e324a40db4aa956f7cbf24e9417557776e7a84f
    * Update tensorflow to 8a7764be0d32a72ad6d93ff3216520af184e26a0
    * Renamed `Confined` to `ConfinedAttr`
    * Updated `flow.dispatch.tensor.{load|store}` op assembly to use `custom<DynamicIndexList>`
    * Updated `operand_segment_sizes` to `DenseI32ArrayAttr`

commit d4ba930
Author: Han-Chung Wang <hanchung@google.com>
Date:   Tue Aug 23 05:26:35 2022 +0800

    Add a verifier and tuning examples for CPU convolution codegen. (#10147)

commit 3263ccd
Merge: c234161 b902d33
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 14:23:29 2022 -0700

    Merge pull request #10158 from iree-org/benvanik-pipeline-layout-2

    [NFC] Renaming "executable layout" to "pipeline layout".

commit b902d33
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 13:35:57 2022 -0700

    Bumping vmfb version due to break from renaming !hal.executable_layout.

commit b6afa47
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 13:35:00 2022 -0700

    Renaming `!hal.executable_layout` to `!hal.pipeline_layout`
    And similarly the runtime side to `iree_hal_pipeline_layout`.

    Progress on #10144.

commit 347660c
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 11:27:44 2022 -0700

    Starting rename of executable_layout -> pipeline_layout.

    Progress on #10144.

commit c234161
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 12:50:50 2022 -0700

    [NFC] Merging descriptor_set_layout.h into executable_layout.h. (#10154)

    Now that the layouts are only used together keeping them in the same
    place will make it easier to see how they fit and make them easier to
    refactor.

    Progress on #10144.

commit 8775cfe
Author: bjacob <benoitjacob@google.com>
Date:   Mon Aug 22 15:04:26 2022 -0400

    Script improvements (#10136)

    Post-merge review comments from #10132.

commit 1750213
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 10:40:28 2022 -0700

    Removing !hal.descriptor_set/iree_hal_descriptor_set_t. (#10146)

    It was never fully implemented and the combination of push descriptors
    and upcoming binding tables should be sufficient for our uses.

    Not a breaking change as the compiler had never emitted code using them.
    Progress on #10144.

commit e33c64c
Author: Thomas <thomasraoux@google.com>
Date:   Mon Aug 22 08:51:08 2022 -0700

    Cherry-pick mlir fix in linalg tiling (#10153)

    cherry-pick commit 06c02d5dbb13f6d2a10eaa75c236f3c61cdf5b91

commit 9b092fb
Author: Marius Brehler <marius.brehler@iml.fraunhofer.de>
Date:   Mon Aug 22 17:27:11 2022 +0200

    Don't explicitly set MLIR_PDLL_TABLEGEN_EXE (#10151)

    With llvm/llvm-project@91b6f76, the variable `MLIR_PDLL_TABLEGEN_EXE` is
    set as a cache variable in MLIR upstream.

commit 52e8625
Author: Han-Chung Wang <hanchung@google.com>
Date:   Sat Aug 20 08:03:19 2022 +0800

    Update default tiling sizes for ARM convolution configurations. (#10086)

    This is the first round of tuning for ARM normal convolution codegen. The parameters are derived from experiments for 3x3 kernel cases.

    Benchmark file:

    ```mlir
    util.global private @"__iree_flow_lhs" {noinline} = dense<1.0> : tensor<1x51x41x512xf32>
    util.global private @"__iree_flow_rhs" {noinline} = dense<1.0> : tensor<3x3x512x512xf32>
    func.func @conv_3x3filter() ->tensor<1x25x20x512xf32> {
      %lhs_ptr = util.global.address @"__iree_flow_lhs" : !util.ptr<tensor<1x51x41x512xf32>>
      %rhs_ptr = util.global.address @"__iree_flow_rhs" : !util.ptr<tensor<3x3x512x512xf32>>
      %lhs = util.global.load.indirect %lhs_ptr : !util.ptr<tensor<1x51x41x512xf32>> -> tensor<1x51x41x512xf32>
      %rhs = util.global.load.indirect %rhs_ptr : !util.ptr<tensor<3x3x512x512xf32>> -> tensor<3x3x512x512xf32>

      %cst = arith.constant 0.000000e+00 : f32
      %2 = linalg.init_tensor [1, 25, 20, 512] : tensor<1x25x20x512xf32>
      %3 = linalg.fill ins(%cst : f32) outs(%2 : tensor<1x25x20x512xf32>) -> tensor<1x25x20x512xf32>
      %4 = linalg.conv_2d_nhwc_hwcf
        { dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>}
        ins(%lhs, %rhs : tensor<1x51x41x512xf32>, tensor<3x3x512x512xf32>)
        outs(%3 : tensor<1x25x20x512xf32>) -> tensor<1x25x20x512xf32>
      return %4 : tensor<1x25x20x512xf32>
    }
    ```

    Before:

    ```
    # 1-threaded, taskset 80
    -----------------------------------------------------------------------------------
    Benchmark                                         Time             CPU   Iterations
    -----------------------------------------------------------------------------------
    BM_conv_3x3filter/process_time/real_time       1164 ms         1126 ms            1

    # 4-threaded, taskset f0
    -----------------------------------------------------------------------------------
    Benchmark                                         Time             CPU   Iterations
    -----------------------------------------------------------------------------------
    BM_conv_3x3filter/process_time/real_time        643 ms         1764 ms            1
    ```

    After:

    ```
    # 1-threaded, taskset 80
    -----------------------------------------------------------------------------------
    Benchmark                                         Time             CPU   Iterations
    -----------------------------------------------------------------------------------
    BM_conv_3x3filter/process_time/real_time        160 ms          155 ms            4

    # 4-threaded, taskset f0
    -----------------------------------------------------------------------------------
    Benchmark                                         Time             CPU   Iterations
    -----------------------------------------------------------------------------------
    BM_conv_3x3filter/process_time/real_time       65.6 ms          160 ms            9
    ```

commit 42244e7
Author: Stella Laurenzo <laurenzo@google.com>
Date:   Fri Aug 19 16:20:43 2022 -0700

    NFC: Convert util transforms to declarative registration. (#10143)

commit 979d6ea
Author: Thomas <thomasraoux@google.com>
Date:   Fri Aug 19 12:31:23 2022 -0700

    Integrate llvm-project and bump dependencies. (#10140)

    * llvm-project: 619fd8c2ab505d8f79cbbbe3fd09b02f6640e1b1
    * mlir-hlo: cb55a7168c1841d05287677746a39a5de7cb855f
    * tensorflow: fc4021a8dd654606cd95e61a033691157853e122

    Additional changes:
    * rename member functions for tenor ops
    * Remove reluN tosa tests
    * carry patches for llvm and mhlo

commit cb0f8d4
Merge: e8ea103 65a9beb
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Fri Aug 19 11:40:59 2022 -0700

    Merge pull request #10141 from iree-org/benvanik-queue-barrier

    Adding iree_hal_device_queue_barrier helper and fixing pool enum.

commit 65a9beb
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Fri Aug 19 10:40:41 2022 -0700

    Changing iree_hal_allocator_pool_id_t to iree_hal_allocator_pool_t.
    I originally intended this to be a bitfield but forgot when plumbing.

commit 4c84f4a
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Fri Aug 19 10:28:36 2022 -0700

    Adding iree_hal_device_queue_barrier helper.

commit e8ea103
Author: Thomas <thomasraoux@google.com>
Date:   Fri Aug 19 03:54:33 2022 -0700

    [LLVMGPU] Add barriers when bufferization inserts shared memory copy (#10137)

    This is a conservative solution to avoid having race conditions when
    bufferization decides to emit shared memory copies.
@GMNGeoffrey
Copy link
Contributor Author

This has been fixed and we are no longer excluding specific tests. We run Vulkan tests without lsan because swiftshader itself fails lsan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working good first issue 🌱 Good for newcomers infrastructure Relating to build systems, CI, or testing
Projects
No open projects
Status: No status
Development

No branches or pull requests

3 participants