-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lit] Add a flag to disable lit time tests #98270
Conversation
@llvm/pr-subscribers-testing-tools @llvm/pr-subscribers-libcxx Author: Vincent Lee (thevinster) ChangesLLVM lit assumes control of the test parallelism when running a test suite. This style of testing doesn't play nicely with build systems like Buck or Bazel since it prefers finer grained actions on a per-test level. In order for external build systems to control the test parallelism, add an option to disable An additional change has been made to refactor the original Full diff: https://github.com/llvm/llvm-project/pull/98270.diff 8 Files Affected:
diff --git a/.ci/monolithic-linux.sh b/.ci/monolithic-linux.sh
index b78dc59432b65..04395621a8f27 100755
--- a/.ci/monolithic-linux.sh
+++ b/.ci/monolithic-linux.sh
@@ -47,7 +47,7 @@ cmake -S "${MONOREPO_ROOT}"/llvm -B "${BUILD_DIR}" \
-D LLVM_ENABLE_ASSERTIONS=ON \
-D LLVM_BUILD_EXAMPLES=ON \
-D COMPILER_RT_BUILD_LIBFUZZER=OFF \
- -D LLVM_LIT_ARGS="-v --xunit-xml-output ${BUILD_DIR}/test-results.xml --timeout=1200 --time-tests" \
+ -D LLVM_LIT_ARGS="-v --xunit-xml-output ${BUILD_DIR}/test-results.xml --timeout=1200 --time-tests-histogram" \
-D LLVM_ENABLE_LLD=ON \
-D CMAKE_CXX_FLAGS=-gmlt \
-D LLVM_CCACHE_BUILD=ON \
@@ -124,6 +124,6 @@ if [[ "${runtimes}" != "" ]]; then
-D LIBCXXABI_TEST_PARAMS="enable_modules=clang"
echo "--- ninja runtimes clang modules"
-
+
ninja -vC "${RUNTIMES_BUILD_DIR}" ${runtime_targets}
fi
diff --git a/.ci/monolithic-windows.sh b/.ci/monolithic-windows.sh
index 91e719c52d436..b64856af044be 100755
--- a/.ci/monolithic-windows.sh
+++ b/.ci/monolithic-windows.sh
@@ -53,7 +53,7 @@ cmake -S "${MONOREPO_ROOT}"/llvm -B "${BUILD_DIR}" \
-D LLVM_ENABLE_ASSERTIONS=ON \
-D LLVM_BUILD_EXAMPLES=ON \
-D COMPILER_RT_BUILD_LIBFUZZER=OFF \
- -D LLVM_LIT_ARGS="-v --xunit-xml-output ${BUILD_DIR}/test-results.xml --timeout=1200 --time-tests" \
+ -D LLVM_LIT_ARGS="-v --xunit-xml-output ${BUILD_DIR}/test-results.xml --timeout=1200 --time-tests-histogram" \
-D COMPILER_RT_BUILD_ORC=OFF \
-D CMAKE_C_COMPILER_LAUNCHER=sccache \
-D CMAKE_CXX_COMPILER_LAUNCHER=sccache \
diff --git a/libcxx/utils/ci/run-buildbot b/libcxx/utils/ci/run-buildbot
index f1c20b9d72190..a59c5930b5252 100755
--- a/libcxx/utils/ci/run-buildbot
+++ b/libcxx/utils/ci/run-buildbot
@@ -119,7 +119,7 @@ function generate-cmake-base() {
-DLIBCXX_ENABLE_WERROR=YES \
-DLIBCXXABI_ENABLE_WERROR=YES \
-DLIBUNWIND_ENABLE_WERROR=YES \
- -DLLVM_LIT_ARGS="-sv --xunit-xml-output test-results.xml --timeout=1500 --time-tests" \
+ -DLLVM_LIT_ARGS="-sv --xunit-xml-output test-results.xml --timeout=1500 --time-tests-histogram" \
"${@}"
}
@@ -381,7 +381,7 @@ bootstrapping-build)
-DLLVM_TARGETS_TO_BUILD="host" \
-DRUNTIMES_BUILD_ALLOW_DARWIN=ON \
-DLLVM_ENABLE_ASSERTIONS=ON \
- -DLLVM_LIT_ARGS="-sv --xunit-xml-output test-results.xml --timeout=1500 --time-tests"
+ -DLLVM_LIT_ARGS="-sv --xunit-xml-output test-results.xml --timeout=1500 --time-tests-histogram"
echo "+++ Running the LLDB libc++ data formatter tests"
${NINJA} -vC "${BUILD_DIR}" check-lldb-api-functionalities-data-formatter-data-formatter-stl-libcxx
diff --git a/llvm/docs/CommandGuide/lit.rst b/llvm/docs/CommandGuide/lit.rst
index 799ee34e9f9ff..c63f679e74a4b 100644
--- a/llvm/docs/CommandGuide/lit.rst
+++ b/llvm/docs/CommandGuide/lit.rst
@@ -151,7 +151,12 @@ EXECUTION OPTIONS
feature that can be used to conditionally disable (or expect failure in)
certain tests.
-.. option:: --time-tests
+.. option:: --no-time-tests
+
+ Disable tracking the wall time individual tests take to execute. This is useful
+ for external build systems to orchestrate the scheduled tests.
+
+.. option:: --time-tests-histogram
Track the wall time individual tests take to execute and includes the results
in the summary output. This is useful for determining which tests in a test
diff --git a/llvm/utils/lit/lit/cl_arguments.py b/llvm/utils/lit/lit/cl_arguments.py
index b9122d07afd8a..8df76ef4b8c82 100644
--- a/llvm/utils/lit/lit/cl_arguments.py
+++ b/llvm/utils/lit/lit/cl_arguments.py
@@ -154,11 +154,6 @@ def parse_args():
action="append",
default=[],
)
- execution_group.add_argument(
- "--time-tests",
- help="Track elapsed wall time for each test",
- action="store_true",
- )
execution_group.add_argument(
"--no-execute",
dest="noExecute",
@@ -209,6 +204,17 @@ def parse_args():
action="store_true",
help="Exit with status zero even if some tests fail",
)
+ execution_test_time_group = execution_group.add_mutually_exclusive_group()
+ execution_test_time_group.add_argument(
+ "--no-time-tests",
+ help="Do not track elapsed wall time for each test",
+ action="store_true",
+ )
+ execution_test_time_group.add_argument(
+ "--time-tests-histogram",
+ help="Track elapsed wall time for each test in a histogram",
+ action="store_true",
+ )
selection_group = parser.add_argument_group("Test Selection")
selection_group.add_argument(
diff --git a/llvm/utils/lit/lit/main.py b/llvm/utils/lit/lit/main.py
index db9f24f748d9e..8cc784dbf6471 100755
--- a/llvm/utils/lit/lit/main.py
+++ b/llvm/utils/lit/lit/main.py
@@ -124,13 +124,14 @@ def main(builtin_params={}):
run_tests(selected_tests, lit_config, opts, len(discovered_tests))
elapsed = time.time() - start
- record_test_times(selected_tests, lit_config)
+ if not opts.no_time_tests or opts.time_tests_histogram:
+ record_test_times(selected_tests, lit_config)
selected_tests, discovered_tests = GoogleTest.post_process_shard_results(
selected_tests, discovered_tests
)
- if opts.time_tests:
+ if opts.time_tests_histogram:
print_histogram(discovered_tests)
print_results(discovered_tests, elapsed, opts)
diff --git a/offload/cmake/OpenMPTesting.cmake b/offload/cmake/OpenMPTesting.cmake
index 11eafeb764260..fa767bb291d8a 100644
--- a/offload/cmake/OpenMPTesting.cmake
+++ b/offload/cmake/OpenMPTesting.cmake
@@ -58,7 +58,7 @@ if (${OPENMP_STANDALONE_BUILD})
set(DEFAULT_LIT_ARGS "${DEFAULT_LIT_ARGS} --no-progress-bar")
endif()
if (${CMAKE_SYSTEM_NAME} MATCHES "AIX")
- set(DEFAULT_LIT_ARGS "${DEFAULT_LIT_ARGS} --time-tests --timeout=1800")
+ set(DEFAULT_LIT_ARGS "${DEFAULT_LIT_ARGS} --time-tests-histogram --timeout=1800")
endif()
set(OPENMP_LIT_ARGS "${DEFAULT_LIT_ARGS}" CACHE STRING "Options for lit.")
separate_arguments(OPENMP_LIT_ARGS)
diff --git a/openmp/cmake/OpenMPTesting.cmake b/openmp/cmake/OpenMPTesting.cmake
index c67ad8b1cbd9c..fb58cad04c019 100644
--- a/openmp/cmake/OpenMPTesting.cmake
+++ b/openmp/cmake/OpenMPTesting.cmake
@@ -58,7 +58,7 @@ if (${OPENMP_STANDALONE_BUILD})
set(DEFAULT_LIT_ARGS "${DEFAULT_LIT_ARGS} --no-progress-bar")
endif()
if (${CMAKE_SYSTEM_NAME} MATCHES "AIX")
- set(DEFAULT_LIT_ARGS "${DEFAULT_LIT_ARGS} --time-tests --timeout=3000")
+ set(DEFAULT_LIT_ARGS "${DEFAULT_LIT_ARGS} --time-tests-histogram --timeout=3000")
endif()
set(OPENMP_LIT_ARGS "${DEFAULT_LIT_ARGS}" CACHE STRING "Options for lit.")
separate_arguments(OPENMP_LIT_ARGS)
|
I can't really evaluate whether the API change is good or bad. However we should document this change in the release notes to avoid surprising people about an API break. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good (in that it does what it sets out to do), but I just want to make sure I understand the motivation properly.
By default, lit records the time each individual test takes, in a file. If that file exists, it runs the tests in the order slowest to fastest, to avoid a long tail. I think the bit I'm uncertain about is how this interferes with what other build systems do? From what you've written, I believe you're saying that other build systems run subsets of the tests independently of other subsets, then because these subsets are running at the same time as each other, multiple lit invocations attempt to write to the same file at the same file potentially, which obviously isn't going to go well. Am I correct with this?
Would the "ordered by time taken" feature still be useful for these other systems (ignoring for a moment that this is controlled via the test times file)? If the feature would still be useful, I'm thinking that this PR is actually not really solving the heart of the problem, namely that running multiple lit invocations in the same section won't go well. Indeed, arguably the external build systems aren't even that relevant here: I've been known in the past to run lit in two different LLVM test directories myself at the same time for various reasons. Thoughts?
You can control the concurrency for lit specifying the number of workers. Is that not sufficient? Cannot you implement what you want without breaking the API? Just leave away the flag you don't want? |
Good idea. If this is the direction taken, then I'll update the release notes. |
I'm going to use Buck here as the example since that's what I'm using. The way we've modeled it is to have Buck take control over each test file (instead of a single test-suite). This means in a certain test-suite, things can run in parallel which creates a race condition when writing to
Yep, that's correct.
It's less of a concern since the parallelizing all the tests within a test suite is already achieving the theoretical maximum. At the moment, there aren't any reasons to need to know which test takes the longest since that's not a bottleneck at all in any situation (when looking at overall testing times compared to build times).
IIUC, you're suggesting to implement some sharding mechanism within a test-suite? If so, that would also work, but here I'm choosing the path of least resistance and disabling it since I don't see any reason for needing it at the moment. |
You can control the concurrency but that's in the context of lit being the orchestrator. The goal here is to have external build systems be aware of each test instead of a single test-suite (which deviates from the model of how lit behaves. The breaking change was meant to make things more readable. I can avoid breaking changes by just introducing a I have no strong opinions in either approach, depending on what the community agrees what should be the best path forward. |
Thanks for the example, that helped me confirm my thoughts. Orthogonal to the real issue here, but have you considered the additional overhead of spinning up lit per test? I would expect (but it's always possible I'm wrong) that spinning up lit 10 times to run 10 tests individually would be slower than spinning it up once to run them all.
Not exactly. The issue I'm more illustrating is that the test times file is already unsafe in the face of concurrent usage of the same test suite (even when the runs don't overlap), so perhaps we need to revisit how that file is written, rather than work around it. One random idea that I haven't really given much thought to how effective it would be (or even whether it would actually work) would be to avoid a single centralised file and instead have a file per test, that are read in following test discovery to form a combined database of sorts for determining order within that set of tests. In this way, there's only a risk of concurrency issues if the same test is run multiple times simultaneously, but that's already a limitation in lit anyway. If I'm not mistaken, this solves the issue both for your use case and for the more general case I've highlighted. Does this make sense? Of course, other implementations might be better. |
It's possible that the lit test, itself, could be slower by having to spin it up N times for N tests, but the net result for us by using a distributed build system like buck should still be faster since we would've distributed these tests across multiple machines and not be limited by the cores on a single machine. If we had modeled our testing approach similar to lit's model of taking in a test suite, then we would be sacrificing flexibility of being able to run a single lit test (for those who may only want to debug the single test during development)
Yep this would be a potential solution. The problem I see here with this approach is the lit overhead for doing I/O just to write a timestamp for each file. To better illustrate, LLVM has ~60000 lit tests. Having to do I/O for ~60000 files might not scale and could end up further slowing down the current behavior (which wouldn't be ideal); whereas, disabling it via a flag would maintain the status quo. I'm open to other potential forms of solution. And currently, disabling via a flag is the best compromise (even though it may seem like a workaround). |
Coming back to the idea of breaking changes, I think that's better suited in another change (if the community agrees). I'll scope this change to just a new flag (renamed for better clarity). |
Thanks for the explanation, it makes sense. I didn't realise you were distributing the lit tests across multiple devices.
Ultimately, this is a form of database, so we could look into things like SQL etc, but I suspect that the overhead isn't worth it in this context, plus there'll always be the risk of contention accessing the database, which will be especially bad in your use-case, since you'd be accessing it once per test, rather than once per lit test suite. Given the right concurrent database solution, it might work reasonably well (i.e. something with the ability to read/write in parallel as long as the different threads don't access the same database entry). However, I'm not a database expert, so I can't say if such a system would be either fast or easily integratable into lit. Side-note: on the assumption you don't have access to at least as many build threads/processes across all the machines as you have tests, you'll end up losing out on the benefit of the test times file by simply disabling it, i.e. you could end up with a long tail of a small number of very slow tests being executed right at the end of the test set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should update lit's own test suite to exercise the new option.
llvm/docs/CommandGuide/lit.rst
Outdated
.. option:: --skip-test-recording | ||
|
||
Disable tracking the wall time individual tests take to execute. This allows | ||
external build systems to orchestrate the scheduled tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the option name should have "time" in the title somewhere, e.g. --skip-test-time-recording
.
llvm/utils/lit/lit/main.py
Outdated
@@ -124,7 +124,7 @@ def main(builtin_params={}): | |||
run_tests(selected_tests, lit_config, opts, len(discovered_tests)) | |||
elapsed = time.time() - start | |||
|
|||
if not opts.skip_test_recording or opts.time_tests: | |||
if not opts.skip_test_time_recording or opts.time_tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should time-tests really overwrite skip-test-time-recording here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original idea was to prevent the case where both --skip-test-time-recording
and --time-tests
are both passed. If it didn't overwrite, then we'd run into the situation where we wouldn't time the tests, but attempt to print the histogram.
Now that I think about it, the --time-tests
parameter is redundant since you cannot enable both --time-tests
and --skip-test-time-recording
as they are mutually exclusive args.
llvm/utils/lit/tests/time-tests.py
Outdated
# RUN: FileCheck < %t.out %s | ||
# RUN: rm %{inputs}/time-tests/.lit_test_times.txt | ||
|
||
# CHECK-NOFILE: cannot access 'Inputs/time-tests/.lit_test_times.txt': No such file or directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some mechanism in place that guarantees English locale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what you mean in detail? Not sure I'm following here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, which command is supposed to output this line (rm/filecheck/the file redirect?). If this system output is localized, the check will fail if locale is not set to something like LANG=EN_us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are referring to the CHECK-NOFILE
line, it's being used in L5 where I verify that this file isn't generated from lit when using --skip-test-time-recording
. I have the rm
command to cleanup that file after the lit run on L9. This test mirrors other lit tests in this directory and they don't have any checks for locale independence. I'm not aware of anything here that would require it (everything I'm using seems to be standard commands)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not locale related as such, but No such file or directory
is platform-specific and this will fail on other platforms. In other parts of llvm, we have %errc_... substitutions to allow specifying the message via the FileCheck command-line. I don't know if those are available here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not locale related as such, but No such file or directory is platform-specific and this will fail on other platforms.
Would a more reliable change be to verify that the ls
command failed? I can't think of any other reasons why the ls
would fail unless the file isn't found.
In other parts of llvm, we have %errc_... substitutions to allow specifying the message via the FileCheck command-line.
I don't think we have that here. Lit's own test suite isn't as comprehensive as other test suites. We could add that functionality, but I'm not sure it's worth for this one specific use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, otherwise LGTM. Would be worth giving a bit of time for others to say anything.
llvm/docs/CommandGuide/lit.rst
Outdated
Disable tracking the wall time individual tests take to execute. This allows | ||
external build systems to orchestrate the scheduled tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the second sentence is strictly necessary. In the future, other use-cases might come up and the sentence might become redundant or even false.
llvm/utils/lit/lit/cl_arguments.py
Outdated
) | ||
execution_test_time_group.add_argument( | ||
"--time-tests", | ||
help="Track elapsed wall time for each test in a histogram", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in a histogram" doesn't feel right: the times aren't tracked in the histogram, they're printed in that histogram (and recorded in a file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified to say "printed in a histogram"
llvm/utils/lit/tests/time-tests.py
Outdated
# RUN: %{lit-no-order-opt} --skip-test-time-recording %{inputs}/time-tests | ||
# RUN: not ls %{inputs}/time-tests/.lit_test_times.txt | ||
|
||
## Check that --time-tests generates a printed histogram |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Check that --time-tests generates a printed histogram | |
## Check that --time-tests generates a printed histogram. |
LLVM lit assumes control of the test parallelism when running a test suite. This style of testing doesn't play nicely with build systems like Buck or Bazel since it prefers finer grained actions on a per-test level. In order for external build systems to control the test parallelism, add an option to disable `.lit_test_times.txt` under the `--skip-test-time-recording` flag, thus allowing other build systems to determine the parallelism and avoid race conditions when writing to that file. I went for `--skip-test-time-recording` instead of `--time-tests` in order to preserve the original functionality of writing to `.lit_test_times.txt` as the default behavior and only opt-in for those who do _not_ want `.lit_test_times.txt` file.
Summary: LLVM lit assumes control of the test parallelism when running a test suite. This style of testing doesn't play nicely with build systems like Buck or Bazel since it prefers finer grained actions on a per-test level. In order for external build systems to control the test parallelism, add an option to disable `.lit_test_times.txt` under the `--skip-test-time-recording` flag, thus allowing other build systems to determine the parallelism and avoid race conditions when writing to that file. I went for `--skip-test-time-recording` instead of `--time-tests` in order to preserve the original functionality of writing to `.lit_test_times.txt` as the default behavior and only opt-in for those who do _not_ want `.lit_test_times.txt` file. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251198
LLVM lit assumes control of the test parallelism when running a test suite. This style of testing doesn't play nicely with build systems like Buck or Bazel since it prefers finer grained actions on a per-test level. In order for external build systems to control the test parallelism, add an option to disable
.lit_test_times.txt
under the--no-time-tests
flag, thus allowing other build systems to determine the parallelism and avoid race conditions when writing to that file. I went for--no-time-tests
instead of--time-tests
in order to preserve the original functionality of writing to.lit_test_times.txt
as the default behavior and only opt-in for those who do not want.lit_test_times.txt
file.An additional change has been made to refactor the original
--time-tests
flag to--time-tests-histogram
to better reflect the original functionality. This is technically a breaking API change since external users who build LLVM out-of-tree can fail, but I've made sure all the uses within LLVM have been switched to use the new flag.