Skip to content

Debug CI crash#5621

Closed
nvsekkin wants to merge 2 commits into
isaac-sim:developfrom
nvsekkin:dev/esekkin/verbose-ci
Closed

Debug CI crash#5621
nvsekkin wants to merge 2 commits into
isaac-sim:developfrom
nvsekkin:dev/esekkin/verbose-ci

Conversation

@nvsekkin
Copy link
Copy Markdown
Collaborator

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (existing functionality will not work without user modification)
  • Documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI Debugging Changes Review

This PR adds diagnostic tooling to help debug CI crashes. The changes are straightforward and low-risk.

Summary of Changes

.github/actions/run-tests/action.yml

  • Adds PYTHONFAULTHANDLER=1 environment variable to the Docker test container

tools/conftest.py

  • Adds -s flag (disables output capture, shows print statements)
  • Adds -vv flag (extra verbose output)

Analysis

✅ What Works Well

  1. PYTHONFAULTHANDLER - Excellent choice for debugging segfaults and crashes. When enabled, Python will dump a traceback on fatal signals (SIGSEGV, SIGFPE, SIGABRT, SIGBUS, SIGILL), which is invaluable for diagnosing low-level crashes in CI.

  2. Pytest verbosity flags - The -s and -vv flags will provide much more insight into test execution, particularly useful for identifying where crashes occur.

⚠️ Considerations

  1. Temporary vs Permanent: These flags significantly increase log verbosity. Consider whether this is intended as a temporary debugging measure or a permanent change. If temporary, consider adding a comment or creating a follow-up issue to revert once the crash is diagnosed.

  2. CI Log Size: The -s -vv flags will substantially increase CI log output. This is fine for debugging but may impact log readability and storage for routine CI runs if left permanently enabled.

  3. Performance: Minimal impact expected. PYTHONFAULTHANDLER has negligible overhead, and pytest verbosity flags do not affect test execution time.

Verdict

The changes are safe to merge for debugging purposes. The implementation is correct and follows the existing patterns in the codebase.

Suggestion: If these changes are intended to be temporary, consider adding a TODO comment or creating a tracking issue to revert them once the CI crash is resolved.


Update (39a7a1e): New commit adds include-files: "test_sensor_base.py" to narrow test-isaaclab-core-2 to a single test file for SIGSEGV reproduction. This is a reasonable debugging approach to isolate the crash. Previous concerns about debug flags still apply once the issue is diagnosed.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR adds two diagnostic aids to help investigate CI crashes: PYTHONFAULTHANDLER=1 is injected as a Docker environment variable so any Python process in the container emits a traceback on a hard crash (segfault, SIGABRT, etc.), and -s -vv flags are added to the pytest subprocess invocation in conftest.py to stream all test output without buffering and increase verbosity.

  • action.yml: Adds -e PYTHONFAULTHANDLER=1 to the Docker run env block. This complements the existing per-subprocess env[\"PYTHONFAULTHANDLER\"] = \"1\" already set in conftest.py line 318, extending coverage to any Python process spawned outside run_individual_tests.
  • conftest.py: Inserts -s and -vv before --no-header in the pytest subprocess command. -s passes all test stdout directly to the pipe captured by capture_test_output_with_timeout, while -vv maximises fixture and collection output to aid crash diagnosis.

Confidence Score: 4/5

Safe to merge for its stated debugging purpose; no functional regressions introduced, though log verbosity increases substantially on every CI run going forward.

Both changes are additive diagnostic flags with no effect on test pass/fail outcomes. The main risk is -s routing all Isaac Sim subprocess output through an in-memory accumulation buffer in capture_test_output_with_timeout, which could balloon the runner process's memory on verbose tests. The flags also lack any comment signalling that they are temporary, meaning they may remain permanently on develop after the crash is resolved.

tools/conftest.py — the -s flag interacts with the in-memory pipe accumulation in capture_test_output_with_timeout

Important Files Changed

Filename Overview
.github/actions/run-tests/action.yml Adds PYTHONFAULTHANDLER=1 Docker env var to emit Python crash tracebacks from any process in the container, complementing the existing per-subprocess setting in conftest.py
tools/conftest.py Adds -s (no output capture) and -vv (extra verbose) pytest flags to run_individual_tests subprocess; increases diagnostic output but may produce very large logs for Isaac Sim tests

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Docker Container
PYTHONFAULTHANDLER=1
new] --> B[conftest.py
run_individual_tests]
    B --> C[subprocess.Popen
stdout=PIPE, stderr=PIPE]
    C --> D[pytest subprocess
-s -vv new
--no-header
--tb=short]
    D -->|stdout/stderr ALL output unbuffered| E[capture_test_output_with_timeout
accumulates stdout_data in memory]
    E -->|crash / segfault| F[PYTHONFAULTHANDLER traceback]
    E -->|normal exit| G[JUnit XML + log output]
    E -->|timeout / hang| H[kill + system diagnostics]
Loading

Reviews (1): Last reviewed commit: "Add PYTHONFAULTHANDLER" | Re-trigger Greptile

Comment thread tools/conftest.py
Comment on lines +344 to +345
"-s",
"-vv",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unbounded memory growth with -s flag

-s disables pytest's internal output capture so all test stdout flows directly into the subprocess pipe. capture_test_output_with_timeout accumulates the entire pipe content in memory (stdout_data += chunk). Isaac Sim tests are extremely verbose — simulation logs, Kit messages, shader compilation output — and the -s flag means none of that is buffered or discarded by pytest. On a long-running test, stdout_data can grow to hundreds of MB inside the conftest runner process, which could trigger an OOM kill of the runner itself rather than the test under investigation.

Comment thread tools/conftest.py
Comment on lines +344 to +345
"-s",
"-vv",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Debug flags appear intended as temporary but have no revert marker

-s and -vv are diagnostic flags that significantly increase log volume and are typically added temporarily while investigating a specific crash. There is no comment in the code or the PR description linking these to a follow-up issue or indicating they are intended to be permanent. If these land on develop without a revert plan, every subsequent CI run will produce substantially larger logs, which can make it harder to find signal in the output once the crash is resolved.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant