-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-init when CUDA_INJECTION64_PATH=none is set #979
Conversation
Summary: CUDA_INJECTION64_PATH being set causes Kineto to skip auto-init. However CUDA_INJECTION64_PATH=none is a commonly used config option, and we should not skip auto-init since the path is set to none. This diff fixes the scenario where `CUDA_INJECTION64_PATH=none` is set and on-demand trace should be enabled. Reviewed By: briancoutinho Differential Revision: D61478101 Pulled By: aaronenyeshi
This pull request was exported from Phabricator. Differential Revision: D61478101 |
@aaronenyeshi merged this pull request in 120cbc2. |
IP: 223.25.122.92 | Time: 2024-02-16:19:47:09 Date In Victim's Device: Fri Feb 16 2024 22:47:10 GMT+0300 )التوقيت العربي الرسمي : Device Information productSub: 20030107 vendor: Apple Computer, Inc. max TouchPoints: 5 hardwareConcurrency: 4 cookieEnabled: true appCodeName: Mozilla appName: Netscape appVersion: 5.0 (iPhone; CPU iPhone OS 17_2_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1 platform: iPhone product: Gecko userAgent: Mozilla/5.0 (iPhone; CPU iPhone OS 17_2_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1 language: ar languages: ar webdriver: false pdfViewerEnabled: true Media Device Information audioinput: id = |
Summary: Pull Request resolved: pytorch#979 CUDA_INJECTION64_PATH being set causes Kineto to skip auto-init. However CUDA_INJECTION64_PATH=none is a commonly used config option, and we should not skip auto-init since the path is set to none. This diff fixes the scenario where `CUDA_INJECTION64_PATH=none` is set and on-demand trace should be enabled. Test Plan: CI Ran locally with `CUDA_INJECTION64_PATH=none` and on-demand profiling works! ``` $ CUDA_INJECTION64_PATH=none buck run mode/dev-nosan kineto/libkineto/fb/integration_tests:pytorch_resnet_integration_test Buck UI: https://www.internalfb.com/buck2/282115c8-6c29-414a-9d39-c4afb73d4e52 Network: Up: 0B Down: 0B Jobs completed: 19751. Time elapsed: 1.5s. BUILD SUCCEEDED [INFO: args.py: 113]: Setting master address to localhost and port to 62421 [INFO: pytorch_resnet_integration_test.py: 256]: Start: ready to do work 2114906 WARNING: Logging before InitGoogleLogging() is written to STDERR I0819 09:07:08.277374 2116870 C10dScubaLogger.cpp:32] Registering C10dScubaLogger... I0819 09:07:08.584887 2116870 ProcessGroupNCCL.cpp:865] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0 I0819 09:07:08.584949 2116870 ProcessGroupNCCL.cpp:874] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 0, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 2000, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1 INFO:2024-08-19 09:07:11 2116870:2116870 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 INFO:2024-08-19 09:07:11 2116870:2116870 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 09:07:11 2116870:2119791 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 09:07:12 2116870:2119791 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 I0819 09:07:15.780548 2116870 ProcessGroupNCCL.cpp:2130] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.059698 ms NCCL version 2.18.3+cudaCUDA_MAJOR.CUDA_MINOR I0819 09:07:15.851341 2116870 ProcessGroupNCCL.cpp:2240] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL created ncclComm_ 0x7f67ce8ec000 on CUDA device: I0819 09:07:15.851400 2116870 ProcessGroupNCCL.cpp:2245] [PG ID 0 PG GUID 0(default_pg) Rank 0] NCCL_DEBUG: WARN [INFO: pytorch_resnet_integration_test.py: 199]: step: 0, peak allocated GPU mem: 2.97GB, peak active GPU mem: 2.97GB, peak reserved GPU mem: 3.07GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 20, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 40, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 60, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 80, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 100, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 120, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 140, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 160, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 180, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 200, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 220, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. INFO:2024-08-19 09:07:36 2116870:2119791 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=E54247FF9C32A885E23EC0A21BC9DD96 REQUEST_TRACE_ID=E54247FF9C32A885E23EC0A21BC9DD96 ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724083651/localhost/ REQUEST_TRACE_ID=5172367202475739244 [INFO: pytorch_resnet_integration_test.py: 199]: step: 240, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. INFO:2024-08-19 09:07:37 2116870:2121591 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724083663089205388 INFO:2024-08-19 09:07:37 2116870:2121591 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_2116870.json Log file: /tmp/libkineto_activities_2116870.json Trace start time: 2024-08-19 09:07:43 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s [INFO: pytorch_resnet_integration_test.py: 199]: step: 260, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 280, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 300, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 320, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. STAGE:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection [INFO: pytorch_resnet_integration_test.py: 199]: step: 340, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:307] Processed 690712 GPU records (34396720 bytes) INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 172266, Blocklisted runtime = 388516, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 09:07:44 2116870:2121591 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_2116870.json INFO:2024-08-19 09:07:44 2116870:2121591 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_2116870.json STAGE:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 09:07:45.141360 2122707 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) [INFO: pytorch_resnet_integration_test.py: 199]: step: 360, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 380, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json.gz INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json.gz&bucket=gpu_traces INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 3486626 us STAGE:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push [INFO: pytorch_resnet_integration_test.py: 199]: step: 400, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 420, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 440, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 460, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 480, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 500, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 520, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 540, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 560, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 580, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 600, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 620, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 640, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 660, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 680, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 700, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 720, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 740, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 760, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 780, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 800, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 820, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 840, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 860, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 880, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 900, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 920, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 940, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 960, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 980, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 216]: running function took 83.87466025352478 seconds to complete I0819 09:08:26.611253 2116870 ProcessGroupNCCL.cpp:1164] [PG ID 0 PG GUID 0(default_pg) Rank 0] Launching ProcessGroupNCCL abort asynchrounously. I0819 09:08:26.612347 2125072 ProcessGroupNCCL.cpp:1111] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destroying ncclComm_ 0x7f67ce8ec000 on CUDA device: 0 I0819 09:08:26.612396 2125072 NCCLUtils.hpp:407] Aborting ncclComm_ 0x7f67ce8ec000 with reason: No abort reason provided. I0819 09:08:27.125118 2125072 ProcessGroupNCCL.cpp:1130] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destroyed communicator on CUDA device: 0 with stream: 3 I0819 09:08:27.125389 2116870 ProcessGroupNCCL.cpp:1065] [PG ID 0 PG GUID 0(default_pg) Rank 0] future is successfully executed for: ProcessGroup abort I0819 09:08:27.125425 2116870 ProcessGroupNCCL.cpp:1170] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL aborts successfully. I0819 09:08:27.127446 2116870 ProcessGroupNCCL.cpp:1179] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destructor entered. I0819 09:08:27.127477 2116870 ProcessGroupNCCL.cpp:1200] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL watchdog thread joined. I0819 09:08:27.127542 2116870 ProcessGroupNCCL.cpp:1204] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL heart beat monitor thread joined. /usr/local/fbcode/platform010/lib/python3.10/tempfile.py:869: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpzjstec6w'> [INFO: pytorch_resnet_integration_test.py: 267]: Finish: Completed test workload ``` Also, ran dyno/rust/kineto_integration_test: ``` $ buck2 run //dyno/rust/kineto_integration_test:dyno_gputrace_test -- --skip-trace-validation -j dacluster/dauser/daname/2 --collect-iters File changed: fbsource//xplat/kineto/libkineto/fb/init.cpp File changed: fbcode//kineto/libkineto/fb/init.cpp Buck UI: https://www.internalfb.com/buck2/9bee8d11-3d2d-4295-bd12-a699a50ed8dd Network: Up: 0B Down: 2.9MiB (reSessionID-8f330210-f582-4fed-bc14-cb4004759ccb) Jobs completed: 76046. Time elapsed: 24.5s. Cache hits: 100%. Commands: 3512 (cached: 3512, remote: 0, local: 0) BUILD SUCCEEDED ---------------------------------------- Trace collector program = common-on-demand-tracing-library ---------------------------------------- Autodetected job type Tw for job id dacluster/dauser/daname/2 Building trace tester Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester Soft Error: source_directory_includes_subpackage: Directory `v2.17.1-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.17.1-1/src/tests`. Soft Error: source_directory_includes_subpackage: Directory `v2.18.3-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.18.3-1/src/tests`. Soft Error: source_directory_includes_subpackage: Directory `v2.19.3-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.19.3-1/src/tests`. Buck UI: https://www.internalfb.com/buck2/9bbcd130-82f9-4e51-982d-c9d88ed8686d Network: Up: 17KiB Down: 2.9MiB (reSessionID-695fce22-aaa1-4cde-9364-bf07ee49f27e) Jobs completed: 182380. Time elapsed: 1:10.3s. Cache hits: 99%. Commands: 3516 (cached: 3514, remote: 1, local: 1) BUILD SUCCEEDED Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based Waiting 20 for the trace tester to start up... ---------------------------------------- Collecting gpu trace on application with pid = n/a, job_id = dacluster/dauser/daname/2, collect iters = true ---------------------------------------- I0819 12:39:03.595689 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity) I0819 12:39:03.595943 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es) Trace Urls: [PidTraceUrlPair { pid: 3175601, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz&bucket=gpu_traces" }] Trace tester is running in background, please be patient... Trace tester application stdout: --- --- Trace tester application stderr: --- I0819 12:38:32.286183 3175601 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers. INFO:2024-08-19 12:38:32 3175601:3175601 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 12:38:32 3175601:3175844 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:38:33 3175601:3175844 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:38:34 3175601:3175601 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 W0819 12:38:57.471963 3175601 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator()) INFO:2024-08-19 12:39:07 3175601:3175844 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=test_group_trace_id ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/ REQUEST_TRACE_ID=17832852696324168180 INFO:2024-08-19 12:39:08 3175601:3178719 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096354388906844 INFO:2024-08-19 12:39:08 3175601:3178719 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3175601.json Log file: /tmp/libkineto_activities_3175601.json Trace start time: 2024-08-19 12:39:14 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s STAGE:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 12:39:15 3175601:3178719 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection INFO:2024-08-19 12:39:15 3175601:3178719 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:307] Processed 933012 GPU records (44806976 bytes) INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 104243, Blocklisted runtime = 514615, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 12:39:16 3175601:3178719 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3175601.json INFO:2024-08-19 12:39:16 3175601:3178719 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3175601.json STAGE:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 12:39:16.931658 3179183 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) W0819 12:39:20.854436 3179426 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier E0819 12:39:20.919210 3179426 api.cpp:491] The specified logical config name() is not valid W0819 12:39:20.919265 3179426 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File '' I0819 12:39:20.919397 3179426 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid' INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz&bucket=gpu_traces INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 6317931 us STAGE:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push WARNING:2024-08-19 12:40:12 3175601:3175844 DynoConfigLoader.cpp:35] (x1) Failed to read config: No dyno config client --- Autodetected job type Tw for job id dacluster/dauser/daname/2 Building trace tester Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester Buck UI: https://www.internalfb.com/buck2/bf79fa5f-e636-4d39-ab0c-cd237b3189d1 Network: Up: 0B Down: 0B Jobs completed: 5. Time elapsed: 0.1s. BUILD SUCCEEDED Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based Waiting 20 for the trace tester to start up... ---------------------------------------- Collecting gpu trace on application with pid = 3181985, job_id = dacluster/dauser/daname/2, collect iters = true ---------------------------------------- I0819 12:40:33.999593 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity) I0819 12:40:33.999762 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es) Trace Urls: [PidTraceUrlPair { pid: 3181985, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz&bucket=gpu_traces" }] Trace tester is running in background, please be patient... Trace tester application stdout: --- --- Trace tester application stderr: --- I0819 12:40:16.687461 3181985 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers. INFO:2024-08-19 12:40:16 3181985:3181985 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 12:40:16 3181985:3182373 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:40:18 3181985:3182373 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:40:18 3181985:3181985 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 INFO:2024-08-19 12:40:36 3181985:3182373 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=test_group_trace_id ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/ REQUEST_TRACE_ID=17968765238600337472 INFO:2024-08-19 12:40:37 3181985:3183119 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096443817688761 INFO:2024-08-19 12:40:37 3181985:3183119 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3181985.json Log file: /tmp/libkineto_activities_3181985.json Trace start time: 2024-08-19 12:40:43 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s W0819 12:40:41.175002 3181985 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator()) STAGE:2024-08-19 12:40:43 3181985:3183119 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 12:40:43 3181985:3183119 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 12:40:44 3181985:3183119 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:307] Processed 942895 GPU records (44635768 bytes) INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 100281, Blocklisted runtime = 505241, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 12:40:45 3181985:3183119 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3181985.json INFO:2024-08-19 12:40:45 3181985:3183119 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3181985.json STAGE:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 12:40:46.369321 3183880 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) W0819 12:40:49.872421 3184196 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier E0819 12:40:49.927807 3184196 api.cpp:491] The specified logical config name() is not valid W0819 12:40:49.927846 3184196 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File '' I0819 12:40:49.927968 3184196 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid' INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz&bucket=gpu_traces INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 5519986 us STAGE:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push WARNING:2024-08-19 12:41:51 3181985:3182373 DynoConfigLoader.cpp:35] (x1) Failed to read config: No dyno config client --- Building trace tester Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester Buck UI: https://www.internalfb.com/buck2/40dd7a0b-6f22-4732-89ca-84d22a4e62f5 Network: Up: 0B Down: 0B Jobs completed: 5. Time elapsed: 0.2s. BUILD SUCCEEDED Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based Waiting 20 for the trace tester to start up... ---------------------------------------- Collecting gpu trace on application with pid = 3190209, job_id = n/a, collect iters = true ---------------------------------------- I0819 12:42:13.637505 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity) I0819 12:42:13.637663 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es) Trace Urls: [PidTraceUrlPair { pid: 3190209, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces" }] Trace tester is running in background, please be patient... Trace tester application stdout: --- --- Trace tester application stderr: --- I0819 12:41:56.295006 3190209 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers. INFO:2024-08-19 12:41:56 3190209:3190209 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 12:41:56 3190209:3190382 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:41:57 3190209:3190382 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:41:57 3190209:3190209 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 INFO:2024-08-19 12:42:16 3190209:3190382 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=test_group_trace_id ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724096533/localhost/ REQUEST_TRACE_ID=3582354066664069411 INFO:2024-08-19 12:42:17 3190209:3191748 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096543391277741 INFO:2024-08-19 12:42:17 3190209:3191748 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3190209.json Log file: /tmp/libkineto_activities_3190209.json Trace start time: 2024-08-19 12:42:23 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s W0819 12:42:19.165912 3190209 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator()) STAGE:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 12:42:24 3190209:3191748 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection INFO:2024-08-19 12:42:24 3190209:3191748 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:307] Processed 981694 GPU records (47235736 bytes) INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 109007, Blocklisted runtime = 543555, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 12:42:25 3190209:3191748 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3190209.json INFO:2024-08-19 12:42:25 3190209:3191748 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3190209.json STAGE:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 12:42:25.831236 3192298 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) W0819 12:42:29.783102 3192507 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier E0819 12:42:29.839179 3192507 api.cpp:491] The specified logical config name() is not valid W0819 12:42:29.839231 3192507 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File '' I0819 12:42:29.839358 3192507 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid' INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 6102519 us STAGE:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push --- Building trace tester Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester Buck UI: https://www.internalfb.com/buck2/b5cfc916-b247-432d-98a2-7b289b5552d7 Network: Up: 0B Down: 0B Jobs completed: 5. Time elapsed: 0.2s. BUILD SUCCEEDED Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based Waiting 20 for the trace tester to start up... ---------------------------------------- Collecting gpu trace on application with pid = n/a, job_id = n/a, collect iters = true ---------------------------------------- I0819 12:43:48.965962 3167910 OnDemandTracingCommon.cpp:99] Found 2 matching PIDs (Busy: 0 activity) I0819 12:43:48.966220 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 2 process(es) Trace Urls: [PidTraceUrlPair { pid: 3196469, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz&bucket=gpu_traces" }, PidTraceUrlPair { pid: 3190209, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces" }] Trace tester is running in background, please be patient... Trace tester application stdout: --- --- Trace tester application stderr: --- I0819 12:43:31.657953 3196469 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers. INFO:2024-08-19 12:43:31 3196469:3196469 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 12:43:31 3196469:3196920 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:43:32 3196469:3196920 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:43:33 3196469:3196469 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 INFO:2024-08-19 12:43:51 3196469:3196920 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=test_group_trace_id ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724096628/localhost/ REQUEST_TRACE_ID=6007547554162055103 INFO:2024-08-19 12:43:52 3196469:3198770 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096638759081016 INFO:2024-08-19 12:43:52 3196469:3198770 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3196469.json Log file: /tmp/libkineto_activities_3196469.json Trace start time: 2024-08-19 12:43:58 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s W0819 12:43:58.627271 3196469 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator()) STAGE:2024-08-19 12:43:58 3196469:3198770 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 12:43:58 3196469:3198770 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 12:43:59 3196469:3198770 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 12:43:59 3196469:3198770 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:307] Processed 816296 GPU records (38657456 bytes) INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 86644, Blocklisted runtime = 437744, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 12:44:00 3196469:3198770 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3196469.json INFO:2024-08-19 12:44:00 3196469:3198770 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3196469.json STAGE:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 12:44:01.344978 3199372 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) W0819 12:44:04.267980 3199473 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier E0819 12:44:04.327500 3199473 api.cpp:491] The specified logical config name() is not valid W0819 12:44:04.327549 3199473 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File '' I0819 12:44:04.327661 3199473 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid' INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz&bucket=gpu_traces INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 4973015 us STAGE:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push --- ``` Reviewed By: briancoutinho Differential Revision: D61478101 Pulled By: aaronenyeshi fbshipit-source-id: d122f1098301d3acddefed4aa08bc6863358a8cf
Summary: Pull Request resolved: pytorch#979 CUDA_INJECTION64_PATH being set causes Kineto to skip auto-init. However CUDA_INJECTION64_PATH=none is a commonly used config option, and we should not skip auto-init since the path is set to none. This diff fixes the scenario where `CUDA_INJECTION64_PATH=none` is set and on-demand trace should be enabled. Test Plan: CI Ran locally with `CUDA_INJECTION64_PATH=none` and on-demand profiling works! ``` $ CUDA_INJECTION64_PATH=none buck run mode/dev-nosan kineto/libkineto/fb/integration_tests:pytorch_resnet_integration_test Buck UI: https://www.internalfb.com/buck2/282115c8-6c29-414a-9d39-c4afb73d4e52 Network: Up: 0B Down: 0B Jobs completed: 19751. Time elapsed: 1.5s. BUILD SUCCEEDED [INFO: args.py: 113]: Setting master address to localhost and port to 62421 [INFO: pytorch_resnet_integration_test.py: 256]: Start: ready to do work 2114906 WARNING: Logging before InitGoogleLogging() is written to STDERR I0819 09:07:08.277374 2116870 C10dScubaLogger.cpp:32] Registering C10dScubaLogger... I0819 09:07:08.584887 2116870 ProcessGroupNCCL.cpp:865] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0 I0819 09:07:08.584949 2116870 ProcessGroupNCCL.cpp:874] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 0, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 2000, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1 INFO:2024-08-19 09:07:11 2116870:2116870 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 INFO:2024-08-19 09:07:11 2116870:2116870 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 09:07:11 2116870:2119791 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 09:07:12 2116870:2119791 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 I0819 09:07:15.780548 2116870 ProcessGroupNCCL.cpp:2130] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.059698 ms NCCL version 2.18.3+cudaCUDA_MAJOR.CUDA_MINOR I0819 09:07:15.851341 2116870 ProcessGroupNCCL.cpp:2240] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL created ncclComm_ 0x7f67ce8ec000 on CUDA device: I0819 09:07:15.851400 2116870 ProcessGroupNCCL.cpp:2245] [PG ID 0 PG GUID 0(default_pg) Rank 0] NCCL_DEBUG: WARN [INFO: pytorch_resnet_integration_test.py: 199]: step: 0, peak allocated GPU mem: 2.97GB, peak active GPU mem: 2.97GB, peak reserved GPU mem: 3.07GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 20, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 40, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 60, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 80, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 100, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 120, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 140, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 160, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 180, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 200, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 220, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. INFO:2024-08-19 09:07:36 2116870:2119791 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=E54247FF9C32A885E23EC0A21BC9DD96 REQUEST_TRACE_ID=E54247FF9C32A885E23EC0A21BC9DD96 ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724083651/localhost/ REQUEST_TRACE_ID=5172367202475739244 [INFO: pytorch_resnet_integration_test.py: 199]: step: 240, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. INFO:2024-08-19 09:07:37 2116870:2121591 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724083663089205388 INFO:2024-08-19 09:07:37 2116870:2121591 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_2116870.json Log file: /tmp/libkineto_activities_2116870.json Trace start time: 2024-08-19 09:07:43 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s [INFO: pytorch_resnet_integration_test.py: 199]: step: 260, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 280, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 300, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 320, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. STAGE:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection [INFO: pytorch_resnet_integration_test.py: 199]: step: 340, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:307] Processed 690712 GPU records (34396720 bytes) INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 172266, Blocklisted runtime = 388516, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 09:07:44 2116870:2121591 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_2116870.json INFO:2024-08-19 09:07:44 2116870:2121591 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_2116870.json STAGE:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 09:07:45.141360 2122707 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) [INFO: pytorch_resnet_integration_test.py: 199]: step: 360, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 380, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json.gz INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json.gz&bucket=gpu_traces INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 3486626 us STAGE:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push [INFO: pytorch_resnet_integration_test.py: 199]: step: 400, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 420, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 440, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 460, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 480, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 500, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 520, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 540, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 560, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 580, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 600, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 620, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 640, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 660, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 680, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 700, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 720, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 740, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 760, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 780, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 800, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 820, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 840, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 860, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 880, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 900, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 920, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 940, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 960, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 980, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 216]: running function took 83.87466025352478 seconds to complete I0819 09:08:26.611253 2116870 ProcessGroupNCCL.cpp:1164] [PG ID 0 PG GUID 0(default_pg) Rank 0] Launching ProcessGroupNCCL abort asynchrounously. I0819 09:08:26.612347 2125072 ProcessGroupNCCL.cpp:1111] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destroying ncclComm_ 0x7f67ce8ec000 on CUDA device: 0 I0819 09:08:26.612396 2125072 NCCLUtils.hpp:407] Aborting ncclComm_ 0x7f67ce8ec000 with reason: No abort reason provided. I0819 09:08:27.125118 2125072 ProcessGroupNCCL.cpp:1130] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destroyed communicator on CUDA device: 0 with stream: 3 I0819 09:08:27.125389 2116870 ProcessGroupNCCL.cpp:1065] [PG ID 0 PG GUID 0(default_pg) Rank 0] future is successfully executed for: ProcessGroup abort I0819 09:08:27.125425 2116870 ProcessGroupNCCL.cpp:1170] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL aborts successfully. I0819 09:08:27.127446 2116870 ProcessGroupNCCL.cpp:1179] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destructor entered. I0819 09:08:27.127477 2116870 ProcessGroupNCCL.cpp:1200] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL watchdog thread joined. I0819 09:08:27.127542 2116870 ProcessGroupNCCL.cpp:1204] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL heart beat monitor thread joined. /usr/local/fbcode/platform010/lib/python3.10/tempfile.py:869: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpzjstec6w'> [INFO: pytorch_resnet_integration_test.py: 267]: Finish: Completed test workload ``` Also, ran dyno/rust/kineto_integration_test: ``` $ buck2 run //dyno/rust/kineto_integration_test:dyno_gputrace_test -- --skip-trace-validation -j dacluster/dauser/daname/2 --collect-iters File changed: fbsource//xplat/kineto/libkineto/fb/init.cpp File changed: fbcode//kineto/libkineto/fb/init.cpp Buck UI: https://www.internalfb.com/buck2/9bee8d11-3d2d-4295-bd12-a699a50ed8dd Network: Up: 0B Down: 2.9MiB (reSessionID-8f330210-f582-4fed-bc14-cb4004759ccb) Jobs completed: 76046. Time elapsed: 24.5s. Cache hits: 100%. Commands: 3512 (cached: 3512, remote: 0, local: 0) BUILD SUCCEEDED ---------------------------------------- Trace collector program = common-on-demand-tracing-library ---------------------------------------- Autodetected job type Tw for job id dacluster/dauser/daname/2 Building trace tester Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester Soft Error: source_directory_includes_subpackage: Directory `v2.17.1-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.17.1-1/src/tests`. Soft Error: source_directory_includes_subpackage: Directory `v2.18.3-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.18.3-1/src/tests`. Soft Error: source_directory_includes_subpackage: Directory `v2.19.3-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.19.3-1/src/tests`. Buck UI: https://www.internalfb.com/buck2/9bbcd130-82f9-4e51-982d-c9d88ed8686d Network: Up: 17KiB Down: 2.9MiB (reSessionID-695fce22-aaa1-4cde-9364-bf07ee49f27e) Jobs completed: 182380. Time elapsed: 1:10.3s. Cache hits: 99%. Commands: 3516 (cached: 3514, remote: 1, local: 1) BUILD SUCCEEDED Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based Waiting 20 for the trace tester to start up... ---------------------------------------- Collecting gpu trace on application with pid = n/a, job_id = dacluster/dauser/daname/2, collect iters = true ---------------------------------------- I0819 12:39:03.595689 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity) I0819 12:39:03.595943 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es) Trace Urls: [PidTraceUrlPair { pid: 3175601, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz&bucket=gpu_traces" }] Trace tester is running in background, please be patient... Trace tester application stdout: --- --- Trace tester application stderr: --- I0819 12:38:32.286183 3175601 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers. INFO:2024-08-19 12:38:32 3175601:3175601 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 12:38:32 3175601:3175844 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:38:33 3175601:3175844 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:38:34 3175601:3175601 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 W0819 12:38:57.471963 3175601 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator()) INFO:2024-08-19 12:39:07 3175601:3175844 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=test_group_trace_id ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/ REQUEST_TRACE_ID=17832852696324168180 INFO:2024-08-19 12:39:08 3175601:3178719 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096354388906844 INFO:2024-08-19 12:39:08 3175601:3178719 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3175601.json Log file: /tmp/libkineto_activities_3175601.json Trace start time: 2024-08-19 12:39:14 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s STAGE:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 12:39:15 3175601:3178719 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection INFO:2024-08-19 12:39:15 3175601:3178719 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:307] Processed 933012 GPU records (44806976 bytes) INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 104243, Blocklisted runtime = 514615, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 12:39:16 3175601:3178719 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3175601.json INFO:2024-08-19 12:39:16 3175601:3178719 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3175601.json STAGE:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 12:39:16.931658 3179183 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) W0819 12:39:20.854436 3179426 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier E0819 12:39:20.919210 3179426 api.cpp:491] The specified logical config name() is not valid W0819 12:39:20.919265 3179426 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File '' I0819 12:39:20.919397 3179426 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid' INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz&bucket=gpu_traces INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 6317931 us STAGE:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push WARNING:2024-08-19 12:40:12 3175601:3175844 DynoConfigLoader.cpp:35] (x1) Failed to read config: No dyno config client --- Autodetected job type Tw for job id dacluster/dauser/daname/2 Building trace tester Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester Buck UI: https://www.internalfb.com/buck2/bf79fa5f-e636-4d39-ab0c-cd237b3189d1 Network: Up: 0B Down: 0B Jobs completed: 5. Time elapsed: 0.1s. BUILD SUCCEEDED Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based Waiting 20 for the trace tester to start up... ---------------------------------------- Collecting gpu trace on application with pid = 3181985, job_id = dacluster/dauser/daname/2, collect iters = true ---------------------------------------- I0819 12:40:33.999593 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity) I0819 12:40:33.999762 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es) Trace Urls: [PidTraceUrlPair { pid: 3181985, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz&bucket=gpu_traces" }] Trace tester is running in background, please be patient... Trace tester application stdout: --- --- Trace tester application stderr: --- I0819 12:40:16.687461 3181985 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers. INFO:2024-08-19 12:40:16 3181985:3181985 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 12:40:16 3181985:3182373 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:40:18 3181985:3182373 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:40:18 3181985:3181985 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 INFO:2024-08-19 12:40:36 3181985:3182373 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=test_group_trace_id ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/ REQUEST_TRACE_ID=17968765238600337472 INFO:2024-08-19 12:40:37 3181985:3183119 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096443817688761 INFO:2024-08-19 12:40:37 3181985:3183119 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3181985.json Log file: /tmp/libkineto_activities_3181985.json Trace start time: 2024-08-19 12:40:43 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s W0819 12:40:41.175002 3181985 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator()) STAGE:2024-08-19 12:40:43 3181985:3183119 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 12:40:43 3181985:3183119 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 12:40:44 3181985:3183119 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:307] Processed 942895 GPU records (44635768 bytes) INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 100281, Blocklisted runtime = 505241, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 12:40:45 3181985:3183119 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3181985.json INFO:2024-08-19 12:40:45 3181985:3183119 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3181985.json STAGE:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 12:40:46.369321 3183880 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) W0819 12:40:49.872421 3184196 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier E0819 12:40:49.927807 3184196 api.cpp:491] The specified logical config name() is not valid W0819 12:40:49.927846 3184196 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File '' I0819 12:40:49.927968 3184196 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid' INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz&bucket=gpu_traces INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 5519986 us STAGE:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push WARNING:2024-08-19 12:41:51 3181985:3182373 DynoConfigLoader.cpp:35] (x1) Failed to read config: No dyno config client --- Building trace tester Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester Buck UI: https://www.internalfb.com/buck2/40dd7a0b-6f22-4732-89ca-84d22a4e62f5 Network: Up: 0B Down: 0B Jobs completed: 5. Time elapsed: 0.2s. BUILD SUCCEEDED Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based Waiting 20 for the trace tester to start up... ---------------------------------------- Collecting gpu trace on application with pid = 3190209, job_id = n/a, collect iters = true ---------------------------------------- I0819 12:42:13.637505 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity) I0819 12:42:13.637663 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es) Trace Urls: [PidTraceUrlPair { pid: 3190209, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces" }] Trace tester is running in background, please be patient... Trace tester application stdout: --- --- Trace tester application stderr: --- I0819 12:41:56.295006 3190209 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers. INFO:2024-08-19 12:41:56 3190209:3190209 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 12:41:56 3190209:3190382 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:41:57 3190209:3190382 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:41:57 3190209:3190209 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 INFO:2024-08-19 12:42:16 3190209:3190382 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=test_group_trace_id ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724096533/localhost/ REQUEST_TRACE_ID=3582354066664069411 INFO:2024-08-19 12:42:17 3190209:3191748 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096543391277741 INFO:2024-08-19 12:42:17 3190209:3191748 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3190209.json Log file: /tmp/libkineto_activities_3190209.json Trace start time: 2024-08-19 12:42:23 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s W0819 12:42:19.165912 3190209 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator()) STAGE:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 12:42:24 3190209:3191748 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection INFO:2024-08-19 12:42:24 3190209:3191748 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:307] Processed 981694 GPU records (47235736 bytes) INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 109007, Blocklisted runtime = 543555, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 12:42:25 3190209:3191748 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3190209.json INFO:2024-08-19 12:42:25 3190209:3191748 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3190209.json STAGE:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 12:42:25.831236 3192298 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) W0819 12:42:29.783102 3192507 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier E0819 12:42:29.839179 3192507 api.cpp:491] The specified logical config name() is not valid W0819 12:42:29.839231 3192507 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File '' I0819 12:42:29.839358 3192507 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid' INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 6102519 us STAGE:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push --- Building trace tester Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester Buck UI: https://www.internalfb.com/buck2/b5cfc916-b247-432d-98a2-7b289b5552d7 Network: Up: 0B Down: 0B Jobs completed: 5. Time elapsed: 0.2s. BUILD SUCCEEDED Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based Waiting 20 for the trace tester to start up... ---------------------------------------- Collecting gpu trace on application with pid = n/a, job_id = n/a, collect iters = true ---------------------------------------- I0819 12:43:48.965962 3167910 OnDemandTracingCommon.cpp:99] Found 2 matching PIDs (Busy: 0 activity) I0819 12:43:48.966220 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 2 process(es) Trace Urls: [PidTraceUrlPair { pid: 3196469, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz&bucket=gpu_traces" }, PidTraceUrlPair { pid: 3190209, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces" }] Trace tester is running in background, please be patient... Trace tester application stdout: --- --- Trace tester application stderr: --- I0819 12:43:31.657953 3196469 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers. INFO:2024-08-19 12:43:31 3196469:3196469 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 12:43:31 3196469:3196920 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:43:32 3196469:3196920 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:43:33 3196469:3196469 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 INFO:2024-08-19 12:43:51 3196469:3196920 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=test_group_trace_id ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724096628/localhost/ REQUEST_TRACE_ID=6007547554162055103 INFO:2024-08-19 12:43:52 3196469:3198770 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096638759081016 INFO:2024-08-19 12:43:52 3196469:3198770 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3196469.json Log file: /tmp/libkineto_activities_3196469.json Trace start time: 2024-08-19 12:43:58 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s W0819 12:43:58.627271 3196469 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator()) STAGE:2024-08-19 12:43:58 3196469:3198770 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 12:43:58 3196469:3198770 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 12:43:59 3196469:3198770 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 12:43:59 3196469:3198770 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:307] Processed 816296 GPU records (38657456 bytes) INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 86644, Blocklisted runtime = 437744, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 12:44:00 3196469:3198770 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3196469.json INFO:2024-08-19 12:44:00 3196469:3198770 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3196469.json STAGE:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 12:44:01.344978 3199372 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) W0819 12:44:04.267980 3199473 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier E0819 12:44:04.327500 3199473 api.cpp:491] The specified logical config name() is not valid W0819 12:44:04.327549 3199473 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File '' I0819 12:44:04.327661 3199473 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid' INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz&bucket=gpu_traces INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 4973015 us STAGE:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push --- ``` Reviewed By: briancoutinho Differential Revision: D61478101 Pulled By: aaronenyeshi fbshipit-source-id: d122f1098301d3acddefed4aa08bc6863358a8cf
Summary:
CUDA_INJECTION64_PATH being set causes Kineto to skip auto-init. However CUDA_INJECTION64_PATH=none is a commonly used config option, and we should not skip auto-init since the path is set to none.
This diff fixes the scenario where
CUDA_INJECTION64_PATH=none
is set and on-demand trace should be enabled.Reviewed By: briancoutinho
Differential Revision: D61478101
Pulled By: aaronenyeshi