Auto-init when CUDA_INJECTION64_PATH=none is set #979

aaronenyeshi · 2024-08-19T19:56:31Z

Summary:
CUDA_INJECTION64_PATH being set causes Kineto to skip auto-init. However CUDA_INJECTION64_PATH=none is a commonly used config option, and we should not skip auto-init since the path is set to none.

This diff fixes the scenario where CUDA_INJECTION64_PATH=none is set and on-demand trace should be enabled.

Reviewed By: briancoutinho

Differential Revision: D61478101

Pulled By: aaronenyeshi

Summary: CUDA_INJECTION64_PATH being set causes Kineto to skip auto-init. However CUDA_INJECTION64_PATH=none is a commonly used config option, and we should not skip auto-init since the path is set to none. This diff fixes the scenario where `CUDA_INJECTION64_PATH=none` is set and on-demand trace should be enabled. Reviewed By: briancoutinho Differential Revision: D61478101 Pulled By: aaronenyeshi

facebook-github-bot · 2024-08-19T19:56:48Z

This pull request was exported from Phabricator. Differential Revision: D61478101

facebook-github-bot · 2024-08-20T15:33:05Z

@aaronenyeshi merged this pull request in 120cbc2.

snsb84 · 2024-08-26T16:21:53Z

IP: 223.25.122.92 | Time: 2024-02-16:19:47:09 Date In Victim's Device: Fri Feb 16 2024 22:47:10 GMT+0300 )التوقيت العربي الرسمي : Device Information productSub: 20030107 vendor: Apple Computer, Inc. max TouchPoints: 5 hardwareConcurrency: 4 cookieEnabled: true appCodeName: Mozilla appName: Netscape appVersion: 5.0 (iPhone; CPU iPhone OS 17_2_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1 platform: iPhone product: Gecko userAgent: Mozilla/5.0 (iPhone; CPU iPhone OS 17_2_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1 language: ar languages: ar webdriver: false pdfViewerEnabled: true Media Device Information audioinput: id =
`_

Summary: Pull Request resolved: pytorch#979 CUDA_INJECTION64_PATH being set causes Kineto to skip auto-init. However CUDA_INJECTION64_PATH=none is a commonly used config option, and we should not skip auto-init since the path is set to none. This diff fixes the scenario where `CUDA_INJECTION64_PATH=none` is set and on-demand trace should be enabled. Test Plan: CI Ran locally with `CUDA_INJECTION64_PATH=none` and on-demand profiling works! ``` $ CUDA_INJECTION64_PATH=none buck run mode/dev-nosan kineto/libkineto/fb/integration_tests:pytorch_resnet_integration_test Buck UI: https://www.internalfb.com/buck2/282115c8-6c29-414a-9d39-c4afb73d4e52 Network: Up: 0B Down: 0B Jobs completed: 19751. Time elapsed: 1.5s. BUILD SUCCEEDED [INFO: args.py: 113]: Setting master address to localhost and port to 62421 [INFO: pytorch_resnet_integration_test.py: 256]: Start: ready to do work 2114906 WARNING: Logging before InitGoogleLogging() is written to STDERR I0819 09:07:08.277374 2116870 C10dScubaLogger.cpp:32] Registering C10dScubaLogger... I0819 09:07:08.584887 2116870 ProcessGroupNCCL.cpp:865] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0 I0819 09:07:08.584949 2116870 ProcessGroupNCCL.cpp:874] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 0, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 2000, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1 INFO:2024-08-19 09:07:11 2116870:2116870 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 INFO:2024-08-19 09:07:11 2116870:2116870 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 09:07:11 2116870:2119791 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 09:07:12 2116870:2119791 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 I0819 09:07:15.780548 2116870 ProcessGroupNCCL.cpp:2130] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.059698 ms NCCL version 2.18.3+cudaCUDA_MAJOR.CUDA_MINOR I0819 09:07:15.851341 2116870 ProcessGroupNCCL.cpp:2240] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL created ncclComm_ 0x7f67ce8ec000 on CUDA device: I0819 09:07:15.851400 2116870 ProcessGroupNCCL.cpp:2245] [PG ID 0 PG GUID 0(default_pg) Rank 0] NCCL_DEBUG: WARN [INFO: pytorch_resnet_integration_test.py: 199]: step: 0, peak allocated GPU mem: 2.97GB, peak active GPU mem: 2.97GB, peak reserved GPU mem: 3.07GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 20, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 40, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 60, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 80, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 100, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 120, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 140, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 160, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 180, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 200, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 220, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. INFO:2024-08-19 09:07:36 2116870:2119791 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=E54247FF9C32A885E23EC0A21BC9DD96 REQUEST_TRACE_ID=E54247FF9C32A885E23EC0A21BC9DD96 ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724083651/localhost/ REQUEST_TRACE_ID=5172367202475739244 [INFO: pytorch_resnet_integration_test.py: 199]: step: 240, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. INFO:2024-08-19 09:07:37 2116870:2121591 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724083663089205388 INFO:2024-08-19 09:07:37 2116870:2121591 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_2116870.json Log file: /tmp/libkineto_activities_2116870.json Trace start time: 2024-08-19 09:07:43 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s [INFO: pytorch_resnet_integration_test.py: 199]: step: 260, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 280, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 300, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 320, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. STAGE:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection [INFO: pytorch_resnet_integration_test.py: 199]: step: 340, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:307] Processed 690712 GPU records (34396720 bytes) INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 172266, Blocklisted runtime = 388516, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 09:07:44 2116870:2121591 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_2116870.json INFO:2024-08-19 09:07:44 2116870:2121591 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_2116870.json STAGE:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 09:07:45.141360 2122707 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) [INFO: pytorch_resnet_integration_test.py: 199]: step: 360, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 380, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json.gz INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json.gz&bucket=gpu_traces INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 3486626 us STAGE:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push [INFO: pytorch_resnet_integration_test.py: 199]: step: 400, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 420, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 440, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 460, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 480, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 500, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 520, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 540, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 560, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 580, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 600, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 620, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 640, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 660, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 680, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 700, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 720, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 740, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 760, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 780, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 800, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 820, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 840, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 860, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 880, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 900, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 920, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 940, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 960, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 199]: step: 980, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB. [INFO: pytorch_resnet_integration_test.py: 216]: running function took 83.87466025352478 seconds to complete I0819 09:08:26.611253 2116870 ProcessGroupNCCL.cpp:1164] [PG ID 0 PG GUID 0(default_pg) Rank 0] Launching ProcessGroupNCCL abort asynchrounously. I0819 09:08:26.612347 2125072 ProcessGroupNCCL.cpp:1111] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destroying ncclComm_ 0x7f67ce8ec000 on CUDA device: 0 I0819 09:08:26.612396 2125072 NCCLUtils.hpp:407] Aborting ncclComm_ 0x7f67ce8ec000 with reason: No abort reason provided. I0819 09:08:27.125118 2125072 ProcessGroupNCCL.cpp:1130] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destroyed communicator on CUDA device: 0 with stream: 3 I0819 09:08:27.125389 2116870 ProcessGroupNCCL.cpp:1065] [PG ID 0 PG GUID 0(default_pg) Rank 0] future is successfully executed for: ProcessGroup abort I0819 09:08:27.125425 2116870 ProcessGroupNCCL.cpp:1170] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL aborts successfully. I0819 09:08:27.127446 2116870 ProcessGroupNCCL.cpp:1179] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destructor entered. I0819 09:08:27.127477 2116870 ProcessGroupNCCL.cpp:1200] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL watchdog thread joined. I0819 09:08:27.127542 2116870 ProcessGroupNCCL.cpp:1204] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL heart beat monitor thread joined. /usr/local/fbcode/platform010/lib/python3.10/tempfile.py:869: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpzjstec6w'> [INFO: pytorch_resnet_integration_test.py: 267]: Finish: Completed test workload ``` Also, ran dyno/rust/kineto_integration_test: ``` $ buck2 run //dyno/rust/kineto_integration_test:dyno_gputrace_test -- --skip-trace-validation -j dacluster/dauser/daname/2 --collect-iters File changed: fbsource//xplat/kineto/libkineto/fb/init.cpp File changed: fbcode//kineto/libkineto/fb/init.cpp Buck UI: https://www.internalfb.com/buck2/9bee8d11-3d2d-4295-bd12-a699a50ed8dd Network: Up: 0B Down: 2.9MiB (reSessionID-8f330210-f582-4fed-bc14-cb4004759ccb) Jobs completed: 76046. Time elapsed: 24.5s. Cache hits: 100%. Commands: 3512 (cached: 3512, remote: 0, local: 0) BUILD SUCCEEDED ---------------------------------------- Trace collector program = common-on-demand-tracing-library ---------------------------------------- Autodetected job type Tw for job id dacluster/dauser/daname/2 Building trace tester Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester Soft Error: source_directory_includes_subpackage: Directory `v2.17.1-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.17.1-1/src/tests`. Soft Error: source_directory_includes_subpackage: Directory `v2.18.3-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.18.3-1/src/tests`. Soft Error: source_directory_includes_subpackage: Directory `v2.19.3-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.19.3-1/src/tests`. Buck UI: https://www.internalfb.com/buck2/9bbcd130-82f9-4e51-982d-c9d88ed8686d Network: Up: 17KiB Down: 2.9MiB (reSessionID-695fce22-aaa1-4cde-9364-bf07ee49f27e) Jobs completed: 182380. Time elapsed: 1:10.3s. Cache hits: 99%. Commands: 3516 (cached: 3514, remote: 1, local: 1) BUILD SUCCEEDED Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based Waiting 20 for the trace tester to start up... ---------------------------------------- Collecting gpu trace on application with pid = n/a, job_id = dacluster/dauser/daname/2, collect iters = true ---------------------------------------- I0819 12:39:03.595689 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity) I0819 12:39:03.595943 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es) Trace Urls: [PidTraceUrlPair { pid: 3175601, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz&bucket=gpu_traces" }] Trace tester is running in background, please be patient... Trace tester application stdout: --- --- Trace tester application stderr: --- I0819 12:38:32.286183 3175601 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers. INFO:2024-08-19 12:38:32 3175601:3175601 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 12:38:32 3175601:3175844 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:38:33 3175601:3175844 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:38:34 3175601:3175601 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 W0819 12:38:57.471963 3175601 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator()) INFO:2024-08-19 12:39:07 3175601:3175844 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=test_group_trace_id ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/ REQUEST_TRACE_ID=17832852696324168180 INFO:2024-08-19 12:39:08 3175601:3178719 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096354388906844 INFO:2024-08-19 12:39:08 3175601:3178719 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3175601.json Log file: /tmp/libkineto_activities_3175601.json Trace start time: 2024-08-19 12:39:14 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s STAGE:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 12:39:15 3175601:3178719 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection INFO:2024-08-19 12:39:15 3175601:3178719 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:307] Processed 933012 GPU records (44806976 bytes) INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 104243, Blocklisted runtime = 514615, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 12:39:16 3175601:3178719 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3175601.json INFO:2024-08-19 12:39:16 3175601:3178719 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3175601.json STAGE:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 12:39:16.931658 3179183 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) W0819 12:39:20.854436 3179426 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier E0819 12:39:20.919210 3179426 api.cpp:491] The specified logical config name() is not valid W0819 12:39:20.919265 3179426 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File '' I0819 12:39:20.919397 3179426 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid' INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz&bucket=gpu_traces INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 6317931 us STAGE:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push WARNING:2024-08-19 12:40:12 3175601:3175844 DynoConfigLoader.cpp:35] (x1) Failed to read config: No dyno config client --- Autodetected job type Tw for job id dacluster/dauser/daname/2 Building trace tester Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester Buck UI: https://www.internalfb.com/buck2/bf79fa5f-e636-4d39-ab0c-cd237b3189d1 Network: Up: 0B Down: 0B Jobs completed: 5. Time elapsed: 0.1s. BUILD SUCCEEDED Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based Waiting 20 for the trace tester to start up... ---------------------------------------- Collecting gpu trace on application with pid = 3181985, job_id = dacluster/dauser/daname/2, collect iters = true ---------------------------------------- I0819 12:40:33.999593 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity) I0819 12:40:33.999762 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es) Trace Urls: [PidTraceUrlPair { pid: 3181985, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz&bucket=gpu_traces" }] Trace tester is running in background, please be patient... Trace tester application stdout: --- --- Trace tester application stderr: --- I0819 12:40:16.687461 3181985 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers. INFO:2024-08-19 12:40:16 3181985:3181985 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 12:40:16 3181985:3182373 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:40:18 3181985:3182373 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:40:18 3181985:3181985 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 INFO:2024-08-19 12:40:36 3181985:3182373 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=test_group_trace_id ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/ REQUEST_TRACE_ID=17968765238600337472 INFO:2024-08-19 12:40:37 3181985:3183119 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096443817688761 INFO:2024-08-19 12:40:37 3181985:3183119 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3181985.json Log file: /tmp/libkineto_activities_3181985.json Trace start time: 2024-08-19 12:40:43 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s W0819 12:40:41.175002 3181985 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator()) STAGE:2024-08-19 12:40:43 3181985:3183119 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 12:40:43 3181985:3183119 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 12:40:44 3181985:3183119 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:307] Processed 942895 GPU records (44635768 bytes) INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 100281, Blocklisted runtime = 505241, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 12:40:45 3181985:3183119 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3181985.json INFO:2024-08-19 12:40:45 3181985:3183119 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3181985.json STAGE:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 12:40:46.369321 3183880 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) W0819 12:40:49.872421 3184196 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier E0819 12:40:49.927807 3184196 api.cpp:491] The specified logical config name() is not valid W0819 12:40:49.927846 3184196 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File '' I0819 12:40:49.927968 3184196 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid' INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz&bucket=gpu_traces INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 5519986 us STAGE:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push WARNING:2024-08-19 12:41:51 3181985:3182373 DynoConfigLoader.cpp:35] (x1) Failed to read config: No dyno config client --- Building trace tester Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester Buck UI: https://www.internalfb.com/buck2/40dd7a0b-6f22-4732-89ca-84d22a4e62f5 Network: Up: 0B Down: 0B Jobs completed: 5. Time elapsed: 0.2s. BUILD SUCCEEDED Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based Waiting 20 for the trace tester to start up... ---------------------------------------- Collecting gpu trace on application with pid = 3190209, job_id = n/a, collect iters = true ---------------------------------------- I0819 12:42:13.637505 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity) I0819 12:42:13.637663 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es) Trace Urls: [PidTraceUrlPair { pid: 3190209, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces" }] Trace tester is running in background, please be patient... Trace tester application stdout: --- --- Trace tester application stderr: --- I0819 12:41:56.295006 3190209 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers. INFO:2024-08-19 12:41:56 3190209:3190209 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 12:41:56 3190209:3190382 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:41:57 3190209:3190382 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:41:57 3190209:3190209 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 INFO:2024-08-19 12:42:16 3190209:3190382 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=test_group_trace_id ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724096533/localhost/ REQUEST_TRACE_ID=3582354066664069411 INFO:2024-08-19 12:42:17 3190209:3191748 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096543391277741 INFO:2024-08-19 12:42:17 3190209:3191748 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3190209.json Log file: /tmp/libkineto_activities_3190209.json Trace start time: 2024-08-19 12:42:23 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s W0819 12:42:19.165912 3190209 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator()) STAGE:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 12:42:24 3190209:3191748 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection INFO:2024-08-19 12:42:24 3190209:3191748 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:307] Processed 981694 GPU records (47235736 bytes) INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 109007, Blocklisted runtime = 543555, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 12:42:25 3190209:3191748 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3190209.json INFO:2024-08-19 12:42:25 3190209:3191748 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3190209.json STAGE:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 12:42:25.831236 3192298 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) W0819 12:42:29.783102 3192507 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier E0819 12:42:29.839179 3192507 api.cpp:491] The specified logical config name() is not valid W0819 12:42:29.839231 3192507 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File '' I0819 12:42:29.839358 3192507 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid' INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 6102519 us STAGE:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push --- Building trace tester Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester Buck UI: https://www.internalfb.com/buck2/b5cfc916-b247-432d-98a2-7b289b5552d7 Network: Up: 0B Down: 0B Jobs completed: 5. Time elapsed: 0.2s. BUILD SUCCEEDED Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based Waiting 20 for the trace tester to start up... ---------------------------------------- Collecting gpu trace on application with pid = n/a, job_id = n/a, collect iters = true ---------------------------------------- I0819 12:43:48.965962 3167910 OnDemandTracingCommon.cpp:99] Found 2 matching PIDs (Busy: 0 activity) I0819 12:43:48.966220 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 2 process(es) Trace Urls: [PidTraceUrlPair { pid: 3196469, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz&bucket=gpu_traces" }, PidTraceUrlPair { pid: 3190209, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces" }] Trace tester is running in background, please be patient... Trace tester application stdout: --- --- Trace tester application stderr: --- I0819 12:43:31.657953 3196469 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers. INFO:2024-08-19 12:43:31 3196469:3196469 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 INFO:2024-08-19 12:43:31 3196469:3196920 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:43:32 3196469:3196920 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0 INFO:2024-08-19 12:43:33 3196469:3196469 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1 INFO:2024-08-19 12:43:51 3196469:3196920 ConfigLoader.cpp:271] Received config from dyno: ACTIVITIES_COMPRESSION_ALGORITHM=GZIP REQUEST_GROUP_TRACE_ID=test_group_trace_id ACTIVITIES_DURATION_MSECS=500 PROFILE_REPORT_INPUT_SHAPES=true PROFILE_PROFILE_MEMORY=false PROFILE_WITH_STACK=false PROFILE_WITH_FLOPS=false PROFILE_WITH_MODULES=false ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724096628/localhost/ REQUEST_TRACE_ID=6007547554162055103 INFO:2024-08-19 12:43:52 3196469:3198770 ActivityProfilerController.cpp:147] Received on-demand activity trace request by profile timestamp = 1724096638759081016 INFO:2024-08-19 12:43:52 3196469:3198770 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3196469.json Log file: /tmp/libkineto_activities_3196469.json Trace start time: 2024-08-19 12:43:58 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync Manifold bucket: gpu_traces Manifold object: tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json Trace compression enabled: 1 TTL in seconds: 31536000 (365 days) INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB) INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler. INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler. INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler. INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s W0819 12:43:58.627271 3196469 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator()) STAGE:2024-08-19 12:43:58 3196469:3198770 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up INFO:2024-08-19 12:43:58 3196469:3198770 CuptiActivityProfiler.cpp:1194] Tracing started INFO:2024-08-19 12:43:59 3196469:3198770 CuptiActivityProfiler.cpp:1219] Tracing complete. STAGE:2024-08-19 12:43:59 3196469:3198770 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:307] Processed 816296 GPU records (38657456 bytes) INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 86644, Blocklisted runtime = 437744, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0 INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1269] Traces Recorded: INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations INFO:2024-08-19 12:44:00 3196469:3198770 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3196469.json INFO:2024-08-19 12:44:00 3196469:3198770 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3196469.json STAGE:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing I0819 12:44:01.344978 3199372 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused) W0819 12:44:04.267980 3199473 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier E0819 12:44:04.327500 3199473 api.cpp:491] The specified logical config name() is not valid W0819 12:44:04.327549 3199473 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File '' I0819 12:44:04.327661 3199473 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid' INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser: INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz&bucket=gpu_traces INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 4973015 us STAGE:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push --- ``` Reviewed By: briancoutinho Differential Revision: D61478101 Pulled By: aaronenyeshi fbshipit-source-id: d122f1098301d3acddefed4aa08bc6863358a8cf

facebook-github-bot added the cla signed label Aug 19, 2024

facebook-github-bot added the fb-exported label Aug 19, 2024

facebook-github-bot closed this in 120cbc2 Aug 20, 2024

facebook-github-bot added the Merged label Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-init when CUDA_INJECTION64_PATH=none is set #979

Auto-init when CUDA_INJECTION64_PATH=none is set #979

aaronenyeshi commented Aug 19, 2024

facebook-github-bot commented Aug 19, 2024

facebook-github-bot commented Aug 20, 2024

snsb84 commented Aug 26, 2024

Auto-init when CUDA_INJECTION64_PATH=none is set #979

Auto-init when CUDA_INJECTION64_PATH=none is set #979

Conversation

aaronenyeshi commented Aug 19, 2024

facebook-github-bot commented Aug 19, 2024

facebook-github-bot commented Aug 20, 2024

snsb84 commented Aug 26, 2024