-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add libkineto/test/CMakeLists.txt #869
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great, really happy to get unit tests working in OSS for Kineto. We should try to remove the dependency on folly if possible!
I'm not sure if you have permissions to see Github signals, but getting this error in the CI. Wondering if we could skip CUDA when missing?
|
Yes, I could see the error in the CI. Building libkineto should set CUDA_SOURCE_DIR. If CUDA_SOURCE_DIR is not set, libkineto will fail to build. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, and LGTM!
The original LIBKINETOCI is using a basic linux environment. I've just landed a new CI build, libkineto PR Test, that should contain CUDA libraries. #870. Running it now |
The unit tests have dependency on CUDA, so LIBKINETOCI should set KINETO_BUILD_TESTS off and libkineto PR Test should set CUDA_SOURCE_DIR before building it. |
Good catch, the docker image is using CUDA_HOME. Can you please update the CUDA_SOURCE_DIR to check CUDA_HOME instead? Also we could update LIBKINETOCI directly here: https://github.com/pytorch/kineto/blob/main/.github/workflows/libkineto_ci.yml#L42-L56 |
Please check if env CUDA_HOME really exist in docker container and have gpu device installed? @aaronenyeshi |
It should be set, we are using pytorch/benchmark's docker image: https://github.com/pytorch/benchmark/blob/main/docker/gcp-a100-runner-dind.dockerfile#L48 @xuzhao9 mentioned it may be a bug: https://stackoverflow.com/questions/55044846/no-cmake-cuda-compiler-could-be-found |
Can you please trigger the workflows again? @aaronenyeshi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
@aaronenyeshi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Finished CMakeLists.txt in the test folder and added folly repo needed by CuptiActivityProfilerTest. But still had some problems: - ConfigTest : Through gtest passed, the log show "ERROR" as below: log: ERROR:2024-02-15 20:45:30 1682464:1682464 AbstractConfig.cpp:145] Invalid config line: - CuptiProfilerApiTest : show "Aborted (core dumped)" - LoggerObserverTest: Through gtest passed, the log show "ERROR" **system info:** os: Ubuntu 20.04.5 LTS cuda version: cuda 11.7 **test result:** ``` $ make test Running tests... Test project /kineto/libkineto/build Start 1: ParseTest.Whitespace 1/47 Test pytorch#1: ParseTest.Whitespace .................................. Passed 0.00 sec Start 2: ParseTest.Comment 2/47 Test pytorch#2: ParseTest.Comment ..................................... Passed 0.00 sec Start 3: ParseTest.Format 3/47 Test pytorch#3: ParseTest.Format ...................................... Passed 0.00 sec Start 4: ParseTest.DefaultActivityTypes 4/47 Test pytorch#4: ParseTest.DefaultActivityTypes ........................ Passed 0.00 sec Start 5: ParseTest.ActivityTypes 5/47 Test pytorch#5: ParseTest.ActivityTypes ............................... Passed 0.00 sec Start 6: ParseTest.SamplePeriod 6/47 Test pytorch#6: ParseTest.SamplePeriod ................................ Passed 0.00 sec Start 7: ParseTest.MultiplexPeriod 7/47 Test pytorch#7: ParseTest.MultiplexPeriod ............................. Passed 0.00 sec Start 8: ParseTest.ReportPeriod 8/47 Test pytorch#8: ParseTest.ReportPeriod ................................ Passed 0.00 sec Start 9: ParseTest.SamplesPerReport 9/47 Test pytorch#9: ParseTest.SamplesPerReport ............................ Passed 0.00 sec Start 10: ParseTest.EnableSigUsr2 10/47 Test pytorch#10: ParseTest.EnableSigUsr2 ............................... Passed 0.00 sec Start 11: ParseTest.DeviceMask 11/47 Test pytorch#11: ParseTest.DeviceMask .................................. Passed 0.00 sec Start 12: ParseTest.RequestTime 12/47 Test pytorch#12: ParseTest.RequestTime ................................. Passed 0.00 sec Start 13: ParseTest.ProfileStartTime 13/47 Test pytorch#13: ParseTest.ProfileStartTime ............................ Passed 0.00 sec Start 14: CuptiActivityProfiler.AsyncTrace 14/47 Test pytorch#14: CuptiActivityProfiler.AsyncTrace ...................... Passed 0.72 sec Start 15: CuptiActivityProfiler.AsyncTraceUsingIter 15/47 Test pytorch#15: CuptiActivityProfiler.AsyncTraceUsingIter ............. Passed 0.70 sec Start 16: CuptiActivityProfilerTest.SyncTrace 16/47 Test pytorch#16: CuptiActivityProfilerTest.SyncTrace ................... Passed 0.75 sec Start 17: CuptiActivityProfilerTest.GpuNCCLCollectiveTest 17/47 Test pytorch#17: CuptiActivityProfilerTest.GpuNCCLCollectiveTest ....... Passed 0.75 sec Start 18: CuptiActivityProfilerTest.GpuUserAnnotationTest 18/47 Test pytorch#18: CuptiActivityProfilerTest.GpuUserAnnotationTest ....... Passed 0.74 sec Start 19: CuptiActivityProfilerTest.SubActivityProfilers 19/47 Test pytorch#19: CuptiActivityProfilerTest.SubActivityProfilers ........ Passed 0.72 sec Start 20: CuptiActivityProfilerTest.BufferSizeLimitTestWarmup 20/47 Test pytorch#20: CuptiActivityProfilerTest.BufferSizeLimitTestWarmup ... Passed 0.74 sec Start 21: CuptiCallbackApiTest.SimpleTest 21/47 Test pytorch#21: CuptiCallbackApiTest.SimpleTest ....................... Passed 0.00 sec Start 22: CuptiCallbackApiTest.AllCallbacks 22/47 Test pytorch#22: CuptiCallbackApiTest.AllCallbacks ..................... Passed 0.00 sec Start 23: CuptiCallbackApiTest.ContentionTest 23/47 Test pytorch#23: CuptiCallbackApiTest.ContentionTest ................... Passed 0.91 sec Start 24: CuptiCallbackApiTest.Bechmark 24/47 Test pytorch#24: CuptiCallbackApiTest.Bechmark ......................... Passed 0.00 sec Start 25: CuptiRangeProfilerApiTest.contextTracking 25/47 Test pytorch#25: CuptiRangeProfilerApiTest.contextTracking ............. Passed 0.01 sec Start 26: CuptiRangeProfilerApiTest.asyncLaunchUserRange 26/47 Test pytorch#26: CuptiRangeProfilerApiTest.asyncLaunchUserRange ........ Passed 0.00 sec Start 27: CuptiRangeProfilerApiTest.asyncLaunchAutoRange 27/47 Test pytorch#27: CuptiRangeProfilerApiTest.asyncLaunchAutoRange ........ Passed 0.00 sec Start 28: CuptiRangeProfilerConfigTest.ConfigureProfiler 28/47 Test pytorch#28: CuptiRangeProfilerConfigTest.ConfigureProfiler ........ Passed 0.00 sec Start 29: CuptiRangeProfilerConfigTest.RangesDefaults 29/47 Test pytorch#29: CuptiRangeProfilerConfigTest.RangesDefaults ........... Passed 0.00 sec Start 30: CuptiRangeProfilerTest.BasicTest 30/47 Test pytorch#30: CuptiRangeProfilerTest.BasicTest ...................... Passed 0.00 sec Start 31: CuptiRangeProfilerTest.UserRangeTest 31/47 Test pytorch#31: CuptiRangeProfilerTest.UserRangeTest .................. Passed 0.01 sec Start 32: CuptiRangeProfilerTest.AutoRangeTest 32/47 Test pytorch#32: CuptiRangeProfilerTest.AutoRangeTest .................. Passed 0.00 sec Start 33: CuptiStringsTest.Valid 33/47 Test pytorch#33: CuptiStringsTest.Valid ................................ Passed 0.00 sec Start 34: CuptiStringsTest.Invalid 34/47 Test pytorch#34: CuptiStringsTest.Invalid .............................. Passed 0.00 sec Start 35: PercentileTest.Create 35/47 Test pytorch#35: PercentileTest.Create ................................. Passed 0.00 sec Start 36: PercentileTest.Normalize 36/47 Test pytorch#36: PercentileTest.Normalize .............................. Passed 0.00 sec Start 37: EventTest.SumSamples 37/47 Test pytorch#37: EventTest.SumSamples .................................. Passed 0.00 sec Start 38: EventTest.Percentiles 38/47 Test pytorch#38: EventTest.Percentiles ................................. Passed 0.00 sec Start 39: MetricTest.Calculate 39/47 Test pytorch#39: MetricTest.Calculate .................................. Passed 0.00 sec Start 40: EventGroupSetTest.CollectSample 40/47 Test pytorch#40: EventGroupSetTest.CollectSample ....................... Passed 0.00 sec Start 41: EventProfilerTest.ConfigureFailure 41/47 Test pytorch#41: EventProfilerTest.ConfigureFailure .................... Passed 0.00 sec Start 42: EventProfilerTest.ConfigureBase 42/47 Test pytorch#42: EventProfilerTest.ConfigureBase ....................... Passed 0.00 sec Start 43: EventProfilerTest.ConfigureOnDemand 43/47 Test pytorch#43: EventProfilerTest.ConfigureOnDemand ................... Passed 0.00 sec Start 44: EventProfilerTest.ReportSample 44/47 Test pytorch#44: EventProfilerTest.ReportSample ........................ Passed 0.00 sec Start 45: LoggerObserverTest.SingleCollectorObserver 45/47 Test pytorch#45: LoggerObserverTest.SingleCollectorObserver ............ Passed 0.00 sec Start 46: LoggerObserverTest.FourCollectorObserver 46/47 Test pytorch#46: LoggerObserverTest.FourCollectorObserver .............. Passed 0.05 sec Start 47: ThreadNameTest.setAndGet 47/47 Test pytorch#47: ThreadNameTest.setAndGet .............................. Passed 0.00 sec 100% tests passed, 0 tests failed out of 47 Total Test time (real) = 6.21 sec ``` Pull Request resolved: pytorch#869 Differential Revision: D53869794 Pulled By: aaronenyeshi
@aaronenyeshi merged this pull request in 9ae2311. |
Finished CMakeLists.txt in the test folder and added folly repo needed by CuptiActivityProfilerTest. But still had some problems:
ConfigTest :
Through gtest passed, the log show "ERROR" as below:
log: ERROR:2024-02-15 20:45:30 1682464:1682464 AbstractConfig.cpp:145] Invalid config line:
CuptiProfilerApiTest :
show "Aborted (core dumped)"
LoggerObserverTest:
Through gtest passed, the log show "ERROR"
system info:
os: Ubuntu 20.04.5 LTS
cuda version: cuda 11.7
test result: