Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add libkineto/test/CMakeLists.txt #869

Closed
wants to merge 5 commits into from
Closed

add libkineto/test/CMakeLists.txt #869

wants to merge 5 commits into from

Conversation

wubai
Copy link
Contributor

@wubai wubai commented Feb 15, 2024

Finished CMakeLists.txt in the test folder and added folly repo needed by CuptiActivityProfilerTest. But still had some problems:

  • ConfigTest :

    Through gtest passed, the log show "ERROR" as below:
    log: ERROR:2024-02-15 20:45:30 1682464:1682464 AbstractConfig.cpp:145] Invalid config line:

  • CuptiProfilerApiTest :

    show "Aborted (core dumped)"

  • LoggerObserverTest:

    Through gtest passed, the log show "ERROR"

system info:
os: Ubuntu 20.04.5 LTS
cuda version: cuda 11.7

test result:

$ make test
Running tests...
Test project /kineto/libkineto/build
      Start  1: ParseTest.Whitespace
 1/47 Test  #1: ParseTest.Whitespace ..................................   Passed    0.00 sec
      Start  2: ParseTest.Comment
 2/47 Test  #2: ParseTest.Comment .....................................   Passed    0.00 sec
      Start  3: ParseTest.Format
 3/47 Test  #3: ParseTest.Format ......................................   Passed    0.00 sec
      Start  4: ParseTest.DefaultActivityTypes
 4/47 Test  #4: ParseTest.DefaultActivityTypes ........................   Passed    0.00 sec
      Start  5: ParseTest.ActivityTypes
 5/47 Test  #5: ParseTest.ActivityTypes ...............................   Passed    0.00 sec
      Start  6: ParseTest.SamplePeriod
 6/47 Test  #6: ParseTest.SamplePeriod ................................   Passed    0.00 sec
      Start  7: ParseTest.MultiplexPeriod
 7/47 Test  #7: ParseTest.MultiplexPeriod .............................   Passed    0.00 sec
      Start  8: ParseTest.ReportPeriod
 8/47 Test  #8: ParseTest.ReportPeriod ................................   Passed    0.00 sec
      Start  9: ParseTest.SamplesPerReport
 9/47 Test  #9: ParseTest.SamplesPerReport ............................   Passed    0.00 sec
      Start 10: ParseTest.EnableSigUsr2
10/47 Test #10: ParseTest.EnableSigUsr2 ...............................   Passed    0.00 sec
      Start 11: ParseTest.DeviceMask
11/47 Test #11: ParseTest.DeviceMask ..................................   Passed    0.00 sec
      Start 12: ParseTest.RequestTime
12/47 Test #12: ParseTest.RequestTime .................................   Passed    0.00 sec
      Start 13: ParseTest.ProfileStartTime
13/47 Test #13: ParseTest.ProfileStartTime ............................   Passed    0.00 sec
      Start 14: CuptiActivityProfiler.AsyncTrace
14/47 Test #14: CuptiActivityProfiler.AsyncTrace ......................   Passed    0.72 sec
      Start 15: CuptiActivityProfiler.AsyncTraceUsingIter
15/47 Test #15: CuptiActivityProfiler.AsyncTraceUsingIter .............   Passed    0.70 sec
      Start 16: CuptiActivityProfilerTest.SyncTrace
16/47 Test #16: CuptiActivityProfilerTest.SyncTrace ...................   Passed    0.75 sec
      Start 17: CuptiActivityProfilerTest.GpuNCCLCollectiveTest
17/47 Test #17: CuptiActivityProfilerTest.GpuNCCLCollectiveTest .......   Passed    0.75 sec
      Start 18: CuptiActivityProfilerTest.GpuUserAnnotationTest
18/47 Test #18: CuptiActivityProfilerTest.GpuUserAnnotationTest .......   Passed    0.74 sec
      Start 19: CuptiActivityProfilerTest.SubActivityProfilers
19/47 Test #19: CuptiActivityProfilerTest.SubActivityProfilers ........   Passed    0.72 sec
      Start 20: CuptiActivityProfilerTest.BufferSizeLimitTestWarmup
20/47 Test #20: CuptiActivityProfilerTest.BufferSizeLimitTestWarmup ...   Passed    0.74 sec
      Start 21: CuptiCallbackApiTest.SimpleTest
21/47 Test #21: CuptiCallbackApiTest.SimpleTest .......................   Passed    0.00 sec
      Start 22: CuptiCallbackApiTest.AllCallbacks
22/47 Test #22: CuptiCallbackApiTest.AllCallbacks .....................   Passed    0.00 sec
      Start 23: CuptiCallbackApiTest.ContentionTest
23/47 Test #23: CuptiCallbackApiTest.ContentionTest ...................   Passed    0.91 sec
      Start 24: CuptiCallbackApiTest.Bechmark
24/47 Test #24: CuptiCallbackApiTest.Bechmark .........................   Passed    0.00 sec
      Start 25: CuptiRangeProfilerApiTest.contextTracking
25/47 Test #25: CuptiRangeProfilerApiTest.contextTracking .............   Passed    0.01 sec
      Start 26: CuptiRangeProfilerApiTest.asyncLaunchUserRange
26/47 Test #26: CuptiRangeProfilerApiTest.asyncLaunchUserRange ........   Passed    0.00 sec
      Start 27: CuptiRangeProfilerApiTest.asyncLaunchAutoRange
27/47 Test #27: CuptiRangeProfilerApiTest.asyncLaunchAutoRange ........   Passed    0.00 sec
      Start 28: CuptiRangeProfilerConfigTest.ConfigureProfiler
28/47 Test #28: CuptiRangeProfilerConfigTest.ConfigureProfiler ........   Passed    0.00 sec
      Start 29: CuptiRangeProfilerConfigTest.RangesDefaults
29/47 Test #29: CuptiRangeProfilerConfigTest.RangesDefaults ...........   Passed    0.00 sec
      Start 30: CuptiRangeProfilerTest.BasicTest
30/47 Test #30: CuptiRangeProfilerTest.BasicTest ......................   Passed    0.00 sec
      Start 31: CuptiRangeProfilerTest.UserRangeTest
31/47 Test #31: CuptiRangeProfilerTest.UserRangeTest ..................   Passed    0.01 sec
      Start 32: CuptiRangeProfilerTest.AutoRangeTest
32/47 Test #32: CuptiRangeProfilerTest.AutoRangeTest ..................   Passed    0.00 sec
      Start 33: CuptiStringsTest.Valid
33/47 Test #33: CuptiStringsTest.Valid ................................   Passed    0.00 sec
      Start 34: CuptiStringsTest.Invalid
34/47 Test #34: CuptiStringsTest.Invalid ..............................   Passed    0.00 sec
      Start 35: PercentileTest.Create
35/47 Test #35: PercentileTest.Create .................................   Passed    0.00 sec
      Start 36: PercentileTest.Normalize
36/47 Test #36: PercentileTest.Normalize ..............................   Passed    0.00 sec
      Start 37: EventTest.SumSamples
37/47 Test #37: EventTest.SumSamples ..................................   Passed    0.00 sec
      Start 38: EventTest.Percentiles
38/47 Test #38: EventTest.Percentiles .................................   Passed    0.00 sec
      Start 39: MetricTest.Calculate
39/47 Test #39: MetricTest.Calculate ..................................   Passed    0.00 sec
      Start 40: EventGroupSetTest.CollectSample
40/47 Test #40: EventGroupSetTest.CollectSample .......................   Passed    0.00 sec
      Start 41: EventProfilerTest.ConfigureFailure
41/47 Test #41: EventProfilerTest.ConfigureFailure ....................   Passed    0.00 sec
      Start 42: EventProfilerTest.ConfigureBase
42/47 Test #42: EventProfilerTest.ConfigureBase .......................   Passed    0.00 sec
      Start 43: EventProfilerTest.ConfigureOnDemand
43/47 Test #43: EventProfilerTest.ConfigureOnDemand ...................   Passed    0.00 sec
      Start 44: EventProfilerTest.ReportSample
44/47 Test #44: EventProfilerTest.ReportSample ........................   Passed    0.00 sec
      Start 45: LoggerObserverTest.SingleCollectorObserver
45/47 Test #45: LoggerObserverTest.SingleCollectorObserver ............   Passed    0.00 sec
      Start 46: LoggerObserverTest.FourCollectorObserver
46/47 Test #46: LoggerObserverTest.FourCollectorObserver ..............   Passed    0.05 sec
      Start 47: ThreadNameTest.setAndGet
47/47 Test #47: ThreadNameTest.setAndGet ..............................   Passed    0.00 sec

100% tests passed, 0 tests failed out of 47

Total Test time (real) =   6.21 sec

Copy link
Member

@aaronenyeshi aaronenyeshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great, really happy to get unit tests working in OSS for Kineto. We should try to remove the dependency on folly if possible!

.gitmodules Outdated Show resolved Hide resolved
@aaronenyeshi
Copy link
Member

I'm not sure if you have permissions to see Github signals, but getting this error in the CI. Wondering if we could skip CUDA when missing?

CMake Error at /usr/local/share/cmake-3.28/Modules/Internal/CMakeCUDAFindToolkit.cmake:104 (message):
  Failed to find nvcc.

  Compiler requires the CUDA toolkit.  Please set the CUDAToolkit_ROOT
  variable.
Call Stack (most recent call first):
  /usr/local/share/cmake-3.28/Modules/CMakeDetermineCUDACompiler.cmake:89 (cmake_cuda_find_toolkit)
  test/CMakeLists.txt:132 (enable_language)


-- Configuring incomplete, errors occurred!

@wubai
Copy link
Contributor Author

wubai commented Feb 16, 2024

I'm not sure if you have permissions to see Github signals, but getting this error in the CI. Wondering if we could skip CUDA when missing?

CMake Error at /usr/local/share/cmake-3.28/Modules/Internal/CMakeCUDAFindToolkit.cmake:104 (message):
  Failed to find nvcc.

  Compiler requires the CUDA toolkit.  Please set the CUDAToolkit_ROOT
  variable.
Call Stack (most recent call first):
  /usr/local/share/cmake-3.28/Modules/CMakeDetermineCUDACompiler.cmake:89 (cmake_cuda_find_toolkit)
  test/CMakeLists.txt:132 (enable_language)


-- Configuring incomplete, errors occurred!

Yes, I could see the error in the CI. Building libkineto should set CUDA_SOURCE_DIR. If CUDA_SOURCE_DIR is not set, libkineto will fail to build.

Copy link
Member

@aaronenyeshi aaronenyeshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, and LGTM!

@aaronenyeshi
Copy link
Member

Yes, I could see the error in the CI. Building libkineto should set CUDA_SOURCE_DIR. If CUDA_SOURCE_DIR is not set, libkineto will fail to build.

The original LIBKINETOCI is using a basic linux environment. I've just landed a new CI build, libkineto PR Test, that should contain CUDA libraries. #870. Running it now

@wubai
Copy link
Contributor Author

wubai commented Feb 16, 2024

Yes, I could see the error in the CI. Building libkineto should set CUDA_SOURCE_DIR. If CUDA_SOURCE_DIR is not set, libkineto will fail to build.

The original LIBKINETOCI is using a basic linux environment. I've just landed a new CI build, libkineto PR Test, that should contain CUDA libraries. #870. Running it now

The unit tests have dependency on CUDA, so LIBKINETOCI should set KINETO_BUILD_TESTS off and libkineto PR Test should set CUDA_SOURCE_DIR before building it.

@aaronenyeshi
Copy link
Member

Yes, I could see the error in the CI. Building libkineto should set CUDA_SOURCE_DIR. If CUDA_SOURCE_DIR is not set, libkineto will fail to build.

The original LIBKINETOCI is using a basic linux environment. I've just landed a new CI build, libkineto PR Test, that should contain CUDA libraries. #870. Running it now

The unit tests have dependency on CUDA, so LIBKINETOCI should set KINETO_BUILD_TESTS off and libkineto PR Test should set CUDA_SOURCE_DIR before building it.

Good catch, the docker image is using CUDA_HOME. Can you please update the CUDA_SOURCE_DIR to check CUDA_HOME instead? Also we could update LIBKINETOCI directly here: https://github.com/pytorch/kineto/blob/main/.github/workflows/libkineto_ci.yml#L42-L56

@wubai
Copy link
Contributor Author

wubai commented Feb 16, 2024

Please check if env CUDA_HOME really exist in docker container and have gpu device installed? @aaronenyeshi

@aaronenyeshi
Copy link
Member

aaronenyeshi commented Feb 16, 2024

Please check if env CUDA_HOME really exist in docker container? @aaronenyeshi

It should be set, we are using pytorch/benchmark's docker image: https://github.com/pytorch/benchmark/blob/main/docker/gcp-a100-runner-dind.dockerfile#L48

@xuzhao9 mentioned it may be a bug: https://stackoverflow.com/questions/55044846/no-cmake-cuda-compiler-could-be-found
and may require manually adding -DCMAKE_CUDA_COMPILER:PATH=/usr/local/cuda/bin/nvcc. I'm trying to set-up my docker to repro the issue.

@wubai
Copy link
Contributor Author

wubai commented Feb 16, 2024

Can you please trigger the workflows again? @aaronenyeshi

Copy link
Member

@aaronenyeshi aaronenyeshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@facebook-github-bot
Copy link
Contributor

@aaronenyeshi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

aaronenyeshi pushed a commit to aaronenyeshi/kineto that referenced this pull request Feb 16, 2024
Summary:
Finished CMakeLists.txt in the test folder and added folly repo needed by CuptiActivityProfilerTest. But still had some problems:

- ConfigTest :

    Through gtest passed, the log show "ERROR" as below:
    log:  ERROR:2024-02-15 20:45:30 1682464:1682464 AbstractConfig.cpp:145] Invalid config line:

- CuptiProfilerApiTest :

    show "Aborted (core dumped)"

- LoggerObserverTest:

    Through gtest passed, the log show "ERROR"

**system info:**
os: Ubuntu 20.04.5 LTS
cuda version: cuda 11.7

**test result:**
```
$ make test
Running tests...
Test project /kineto/libkineto/build
      Start  1: ParseTest.Whitespace
 1/47 Test  pytorch#1: ParseTest.Whitespace ..................................   Passed    0.00 sec
      Start  2: ParseTest.Comment
 2/47 Test  pytorch#2: ParseTest.Comment .....................................   Passed    0.00 sec
      Start  3: ParseTest.Format
 3/47 Test  pytorch#3: ParseTest.Format ......................................   Passed    0.00 sec
      Start  4: ParseTest.DefaultActivityTypes
 4/47 Test  pytorch#4: ParseTest.DefaultActivityTypes ........................   Passed    0.00 sec
      Start  5: ParseTest.ActivityTypes
 5/47 Test  pytorch#5: ParseTest.ActivityTypes ...............................   Passed    0.00 sec
      Start  6: ParseTest.SamplePeriod
 6/47 Test  pytorch#6: ParseTest.SamplePeriod ................................   Passed    0.00 sec
      Start  7: ParseTest.MultiplexPeriod
 7/47 Test  pytorch#7: ParseTest.MultiplexPeriod .............................   Passed    0.00 sec
      Start  8: ParseTest.ReportPeriod
 8/47 Test  pytorch#8: ParseTest.ReportPeriod ................................   Passed    0.00 sec
      Start  9: ParseTest.SamplesPerReport
 9/47 Test  pytorch#9: ParseTest.SamplesPerReport ............................   Passed    0.00 sec
      Start 10: ParseTest.EnableSigUsr2
10/47 Test pytorch#10: ParseTest.EnableSigUsr2 ...............................   Passed    0.00 sec
      Start 11: ParseTest.DeviceMask
11/47 Test pytorch#11: ParseTest.DeviceMask ..................................   Passed    0.00 sec
      Start 12: ParseTest.RequestTime
12/47 Test pytorch#12: ParseTest.RequestTime .................................   Passed    0.00 sec
      Start 13: ParseTest.ProfileStartTime
13/47 Test pytorch#13: ParseTest.ProfileStartTime ............................   Passed    0.00 sec
      Start 14: CuptiActivityProfiler.AsyncTrace
14/47 Test pytorch#14: CuptiActivityProfiler.AsyncTrace ......................   Passed    0.72 sec
      Start 15: CuptiActivityProfiler.AsyncTraceUsingIter
15/47 Test pytorch#15: CuptiActivityProfiler.AsyncTraceUsingIter .............   Passed    0.70 sec
      Start 16: CuptiActivityProfilerTest.SyncTrace
16/47 Test pytorch#16: CuptiActivityProfilerTest.SyncTrace ...................   Passed    0.75 sec
      Start 17: CuptiActivityProfilerTest.GpuNCCLCollectiveTest
17/47 Test pytorch#17: CuptiActivityProfilerTest.GpuNCCLCollectiveTest .......   Passed    0.75 sec
      Start 18: CuptiActivityProfilerTest.GpuUserAnnotationTest
18/47 Test pytorch#18: CuptiActivityProfilerTest.GpuUserAnnotationTest .......   Passed    0.74 sec
      Start 19: CuptiActivityProfilerTest.SubActivityProfilers
19/47 Test pytorch#19: CuptiActivityProfilerTest.SubActivityProfilers ........   Passed    0.72 sec
      Start 20: CuptiActivityProfilerTest.BufferSizeLimitTestWarmup
20/47 Test pytorch#20: CuptiActivityProfilerTest.BufferSizeLimitTestWarmup ...   Passed    0.74 sec
      Start 21: CuptiCallbackApiTest.SimpleTest
21/47 Test pytorch#21: CuptiCallbackApiTest.SimpleTest .......................   Passed    0.00 sec
      Start 22: CuptiCallbackApiTest.AllCallbacks
22/47 Test pytorch#22: CuptiCallbackApiTest.AllCallbacks .....................   Passed    0.00 sec
      Start 23: CuptiCallbackApiTest.ContentionTest
23/47 Test pytorch#23: CuptiCallbackApiTest.ContentionTest ...................   Passed    0.91 sec
      Start 24: CuptiCallbackApiTest.Bechmark
24/47 Test pytorch#24: CuptiCallbackApiTest.Bechmark .........................   Passed    0.00 sec
      Start 25: CuptiRangeProfilerApiTest.contextTracking
25/47 Test pytorch#25: CuptiRangeProfilerApiTest.contextTracking .............   Passed    0.01 sec
      Start 26: CuptiRangeProfilerApiTest.asyncLaunchUserRange
26/47 Test pytorch#26: CuptiRangeProfilerApiTest.asyncLaunchUserRange ........   Passed    0.00 sec
      Start 27: CuptiRangeProfilerApiTest.asyncLaunchAutoRange
27/47 Test pytorch#27: CuptiRangeProfilerApiTest.asyncLaunchAutoRange ........   Passed    0.00 sec
      Start 28: CuptiRangeProfilerConfigTest.ConfigureProfiler
28/47 Test pytorch#28: CuptiRangeProfilerConfigTest.ConfigureProfiler ........   Passed    0.00 sec
      Start 29: CuptiRangeProfilerConfigTest.RangesDefaults
29/47 Test pytorch#29: CuptiRangeProfilerConfigTest.RangesDefaults ...........   Passed    0.00 sec
      Start 30: CuptiRangeProfilerTest.BasicTest
30/47 Test pytorch#30: CuptiRangeProfilerTest.BasicTest ......................   Passed    0.00 sec
      Start 31: CuptiRangeProfilerTest.UserRangeTest
31/47 Test pytorch#31: CuptiRangeProfilerTest.UserRangeTest ..................   Passed    0.01 sec
      Start 32: CuptiRangeProfilerTest.AutoRangeTest
32/47 Test pytorch#32: CuptiRangeProfilerTest.AutoRangeTest ..................   Passed    0.00 sec
      Start 33: CuptiStringsTest.Valid
33/47 Test pytorch#33: CuptiStringsTest.Valid ................................   Passed    0.00 sec
      Start 34: CuptiStringsTest.Invalid
34/47 Test pytorch#34: CuptiStringsTest.Invalid ..............................   Passed    0.00 sec
      Start 35: PercentileTest.Create
35/47 Test pytorch#35: PercentileTest.Create .................................   Passed    0.00 sec
      Start 36: PercentileTest.Normalize
36/47 Test pytorch#36: PercentileTest.Normalize ..............................   Passed    0.00 sec
      Start 37: EventTest.SumSamples
37/47 Test pytorch#37: EventTest.SumSamples ..................................   Passed    0.00 sec
      Start 38: EventTest.Percentiles
38/47 Test pytorch#38: EventTest.Percentiles .................................   Passed    0.00 sec
      Start 39: MetricTest.Calculate
39/47 Test pytorch#39: MetricTest.Calculate ..................................   Passed    0.00 sec
      Start 40: EventGroupSetTest.CollectSample
40/47 Test pytorch#40: EventGroupSetTest.CollectSample .......................   Passed    0.00 sec
      Start 41: EventProfilerTest.ConfigureFailure
41/47 Test pytorch#41: EventProfilerTest.ConfigureFailure ....................   Passed    0.00 sec
      Start 42: EventProfilerTest.ConfigureBase
42/47 Test pytorch#42: EventProfilerTest.ConfigureBase .......................   Passed    0.00 sec
      Start 43: EventProfilerTest.ConfigureOnDemand
43/47 Test pytorch#43: EventProfilerTest.ConfigureOnDemand ...................   Passed    0.00 sec
      Start 44: EventProfilerTest.ReportSample
44/47 Test pytorch#44: EventProfilerTest.ReportSample ........................   Passed    0.00 sec
      Start 45: LoggerObserverTest.SingleCollectorObserver
45/47 Test pytorch#45: LoggerObserverTest.SingleCollectorObserver ............   Passed    0.00 sec
      Start 46: LoggerObserverTest.FourCollectorObserver
46/47 Test pytorch#46: LoggerObserverTest.FourCollectorObserver ..............   Passed    0.05 sec
      Start 47: ThreadNameTest.setAndGet
47/47 Test pytorch#47: ThreadNameTest.setAndGet ..............................   Passed    0.00 sec

100% tests passed, 0 tests failed out of 47

Total Test time (real) =   6.21 sec

```

Pull Request resolved: pytorch#869

Differential Revision: D53869794

Pulled By: aaronenyeshi
@facebook-github-bot
Copy link
Contributor

@aaronenyeshi merged this pull request in 9ae2311.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants