Skip to content

Document generation CI is broken#13308

Merged
skottmckay merged 16 commits into
mainfrom
skottmckay/FixKernelDocumentationGeneration
Oct 27, 2022
Merged

Document generation CI is broken#13308
skottmckay merged 16 commits into
mainfrom
skottmckay/FixKernelDocumentationGeneration

Conversation

@skottmckay
Copy link
Copy Markdown
Contributor

@skottmckay skottmckay commented Oct 13, 2022

Description

Fix document generation CI. It's not currently updating the docs as we're skipping the tests, which is the invocation of build.py that would have generated the documentation.

Setup specific task to generate documentation for greater clarity.

Motivation and Context

Operator kernel documentation is not getting updated and is now out of date.

@skottmckay skottmckay requested a review from fdwr October 13, 2022 11:17
fdwr
fdwr previously approved these changes Oct 14, 2022
@skottmckay skottmckay marked this pull request as ready for review October 14, 2022 03:58
@skottmckay skottmckay requested a review from a team October 14, 2022 03:58
buildArch: x64
additionalBuildFlags: --gen_doc validate --skip_tests --enable_pybind --use_dml --use_cuda --cuda_version=11.6 --cuda_home="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6" --enable_cuda_profiling --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=52 --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF
# note: need to specify `--gen_doc` when creating the build config so it has to be in additionalBuildFlags
additionalBuildFlags: --gen_doc --skip_tests --enable_pybind --use_dml --use_cuda --cuda_version=11.6 --cuda_home="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6" --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=52 --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF
Copy link
Copy Markdown
Contributor

@snnn snnn Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In the past, "--cmake_extra_defines" can only be specified once in a command line. Though we made some hack now it works, I hope we don't need to rely on it. Here you can put two defines together.

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Oct 14, 2022

I created a PR to move ORT GPU pipelines to T4, #13170. If that PR was merged, the "Windows GPU CI Pipeline" of this PR would not pass. It is because DML needs WDDM, but T4 doesn't WDDM. So I created a new machine pool and a new VM image that is just for DML. You can't assume a single machine that can run both DML and CUDA, especially someone just suggested me to move the DML pipeline to AMD GPUs. It wouldn't make sense to install CUDA libraries on all AMD GPU machines.

@fdwr
Copy link
Copy Markdown
Contributor

fdwr commented Oct 14, 2022

You can't assume a single machine that can run both DML and CUDA

Well we shouldn't need to actually run any GPU-specific code of DML or CUDA beyond basic initialization just for document generation anyway - we just need the kernels registered. So how about an internal overload of DMLProviderFactoryCreator::Create which addGlobalSchemaFunctions() can call in onnxruntime_pybind_schema.cc to avoid the IsSoftwareAdapter check? (maybe an optional bool flag)

fdwr
fdwr previously approved these changes Oct 17, 2022
@snnn
Copy link
Copy Markdown
Contributor

snnn commented Oct 18, 2022

@fdwr , so what's the plan?

@skottmckay
Copy link
Copy Markdown
Contributor Author

skottmckay commented Oct 18, 2022

Can we check this in to fix the operator documentation at least? either way we need to figure out how this works going forward.

One option is that we will be able to set a config value in the session options with this PR that the DML EP could read and not error out if there's no gpu when we're generating docs. updating the DML EP factory creator is fine as well.

@tianleiwu
Copy link
Copy Markdown
Contributor

@skottmckay, Please check in ASAP to unblock operator documentation update.

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Oct 21, 2022

The "onnxruntime-Win2019-GPU-dml" machine pool should not need to have CUDA installed. I think both @fdwr and @skottmckay's suggestions are good, but can we make it happen?

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Oct 21, 2022

Will PR #13318 be merged soon, or would you like me to create an internal overload of DMLProviderFactoryCreator::Create which addGlobalSchemaFunctions() can call in onnxruntime_pybind_schema.cc to avoid the IsSoftwareAdapter check?

@snnn
Copy link
Copy Markdown
Contributor

snnn commented Oct 24, 2022

Anything I can help?

@fdwr
Copy link
Copy Markdown
Contributor

fdwr commented Oct 24, 2022

@snnn Sorry, I wasn't sure who was doing what, what Scott was adding it or I should. I can add it in a few hours after today's meetings. Was thinking DMLProviderFactoryCreator::Create(int device_id, bool skip_software_adapter_check) which would be false by default.

@fdwr
Copy link
Copy Markdown
Contributor

fdwr commented Oct 25, 2022

#13428

@skottmckay
Copy link
Copy Markdown
Contributor Author

@snnn or @tianleiwu could you please signoff so it can be checked in?

fdwr added a commit that referenced this pull request Oct 25, 2022
… DML EP to initialize on software-only devices (#13428)

### Description
The documentation pipeline does not require an actual GPU, and running
on GPU-capable agents costs more. So to enable running on CPU-only
devices and to potentially consolidate future pipelines, and since the
tests are not actually executed on this device anyway (it just needs to
initialize the EP for the sake of operator kernel enumeration), add an
initialization flag to skip the software device check - this is only an
internal overload not exposed in the public API. See
#13308.

### Motivation and Context
- *If it fixes an open issue, please link to the issue here.* NA
Comment thread tools/ci_build/github/azure-pipelines/win-gpu-ci-pipeline.yml Outdated
Update pool used for doc generation.
  - add temporary diff to validate it works
Copy link
Copy Markdown
Contributor

@snnn snnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@skottmckay skottmckay merged commit ab71c4b into main Oct 27, 2022
@skottmckay skottmckay deleted the skottmckay/FixKernelDocumentationGeneration branch October 27, 2022 21:20
linnealovespie pushed a commit that referenced this pull request Oct 28, 2022
… DML EP to initialize on software-only devices (#13428)

### Description
The documentation pipeline does not require an actual GPU, and running
on GPU-capable agents costs more. So to enable running on CPU-only
devices and to potentially consolidate future pipelines, and since the
tests are not actually executed on this device anyway (it just needs to
initialize the EP for the sake of operator kernel enumeration), add an
initialization flag to skip the software device check - this is only an
internal overload not exposed in the public API. See
#13308.

### Motivation and Context
- *If it fixes an open issue, please link to the issue here.* NA
linnealovespie pushed a commit that referenced this pull request Oct 28, 2022
### Description
<!-- Describe your changes. -->
Fix document generation CI. It's not currently updating the docs as
we're skipping the tests, which is the invocation of build.py that would
have generated the documentation.

Setup specific task to generate documentation for greater clarity. 

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Operator kernel documentation is not getting updated and is now out of
date.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants