Skip to content

Fix gen_doc for training build#10980

Closed
pengwa wants to merge 1 commit into
masterfrom
pengwa/fix_gen_doc
Closed

Fix gen_doc for training build#10980
pengwa wants to merge 1 commit into
masterfrom
pengwa/fix_gen_doc

Conversation

@pengwa
Copy link
Copy Markdown
Contributor

@pengwa pengwa commented Mar 23, 2022

Description: Fix gen_doc for training build.

Notes: discussed with @mindest, there are few issues:

  1. There is CI that will test the diff of MD files. So if we let existing "docs/ContribOperators.md" and "docs/OperatorKernels.md" contains training ops, it will fail. The other hand, there will be confilicts between inferencing devs and training devs.
  2. Regarding to split training ops in new MD files. Currently training ops are registered either in MS Domains and ONNX Domains. So it's hard to extract training specific ops and generate training specific MD files, unless we do refactoring with high motivations.
  3. There are some complex kernel defs for example LambOptimizer, which mess the MD files unless we truncated the content.

Since this issue impact training devs for limited scenarios (updating shared operator def & kernels), there are workarounds people feel acceptable: remove enable_training from building flag to update the MD files.

So let's revisit this issue once we need public training ops defs some day. Closing this PR.


  • When we do training build with "--gen_doc", we will get following failure.

    ./build.sh --config $flavor --use_cuda --enable_training  --build_wheel --parallel 8 --use_mpi --skip_tests --enable_training_torch_interop --cuda_version=11.3 --gen_doc
    
    2022-03-23 02:37:56,715 util.run [DEBUG] - Subprocess completed. Return code: 0
    2022-03-23 02:37:56,716 util.run [INFO] - Running subprocess in '/bert_ort/pengwa/ort_private/build/Linux/RelWithDebInfo'
      /bert_ort/pengwa/py38/bin/python3 gen_contrib_doc.py --output_path /bert_ort/pengwa/ort_private/docs/ContribOperators.md --domains com.microsoft
    Traceback (most recent call last):
      File "gen_contrib_doc.py", line 20, in <module>
        from onnxruntime.capi.onnxruntime_pybind11_state import schemadef  # noqa: F401
    **ImportError: cannot import name 'schemadef' from 'onnxruntime.capi.onnxruntime_pybind11_state'** (/bert_ort/pengwa/ort_private/build/Linux/RelWithDebInfo/onnxruntime/capi/onnxruntime_pybind11_state.so)
    Traceback (most recent call last):
      File "/bert_ort/pengwa/ort_private/tools/ci_build/build.py", line 2427, in <module>
        sys.exit(main())
      File "/bert_ort/pengwa/ort_private/tools/ci_build/build.py", line 2414, in main
        generate_documentation(source_dir, build_dir, configs, args.gen_doc == 'validate')
      File "/bert_ort/pengwa/ort_private/tools/ci_build/build.py", line 2006, in generate_documentation
        run_subprocess([sys.executable, 'gen_contrib_doc.py', '--output_path', contrib_op_doc_path,
      File "/bert_ort/pengwa/ort_private/tools/ci_build/build.py", line 655, in run_subprocess
        return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
      File "/bert_ort/pengwa/ort_private/tools/python/util/run.py", line 42, in run
        completed_process = subprocess.run(
      File "/bert_ort/pengwa/py38/lib/python3.8/subprocess.py", line 516, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['/bert_ort/pengwa/py38/bin/python3', 'gen_contrib_doc.py', '--output_path', '/bert_ort/pengwa/ort_private/docs/ContribOperators.md', '--domains', 'com.microsoft']' returned non-zero exit status 1
    
  • Fix a build warning in fused_ops_frontend.cpp.

Motivation and Context

  • Why is this change required? What problem does it solve?
  • If it fixes an open issue, please link to the issue here.

@pengwa pengwa added the training issues related to ONNX Runtime training; typically submitted using template label Mar 23, 2022
@pengwa pengwa requested review from Lafi7e, mindest and skottmckay March 23, 2022 07:56
@pengwa
Copy link
Copy Markdown
Contributor Author

pengwa commented Mar 23, 2022

Add @skottmckay since it introduced lots of training specific op definition update into "docs/ContribOperators.md" and "docs/OperatorKernels.md". To make sure there is no big concern.

@pengwa
Copy link
Copy Markdown
Contributor Author

pengwa commented Mar 23, 2022

Looks our LambOptimizer schema is too long to load here. https://github.com/microsoft/onnxruntime/blob/647db5d40fa136f9c0e957c489fbc0fcdcc905ea/docs/ContribOperators.md

Would like to heard your thoughts whether it is necessary to include training ops in those MD files. If needed anyway, we may truncate the content in our gen_doc python tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

training issues related to ONNX Runtime training; typically submitted using template

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant