Skip to content

Conversation

walterddr
Copy link
Contributor

No description provided.

@jeffdaily
Copy link
Collaborator

The ROCm build errors are due to the hipify script not running. The cmake error add_subdirectory given source "src/THH" which is not an existing directory is because the hipify script creates the THH directory.

@dr-ci
Copy link

dr-ci bot commented Sep 8, 2020

💊 CI failures summary and remediations

As of commit 28da9a3 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

2 jobs timed out:

  • binary_linux_libtorch_3_7m_cu101_gcc5_4_cxx11-abi_nightly_shared-with-deps_build
  • binary_linux_libtorch_3_7m_cu101_gcc5_4_cxx11-abi_nightly_shared-without-deps_build

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 16 times.

@walterddr
Copy link
Contributor Author

walterddr commented Sep 8, 2020

The ROCm build errors are due to the hipify script not running. The cmake error add_subdirectory given source "src/THH" which is not an existing directory is because the hipify script creates the THH directory.

could you share which exact build failure were you referring to?

BTW here are a bunch of changes I did since last time:

  1. disabled libtorch build because of the cmake configuration issue (which I think is related to the hipify script not running)
  2. trying to disable binary test because pytorch/builder repo's check_binary.sh script is not recognizing rocm and is still expecting cuda. (preparing a PR for it now)

also do we plan to enable all build variants (namely conda & libtorch) at the nightly stage for now or is it ok to just do the manywheel + rocm combination (on 3.6~3.8)

@jeffdaily
Copy link
Collaborator

The first stage of this pytorch rocm wheel effort was to get a successful nightly manywheel + rocm combination. The conda and libtorch builds, though ultimately desired, are secondary efforts.

That said, the libtorch build is failing as commented here:
pytorch/builder#511 (comment)
There could be additional errors, but it is primarily failing due to not running build_amd.py prior to setup.py install.

@walterddr
Copy link
Contributor Author

sounds good. I will focus on getting manywheel + rocm work for this PR and will create fixes for libtorch/hipify on the next

Copy link
Member

@seemethere seemethere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me, I'm curious to know how this interacts with the upload script.

Like what are the filenames for these wheels, and which folder on S3 do they actually get uploaded to?

Just had a deeper look and looks like these binaries will get uploaded to the rocm3.7 S3 folder

@walterddr walterddr force-pushed the gh/walterddr/rocm_nightly branch from 0b26074 to 28da9a3 Compare September 8, 2020 21:44
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@walterddr merged this pull request in 0351d31.

@facebook-github-bot facebook-github-bot deleted the gh/walterddr/rocm_nightly branch January 27, 2021 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants