Skip to content

Conversation

scopatz
Copy link
Contributor

@scopatz scopatz commented Aug 28, 2020

@codecov
Copy link

codecov bot commented Aug 28, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@6d31213). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #43771   +/-   ##
=========================================
  Coverage          ?   69.34%           
=========================================
  Files             ?      378           
  Lines             ?    46698           
  Branches          ?        0           
=========================================
  Hits              ?    32381           
  Misses            ?    14317           
  Partials          ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6d31213...677b919. Read the comment docs.

tools/nightly.py Outdated
if platform.startswith("win"):
return False
src = os.path.join(source_dir, "__nightly_test__")
trg = os.path.join(target_dir, "__nightky_test__")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nightky?

@ezyang
Copy link
Contributor

ezyang commented Aug 28, 2020

Can you briefly describe how you tested this fixed the bug in question?

Also, another common fix for this cross-link problem is to just put the tmp files in the same directory as where the eventual location will be; see also https://rachelbythebay.com/w/2020/08/11/files/

@scopatz
Copy link
Contributor Author

scopatz commented Aug 28, 2020

Can you briefly describe how you tested this fixed the bug in question?

I tested it locally as did @lezcano (who originally reported the issue).

Also, another common fix for this cross-link problem is to just put the tmp files in the same directory as where the eventual location will be; see also https://rachelbythebay.com/w/2020/08/11/files/

It is true, but I though this was a more reasonable solution given the existing architecture and that most of the files in the conda package will not actually be linked, so in the event of a failure they will be cleaned up by the OS automatically, whereas if the tmpdir is in the target dir then you are relying on Python to clean them up. Basically, this seems more robust overall and less likely to get the entirety of the nightly pytorch accidentally committed to the repo.

tools/nightly.py Outdated
Comment on lines 485 to 499
src = os.path.join(source_dir, "__nightly_test__")
trg = os.path.join(target_dir, "__nightky_test__")
try:
with open(src, "w"):
pass
os.link(src, trg)
linkable = True
except OSError:
linkable = False
finally:
if os.path.isfile(trg):
os.remove(trg)
if os.path.isfile(src):
os.remove(src)
return linkable
Copy link
Contributor

@malfet malfet Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wouldn't using os.stat be a much simpler solution?

Suggested change
src = os.path.join(source_dir, "__nightly_test__")
trg = os.path.join(target_dir, "__nightky_test__")
try:
with open(src, "w"):
pass
os.link(src, trg)
linkable = True
except OSError:
linkable = False
finally:
if os.path.isfile(trg):
os.remove(trg)
if os.path.isfile(src):
os.remove(src)
return linkable
try:
# Hard linking is possible between two folders on the same device
return os.stat(source_dir).st_dev == os.stat(target_dir).st_devsrc = os.path.join(source_dir, "__nightly_test__")
except OSError:
# hard-linking between non-existing/non-accessible directories is not possible
return False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking a long time to get back to this. I don't believe that using stat to test the device is semantically correct here. We want to test if we can link, so the the way to do that is by trying to link a temporary file. The implementation details about which devices allow linking to which other devices isn't a strong enough guarantee that linking can happen, though it may be necessary based on the system

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. On the other hand, existing implementation does not, strictly speaking, answer the question whether files between two folders can be linked, because it will return false if user is not allowed to create files in source folder or if __nightly_test__ already exists in source folder or if __nightky_test__ exists in destination folder.
Considering all that, I think ideal approach would be to just try hard_link and if it fails to copy as suggested in https://github.com/pytorch/pytorch/pull/43771/files#r479539719

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider simpler implementation of _can_link, or perhaps just fall back to try: except: between link_file and copy_file

tools/nightly.py Outdated
Comment on lines 510 to 513
if _can_link(platform, source_dir, target_dir):
_link_files(listing, source_dir, target_dir)
else:
_copy_files(listing, source_dir, target_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't try: except: paradigm be better than if: then: in this case?

Suggested change
if _can_link(platform, source_dir, target_dir):
_link_files(listing, source_dir, target_dir)
else:
_copy_files(listing, source_dir, target_dir)
try::
_link_files(listing, source_dir, target_dir)
except OSError:
_copy_files(listing, source_dir, target_dir)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try is in the _can_link() function. There is nothing particularly exceptional about choosing which function should be used to transfer files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach eliminate the need of _can_link function completely, isn't it?

@ailzhang ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 31, 2020
@ezyang
Copy link
Contributor

ezyang commented Sep 18, 2020

@malfet are you satisfied by the response here, or is there more stuff we have to do? It would be good to move this along.

tools/nightly.py Outdated
Comment on lines 510 to 513
if _can_link(platform, source_dir, target_dir):
_link_files(listing, source_dir, target_dir)
else:
_copy_files(listing, source_dir, target_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach eliminate the need of _can_link function completely, isn't it?

tools/nightly.py Outdated
Comment on lines 485 to 499
src = os.path.join(source_dir, "__nightly_test__")
trg = os.path.join(target_dir, "__nightky_test__")
try:
with open(src, "w"):
pass
os.link(src, trg)
linkable = True
except OSError:
linkable = False
finally:
if os.path.isfile(trg):
os.remove(trg)
if os.path.isfile(src):
os.remove(src)
return linkable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. On the other hand, existing implementation does not, strictly speaking, answer the question whether files between two folders can be linked, because it will return false if user is not allowed to create files in source folder or if __nightly_test__ already exists in source folder or if __nightky_test__ exists in destination folder.
Considering all that, I think ideal approach would be to just try hard_link and if it fails to copy as suggested in https://github.com/pytorch/pytorch/pull/43771/files#r479539719

@scopatz
Copy link
Contributor Author

scopatz commented Sep 18, 2020

OK, here is a version that works with try-except and removes the _can_link() function

@dr-ci
Copy link

dr-ci bot commented Sep 18, 2020

💊 CI failures summary and remediations

As of commit 677b919 (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



🚧 1 ongoing upstream failure:

These were probably caused by upstream breakages that are not fixed yet:


🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is newer than viable/strict, you can try basing on an older, stable commit:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase --onto FETCH_HEAD $(git merge-base origin/master HEAD)

If your commit is older than viable/strict:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 2 times.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@malfet merged this pull request in 7de512c.

@scopatz scopatz deleted the nightly-ln branch September 23, 2020 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nightly.py error

7 participants