Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid PermissionError on Lustre when copying files (fixes #25354) #27247

Merged
merged 5 commits into from Feb 20, 2023

Conversation

BenWibking
Copy link
Contributor

@BenWibking BenWibking commented Nov 5, 2021

There is a bug in Python versions < 3.7.4 that causes PermissionError to occur when using shutil.copy2 to copy read-only files that have extended attributes set by Lustre (which is used by lib/spack/llnl/util/filesystem.py:504 to copy directory trees). The upstream Python bug is here.

Setting SPACK_PYTHON to point to a python executable >= version 3.7.4 fixes the problem. However, for Python versions below this, we monkeypatch shutil.copystat to set extended attributes before setting permissions. This means that copying read-only files that have extended attributes set by Lustre can succeed, which is necessary to install openjdk (and fix #25354). I have confirmed on my cluster that this patch fixes openjdk installation when spack is installed on a Lustre FS when using Python 3.6.

@BenWibking
Copy link
Contributor Author

It looks like the CI fails due to this function accessing a member variable that only exists in Python 3.3+, but I've guarded against accessing it for earlier verions:

          # use the real function only if it exists
           # *and* it supports follow_symlinks
           def lookup(name):
               fn = getattr(os, name, _nop)
               if sys.version_info >= (3, 3):
                   if fn in os.supports_follow_symlinks:
                       return fn
               return _nop

@mwkrentel
Copy link
Member

I tried this on theta at ANL. /lus/theta-fs0 is lustre.

$ mount | grep  lustre
172.22.10.92@o2ib:172.22.10.93@o2ib:/snx11214 on /lus/theta-fs0 type lustre (rw,flock,lazystatfs)

On /lus/theta-fs0, current spack develop fails for me as:

$ spack install openjdk 
==> Installing openjdk-11.0.12_7-cw7hhkrfe4x2fk73jlxw22pefvf5n4j5
==> No binary for openjdk-11.0.12_7-cw7hhkrfe4x2fk73jlxw22pefvf5n4j5 found: installing from source
==> Using cached archive: /lus/theta-fs0/projects/CSC250STTO11/krentel/spack/spack-repo/source-cache/_source-cache/archive/87/8770f600fc3b89bf331213c7aa21f8eedd9ca5d96036d1cd48cb2748a3dbefd2.tar.gz
==> No patches needed for openjdk
==> openjdk: Executing phase: 'install'
==> Error: PermissionError: [Errno 13] Permission denied: '/lus/theta-fs0/projects/CSC250STTO11/krentel/spack/install/cray-cnl7-x86_64/gcc-11.2.0/openjdk-11.0.12_7-cw7hhkrfe4x2fk73jlxw22pefvf5n4j5/legal/jdk.localedata/thaidict.md'

/lus/theta-fs0/projects/CSC250STTO11/krentel/spack/spack-repo/var/spack/repos/builtin/packages/openjdk/package.py:144, in install:
        142    def install(self, spec, prefix):
        143        top_dir = 'Contents/Home/' if platform.system() == "Darwin" else '.'
  >>    144        install_tree(top_dir, prefix)

With the patch to filesystem.py it works.

$ spack install openjdk
==> Installing openjdk-11.0.12_7-cw7hhkrfe4x2fk73jlxw22pefvf5n4j5
==> No binary for openjdk-11.0.12_7-cw7hhkrfe4x2fk73jlxw22pefvf5n4j5 found: installing from source
==> Using cached archive: /lus/theta-fs0/projects/CSC250STTO11/krentel/spack/spack-repo/source-cache/_source-cache/archive/87/8770f600fc3b89bf331213c7aa21f8eedd9ca5d96036d1cd48cb2748a3dbefd2.tar.gz
==> No patches needed for openjdk
==> openjdk: Executing phase: 'install'
==> openjdk: Successfully installed openjdk-11.0.12_7-cw7hhkrfe4x2fk73jlxw22pefvf5n4j5
  Fetch: 0.49s.  Build: 6.69s.  Total: 7.18s.
[+] /lus/theta-fs0/projects/CSC250STTO11/krentel/spack/install/cray-cnl7-x86_64/gcc-11.2.0/openjdk-11.0.12_7-cw7hhkrfe4x2fk73jlxw22pefvf5n4j5

So, I can confirm the bug on lustre and the patch works for me.

Btw, it's not just the install_tree and module_roots directories
where it fails, it also fails when build_stage is on lustre.

This is with current spack develop as of today, Nov 9, 2021.

  • Spack: 0.17.0-48-0322431c5c
  • Python: 3.6.12
  • Platform: cray-sles15-haswell
  • Concretizer: clingo

@mwkrentel
Copy link
Member

I tested this a week ago, and I was able to reproduce the problem and
confirm the solution works. So, I approve.

@tldahlgren You understand I don't have commit access (and don't really want it), right?

@tldahlgren
Copy link
Contributor

I tested this a week ago, and I was able to reproduce the problem and confirm the solution works. So, I approve.

@tldahlgren You understand I don't have commit access (and don't really want it), right?

Yes, I do understand.

I was just flagging PRs with the people who have provided reviews so it's visible on the PR list. IMO it's an easier way to distinguish PRs with real feedback (e.g., something other than spackbot comments).

Perhaps this is a good topic to clarify in the Spack User meeting?

@BenWibking
Copy link
Contributor Author

@tgamblin @becker33 Is there any timeframe on when this can get reviewed? This patch fixes an issue I reported over 3 months ago, and this PR itself is 20 days old...

@haampie
Copy link
Member

haampie commented Nov 25, 2021

Not sure why this was not reviewed, but one issue is we would add PSF-licensed
code, if I understand correctly.

2. Subject to the terms and conditions of this License Agreement, PSF hereby
   grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce,
   analyze, test, perform and/or display publicly, prepare derivative works,
   distribute, and otherwise use Python 3.10.0 alone or in any derivative
   version, provided, however, that PSF's License Agreement and PSF's notice of
   copyright, i.e., "Copyright © 2001-2021 Python Software Foundation; All Rights
   Reserved" are retained in Python 3.10.0 alone or in any derivative version
   prepared by Licensee.

3. In the event Licensee prepares a derivative work that is based on or
   incorporates Python 3.10.0 or any part thereof, and wants to make the
   derivative work available to others as provided herein, then Licensee hereby
   agrees to include in any such work a brief summary of the changes made to Python
   3.10.0.

Can you rebase the PR?

@BenWibking
Copy link
Contributor Author

Rebased.

I didn't realize the license issue. How does this work for vendored dependencies? Would adding shutil (from 3.7.4 or higher) as a vendored dependency fix this problem?

@BenWibking
Copy link
Contributor Author

The error message for the unittests (2.7, clingo) check is inscrutable to me. Is this a CI problem, or an actual problem with this patch?

@BenWibking
Copy link
Contributor Author

@haampie @tgamblin @becker33 Just trying to get traction on getting this merged. Is there anything blocking merging this?

@BenWibking
Copy link
Contributor Author

@haampie @tgamblin @becker33 Checking in on this...any possibility of getting this merged?

@haampie
Copy link
Member

haampie commented Jan 4, 2022

There's still some issue with Python 2.7, see the failing test.

TypeError: stat() takes no keyword arguments

@spackbot-app spackbot-app bot added the core PR affects Spack core functionality label Aug 5, 2022
@wyphan
Copy link
Contributor

wyphan commented Jan 17, 2023

@spackbot fix style

@spackbot-app
Copy link

spackbot-app bot commented Jan 17, 2023

Let me see if I can fix that for you!

@spackbot-app
Copy link

spackbot-app bot commented Jan 17, 2023

I was able to run spack style --fix for you!

spack style --fix
==> Running style checks on spack
  selected: isort, black, flake8, mypy
==> Modified files
  lib/spack/llnl/util/filesystem.py
==> Running isort checks
  isort checks were clean
==> Running black checks
reformatted lib/spack/llnl/util/filesystem.py
All done! ✨ 🍰 ✨
1 file reformatted.
  black checks were clean
==> Running flake8 checks
  flake8 checks were clean
==> Running mypy checks
Success: no issues found in 555 source files
  mypy checks were clean
==> spack style checks were clean
Keep in mind that I cannot fix your flake8 or mypy errors, so if you have any you'll need to fix them and update the pull request. If I was able to push to your branch, if you make further changes you will need to pull from your updated branch before pushing again.

I've updated the branch with style fixes.

@BenWibking
Copy link
Contributor Author

There's still some issue with Python 2.7, see the failing test.

TypeError: stat() takes no keyword arguments

Since Python 2.7 support has been dropped, is there anything currently blocking this?

@wyphan
Copy link
Contributor

wyphan commented Feb 17, 2023 via email

@BenWibking
Copy link
Contributor Author

I've checked the "Allow edits and access to secrets by maintainers" box. Is there anything else I need to do?

@haampie haampie merged commit e8238fe into spack:develop Feb 20, 2023
koysean pushed a commit to koysean/spack that referenced this pull request Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core PR affects Spack core functionality utilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Installation issue: openjdk
5 participants