Skip to content

CPU autodetect failing due to failing pip install reframe-hpc==4.3.3 #3023

@casparvl

Description

@casparvl

I'm (again) having some issues with CPU autodetect. Full output:

--- /home/casparvl/rfm.hba5v3pz/rfm-detect-job.sh ---
#!/bin/bash
#SBATCH --job-name="rfm-detect-job"
#SBATCH --ntasks=1
#SBATCH --output=rfm-detect-job.out
#SBATCH --error=rfm-detect-job.err
#SBATCH --partition=aarch64-generic-node
#SBATCH --export=NONE

_onerror()
{
    exitcode=$?
    echo "-reframe: command \`$BASH_COMMAND' failed (exit code: $exitcode)"
    exit $exitcode
}

trap _onerror ERR

python3 -m venv venv.reframe
source venv.reframe/bin/activate
pip install reframe-hpc==4.3.3
reframe --detect-host-topology=topo.json
deactivate

--- /home/casparvl/rfm.hba5v3pz/rfm-detect-job.sh ---
job finished
--- /home/casparvl/rfm.hba5v3pz/rfm-detect-job.out ---
Collecting reframe-hpc==4.3.3
  Using cached https://files.pythonhosted.org/packages/bc/cc/99e6cbb183c49edc21c3bb9afa91316797884ff8b6f0fb521fec54ef1869/ReFrame_HPC-4.3.3-py3-none-any.whl
Collecting lxml (from reframe-hpc==4.3.3)
  Using cached https://files.pythonhosted.org/packages/30/39/7305428d1c4f28282a4f5bdbef24e0f905d351f34cf351ceb131f5cddf78/lxml-4.9.3.tar.gz
    Complete output from command python setup.py egg_info:
    Building lxml version 4.9.3.
    Building without Cython.
    Error: Please make sure the libxml2 and libxslt development packages are installed.

    ----------------------------------------
-reframe: command `pip install reframe-hpc==4.3.3' failed (exit code: 1)

--- /home/casparvl/rfm.hba5v3pz/rfm-detect-job.out ---
--- /home/casparvl/rfm.hba5v3pz/rfm-detect-job.err ---
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-6t929r68/lxml/
You are using pip version 9.0.3, however version 23.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

--- /home/casparvl/rfm.hba5v3pz/rfm-detect-job.err ---
WARNING: failed to retrieve remote processor info: [Errno 2] No such file or directory: 'topo.json'
Traceback (most recent call last):
  File "/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/software/ReFrame/4.3.3/lib/python3.11/site-packages/reframe/frontend/autodetect.py", line 173, in _remot
e_detect
    topo_info = json.loads(_contents('topo.json'))
                           ^^^^^^^^^^^^^^^^^^^^^^
  File "/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/software/ReFrame/4.3.3/lib/python3.11/site-packages/reframe/frontend/autodetect.py", line 30, in _conten
ts
    with open(filename) as fp:
         ^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'topo.json'

> device auto-detection is not supported

I'm having this only on some nodes (ARM) in our virtual cluster, probably because the libxml2 and libxslt are not in that image. However, as was pointed out to me by someone else: "you would not need libxml2 in the image if pip was up to date as lxml wheel is available for aarch64 in PyPI"

Interactively trying

python3 -m venv /tmp/reframe-venv
source /tmp/reframe-venv/bin/activate
python3 -m pip install reframe-hpc==4.3.3

indeed failed with the same error, while

python3 -m venv /tmp/reframe-venv
source /tmp/reframe-venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install reframe-hpc==4.3.3

completes just fine.

Now, I'm not sure what the right approach is here. One option would be if you injected a pip install --upgrade pip in the CPU detection script. On the other hand, I can imagine you might be reluctant to do it: it might cause other issues (though I would expect fewer). Another option is to somehow offer more customizeability to the user of what the CPU autodetection script should look like. I've addressed that topic before, although note that the suggested option of some form of prerun_cmds there wouldn't have helped in this case.

Any suggestions? Sure, you could argue "simply install those system packages", but I simply don't always have that kind of power or possibility everywhere.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions