From e35e17a3c8cedd5861a59cf05110bfcd20bd1685 Mon Sep 17 00:00:00 2001 From: Enrique Gonzalez Paredes Date: Mon, 17 Mar 2025 12:40:45 +0100 Subject: [PATCH 1/4] Update venv-squasfs documentation with support for relocatable venvs with uv --- docs/guides/storage.md | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/docs/guides/storage.md b/docs/guides/storage.md index a0d0a5cc..062a45d7 100644 --- a/docs/guides/storage.md +++ b/docs/guides/storage.md @@ -33,7 +33,6 @@ This file can be mounted as a read-only [Squashfs](https://en.wikipedia.org/wiki #### Step 1: create the virtual environment The first step is to create the virtual environment using the usual workflow. -This might be slow, because we are not optimising this stage for file system performance. ```bash # for the example create a working path on SCRATCH @@ -67,6 +66,24 @@ pip install torch torchvision torchaudio \ In our "simple" pytorch example, I counted **22806 inodes**! +##### Alternative virtual environment creation using uv + +The installation process described above is not optimized for file system performance and will still be slow on Lustre filesystems. An alternative way to create the virtual environment is to use the [uv](https://docs.astral.sh/uv/) tool, which supports _relocatable_ virtual environments and asynchronous package downloads for better installation times. This way, the installation process is much shorter and the resulting squashfs image can be shared across projects, as the virtual environment can be safely used from any location. + +```bash +# activate the uenv as before +uenv start prgenv-gnu/24.11:v1 --view=default + +# create and activate a new relocatable venv using uv +uv venv --relocatable --link-mode=copy /dev/shm/sqfs-demo/.venv +cd /dev/shm/sqfs-demo +source .venv/bin/activate + +# install software in the virtual environment using uv +uv pip install --link-mode=copy torch torchvision torchaudio \ + --index-url https://download.pytorch.org/whl/cu126 +``` + #### Step 2: make a squashfs image of the virtual environment The next step is to create a single squashfs file that contains the whole `$SCRATCH/sqfs-demo/.pyenv` path. @@ -117,7 +134,7 @@ Note that the original virtual environment is still installed in `$SCRATCH/sqfs- A benefit of this approach is that the squashfs file can be copied to a location that is not subject to the Scratch cleaning policy. !!! warning - Virtual environment are usually not relocatable as they contain symlinks to absolute locations inside the virtual environment. Therefore, you need to mount the image in the exact same location where you created the virtual environment. + Virtual environments are not relocatable by default as they contain symlinks to absolute locations inside the virtual environment. This means that the squashfs file must be mounted in the exact same location where the virtual environment was created, unless it contains a virtual environment specifically created using a tool with support for relocatable virtual environments (e.g. `uv venv --relocatable` as mentioned in step 1), in which case it can be mounted in any location. #### Step 4: (optional) regenerate the virtual environment From f52aab26b99247da31ba3f50df6cb447bdb35b53 Mon Sep 17 00:00:00 2001 From: Enrique Gonzalez Paredes Date: Mon, 17 Mar 2025 13:11:13 +0100 Subject: [PATCH 2/4] Enhance explanations and add bytecode precompilation step --- docs/guides/storage.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/guides/storage.md b/docs/guides/storage.md index 062a45d7..c28e95e0 100644 --- a/docs/guides/storage.md +++ b/docs/guides/storage.md @@ -68,20 +68,24 @@ pip install torch torchvision torchaudio \ ##### Alternative virtual environment creation using uv -The installation process described above is not optimized for file system performance and will still be slow on Lustre filesystems. An alternative way to create the virtual environment is to use the [uv](https://docs.astral.sh/uv/) tool, which supports _relocatable_ virtual environments and asynchronous package downloads for better installation times. This way, the installation process is much shorter and the resulting squashfs image can be shared across projects, as the virtual environment can be safely used from any location. +The installation process described above is not optimized for file system performance and will still be slow on Lustre filesystems. An alternative way to create the virtual environment is to use the [uv](https://docs.astral.sh/uv/) tool, which supports _relocatable_ virtual environments and asynchronous package downloads. The main benefit of a relocatable virtual environment is that it does not need to be created in the final path from where it will be used. This allows the use of shared memory to speed up the creation and initialization of the virtual environment and, since the virtual environment can be used from any location, the resulting squashfs image can be safely shared across projects. ```bash # activate the uenv as before uenv start prgenv-gnu/24.11:v1 --view=default # create and activate a new relocatable venv using uv -uv venv --relocatable --link-mode=copy /dev/shm/sqfs-demo/.venv +# in this case we explicitly select python 3.12 +uv venv -p 3.12 --relocatable --link-mode=copy /dev/shm/sqfs-demo/.venv cd /dev/shm/sqfs-demo source .venv/bin/activate # install software in the virtual environment using uv uv pip install --link-mode=copy torch torchvision torchaudio \ --index-url https://download.pytorch.org/whl/cu126 +# optionally, to reduce the import times, precompile all +# python modules to bytecode before creating the squashfs image +python -m compileall .venv/lib/python3.12/site-packages ``` #### Step 2: make a squashfs image of the virtual environment @@ -145,4 +149,3 @@ If you need to modify the virtual environment, run the original uenv without the !!! hint If you save the updated copy in a different file, you can now "roll back" to the old version of the environment by mounting the old image. - From 3526b0220ea4d76461fb1722804a3551dfa121fd Mon Sep 17 00:00:00 2001 From: Enrique Gonzalez Paredes Date: Mon, 17 Mar 2025 14:37:13 +0100 Subject: [PATCH 3/4] Compile to bytecode in parallel for multiple optimization levels --- docs/guides/storage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guides/storage.md b/docs/guides/storage.md index c28e95e0..646a0c23 100644 --- a/docs/guides/storage.md +++ b/docs/guides/storage.md @@ -85,7 +85,7 @@ uv pip install --link-mode=copy torch torchvision torchaudio \ --index-url https://download.pytorch.org/whl/cu126 # optionally, to reduce the import times, precompile all # python modules to bytecode before creating the squashfs image -python -m compileall .venv/lib/python3.12/site-packages +python -m compileall -j 8 -o 1 -o 2 .venv/lib/python3.12/site-packages ``` #### Step 2: make a squashfs image of the virtual environment From 0016de939b924d0b8b3726b2c69f5f9366d9264e Mon Sep 17 00:00:00 2001 From: Enrique Gonzalez Paredes Date: Mon, 17 Mar 2025 17:06:12 +0100 Subject: [PATCH 4/4] Use tabs for the uv/venv installation process --- docs/guides/storage.md | 145 +++++++++++++++++++++++++---------------- 1 file changed, 88 insertions(+), 57 deletions(-) diff --git a/docs/guides/storage.md b/docs/guides/storage.md index 646a0c23..8eade8c1 100644 --- a/docs/guides/storage.md +++ b/docs/guides/storage.md @@ -34,25 +34,55 @@ This file can be mounted as a read-only [Squashfs](https://en.wikipedia.org/wiki The first step is to create the virtual environment using the usual workflow. -```bash -# for the example create a working path on SCRATCH -mkdir $SCRATCH/sqfs-demo -cd $SCRATCH/sqfs-demo - -# start the uenv -# in this case the "default" view of prgenv-gnu provides python, cray-mpich, -# and other useful tools -uenv start prgenv-gnu/24.11:v1 --view=default - -# create and activate the empty venv -python -m venv ./.pyenv -source ./.pyenv/bin/activate - -# install software in the virtual environment -# in this case we install install pytorch -pip install torch torchvision torchaudio \ - --index-url https://download.pytorch.org/whl/cu126 -``` +=== "uv" + + The recommended way to create a new virtual environment is to use the [uv](https://docs.astral.sh/uv/) tool, which supports _relocatable_ virtual environments and asynchronous package downloads. The main benefit of a relocatable virtual environment is that it does not need to be created in the final path from where it will be used. This allows the use of shared memory to speed up the creation and initialization of the virtual environment and, since the virtual environment can be used from any location, the resulting squashfs image can be safely shared across projects. + + ```bash + # start the uenv + # in this case the "default" view of prgenv-gnu provides python, cray-mpich, + # and other useful tools + uenv start prgenv-gnu/24.11:v1 --view=default + + # create and activate a new relocatable venv using uv + # in this case we explicitly select python 3.12 + uv venv -p 3.12 --relocatable --link-mode=copy /dev/shm/sqfs-demo/.venv + cd /dev/shm/sqfs-demo + source .venv/bin/activate + + # install software in the virtual environment using uv + # in this case we install install pytorch + uv pip install --link-mode=copy torch torchvision torchaudio \ + --index-url https://download.pytorch.org/whl/cu126 + + # optionally, to reduce the import times, precompile all + # python modules to bytecode before creating the squashfs image + python -m compileall -j 8 -o 1 -o 2 .venv/lib/python3.12/site-packages + ``` + +=== "venv" + + A new virtual environment can also be created using the standard `venv` module. However, virtual environments created by `venv` are not relocatable, and thus they need to be created and initialized in the path from where they will be used. This implies that the installation process can not be optimized for file system performance and will still be slow on Lustre filesystems. + + ```bash + # start the uenv + # in this case the "default" view of prgenv-gnu provides python, cray-mpich, + # and other useful tools + uenv start prgenv-gnu/24.11:v1 --view=default + + # for the example create a working path on SCRATCH + mkdir $SCRATCH/sqfs-demo + cd $SCRATCH/sqfs-demo + + # create and activate the empty venv + python -m venv ./.venv + source ./.venv/bin/activate + + # install software in the virtual environment + # in this case we install install pytorch + pip install torch torchvision torchaudio \ + --index-url https://download.pytorch.org/whl/cu126 + ``` ??? example "how many files did that create?" An inode is created for every file, directory and symlink on a file system. @@ -60,44 +90,32 @@ pip install torch torchvision torchaudio \ The following command can be used to count the number of inodes: ``` - find $SCRATCH/sqfs-demo/.pyenv -exec stat --format="%i" {} + | sort -u | wc -l + find $SCRATCH/sqfs-demo/.venv -exec stat --format="%i" {} + | sort -u | wc -l ``` `find` is used to list every path and file, and `stat` is called on each of these to get the inode, and then `sort` and `wc` are used to count the number of unique inodes. In our "simple" pytorch example, I counted **22806 inodes**! -##### Alternative virtual environment creation using uv -The installation process described above is not optimized for file system performance and will still be slow on Lustre filesystems. An alternative way to create the virtual environment is to use the [uv](https://docs.astral.sh/uv/) tool, which supports _relocatable_ virtual environments and asynchronous package downloads. The main benefit of a relocatable virtual environment is that it does not need to be created in the final path from where it will be used. This allows the use of shared memory to speed up the creation and initialization of the virtual environment and, since the virtual environment can be used from any location, the resulting squashfs image can be safely shared across projects. +#### Step 2: make a squashfs image of the virtual environment -```bash -# activate the uenv as before -uenv start prgenv-gnu/24.11:v1 --view=default +The next step is to create a single squashfs file that contains the whole virtual environment folder (i.e. `/dev/shm/sqfs-demo/.venv` or `$SCRATCH/sqfs-demo/.venv`). -# create and activate a new relocatable venv using uv -# in this case we explicitly select python 3.12 -uv venv -p 3.12 --relocatable --link-mode=copy /dev/shm/sqfs-demo/.venv -cd /dev/shm/sqfs-demo -source .venv/bin/activate +This is performed using the `mksquashfs` command, that is installed on all Alps clusters. -# install software in the virtual environment using uv -uv pip install --link-mode=copy torch torchvision torchaudio \ - --index-url https://download.pytorch.org/whl/cu126 -# optionally, to reduce the import times, precompile all -# python modules to bytecode before creating the squashfs image -python -m compileall -j 8 -o 1 -o 2 .venv/lib/python3.12/site-packages -``` +=== "uv" -#### Step 2: make a squashfs image of the virtual environment - -The next step is to create a single squashfs file that contains the whole `$SCRATCH/sqfs-demo/.pyenv` path. + ```bash + mksquashfs /dev/shm/sqfs-demo/.venv py_venv.squashfs \ + -no-recovery -noappend -Xcompression-level 3 + ``` -This is performed using the `mksquashfs` command, that is installed on all Alps clusters. +=== "venv" -```bash -mksquashfs $SCRATCH/sqfs-demo/.pyenv pyenv.squashfs \ - -no-recovery -noappend -Xcompression-level 3 -``` + ```bash + mksquashfs $SCRATCH/sqfs-demo/.venv py_venv.squashfs \ + -no-recovery -noappend -Xcompression-level 3 + ``` !!! hint The `-Xcompression-level` flag sets the compression level to a value between 1 and 9, with 9 being the most compressed. @@ -126,26 +144,39 @@ mksquashfs $SCRATCH/sqfs-demo/.pyenv pyenv.squashfs \ To use the optimised virtual environment, mount the squashfs image at the location of the original virtual environment when starting the uenv. -```bash -cd $SCRATCH/sqfs-demo -uenv start --view=default \ - prgenv-gnu/24.11:v1,$PWD/pyenv.squashfs:$SCRATCH/sqfs-demo/.pyenv -source .pyenv/bin/activate -``` +=== "uv" + + ```bash + cd $SCRATCH/sqfs-demo + uenv start --view=default \ + prgenv-gnu/24.11:v1,$PWD/py_venv.squashfs:$SCRATCH/sqfs-demo/.venv + source .venv/bin/activate + ``` + + Remember that virtual environments created by `uv` are relocatable only if the `--relocatable` option flag is passed to the `uv venv` command as mentioned in step 1. In that case, the generated environment is relocatable and thus it is possible to mount it in multiple locations without problems. + +=== "venv" + + ```bash + cd $SCRATCH/sqfs-demo + uenv start --view=default \ + prgenv-gnu/24.11:v1,$PWD/py_venv.squashfs:$SCRATCH/sqfs-demo/.venv + source .venv/bin/activate + ``` -Note that the original virtual environment is still installed in `$SCRATCH/sqfs-demo/.pyenv`, however the squashfs image has been mounted on top of it, so the single squashfs file is being accessed instead of the many files in the original version. + Note that the original virtual environment is still installed in `$SCRATCH/sqfs-demo/.venv`, however the squashfs image has been mounted on top of it, so the single squashfs file is being accessed instead of the many files in the original version. -A benefit of this approach is that the squashfs file can be copied to a location that is not subject to the Scratch cleaning policy. + A benefit of this approach is that the squashfs file can be copied to a location that is not subject to the Scratch cleaning policy. -!!! warning - Virtual environments are not relocatable by default as they contain symlinks to absolute locations inside the virtual environment. This means that the squashfs file must be mounted in the exact same location where the virtual environment was created, unless it contains a virtual environment specifically created using a tool with support for relocatable virtual environments (e.g. `uv venv --relocatable` as mentioned in step 1), in which case it can be mounted in any location. + !!! warning + Virtual environments created by `venv` are not relocatable as they contain symlinks to absolute locations inside the virtual environment. This means that the squashfs file must be mounted in the exact same location where the virtual environment was created. #### Step 4: (optional) regenerate the virtual environment -The squashfs file is immutable - it is not possible to modify the contents of `.pyenv` while it is mounted. +The squashfs file is immutable - it is not possible to modify the contents of `.venv` while it is mounted. This means that it is not possible to `pip install` more packages in the virtual environment. -If you need to modify the virtual environment, run the original uenv without the squashfs file mounted, make changes, and run step 2 again to generate a new image. +If you need to modify the virtual environment, run the original uenv without the squashfs file mounted, make changes to the virtual environment, and run step 2 again to generate a new image. !!! hint If you save the updated copy in a different file, you can now "roll back" to the old version of the environment by mounting the old image.