# Setup Container

On local machine

Running on local machine, Torchfort is installed as instructed in the manual: <https://nvidia.github.io/TorchFort/installation.html>. This creates a docker container "torchfort".  From the manual:

"We provide a Dockerfile which contains all relevant dependencies and builds using the NVIDIA HPC SDK software libraries and compilers, which is our **recommended way** to build TorchFort. In order to build TorchFort using Docker, simply clone the repo and call:

    docker build -t torchfort:latest -f docker/Dockerfile .

from the top level directory of the repo. Inside the container, TorchFort will be installed in `/opt/torchfort` "

Then, just for testing, conda and jupyterlab are added to the container, user x is created, and a commit is made to build the working image.

In [1]:
! sudo docker images

REPOSITORY    TAG                       IMAGE ID       CREATED         SIZE
torchfort     v4                        80e7a7a5bfcd   11 days ago     32GB
torchfort     v3                        588b0fba0bad   3 weeks ago     32GB
torchfort     v2                        5ef57bf9de13   3 weeks ago     30.3GB
torchfort     latest                    f4a1749ea5e5   4 weeks ago     28.9GB
nvidia/cuda   12.3.1-base-ubuntu22.04   bcdbb14063fa   17 months ago   243MB


In [3]:
! sudo docker ps

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


In [4]:
! sudo docker ps -a

CONTAINER ID   IMAGE          COMMAND             CREATED       STATUS                     PORTS                                                                                  NAMES
e8006476e793   torchfort:v3   "/bin/sh -c bash"   3 weeks ago   Exited (255) 2 weeks ago   0.0.0.0:8895->8895/tcp, [::]:8895->8895/tcp, 0.0.0.0:2222->22/tcp, [::]:2222->22/tcp   t01


In [6]:
! sudo docker run -d -it --gpus=all\
    -v $HOME:$HOME -w $HOME \
    --name t02 \
    torchfort:latest

affc598bd3c3c2f9214cecb5b096e1aee47dcc711f496549089b20cf5a735aec


Other options

In [None]:
    --privileged \               # root in a container and on the host system
    -p 8895:8895 \               # jupyter server port
    -p 2222:22 \                 # ssh port
    -v /home/x:/mnt/x \          # access to the host home dir
    -v /torchfort:/torchfort \   # alternative host dir
    --workdir /home/x \
    --user 1000:1000 \

In [2]:
! sudo docker ps -a

CONTAINER ID   IMAGE          COMMAND             CREATED       STATUS                      PORTS                                                                                  NAMES
e8006476e793   torchfort:v3   "/bin/sh -c bash"   10 days ago   Exited (255) 18 hours ago   0.0.0.0:8895->8895/tcp, [::]:8895->8895/tcp, 0.0.0.0:2222->22/tcp, [::]:2222->22/tcp   t01


Starts the container in the background:

In [4]:
! sudo docker start t01

t01


The image was created with:

In [19]:
%%bash
sudo docker run -d -it --gpus=all\
    --privileged \               # root in a container and on the host system
    -p 8895:8895 \               # jupyter server port
    -p 2222:22 \                 # ssh port
    -v /home/x:/mnt/x \          # access to the host home dir
    -v /torchfort:/torchfort \   # alternative host dir
    --workdir /home/x \
    --user 1000:1000 \
    --name t01 \
    torchfort:v3

e8006476e7930589db3484eed3d1b9b5da4d7352f212ea5917ff4a405feb97dc


Check if it is running in the background:

In [5]:
! sudo docker ps

CONTAINER ID   IMAGE          COMMAND             CREATED       STATUS         PORTS                                                                                  NAMES
e8006476e793   torchfort:v3   "/bin/sh -c bash"   10 days ago   Up 7 seconds   0.0.0.0:8895->8895/tcp, [::]:8895->8895/tcp, 0.0.0.0:2222->22/tcp, [::]:2222->22/tcp   t01


Start jupyterlab inside the container:

In [25]:
! sudo docker exec t01 bash /home/x/startjupyterlab

The jupyterlab server can then be accessed using `localhost:8895` in the browser.

## Create new image from existing container

In [1]:
! sudo docker commit t01 torchfort:v4

sha256:80e7a7a5bfcd551394890049564bbbb238553574611f5702fb4f8e4ddae1a3ce


In [2]:
! sudo docker images

REPOSITORY    TAG                       IMAGE ID       CREATED          SIZE
torchfort     v4                        80e7a7a5bfcd   13 seconds ago   32GB
torchfort     v3                        588b0fba0bad   13 days ago      32GB
torchfort     v2                        5ef57bf9de13   2 weeks ago      30.3GB
torchfort     latest                    f4a1749ea5e5   2 weeks ago      28.9GB
nvidia/cuda   12.3.1-base-ubuntu22.04   bcdbb14063fa   17 months ago    243MB


## Convert to Singularity container

In [3]:
! singularity --version

singularity-ce version 4.3.1-jammy


Creates a Singularity container (.sif) from an image that is already loaded into the Docker daemon:

In [15]:
! sudo singularity build torchfort_v4.sif docker-daemon://torchfort:v4

[34mINFO:   [0m Starting build...
[34mINFO:   [0m Fetching OCI image...
[34mINFO:   [0m Extracting OCI image...
[34mINFO:   [0m Inserting Singularity configuration...
[34mINFO:   [0m Creating SIF file...
[34mINFO:   [0m Build complete: torchfort_v4.sif


In [17]:
! ls -lh torchfort_v4.sif

-rwxr-xr-x 1 x x 13G mai  4 21:37 torchfort_v4.sif


In [7]:
! sudo singularity build torchfort_v1.sif docker-daemon://torchfort:latest

[34mINFO:   [0m Starting build...
[34mINFO:   [0m Fetching OCI image...
[34mINFO:   [0m Extracting OCI image...
[34mINFO:   [0m Inserting Singularity configuration...
[34mINFO:   [0m Creating SIF file...
[34mINFO:   [0m Build complete: torchfort_v1.sif


In [19]:
! ls /prj

aux  conda  radnn


In [20]:
! mkdir /prj/containers

Copies to the SyncThing directory synchronized with SDumont. The image should appear in SDumont after the sync is complete.

In [21]:
! cp torchfort_v4.sif /prj/containers/

In [22]:
! ls -lh /prj/containers/torchfort_v4.sif

-rwxr-xr-x 1 x x 13G mai  4 22:52 /prj/containers/torchfort_v4.sif


In [2]:
%cd ~/containers

/home/x/containers


In [3]:
! sudo time singularity build torchfort.sif docker-daemon://torchfort:latest

[34mINFO:   [0m Starting build...
[34mINFO:   [0m Fetching OCI image...
[34mINFO:   [0m Extracting OCI image...
[34mINFO:   [0m Inserting Singularity configuration...
[34mINFO:   [0m Creating SIF file...
[34mINFO:   [0m Build complete: torchfort.sif
