## Launch and set up a VM instance- with python-chi

We will use the `python-chi` Python API to Chameleon to provision our VM server.

We will execute the cells in this notebook inside the Chameleon Jupyter environment.

Run the following cell, and make sure the correct project is selected.

In [1]:
from chi import server, context
import chi, os, time, datetime

context.version = "1.0" 
context.choose_project()
context.choose_site(default="KVM@TACC")

VBox(children=(Dropdown(description='Select Project', options=('CHI-251409',), value='CHI-251409'), Output()))

VBox(children=(Dropdown(description='Select Site', index=7, options=('CHI@TACC', 'CHI@UC', 'CHI@EVL', 'CHI@NCA…

We will use bring up a `m1.medium` flavor server with the `CC-Ubuntu24.04` disk image.

> **Note**: the following cell brings up a server only if you don’t already have one with the same name! (Regardless of its error state.) If you have a server in ERROR state already, delete it first in the Horizon GUI before you run this cell.

In [2]:
username = os.getenv('USER') # all exp resources will have this prefix
s = server.Server(
    f"node-team27", 
    image_name="CC-Ubuntu24.04",
    flavor_name="m1.medium"
)
s.submit(idempotent=True)

Waiting for server node-team27's status to become ACTIVE. This typically takes 10 minutes, but can take up to 20 minutes.


HBox(children=(Label(value=''), IntProgress(value=0, bar_style='success')))

Server has moved to status ACTIVE


Attribute,node-team27
Id,7f5562e8-2a73-4207-b3fb-75d8818e3894
Status,ACTIVE
Image Name,CC-Ubuntu24.04
Flavor Name,m1.medium
Addresses,sharednet1:  IP: 10.56.1.191 (v4)  Type: fixed  MAC: fa:16:3e:68:57:a0
Network Name,sharednet1
Created At,2025-05-08T18:42:19Z
Keypair,trovi-4e85aec
Reservation Id,
Host Id,2e5ef5bcab67d00c12f80400fd31856f7e4b68020e92971f03508c96


Then, we’ll associate a floating IP with the instance:

In [3]:
s.associate_floating_ip()

In [5]:
s.refresh()
s.check_connectivity()

Checking connectivity to 129.114.25.179 port 22.


HBox(children=(Label(value=''), IntProgress(value=0, bar_style='success')))

Connection successful


In the output below, make a note of the floating IP that has been assigned to your instance (in the “Addresses” row).

In [6]:
s.refresh()
s.show(type="widget")

Attribute,node-team27
Id,7f5562e8-2a73-4207-b3fb-75d8818e3894
Status,ACTIVE
Image Name,CC-Ubuntu24.04
Flavor Name,m1.medium
Addresses,sharednet1:  IP: 10.56.1.191 (v4)  Type: fixed  MAC: fa:16:3e:68:57:a0  IP: 129.114.25.179 (v4)  Type: floating  MAC: fa:16:3e:68:57:a0
Network Name,sharednet1
Created At,2025-05-08T18:42:19Z
Keypair,trovi-4e85aec
Reservation Id,
Host Id,2e5ef5bcab67d00c12f80400fd31856f7e4b68020e92971f03508c96


By default, all connections to VM resources are blocked, as a security measure. We need to attach one or more “security groups” to our VM resource, to permit access over the Internet to specified ports.

The following security groups will be created (if they do not already exist in our project) and then added to our server:

In [7]:
security_groups = [
  {'name': "allow-ssh", 'port': 22, 'description': "Enable SSH traffic on TCP port 22"},
  {'name': "allow-5000", 'port': 5000, 'description': "Enable TCP port 5000 (used by Flask)"},
  {'name': "allow-8000", 'port': 8000, 'description': "Enable TCP port 8000 (used by FastAPI)"},
  {'name': "allow-8888", 'port': 8888, 'description': "Enable TCP port 8888 (used by Jupyter)"},
  {'name': "allow-3000", 'port': 3000, 'description': "Enable TCP port 3000 (used by Grafana)"},
  {'name': "allow-9090", 'port': 9090, 'description': "Enable TCP port 9090 (used by Prometheus)"},
  {'name': "allow-8080", 'port': 8080, 'description': "Enable TCP port 8080 (used by cAdvisor, Label Studio)"}
]

In [8]:
# configure openstacksdk for actions unsupported by python-chi
os_conn = chi.clients.connection()
nova_server = chi.nova().servers.get(s.id)

for sg in security_groups:

  if not os_conn.get_security_group(sg['name']):
      os_conn.create_security_group(sg['name'], sg['description'])
      os_conn.create_security_group_rule(sg['name'], port_range_min=sg['port'], port_range_max=sg['port'], protocol='tcp', remote_ip_prefix='0.0.0.0/0')

  nova_server.add_security_group(sg['name'])

print(f"updated security groups: {[group.name for group in nova_server.list_security_group()]}")

updated security groups: ['allow-3000', 'allow-5000', 'allow-8000', 'allow-8080', 'allow-8888', 'allow-9090', 'allow-ssh', 'default']


### Retrieve code and notebooks on the instance

Now, we can use `python-chi` to execute commands on the instance, to set it up. We’ll start by retrieving the code and other materials on the instance.

In [9]:
s.execute("git clone https://github.com/sai-navyanth-p/ResearchPaperSummarizer")

Cloning into 'ResearchPaperSummarizer'...


<Result cmd='git clone https://github.com/sai-navyanth-p/ResearchPaperSummarizer' exited=0>

### Set up Docker

Here, we will set up the container framework.

In [10]:
s.execute("curl -sSL https://get.docker.com/ | sudo sh")
s.execute("sudo groupadd -f docker; sudo usermod -aG docker $USER")

# Executing docker install script, commit: 53a22f61c0628e58e1d6680b49e82993d304b449


+ sh -c apt-get -qq update >/dev/null
+ sh -c DEBIAN_FRONTEND=noninteractive apt-get -y -qq install ca-certificates curl >/dev/null

Running kernel seems to be up-to-date.

Restarting services...
 systemctl restart packagekit.service

No containers need to be restarted.

No user sessions are running outdated binaries.

No VM guests are running outdated hypervisor (qemu) binaries on this host.
+ sh -c install -m 0755 -d /etc/apt/keyrings
+ sh -c curl -fsSL "https://download.docker.com/linux/ubuntu/gpg" -o /etc/apt/keyrings/docker.asc
+ sh -c chmod a+r /etc/apt/keyrings/docker.asc
+ sh -c echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu noble stable" > /etc/apt/sources.list.d/docker.list
+ sh -c apt-get -qq update >/dev/null
+ sh -c DEBIAN_FRONTEND=noninteractive apt-get -y -qq install docker-ce docker-ce-cli containerd.io docker-compose-plugin docker-ce-rootless-extras docker-buildx-plugin >/dev/null

Running kernel seems to be up-to

Client: Docker Engine - Community
 Version:           28.1.1
 API version:       1.49
 Go version:        go1.23.8
 Git commit:        4eba377
 Built:             Fri Apr 18 09:52:14 2025
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          28.1.1
  API version:      1.49 (minimum version 1.24)
  Go version:       go1.23.8
  Git commit:       01f442b
  Built:            Fri Apr 18 09:52:14 2025
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.27
  GitCommit:        05044ec0a9a75232cad458027ca83437aae3f4da
 runc:
  Version:          1.2.5
  GitCommit:        v1.2.5-0-g59923ef
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0


To run Docker as a non-privileged user, consider setting up the
Docker daemon in rootless mode for your user:

    dockerd-rootless-setuptool.sh install

Visit https://docs.docker.com/go/rootless/ to learn about rootless mode.


T

<Result cmd='sudo groupadd -f docker; sudo usermod -aG docker $USER' exited=0>

In [11]:
s.execute("curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list")
s.execute("sudo apt update")
s.execute("sudo apt-get install -y nvidia-container-toolkit")
s.execute("sudo nvidia-ctk runtime configure --runtime=docker")
# for https://github.com/NVIDIA/nvidia-container-toolkit/issues/48
s.execute("sudo jq 'if has(\"exec-opts\") then . else . + {\"exec-opts\": [\"native.cgroupdriver=cgroupfs\"]} end' /etc/docker/daemon.json | sudo tee /etc/docker/daemon.json.tmp > /dev/null && sudo mv /etc/docker/daemon.json.tmp /etc/docker/daemon.json")
s.execute("sudo systemctl restart docker")

deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/deb/$(ARCH) /
#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/experimental/deb/$(ARCH) /






Get:1 https://nvidia.github.io/libnvidia-container/stable/deb/amd64  InRelease [1477 B]
Hit:2 https://download.docker.com/linux/ubuntu noble InRelease
Hit:3 http://security.ubuntu.com/ubuntu noble-security InRelease
Get:4 http://nova.clouds.archive.ubuntu.com/ubuntu noble InRelease [256 kB]
Get:5 https://nvidia.github.io/libnvidia-container/stable/deb/amd64  Packages [18.6 kB]
Hit:6 http://nova.clouds.archive.ubuntu.com/ubuntu noble-updates InRelease
Hit:7 http://nova.clouds.archive.ubuntu.com/ubuntu noble-backports InRelease
Fetched 276 kB in 1s (242 kB/s)
Reading package lists...
Building dependency tree...
Reading state information...
239 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  libnvidia-container-tools libnvidia-container1 nvidia-container-toolkit-base
The following NEW packages will be installed:
  libnvidia-container-t

debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 


Fetched 5844 kB in 0s (34.5 MB/s)
Selecting previously unselected package libnvidia-container1:amd64.
(Reading database ... 89661 files and directories currently installed.)
Preparing to unpack .../libnvidia-container1_1.17.6-1_amd64.deb ...
Unpacking libnvidia-container1:amd64 (1.17.6-1) ...
Selecting previously unselected package libnvidia-container-tools.
Preparing to unpack .../libnvidia-container-tools_1.17.6-1_amd64.deb ...
Unpacking libnvidia-container-tools (1.17.6-1) ...
Selecting previously unselected package nvidia-container-toolkit-base.
Preparing to unpack .../nvidia-container-toolkit-base_1.17.6-1_amd64.deb ...
Unpacking nvidia-container-toolkit-base (1.17.6-1) ...
Selecting previously unselected package nvidia-container-toolkit.
Preparing to unpack .../nvidia-container-toolkit_1.17.6-1_amd64.deb ...
Unpacking nvidia-container-toolkit (1.17.6-1) ...
Setting up nvidia-container-toolkit-base (1.17.6-1) ...
Setting up libnvidia-container1:amd64 (1.17.6-1) ...
Setting up libn

debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype

Running kernel seems to be up-to-date.

No services need to be restarted.

No containers need to be restarted.

No user sessions are running outdated binaries.

No VM guests are running outdated hypervisor (qemu) binaries on this host.
time="2025-05-08T19:40:54Z" level=info msg="Config file does not exist; using empty config"
time="2025-05-08T19:40:54Z" level=info msg="Wrote updated config to /etc/docker/daemon.json"
time="2025-05-08T19:40:54Z" level=info msg="It is recommended that docker daemon be restarted."


<Result cmd='sudo systemctl restart docker' exited=0>

In [12]:
s.execute("sudo apt update")
s.execute("sudo apt -y install nvtop")





Hit:1 https://nvidia.github.io/libnvidia-container/stable/deb/amd64  InRelease
Hit:2 https://download.docker.com/linux/ubuntu noble InRelease
Hit:3 http://security.ubuntu.com/ubuntu noble-security InRelease
Get:4 http://nova.clouds.archive.ubuntu.com/ubuntu noble InRelease [256 kB]
Hit:5 http://nova.clouds.archive.ubuntu.com/ubuntu noble-updates InRelease
Hit:6 http://nova.clouds.archive.ubuntu.com/ubuntu noble-backports InRelease
Fetched 256 kB in 1s (222 kB/s)
Reading package lists...
Building dependency tree...
Reading state information...
239 packages can be upgraded. Run 'apt list --upgradable' to see them.






Reading package lists...
Building dependency tree...
Reading state information...
The following NEW packages will be installed:
  nvtop
0 upgraded, 1 newly installed, 0 to remove and 239 not upgraded.
Need to get 62.8 kB of archives.
After this operation, 180 kB of additional disk space will be used.
Get:1 http://nova.clouds.archive.ubuntu.com/ubuntu noble/multiverse amd64 nvtop amd64 3.0.2-1 [62.8 kB]


debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 


Fetched 62.8 kB in 1s (98.2 kB/s)
Selecting previously unselected package nvtop.
(Reading database ... 89685 files and directories currently installed.)
Preparing to unpack .../nvtop_3.0.2-1_amd64.deb ...
Unpacking nvtop (3.0.2-1) ...
Setting up nvtop (3.0.2-1) ...
Processing triggers for man-db (2.12.0-4build2) ...


debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype

Running kernel seems to be up-to-date.

No services need to be restarted.

No containers need to be restarted.

No user sessions are running outdated binaries.

No VM guests are running outdated hypervisor (qemu) binaries on this host.


<Result cmd='sudo apt -y install nvtop' exited=0>

In [14]:
s.execute("docker build -t jupyter-mlflow -f ResearchPaperSummarizer/docker/Dockerfile.jupyter-torch-mlflow-cuda .")

ERROR: resolve : lstat ResearchPaperSummarizer/docker: no such file or directory


UnexpectedExit: Encountered a bad command exit code!

Command: 'docker build -t jupyter-mlflow -f ResearchPaperSummarizer/docker/Dockerfile.jupyter-torch-mlflow-cuda .'

Exit code: 1

Stdout: already printed

Stderr: already printed



## Open an SSH session

Finally, open an SSH sesson on your server. From your local terminal, run

    ssh -i ~/.ssh/id_rsa_chameleon cc@A.B.C.D

where

-   in place of `~/.ssh/id_rsa_chameleon`, substitute the path to your own key that you had uploaded to KVM@TACC
-   in place of `A.B.C.D`, use the floating IP address you just associated to your instance.