# AIMRC Data Science Core Workshop on Using AHPCC

Before running this notebook, make sure you have the right python packages/modules installed in the environment you are using.

Link to Google Drive Data folder. Please download the data and save it somewhere on your AHPCC drive or on your local computer (if you don't have access to AHPCC).  
> https://drive.google.com/drive/folders/1_IlfCe9hak_ggr7v2aydl-Zg21vIep3K?usp=sharing

## Exercise 1: Here we go!

In this exercise, you learned how to log into the Pinnacle Desktop and how to navigate the system. This exercise was done outside of the Jupyter notebook environment, of course.

## Exercise 2: Basic Shell Commands

In this exercise, you learned about logging into the Pinnacle server using SSH and running basic shell commands. This exercise was done outside of the Jupyter notebook environment.

## Exercise 3: What have I gotten myself into?

### Get basic information about the hardware

In [1]:
# we import psutil that allows us to interact with the system processes
import psutil

# print the number of CPU cores available and the percent usage
print("Number of CPU cores:", psutil.cpu_count())
print("CPU usage:", psutil.cpu_percent(interval=1), "%")

# print the total and used memory on the system
memory = psutil.virtual_memory()
print("Total memory:", memory.total / (1024 * 1024 * 1024), "GB")
print("Used memory:", memory.used / (1024 * 1024 * 1024), "GB")

# print the amount of GPU memory on the system
gpu_memory = psutil.virtual_memory()
print("Total GPU memory:", gpu_memory.total / (1024 * 1024 * 1024), "GB")
print("Used GPU memory:", gpu_memory.used / (1024 * 1024 * 1024), "GB")

# print the storage information on the system
disk_partitions = psutil.disk_partitions()
for d_p in disk_partitions:
    print("Device: " + d_p.device + "; "
          "Mountpoint: " + d_p.mountpoint + "; "
          "File system type: " + d_p.fstype)
disk_usage = psutil.disk_usage('/')
print("Total disk space:", disk_usage.total / (1024 * 1024 * 1024), "GB")
print("Used disk space:", disk_usage.used / (1024 * 1024 * 1024), "GB")

Number of CPU cores: 32
CPU usage: 0.0 %
Total memory: 187.20373153686523 GB
Used memory: 5.00103759765625 GB
Total GPU memory: 187.20373153686523 GB
Used GPU memory: 5.00103759765625 GB
Device: /dev/mapper/system-root; Mountpoint: /; File system type: xfs
Device: /dev/sda1; Mountpoint: /boot; File system type: xfs
Total disk space: 218.95944213867188 GB
Used disk space: 34.81142044067383 GB


We can get detailed information about the Nvidia GPU by using the nvidia-smi utility.  
Note that you can use shell commands in Jupyter Notebooks by prefixing the command with an exclamation mark (!).

In [2]:
!nvidia-smi

Tue Jul 23 14:23:28 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla V100-PCIE-32GB           On  |   00000000:3B:00.0 Off |                    0 |
| N/A   32C    P0             25W /  250W |       1MiB /  32768MiB |      0%   E. Process |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                     

### Get information about the packages installed

In [3]:
# print the list of packages installed in the current environment
!conda list

# packages in environment at /share/apps/python/anaconda-3.14:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
anyio                     4.2.0           py311h06a4308_0  
archspec                  0.2.3              pyhd3eb1b0_0  
argon2-cffi               21.3.0             pyhd3eb1b0_0  
argon2-cffi-bindings      21.2.0          py311h5eee18b_0  
asttokens                 2.0.5              pyhd3eb1b0_0  
async-lru                 2.0.4           py311h06a4308_0  
attrs                     23.1.0          py311h06a4308_0  
babel                     2.11.0          py311h06a4308_0  
beautifulsoup4            4.12.2          py311h06a4308_0  
bleach                    4.1.0              pyhd3eb1b0_0  
boltons                   23.0.0          py311h06a4308_0  
brotli-python             1.0.9           py31

## Exercise 4: Running a simple Python script in Jupyter Notebook

In this exercise, we will run a simple Python program in Jupyter Notebook. We write a program to crop images and save the cropped images in a new folder.

In [5]:
!pip install pillow

Defaulting to user installation because normal site-packages is not writeable
Collecting pillow
  Downloading pillow-10.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.2 kB)
Downloading pillow-10.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.5/4.5 MB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: pillow
Successfully installed pillow-10.4.0


In [4]:
from pathlib import Path
import PIL.Image as Image

DATA_FOLDER = '/home/prateek/Downloads/NFFA images sampled'
TARGET_FOLDER = '/home/prateek/Downloads/NFFA images sampled cropped'

# Create target folder if it does not exist
Path(TARGET_FOLDER).mkdir(parents=True, exist_ok=True)

In [5]:
# get the list of all files in the source folder
files = list(Path(DATA_FOLDER).rglob('*.jpg'))
print(len(files))

# get the dimension of the first image
img = Image.open(files[0])
print(img.size)

250
(1024, 768)


In [6]:
# define the cropping function
def crop_image(file, data_folder, target_folder):
    img = Image.open(file)
    img = img.crop((0, 0, 1024, 600))
    # save the cropped image in the target folder matching the subfolder structure
    save_to_path = str(file).replace(data_folder, target_folder)
    # make sure the subfolder exists
    Path(save_to_path).parent.mkdir(parents=True, exist_ok=True)
    img.save(save_to_path)
    print('saved ' + save_to_path)

In [7]:
# crop all images
for file in files:
    crop_image(file, DATA_FOLDER, TARGET_FOLDER)

saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/ee3dd72b977b5f4b32f3d25410831662.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/76790e49cb19cde496a02ac9f4d69bc9.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/cf9f11694caad1e286bedf582bb45456.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/7a872105aa1ac4c258cfaccfa006927d.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/35361fc57920d961451d3eb6bcae119e.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/371464b90891905ec859741039765370.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/a7c79a822fc5f72396daa9f523426b81.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/cc59dc40bfb2954475b5f65aff32305a.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/3ac8acf63c790f286602439f0039cc15.jpg
saved /home/prateek

## Exercise 5: Running jobs on the Pinnacle cluster

In this exercise, you learned about logging into the Pinnacle server using (1) the OOD and creating and running simple jobs, and (2) SSH and creating and running simple jobs. This exercise was done outside of the Jupyter notebook environment.

## Exercise 6: Fiji

In this exercise, you learned about logging into the Pinnacle Desktop, browsing the internet, downloading and installing Fiji and using Fiji to open and process an image file.

## Exercise 7: A machine learning experiment

In this exercise, we write a simple machine learning program to classify the images you downloaded from the Google Drive folder. We will train a simple CNN model to classify the images.