# AIMRC Data Science Core Workshop on Using AHPCC

Before running this notebook, make sure you have the right python packages/modules installed in the environment you are using.

Link to Google Drive Data folder. Please download the data and save it somewhere on your AHPCC drive or on your local computer (if you don't have access to AHPCC).  
> https://drive.google.com/drive/folders/1_IlfCe9hak_ggr7v2aydl-Zg21vIep3K?usp=sharing

## Exercise 1: Here we go!

In this exercise, you learned how to log into the Pinnacle Desktop and how to navigate the system. This exercise was done outside of the Jupyter notebook environment, of course.

## Exercise 2: Basic Shell Commands

In this exercise, you learned about logging into the Pinnacle server using SSH and running basic shell commands. This exercise was done outside of the Jupyter notebook environment.

## Exercise 3: What have I gotten myself into?

### Get basic information about the hardware

In [None]:
# we import psutil that allows us to interact with the system processes
import psutil

# print the number of CPU cores available and the percent usage
print("Number of CPU cores:", psutil.cpu_count())
print("CPU usage:", psutil.cpu_percent(interval=1), "%")

# print the total and used memory on the system
memory = psutil.virtual_memory()
print("Total memory:", memory.total / (1024 * 1024 * 1024), "GB")
print("Used memory:", memory.used / (1024 * 1024 * 1024), "GB")

# print the amount of GPU memory on the system
gpu_memory = psutil.virtual_memory()
print("Total GPU memory:", gpu_memory.total / (1024 * 1024 * 1024), "GB")
print("Used GPU memory:", gpu_memory.used / (1024 * 1024 * 1024), "GB")

# print the storage information on the system
disk_partitions = psutil.disk_partitions()
for d_p in disk_partitions:
    print("Device: " + d_p.device + "; "
          "Mountpoint: " + d_p.mountpoint + "; "
          "File system type: " + d_p.fstype)
disk_usage = psutil.disk_usage('/')
print("Total disk space:", disk_usage.total / (1024 * 1024 * 1024), "GB")
print("Used disk space:", disk_usage.used / (1024 * 1024 * 1024), "GB")

We can get detailed information about the Nvidia GPU by using the nvidia-smi utility.  
Note that you can use shell commands in Jupyter Notebooks by prefixing the command with an exclamation mark (!).

In [None]:
!nvidia-smi

### Get information about the packages installed

In [None]:
# print the list of packages installed in the current environment
!conda list

## Exercise 4: Running a simple Python script in Jupyter Notebook

In this exercise, we will run a simple Python program in Jupyter Notebook. We write a program to crop images and save the cropped images in a new folder.

In [None]:
from pathlib import Path
import PIL.Image as Image

DATA_FOLDER = 'PATH/TO/DATA'
TARGET_FOLDER = 'PATH/TO/TARGET'

# Create target folder if it does not exist
Path(TARGET_FOLDER).mkdir(parents=True, exist_ok=True)

In [None]:
# get the list of all files in the source folder
files = list(Path(DATA_FOLDER).rglob('*.jpg'))
print(len(files))

# get the dimension of the first image
img = Image.open(files[0])
print(img.size)

In [None]:
# define the cropping function
def crop_image(file, data_folder, target_folder):
    img = Image.open(file)
    img = img.crop((0, 0, 1024, 600))
    # save the cropped image in the target folder matching the subfolder structure
    save_to_path = str(file).replace(data_folder, target_folder)
    # make sure the subfolder exists
    Path(save_to_path).parent.mkdir(parents=True, exist_ok=True)
    img.save(save_to_path)
    print('saved ' + save_to_path)

In [None]:
# crop all images
for file in files:
    crop_image(file, DATA_FOLDER, TARGET_FOLDER)

## Exercise 5: Running jobs on the Pinnacle cluster

In this exercise, you learned about logging into the Pinnacle server using (1) the OOD and creating and running simple jobs, and (2) SSH and creating and running simple jobs. This exercise was done outside of the Jupyter notebook environment.

## Exercise 6: Fiji

In this exercise, you learned about logging into the Pinnacle Desktop, browsing the internet, downloading and installing Fiji and using Fiji to open and process an image file.

## Exercise 7: A machine learning experiment

In this exercise, we write a simple machine learning program to classify the images you downloaded from the Google Drive folder. We will train a simple CNN model to classify the images.