# AIMRC Data Science Core Workshop on Using AHPCC

- Link to [Google Drive Data folder](https://drive.google.com/drive/folders/1_IlfCe9hak_ggr7v2aydl-Zg21vIep3K?usp=sharing). Please download the `workshop-hpc-data` folder.
- Link to [Github Repository](https://github.com/pv-is-nrt/aimrc-data-science-core). Please download the repository by going to Code (green button) > Download ZIP.
- Link to [download PuTTY Installer](https://the.earth.li/~sgtatham/putty/latest/w64/putty-64bit-0.81-installer.msi) for Windows
- Link to [download Fiji](https://downloads.imagej.net/fiji/latest/fiji-linux64.zip) for Linux

## Exercise 1: Here we go!

In this exercise, you learned how to log into the Pinnacle Desktop and how to navigate the system. This exercise was done outside of the Jupyter notebook environment, of course.

Make sure you
1. Logged in to Pinnacle in your browser.
2. Opened a Pinnacle Desktop session within your browser.
3. Accessed Firefox browser within the Pinnacle Desktop session.
4. Downloaded the `workshop-hpc-data` folder from the Google Drive link above.
5. Downloaded the Github repository from the link above.
6. Extracted both downloaded zip files to your home/downloads directory. Right click in the downloaded zip file and click on Extract Here.
7. You are free to download/extract your files to any directory you prefer. But if you followed the instructions above, you Home/Downloads directory should look like this:
    
    ```bash
    /home/username/Downloads/
        ├── aimrc-data-science-core-main
        ├── workshop-hpc-data
        ├── aimrc-data-science-core-main.zip
        └── workshop-hpc-data-20240724T162726Z-001.zip
    ```

## Exercise 2: Basic Shell Commands

In this exercise, you learned about logging into the Pinnacle server using SSH and running basic shell commands. You downloaded and installed PuTTY (link above) for this. 

Type the following in PuTTY Configuration window and click 'Open':  
Host Name or IP address:  
*username@hpc-portal2.hpc.uark.edu*

Practice running the following commands
```bash
    - env
    - who
    - whoami
    - hostname
    - date
    - df
    - cd /home/username/Downloads
    - ls -al
```

This exercise was done outside of the Jupyter notebook environment.

## Exercise 3: What have I gotten myself into?

### Get basic information about the hardware

In [1]:
# we import psutil that allows us to interact with the system processes
import psutil

# print the number of CPU cores available and the percent usage
print("Number of CPU cores:", psutil.cpu_count())
print("CPU usage:", psutil.cpu_percent(interval=1), "%")

# print the total and used memory on the system
memory = psutil.virtual_memory()
print("Total memory:", memory.total / (1024 * 1024 * 1024), "GB")
print("Used memory:", memory.used / (1024 * 1024 * 1024), "GB")

# print the amount of GPU memory on the system
gpu_memory = psutil.virtual_memory()
print("Total GPU memory:", gpu_memory.total / (1024 * 1024 * 1024), "GB")
print("Used GPU memory:", gpu_memory.used / (1024 * 1024 * 1024), "GB")

# print the storage information on the system
disk_partitions = psutil.disk_partitions()
for d_p in disk_partitions:
    print("Device: " + d_p.device + "; "
          "Mountpoint: " + d_p.mountpoint + "; "
          "File system type: " + d_p.fstype)
disk_usage = psutil.disk_usage('/')
print("Total disk space:", disk_usage.total / (1024 * 1024 * 1024), "GB")
print("Used disk space:", disk_usage.used / (1024 * 1024 * 1024), "GB")

Number of CPU cores: 8
CPU usage: 28.1 %
Total memory: 31.960521697998047 GB
Used memory: 16.200885772705078 GB
Total GPU memory: 31.960521697998047 GB
Used GPU memory: 16.200843811035156 GB
Device: C:\; Mountpoint: C:\; File system type: NTFS
Device: D:\; Mountpoint: D:\; File system type: NTFS
Device: E:\; Mountpoint: E:\; File system type: NTFS
Device: P:\; Mountpoint: P:\; File system type: 
Total disk space: 932.6171836853027 GB
Used disk space: 915.7942085266113 GB


We can get detailed information about the Nvidia GPU by using the nvidia-smi utility.  
Note that you can use shell commands in Jupyter Notebooks by prefixing the command with an exclamation mark (!).

In [2]:
!nvidia-smi

Wed Jul 24 01:45:40 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 536.23                 Driver Version: 536.23       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce GTX 1070      WDDM  | 00000000:01:00.0  On |                  N/A |
| 10%   54C    P0              29W / 185W |   1775MiB /  8192MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

### Get information about the packages installed

In [3]:
# print the list of packages installed in the current environment
!conda list

'conda' is not recognized as an internal or external command,
operable program or batch file.


## Exercise 4: Running a simple Python script in Jupyter Notebook

In this exercise, we will run a simple Python program in Jupyter Notebook. We write a program to crop images and save the cropped images in a new folder.

In [5]:
!pip install pillow

Defaulting to user installation because normal site-packages is not writeable
Collecting pillow
  Downloading pillow-10.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.2 kB)
Downloading pillow-10.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.5/4.5 MB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: pillow
Successfully installed pillow-10.4.0


In [4]:
from pathlib import Path
import PIL.Image as Image

DATA_FOLDER = '/home/prateek/Downloads/workshop-hpc-data/NFFA images sampled'
TARGET_FOLDER = '/home/prateek/Downloads/workshop-hpc-data/NFFA images sampled cropped'

# Create target folder if it does not exist
Path(TARGET_FOLDER).mkdir(parents=True, exist_ok=True)

In [5]:
# get the list of all files in the source folder
files = list(Path(DATA_FOLDER).rglob('*.jpg'))
print(len(files), 'total files found')

# get the dimension of the first image
img = Image.open(files[0])
print('Dimensions of the first image:', img.size)

250
(1024, 768)


In [6]:
# define the cropping function
def crop_image(file, data_folder, target_folder):
    img = Image.open(file)
    img = img.crop((0, 0, 1024, 600))
    # save the cropped image in the target folder matching the subfolder structure
    save_to_path = str(file).replace(data_folder, target_folder)
    # make sure the subfolder exists
    Path(save_to_path).parent.mkdir(parents=True, exist_ok=True)
    img.save(save_to_path)
    print('saved ' + save_to_path)

In [7]:
# crop all images
for file in files:
    crop_image(file, DATA_FOLDER, TARGET_FOLDER)

saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/ee3dd72b977b5f4b32f3d25410831662.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/76790e49cb19cde496a02ac9f4d69bc9.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/cf9f11694caad1e286bedf582bb45456.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/7a872105aa1ac4c258cfaccfa006927d.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/35361fc57920d961451d3eb6bcae119e.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/371464b90891905ec859741039765370.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/a7c79a822fc5f72396daa9f523426b81.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/cc59dc40bfb2954475b5f65aff32305a.jpg
saved /home/prateek/Downloads/NFFA images sampled cropped/Porous_Sponge/3ac8acf63c790f286602439f0039cc15.jpg
saved /home/prateek

## Exercise 5: Running jobs on the Pinnacle cluster

In this exercise, you learned about submitting jobs to the Pinnacle server through the SSH terminal. This exercise was done outside of the Jupyter notebook environment.

Make sure you:
- changed into the /workshops/ahpcc directory  
    `cd /home/USERNAME/Downloads/aimrc-data-science-core-main/workshops/ahpcc`
- activate base environment  
    `module load python/3.10-anaconda`  
    `which python`  
    `source /share/apps/bin/conda-3.10.sh`  
- submit a Python script as a job (independent of previous jobs)  
    `sinfo`  
    `squeue`  
    `sbatch --partition pcon06 --constraint 'aimrc' --nodes=1 myjob.sh`  

## Exercise 6: Fiji

In this exercise, you learned about logging into the Pinnacle Desktop, browsing the internet, downloading and installing Fiji and using Fiji to open and process an image file.

Make sure you:
- Downloaded (link at the top) and extracted Fiji
- Launched `Fiji` executable file
- Used one of the NFFA microscopy images to do an image processing of your choice in Fiji and saved your result in your Documents folder.

## Exercise 7: A machine learning experiment

In this exercise, we write a simple machine learning program to classify the images you downloaded from the Google Drive folder. We will train a simple CNN model to classify the images.