# Slideflow Setup Guide

This Jupyter Notebook is meant to be your ground zero for working with Slideflow, including important information about using Linux, Bash, CUDA, plus info about other packages and commands that you will need to know to run experiments with Slideflow on your local workstation or HPC. 

Everything has been set up to be able to run within a Jupyter Notebook so that you can experiment with the commands. 

**This guide assumes that you already have installed Slideflow and created a conda environment for it.**

--------

Table of Contents:
- [Import libraries, set up environment](#import-libraries-set-environment-variables-check-gpus)
- [Setting up Projects and Data in Slideflow](#setting-up-projects-and-data-in-slideflow)
- [Advanced section](#advanced)
    - [Bash commands in Jupyter Notebooks](#Bash-commands-in-Jupyter-Notebooks)
    - [Magic commands](#Magic-commands)
    - [Importing packages and modules from different locations](#importing-packages-and-modules-from-different-locations)
    - [CUDA help](#cuda)
    - [System monitoring and information](#system-monitoring-and-information)
    - [Running code in a standalone script](#running-code-in-a-standalone-script)
    - [Multiprocessing help](#multiprocessing-help)

## Import libraries, set environment variables, check GPUs

### Full code

In [None]:
# import libraries
import os
import slideflow as sf

# Set environment variables 
os.environ['SF_BACKEND'] = 'torch' # Alternative is 'tensorflow'
os.environ['SF_SLIDE_BACKEND'] = 'cucim' # Alternative is 'libvips'
os.environ['CUDA_VISIBLE_DEVICES'] = '0' # Set which GPU(s) to use 

# Set verbose logging
import logging
logging.getLogger('slideflow').setLevel(logging.INFO)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '10'

# Check if slideflow was properly installed
sf.about()

# Check if GPU is available
if os.environ['SF_BACKEND']=='torch':
    import torch
    print('GPU available: ', torch.cuda.is_available())
    print('GPU count: ', torch.cuda.device_count())
    print('GPU current: ', torch.cuda.current_device())
    print('GPU name: ', torch.cuda.get_device_name(torch.cuda.current_device()))
elif os.environ['SF_BACKEND']=='tensorflow':
    import tensorflow as tf
    print("GPU: ", len(tf.config.list_physical_devices('GPU')))

### Walkthrough

The OS module allows Python to interact with the operating system. It provides functions for creating and removing directories, fetching directory contents, identifying the current directory. The ```os.path``` module provides functions for working with system filepaths.

In [None]:
import os

The library slideflow is installed with pip as a software package within your conda environment. Alternatively, you can do fancier things like cloning slideflow directly from Github (see [Advanced:Importing Packages and Modules from different locations](#importing-packages-and-modules-from-different-locations) below). 

<!---
Hello! This is a secret message.
>

In [None]:
import slideflow as sf

We also must set our environment variables. These are variables that are set in the operating system (OS) and are accessible to all programs running in that OS. 
- ```SF_BACKEND``` determines if Slideflow will use Pytorch ('torch') or Tensorflow ('tensorflow') as the backend for machine learning related functionality. We recommend using Pytorch as it is more intuitive and its documentation is better. 
- ```SF_SLIDE_BACKEND``` determines if Slideflow will use 'libvips' or 'cucim' as the image processing library for whole slide images. [cucim](https://github.com/rapidsai/cucim) is much faster but works with fewer file formats, [libvips](https://www.libvips.org/) is slower but adds support for *.scn, *.mrxs, *.ndpi, *.vms, and *.vmu files. We recommend cucim for its speed. 
- ```CUDA_VISIBLE_DEVICES``` determines which GPU(s) Slideflow should use for GPU-accelerated tasks and processes. Every GPU is assigned an integer ID. When working on a multi-GPU system, if you do not specify which GPU to use, a GPU already in use by another user may be chosen. Your process may try to assign GPU memory that is already in use, which won't work and could kill your process and the other user's process. You can use ```nvidia-smi``` from the command line to see which GPUs are in use (see [Advanced:CUDA](#cuda) for more).

In [None]:
# Set environment variables 
os.environ['SF_BACKEND'] = 'torch' # Alternative is 'tensorflow'
os.environ['SF_SLIDE_BACKEND'] = 'cucim' # Alternative is 'libvips'
os.environ['CUDA_VISIBLE_DEVICES'] = '0' # Set which GPU(s) to use 

The logging library allows us to print messages to the console. This is useful for debugging and for keeping track of what is happening in the program. We want the most information about what is happening so we set the environment variable ```TF_CPP_MIN_LOG_LEVEL``` to ```'10'```.

In [None]:
# Set verbose logging
import logging
logging.getLogger('slideflow').setLevel(logging.INFO)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '10'

Check to make sure that Slideflow was properly imported (if it wasn't, this command won't work), what version you are using, and what your backends are.

In [None]:
sf.about()

Check to make sure that GPUs are available, allowing for GPU-accelerated processing. There is different code to check for GPUs depending on if you are using Pytorch or Tensorflow.

FYI: GPU acceleration is enabled by the system-package CUDA, developed by NVIDIA (#1 GPU manufacturer). CUDA is a parallel computing platform and programming model that makes using a GPU for general purpose computing simple and elegant. The developer writes code as if the GPU has its own CPU, called a kernel, and the CUDA runtime and driver take care of the rest. (See [Advanced:CUDA](#cuda) for more information.)

In [None]:
# Check if GPU is available
if os.environ['SF_BACKEND']=='torch':
    import torch
    print('GPU available: ', torch.cuda.is_available())
    print('GPU count: ', torch.cuda.device_count())
    print('GPU current: ', torch.cuda.current_device())
    print('GPU name: ', torch.cuda.get_device_name(torch.cuda.current_device()))
elif os.environ['SF_BACKEND']=='tensorflow':
    import tensorflow as tf
    print("GPU: ", len(tf.config.list_physical_devices('GPU')))

# Setting up Projects and Data in Slideflow

After you have successfully created your conda environment and installed Slideflow, you can start making your first project. In this tutorial, we will create a project that will be used to test the functionality of Slideflow.

Slideflow deals in **Projects** and in **Data**. 

This is the typical data directory structure that is recommended for working with Slideflow:

- ```PROJECTS/```: directory where all projects are stored
    - ```TEST_PROJECT_1/```
        - ```models/```: folder containing trained model folders
        - ```eval/```: folder containing result folders from model evaluation 
        - ```annotations.csv```: annotations file
        - ```log.txt```: Slideflow's console output log (you can manually set the desired logging level)
        - ```settings.json```: project settings which should be edited for each project
        - ```script.py``` or ```notebook.ipynb```: your experiment scripts/notebook with your code
- ```DATA/```: the below directories can be anywhere, pointed to in ```datasets.json```, and each should contain a subdirectory specfic to each dataset
    - ```SLIDES/```: slide image directory 
    - ```ROI/```: region of interest CSV files generated in QuPath by ```export_rois.groovy``` script
    - ```TILES/```: folder used to temporarily store extracted tiles prior to saving as TFRecords; typically tiles are deleted once TFRecords are created
    - ```TFRECORDS/```: folder used to store TFRecords
- ```datasets.json```: address book for dataset directories
 
  
---------

We have created a project plus test data which you can download from [here](). You will need to update the paths in ```datasets.json``` and ```settings.json``` to point to your data directories.

In [None]:
# TODO
dl_path=""

Alternatively, you can create a project programatically using Slideflow's API.

In [None]:
import slideflow as sf
P = sf.create_project(
    root='project_path',
    annotations="./annotations.csv",
    slides='/path/to/slides/'
)

##### ```settings.json```

The ```settings.json``` file should be in your project folder. Everything can be relative paths (```./``` is notation for the current directory) but ```datasets.json``` should be a hard path. The "sources" is a list of the source names listed in ```datasets.json```.

Here is an example of what ```settings.json``` should looke like. 
```
{
    "name": "TEST_PROJECT",
    "annotations": "./annotations.csv",
    "dataset_config": "/home/user/DATA/datasets.json",
    "sources": [
        "SOURCE_1",
        "SOURCE_2"
    ],
    "models_dir": "./models",
    "eval_dir": "./eval",
    "mixed_precision": false,
    "batch_train_config": "./sweep.json"
}
```

##### ```datasets.json``` 

Slideflow does not require your directories to all be in one place: your slides & ROIs can be stored in one place, the tiles & TFRecords in another, the Project folders in another. Slideflow *does* need an “address book” which lists the paths to the data for each different dataset (”datasets” are called “sources”, as you will seen in ```settings.json``` later). The “address book” is the file ```datasets.json```, and its purpose is to act as the one place were all the paths to your data are logged.

Here is what ```datasets.json``` should look like. This file requires the use of "hard paths" to your data (not relative paths).

```
{
  "SOURCE_1":
  {
    "slides": "/directory",
    "roi": "/directory",
    "tiles": "/directory",
    "tfrecords": "/directory",
  },
  "SOURCE_2":
  {
    "slides": "/directory",
    "roi": "/directory",
    "tiles": "/directory",
    "tfrecords": "/directory",
  }
}
```

You can either add the lines to the JSON file manually or you can add a source to a project with the below code:

In [None]:
import slideflow as sf
P = sf.load_project('/path/to/project/directory')
P.add_source(
    name="SOURCE_NAME",
    slides="/slides/directory",
    roi="/roi/directory",
    tiles="/tiles/directory",
    tfrecords="/tfrecords/directory"
)

Once your Project has been created and your data paths have been added to the ```datasets.json``` file, you can start working with Slideflow.

In [None]:
# TODO add in section for running slideflow test.py to check that everything works + add the section for running a script from within a Jupyter notebook

# Advanced

Below is everything that I think you need to understand to perform computational work on our servers. I've included some extra information that I think is useful to know, but not necessary to know.

Table of Contents:
- [Bash commands in Jupyter Notebooks](#Bash-commands-in-Jupyter-Notebooks)
- [Magic commands](#Magic-commands)
- [Importing packages and modules from different locations](#importing-packages-and-modules-from-different-locations)
- [CUDA help](#cuda)
- [System monitoring and information](#system-monitoring-and-information)
- [Running code in a standalone script](#running-code-in-a-standalone-script)
- [Multiprocessing help](#multiprocessing-help)

## Bash commands in Jupyter Notebooks

**To execute bash or other shell commands in a Jupyter Notebook, you should use the prefix ```!``` before the command you wish to run**. It can be really convenient to run bash commands directly within Jupyter Notebooks. 

Some equivalent ways to execute bash commands in Jupyter Notebooks/VSCode:
- Use Python package equivalents (like from the ```os``` or ```sys``` libraries) 
- Execute the equivalent bash commands directly from the Terminal in VSCode
- Use the line magic ```%``` for single-line commands or the cell magic ```%%bash``` for multi-line bash scripts (see ```Magic commands``` section below). Not every bash command will work with the line magic ```%```, but many will.

**Running bash scripts in Jupyter Notebooks**

You can use the magic command %%bash to run a bash script in a Jupyter Notebook cell. ```$varname``` is how you call a variable in bash.

In [5]:
%%bash

# Define directory and file names
dir_name="example_dir"
file_name="example_file.txt"

# Create a new directory
echo "Creating a directory named $dir_name"
mkdir -p $dir_name

# Navigate into the directory
cd $dir_name
echo "Current working directory:"
pwd

# Create a new file and write some content to it
echo "Writing to $file_name"
echo "Hello, this is a test file." > $file_name

# Display the contents of the file
echo "Displaying contents of $file_name:"
cat $file_name

# Remove the file
echo "Removing $file_name"
rm $file_name

# Navigate back to the original directory and delete the new directory
cd ..
echo "Removing $dir_name"
rm -r $dir_name
echo "Current working directory:"
pwd

Creating a directory named example_dir
Current working directory:
/Users/sarakochanny/Python/example_dir
Writing to example_file.txt
Displaying contents of example_file.txt:
Hello, this is a test file.
Removing example_file.txt
Removing example_dir
Current working directory:
/Users/sarakochanny/Python


### Useful Bash commands

Simple bash commands for working with files and directories: 

- ```ls```: Lists the contents of a directory. ```ls -l```: Provides detailed list including file permissions, number of links, owner, group, size, and modification date. ```ls -a```: Lists all files, including hidden files.
- ```cd /path/to/directory```: Changes to the specified directory. ```cd ..```: Moves one directory up. ```cd ~```: Moves to the home directory.
- ```pwd```: Prints the working directory. Useful to double-check what directory you are in.
- ```mkdir <new_directory_name>```: Creates a new directory.
- ```rm <filename>```: Remove file **permanently**. ```rm -r <directory_name>```: Recursively removes a directory and its contents **permanently** (**for the love of God, be careful with this command. Executing ```rm -r /``` will delete everything on your computer**).
- ```touch <filename>```: Creates a new empty file or updates the timestamp of an existing file (this second function is useful if there is some sort of "unused file deletion time limit" set, like on certain HPC scratch spaces).
- ```cp <file_source> <file_destination>```: Copies files and directories. ```cp -r <source_directory> <destination_directory>```: Recursively copies a directory.
- ```mv <file_source> <file_destination>```: Moves files and directories. Also can be used to rename a file or directory (```mv <old_name> <new_name>```).
- ```cat <file_name>```: Displays the content of a file.
- ```grep "pattern" <file_name>```: Searches for a pattern in a file.
- ```find /path/to/search -name "file_pattern"```: Searches for files in a directory hierarchy and finds files matching the given pattern.
- ```echo <text_string>```: Displays a line of text/string that is passed as an argument (e.g. ```echo "Hello World"```: Prints "Hello World".) Equivalent to Python ```print()```. Useful for status updates in bash scripts. ```echo``` is also useful for printing the value of environment variables (e.g. ```echo $CUDA_VISIBLE_DEVICES```).
- ```head <file_name>```: Shows the first 10 lines of a file.
- ```tail <file_name>```: Shows the last 10 lines of a file.
- ```wc -l```: Counts the newline characters in a file

In [2]:
# Test them out here
!ls -la

total 507032
drwxrwxr-x   36 sarakochanny  staff       1152 Dec 19 13:41 [34m.[m[m
drwxr-xr-x+ 101 sarakochanny  staff       3232 Dec 18 14:48 [34m..[m[m
-rw-r--r--@   1 sarakochanny  staff      30724 Dec 19 13:41 .DS_Store
-rw-rw-r--    1 sarakochanny  staff         35 May 25  2023 .gitattributes
drwxrwxr-x    5 sarakochanny  staff        160 Jul  5 12:56 [34m.github[m[m
-rw-rw-r--    1 sarakochanny  staff        401 May 25  2023 .gitmodules
drwxrwxr-x    3 sarakochanny  staff         96 May 25  2023 [34m.ipynb_checkpoints[m[m
drwxr-xr-x   25 sarakochanny  staff        800 Dec 12 15:35 [34mAbbvie[m[m
drwxr-xr-x    4 sarakochanny  staff        128 Mar  6  2020 [34mAdditionalModules[m[m
drwxr-xr-x@  16 sarakochanny  staff        512 Sep  1  2022 [34mCheatsheets[m[m
-rw-r--r--@   1 sarakochanny  staff   16767083 Apr  4  2019 Cheatsheets-Tidyverse.zip
-rw-r--r--@   1 sarakochanny  staff      54481 Oct 25 13:14 Custom Feature Extractors — slideflow 2.1.0 documentation.

You can install pip packages directly from within Jupyter Notebook. (Commented out so it doesn't install if you don't want it to.)

In [None]:
#!pip install tqdm

See [Advanced:System monitoring and information](#system-monitoring-and-information) for other useful software programs you can run from the command line.

## Magic commands

Magic commands are special commands that are designed to perform some common tasks you may want to do from within a Jupyter Notebook.  They are not part of the Python language, but are instead provided by the IPython Kernel (the IPython Kernel is the computational engine that executes the code in a Jupyter Notebook). 

Magic commands begin with either ```%``` or ```%%```. ```%``` is for single-line magics and ```%%``` is for cell magics. 

You can see a detailed description of all commands by running the command ```%magic```, or just a list of them with ```%lsmagic```.

In [8]:
%magic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%

In [11]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%

There are a lot of them, but this is a list of some of the most useful ones:

- ```%time```: Times the execution of a single statement
- ```%pwd```: Prints the working directory path
- ```%cd /path/to/dir```: Changes the working directory
- ```%ls```: Lists the contents of the working directory
- ```%run <filename>```: Executes a Python script inside a cell
- ```%%writefile <filename>```: Writes the contents of a cell to a file
- ```%pycat <filename>```: Shows the content of an external file and highlights the syntax (```%cat``` just prints the contents of the file)
- ```%debug```: Drops you into the built-in Python debugger (pdb) when encountering an error. It allows for interactive debugging and variable inspection.
- ```%env```: Lists all environment variables 
- ```%env <variable>=<value>```: Sets the environment variable <variable> to <value> (alternative to using ```os.environ['VARIABLE'] = 'VALUE'``` or bash's ```export VARIABLE=VALUE```)
- ```%who```: Display variables that exist in the global scope (```%whos``` provides more detailed information) 
- ```%reset```: Reset the namespace by removing all variables and their values from memory
- ```%%html```: Renders the cell as HTML
- ```%matplotlib inline```: Displays matplotlib plots inline instead of a new window
- ```%%bash```: Execute a multi-line bash script within the cell

In [None]:
# Test them out here
%pwd

If you are doing development work actively and wish to test your changes live in a Jupyter Notebook, you can use the magic command ```%load_ext autoreload``` to automatically reload the module every time you make a change to it. 

In [None]:
%load_ext autoreload
%autoreload 2

## Importing packages and modules from different locations

**Why this is useful**

This section useful if you want to do fancy things like work with an experimental branch of Slideflow, have a Slideflow directory that you are actively doing development on, or just keep multiple versions of Slideflow. You can use git to clone the repository directly ```git clone https://github.com/jamesdolezal/slideflow.git```, and then insert the path to the cloned repo into Python's package search path.  

You can use the below methods to import any package or module from any location on your computer.

------

Python uses a system of modules and packages to organize code. A module is a single file (e.g. ```slideflow.py```) while a package is a collection of modules (e.g. ```slideflow/```), often organized as subdirectory folders within the main package directory. Packages can get pretty complicated, with module files within subdirectories within subdirectories. 

When you import a module or package, Python searches for it in an **ordered** list of directories that are stored in the ```sys.path``` variable (```sys``` manages system-specific parameters and functions). You can see the directories in ```sys.path``` using ```print(sys.path)```. 

```
import sys
print(sys.path)

['/home/pearsonlab/PROJECTS/TEST_PROJECT',
 '/home/pearsonlab/anaconda3/envs/sf/lib/python38.zip',
 '/home/pearsonlab/anaconda3/envs/sf/lib/python3.8',
 '/home/pearsonlab/anaconda3/envs/sf/lib/python3.8/lib-dynload',
 '/home/pearsonlab/.local/lib/python3.8/site-packages',
 '/home/pearsonlab/anaconda3/envs/sf/lib/python3.8/site-packages']
```

In [None]:
import sys
print(sys.path)

By default, Python will search for modules and packages in the current working directory (the directory you are in when you start Python, in this example ```/home/pearsonlab/PROJECTS/TEST_PROJECT```).

In [3]:
import os
os.getcwd() # you can also use %pwd or !pwd

'/Users/sarakochanny/Python'

If you want to import a module or package from a different directory, you can add that directory to ```sys.path```. 

So for example, if you have a specific slideflow directory (perhaps from a specific branch or with local changes you've made) you want to work from, you can set the path to that slideflow directory and slideflow will be imported from there instead. 

```
import sys
sf_path = "/home/pearsonlab/sf_dev/"
sys.path.insert(0, sf_path)
import slideflow as sf
print(sys.path)

['/home/pearsonlab/sf_dev/',
 '/home/pearsonlab/PROJECTS/TEST_PROJECT',
 '/home/pearsonlab/anaconda3/envs/sf/lib/python38.zip',
 '/home/pearsonlab/anaconda3/envs/sf/lib/python3.8',
 '/home/pearsonlab/anaconda3/envs/sf/lib/python3.8/lib-dynload',
 '/home/pearsonlab/.local/lib/python3.8/site-packages',
 '/home/pearsonlab/anaconda3/envs/sf/lib/python3.8/site-packages']
```

As you can see, the path to ```sf_dev``` was inserted first, so Slideflow is imported from the directory ```sf_dev``` instead of the directory where you are currently working (```/home/pearsonlab/PROJECTS/TEST_PROJECT```), or the conda env directory (```/home/pearsonlab/anaconda3/envs/sf/lib```) where slideflow is installed.

In [None]:
import sys
sf_path = "/home/pearsonlab/sf_dev/" # change me
sys.path.insert(0, sf_path)
import slideflow as sf
print(sys.path)

In [12]:
# TODO add 'which slideflow' step - wait does this even work?

**NOTE 1: The above steps work for importing modules (single Python ```.py``` files) as well.**

Let's say that you have a standalone Python file that contains a bunch of functions and classes you have written yourself. You want to import those functions and classes into your Jupyter Notebook or standalone Python script. To do this, you follow the very same steps as above, but instead of importing a package, you are importing a module. 

```
import sys
module_dir_path = "/path/to/dir/with/my/python/file/"
sys.path.insert(0, module_dir_path)
import module
```

This will import all of the functions and classes from ```module.py``` into your Jupyter Notebook. I also sometimes use ```from module import *``` which can be easier because then I don't have to write the module name (like in ```module.function()```) each time I want to use a function from the module.

**NOTE 2: If you are doing development work actively and wish to test your changes live in a Jupyter Notebook, you can use the magic command ```%load_ext autoreload``` to automatically reload the module every time you make a change to it.**

In [11]:
%load_ext autoreload
%autoreload 2

## CUDA

**Why this section is useful**

This section is useful so you understand how CUDA works. It must be installed by a system admin with sudo privileges, and sometimes it can be a pain to get it working if your paths are not set up properly. You want to be able to check that it is installed, part of your system PATH, visible by your deep learning library (Pytorch or Tensorflow), and working properly. Useful commands are also included.

--------

CUDA (Compute Unified Device Architecture) is a specialized programming approach for instructing NVIDIA GPUs. When you train a neural network, each layer's operations, such as convolutions and matrix multiplications, can be computed in parallel on a GPU. CUDA provides the necessary tools and language extensions (in C/C++) to developers to write programs that harness this parallelism. You will not interact with the CUDA libraries directly, but you'll interface with CUDA via Tensorflow or Pytorch. 

Before using these commands and functions, you must have the necessary NVIDIA drivers, CUDA Toolkit, and appropriate Python libraries installed. Your system admins (James or Sara) have ensured that CUDA is installed on the Pearson Lab servers (this requires sudo privileges) BUT you need to make sure that its install location is part of your PATH variable (where the OS looks for programs to run). 

### Checking that CUDA is installed

There are a few options how to check if CUDA is installed.

#### Option 1: nvcc

Check the CUDA version with the command ```nvcc --version```. This will print the version of the CUDA compiler driver. If this command doesn't work, then either CUDA isn't installed, or its install location isn't part of your PATH variable (see below).

Sometimes, the ```nvcc``` command won't work because the path to the CUDA library isn't specified in your path. You can also execute the command by specifying the full path to the nvcc executable. For example, on the Pearson Lab servers, the full path is ```/usr/local/cuda/bin/nvcc```. This is true of almost any command that you want to run from the command line. If you don't know the full path, you can use the ```which``` command to find it. For example, ```which nvcc``` returns ```/usr/local/cuda/bin/nvcc```.

In [None]:
!nvcc --version

#### Option 2: Use torch or tensorflow

In [None]:
import os
os.environ['SF_BACKEND'] = 'torch' # Alternative is 'tensorflow'

# Check if GPU is available
if os.environ['SF_BACKEND']=='torch':
    import torch
    print('GPU available: ', torch.cuda.is_available())
    print('GPU count: ', torch.cuda.device_count())
    print('GPU current: ', torch.cuda.current_device())
    print('GPU name: ', torch.cuda.get_device_name(torch.cuda.current_device()))
elif os.environ['SF_BACKEND']=='tensorflow':
    import tensorflow as tf
    print("GPU: ", len(tf.config.list_physical_devices('GPU')))

In [None]:
# Checking CUDA's path

Commands like export PATH=/usr/local/cuda/bin:$PATH and export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH are used to set up the environment for CUDA toolkit and SDK.

### Useful CUDA commands

```nvidia-smi```: Displays information about NVIDIA GPU(s) on your system, including usage, temperature, memory, and driver version. It's a go-to command for monitoring GPU health and activity.

In [None]:
!nvidia-smi

```nvcc --version``` or ```nvcc -V```: Shows the version of the NVIDIA CUDA Compiler (NVCC). Useful for checking your CUDA toolkit version. This won't work if your CUDA path is messed up (see [Checking that CUDA is installed](#checking-that-cuda-is-installed) for more).

In [None]:
!nvcc --version

```export CUDA_VISIBLE_DEVICES=0,1```: This command sets the environment variable to specify which GPUs should be accessible to CUDA applications.

The environment variable ```CUDA_VISIBLE_DEVICES``` determines which GPU(s) Slideflow should use for GPU-accelerated tasks and processes. Every GPU is assigned an integer ID. When working on a multi-GPU system, if you do not specify which GPU to use, a GPU already in use by another user may be chosen. Your process may try to assign GPU memory that is already in use, which won't work and could kill your process and the other user's process. You can use ```nvidia-smi``` from the command line to see which GPUs are in use.

In [None]:
!export CUDA_VISIBLE_DEVICES=0,1
!echo $CUDA_VISIBLE_DEVICES

## System monitoring and information

You want to be able to monitor your processes and what is going on with them. You may also want to be able to monitor the system as a whole. 

### tmux

[Tmux](https://github.com/tmux/tmux/wiki) is an open-source terminal multiplexer for Unix-based operating systems. It allows users to create multiple windows and panes within the same terminal. This is useful for running multiple programs with a single connection, such as when you're remotely connecting to a machine using Secure Shell (SSH). **Most importantly, ```tmux``` allows you to detach from a session and reattach later, which is useful if you have a long-running process that you want to keep running even if you disconnect from the server.**

**You must run tmux in the Terminal/Command Line.** 

```tmux``` relies on using keyboard shortcuts (i.e. ```Ctrl+b + 0```) to control the session and navigate between windows and panes. You can also use ```Shift+:``` to input commands.  You can see a full list of keyboard shortcuts and commands [here](https://tmuxcheatsheet.com/).

Some useful commands:
1. ```tmux new -s <session_name>```: Creates a new tmux session with the name <session_name>.
2. ```tmux a -t mysession```: Reattach to previously created session named "mysession".
3. ```tmux kill-session -t mysession```: Kill the current session.
4. ```Ctrl+b + d```: A keyboard shortcut within tmux. Detach from tmux session.
5. ```: split-window -v```: Input command within tmux. Splits the current pane vertically (-h for horizontal).

In [None]:
!tmux

### Glances

Glances ([website](https://nicolargo.github.io/glances/)) is a cross-platform system monitoring tool written in Python. It monitors & shows usage for CPU, GPU, RAM, network, disk I/O, disk usage, IP address, and more. It's fantastic. 

It also has a web interface so that you can monitor your system from a web browser.

**NOTE: Glances looks utterly terrible in a Jupyter Notebook. Use a terminal window.**

In [2]:
!glances

)07[?47h[1;24r[m[4l[m[?1h=[m[m[H[2J[1mSaras-MacBook-Pro.local[m[37m (Darwin 14.2 64bit)[1;58HUptime: 1 day, 20:08:21[3;17H[m[1mCPU[m[37m      27.1%[m[m   [1mMEM[m[37m   [37m[45m[1m  72.9%[m[m   [1mSWAP[m[37m   [37m[42m[1m 40.9%[m[m   [1mLOAD[m[37m    8core
CPU  [m[1m[[m[32m 27.1%[m[1m][m[37m  [m[m [37muser    [37m[42m[1m 11.0%[m[m   [37mtotal   16.0G[m[m   [37mtotal   1024M[m[m   [37m1 min    1.59
MEM  [m[1m[[35m 72.9%[m][m[37m  [m[m [37msystem  [37m[42m[1m  2.0%[m[m   [37mused    11.7G[m[m   [37mused     419M[m[m   [37m5 min  [32m  2.00
[37mSWAP [m[1m[[m[32m 40.9%[m[1m][m[37m  [6;34Hfree    4.34G[m[m   [37mfree     605M[m[m   [37m15 min [37m[42m[1m  2.08

[mNETWORK    [m[37m   Rx/s   Tx/s[m[m   [1mTASKS[m[37m 441 (2696 thr), 435 run, 0 slp, 6 oth 
anpi0           0b     0b
anpi1           0b     0b[m[m   [37mCPU%   [m[1m[4mMEM%  [m[37m  PID USER       THR  NI S 
a

Run the below for an example image. 

In [13]:
%%HTML
<img src="https://nicolargo.github.io/glances/public/images/screenshot-wide.png" style="height:500px">

### inxi

```inxi``` ([docs](https://smxi.org/docs/inxi.htm)) is an *extraordinarily* useful tool if you want to get information about hardware specifications or OS/kernel versions.  

The command ```inxi``` shows system hardware information based on the flag (```-b``` is basic info, ```-F``` is full output). Specific outputs: CPU (```-C```), graphics (```-G```), hard disks (```-D```), RAM (```-m```), IP address (```-i```), network (```-n```), general info (```-I```), and much more (```-h``` for full list of options).

It is not installed on Linux machines by default (your sys admin should do this). Available on Homebrew for Macs.

In [11]:
# basic info
!inxi -b

[1;34mSystem:[0m
  [1;34mHost:[0m Saras-MacBook-Pro.local [1;34mKernel:[0m 23.2.0 [1;34march:[0m arm64 [1;34mbits:[0m 64[0m
    [1;34mDesktop:[0m Notion [1;34mOS:[0m Darwin 23.2.0[0m
[1;34mMachine:[0m
  [1;34mType:[0m N/A [1;34mMobo:[0m N/A [1;34mmodel:[0m N/A [1;34mserial:[0m N/A [1;34mBIOS:[0m N/A [1;34mv:[0m N/A [1;34mdate:[0m N/A[0m
[1;34mCPU:[0m
  [1;34mInfo:[0m 8-core Apple M1 [MCP] [1;34mspeed:[0m 0[0m
[1;34mGraphics:[0m
  [1;34mMessage:[0m No ARM data found for this feature.[0m
  [1;34mDisplay:[0m [1;34mserver:[0m X.Org [1;34mv:[0m 21.1.6 [1;34mdriver:[0m N/A [1;34mresolution:[0m 1920x1956~1Hz[0m
  [1;34mAPI:[0m OpenGL [1;34mv:[0m 2.1 [1;34mvendor:[0m apple [1;34mv:[0m N/A [1;34mrenderer:[0m Apple M1[0m
[1;34mNetwork:[0m
  [1;34mMessage:[0m No ARM data found for this feature.[0m
[1;34mDrives:[0m
  [1;34mLocal Storage:[0m [1;34mtotal:[0m dmesg.boot not found [1;34mused:[0m 0 KiB[0m
[1;34mInfo:[0m


### ncdu

The typical bash command for checking disk usage is ```du```, but it's not very pretty or human readable. ```ncdu``` (NCurses Disk Usage) ([docs](https://dev.yorhel.nl/ncdu)) on the other hand is fantastic and easy to use. It lets you use the arrow keys to navigate the directory tree and see disk usage for each directory. You can delete files and directories from within the program. It may need to be installed by your sys admin if it isn't already.

Run the below cell for example image.

In [8]:
%%HTML
<img src="https://ostechnix.com/wp-content/uploads/2022/08/Check-Disk-Space-Usage-With-Ncdu.png" style="height:400px">

### ntfy

A phone app that sends you a notification when a command finishes running. Useful for long-running processes.

Steps: 
1. Download ntfy [here](https://ntfy.sh/) to your desired device.
2. Create a unique name for your "topic" (i.e. the experiment name), which generates a unique URL.
3. Add the below (example) code within your Python script, which will post a message to your topic.
4. You get a notification on your phone. 

In [None]:
# first argument is the unique topic URL, data is the message to send
import requests
requests.post("https://ntfy.sh/mytopic", data="Backup successful 😀".encode(encoding='utf-8'))

## Running code in standalone script

The advantage of NOT using a Jupyter Notebook comes from escaping the constraints that come with running things within Jupyter Notebooks. For example, you can't run code in parallel within a Jupyter Notebook. You can only run code sequentially. This is because Jupyter Notebooks are designed to be interactive, and parallel processing is not interactive.

Instead, you can run code in a standalone Python script (i.e. ```experiment.py```) and then execute this script with python from the command line: ```python3 experiment.py``` (use the full path if needed).

Ideally, you should execute scripts on the command line from within a [TMUX](#tmux) session to ensure that long-running processes are not interrupted due to lost of connection to your remote server.

In [20]:
%%writefile experiment.py
# use the above line to write the contents of this cell to a file and then execute it

# test script
print("hello world")

# import libraries
import os
import slideflow as sf

# Check if slideflow was properly installed
sf.about()

Writing experiment.py


In [21]:
# Execute the file (which should be done on the command line, ideally in a TMUX session)
!python3 experiment.py

hello world


In [22]:
# Remove the file
%rm experiment.py

## Multiprocessing help

**Why this section is useful**

Multiprocessing in Python is a means to perform parallel processing by using multiple processors on a machine (as opposed to sequential/serial processing). This is particularly useful for CPU-bound tasks that can be parallelized. 

A *program* (like Python) is static: it is the data and information itself that needs to be processed and executed, while a *process* is when the actual program is in memory and under the control of the CPU.

The library ```multiprocessing``` allows for executed "parent" processes to be divided into "child" processes which are executed in parallel (instead of all processes runnning sequentially). Each child process is assigned its own memory (RAM) and CPU *thread* (a thread is the virtual sequence of instructions given to a CPU). This massively speeds up processing time but also requires higher memory overheads, because each child process requires it's own memory (so memory is duplicated in an additive manner). So beware that what you trade for faster speeds, you lose in memory usage.

Threads run within some process. A process can have more than one thread and each thread **shares** the memory and resources of the process (which means they can access shared data). Multithreading (versus multiprocessing) is how you can speed up computation but avoid the higher memory overheads. For example, Slideflow's ```extract_tiles()``` utilizes multithreading to speed up tile extraction.

In [None]:
import multiprocessing

In [None]:
# TODO examples coming soon!