# Local Installation Setup Tutorial

Hello! This tutorial shows you how to set up and tear down your workspace in a Jupyter Lab notebook (or ipython environment) in order to execute tasks from the luna pathology library. Here are the steps we will review:

1. Prerequisites
2. Set up your virtual environment
3. Clone the repository and install dependencies
4. Setup Luna packages and configurations
5. References

## 1. Prerequisites

A machine with at least 16GB free memory. This tutorial will not run on a laptop with 16GB total memory.

It is assumed you have a Jupyter lab environment set up for executing these notebooks. If not, you may follow the instruction at https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html to install the lab environment on your host system of choice. 

The software prerequisites listed here must be installed on the host system and not through the jupyter lab (or ipython) environment. 

You must download Apache Spark to your local computer in the case that it is not already downloaded (https://spark.apache.org/downloads.html).

Make sure that you have the correct version of Java, Scala, Python, and R installed in the correct place on your computer. Apache Spark runs on Java 8/11, Scala 2.12, Python 3.6+ and R 3.5+.

Here are the links for installations of Java, Scala, Python, and R. Again, make sure you download the correct versions:

Java AdoptOpenJDK: https://adoptopenjdk.net/installation.html
Scala: https://www.scala-lang.org/download/
Python: https://www.python.org/downloads/
R: https://www.r-project.org/

It is important to have the path to your Java installation in your JAVA_HOME environment variable. 

In [6]:
!java -version
!python3 --version

import os, subprocess
os.environ['JAVA_HOME'] = subprocess.check_output(['bash','-c', 'which java']).decode("utf-8")
!echo 'JAVA_HOME=' $JAVA_HOME

openjdk version "1.8.0_275"
OpenJDK Runtime Environment (build 1.8.0_275-b01)
OpenJDK 64-Bit Server VM (build 25.275-b01, mixed mode)
Python 3.6.9
JAVA_HOME= /gpfs/mskmindhdp_emc/sw/env/bin/java


You must also download Hadoop for your computer. On mac, you may install with this command:

    brew install hadoop

Hadoop has special installation instructions for MacBooks. Here is an instruction link for a single cluster as a guide: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html.

Next, install Openslide (https://openslide.org/download/). This library will help with reading the svs images and their tiles. On mac, you may install with this command:

    brew install openslide

Lastly, you must find the location where your Spark software is installed on your machine and the SPARK_HOME environnment variable yourself. You may find your Spark installation directory by executing, 

    which spark-submit
    
If for example, the output is "/opt/spark-3.0.0-bin-hadoop3.2/bin/spark-submit", then set your SPARK_HOME environment variable to "/opt/spark-3.0.0-bin-hadoop3.2" running the code below in a code cell.

    import os
    os.environ['SPARK_HOME']='/opt/spark-3.0.0-bin-hadoop3.2'
    !echo $SPARK_HOME

## 2. Set up your virtual environment

Next, set up your virtual environment within the jupyter lab (or ipython) environment. The end of this tutorial has steps for tearing down this virtual environment. 

Open a terminal in your Jupyter Lab environment by selecting File -> New -> Terminal and execute the following commands. It is assumed that your default python environment on the host system has python3-venv installed (sudo apt-get install python3-venv -y).

    # change directory to your pathology tutorial sandbox directory
    cd [LOCATION-WHERE-YOU-WANT-TO-CREATE-THE-VIRTUAL-ENV]

    # create the virtual environment
    python3 -m venv pt-venv
    
    # activate the virtual environment
    source pt-venv/bin/activate 
    
    # upgrade pip
    pip install --upgrade pip
    
    # install ipykernel
    pip install ipykernel

    # Register this env with jupyter lab. It’ll now show up in the
    # launcher & kernels list once you refresh the page
    python3 -m ipykernel install --user --name pt-venv --display-name "pathology venv"

    # List kernels to ensure it was created successfully
    jupyter kernelspec list
    
    # deactivate the virtual environment in the terminal
    deactivate

Now, apply the new kernel to your notebook by first selecting the default kernel (which is typically "Python 3") and then selecting your new kernel "pathology tutorial venv" from the drop-down list. **NOTE:** It may take a minute for the drop-down list to update. 

Any python packages you pip install through the jupyter environment will now persist only in this environment.


## 3. Install Pyluna

In [7]:
%pip install -q pyluna
%pip install -q pyluna-pathology

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


Verify that pyluna packages were installed. 

In [8]:
%pip list | grep luna

pyluna             0.0.3
pyluna-common      0.0.3
pyluna-core        0.0.3
pyluna-pathology   0.0.3
Note: you may need to restart the kernel to use updated packages.


If you have followed all of these steps so far, your jupyter installation should be set up! Try importing the luna libraries.

In [10]:
import luna
import luna.pathology

You should have no errors with this step. Congratulations, you are ready to move on to the dataset prep!

## 4. Setup Luna home and configurations


First, let's get our bearings

In [27]:
%%bash
echo 'current notebooks dir:'
echo
ls .
echo
echo 'tutorial root dir:'
echo
ls ../

current notebooks dir:

dataset-prep.ipynb
dsa-tools.ipynb
end-to-end-pipeline.ipynb
inference-and-visualization.ipynb
model-training.ipynb
setup.ipynb
teardown.ipynb
tiling.ipynb

tutorial root dir:

classifier
conf
dockerfile
img
Makefile
notebooks
README.md


Next, let's set up `$LUNA_HOME`. This is required to run tasks from luna packages.

In [28]:
!mkdir -p tutorial_sandbox
%env LUNA_HOME=tutorial_sandbox

env: LUNA_HOME=tutorial_sandbox


In [29]:
# check $LUNA_HOME
!echo $LUNA_HOME

tutorial_sandbox


2. Copy the `conf/` directory from the tutorial to $LUNA_HOME/conf. The conf directory contains some prespecified configuration files that will be used in this tutorial. 

We have two baseline configuration files.

- **conf/datastore.cfg**: configuration for your data stores. POSIX and Minio stores are supported.

- **conf/logging.cfg**: configuration for logging level and optional central logging in MongoDB.

In [30]:
!cp -R ../conf $LUNA_HOME/

## 5. References:

Use Virtual Environments Inside Jupyter Notebooks & Jupter Lab [Best Practices] -
https://www.zainrizvi.io/blog/jupyter-notebooks-best-practices-use-virtual-environments/

Installing the IPython kernel - 
https://ipython.readthedocs.io/en/stable/install/kernel_install.html#kernels-for-python-2-and-3