## Description
With this notebook, NNI experiments can be run taking advantage of hosted runtimes by Google. Particularly, they will exploit GPU to speed up the execution.<br>
<i>To enable GPU usage: navigate to Edit→Notebook Settings then select GPU from the Hardware Accelerator drop-down.</i><br>
Due to time constraints imposed to such free resources, we have to accept some trade-off and to design some workaround in order to get as much as possible benefit from the Colab tool.<br>
Additionally, with the aim of fully monitoring the experiments through the graphical interface provided by NNI, we will use ngrok to overcome the fact that Colab doesn’t expose its public IP and ports thus hindering access to the NNI web UI.

<b>Step 0</b><br>
Have a look at what NNI is and how to use it in Google Colab:<br>
&emsp; https://nni.readthedocs.io/en/v2.2/Overview.html<br>
&emsp; https://nni.readthedocs.io/en/stable/sharings/nni_colab_support.html<br>

## Download the course-dedicated folder and install/import packages

In [1]:
%%capture
! pip uninstall --yes gdown
! pip install gdown==4.3.1
#! pip install gdown -U --no-cache-dir
! pip uninstall --yes numpy
! pip install numpy==1.19.5
! pip uninstall --yes tensorflow


import os
import gdown
from gdown.download_folder import download_and_parse_google_drive_link
import time
import datetime
from google.colab import files
from google.colab import drive

In [2]:
if "NNI" not in os.listdir("./"):
  NNI_created = True
  ! mkdir ./NNI
else:
  NNI_created = False

In [3]:
cd NNI

/content/NNI


In [4]:
%%capture

folder_link = "https://drive.google.com/drive/folders/1L6jKkxdshdByQ6_v0PVVhPQMYMmpb609"

return_code, gdrive_folder = download_and_parse_google_drive_link(folder_link,
                                                                  quiet=True)
if NNI_created:
  for ii in gdrive_folder.children:
    if "folder" in ii.type:
      ! mkdir ./$ii.name
      if ii.children:
        for jj in ii.children:
          while jj.name not in os.listdir("./"+ii.name):
            gdown.download(id=jj.id,
                          output="./"+ii.name+"/"+jj.name,
                          quiet=True)
    else:
      while ii.name not in os.listdir("./"):
        gdown.download(id=ii.id,
                      quiet=True)

In [5]:
%%capture
if "requirements_pip.txt" not in os.listdir("./"):
  raise SystemError('requirements_pip.txt missing!')
else:
  ! pip install -r ./requirements_pip.txt

#os.kill(os.getpid(), 9)

## <i>Mount Drive</i>

In [19]:
NNI_output_path = "/content/drive/MyDrive/Polito/MLIA/Project"
drive.mount('/content/drive')

if NNI_output_path:
  if os.path.isdir(NNI_output_path):
    if "evaluations" not in os.listdir(NNI_output_path):
      ! mkdir $NNI_output_path/evaluations
  else:
    if "evaluations" not in os.listdir("/content/drive/MyDrive"):
      ! mkdir /content/drive/MyDrive/evaluations
else:
  if "evaluations" not in os.listdir("/content/drive/MyDrive"):
    ! mkdir /content/drive/MyDrive/evaluations

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Download ngrok and save your personal token

Once you get it after signing up, copy in the cell below your ngrok personal token.

In [10]:
ngrok_personal_token = "22NC2lpjZADMWEov4PScwuuxWO0_2rEq6wJqazzEPdUHq7EVR"

In [11]:
%%capture
! wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
! unzip ngrok-stable-linux-amd64.zip

In [12]:
! ./ngrok authtoken $ngrok_personal_token

Authtoken saved to configuration file: /root/.ngrok2/ngrok.yml


## Check the correct TensorFlow version (2.4.1) is installed and GPU availability

In [13]:
import tensorflow as tf

print("TF version: {}\n".format(tf.__version__))

#tf.config.list_physical_devices('GPU')
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found. This might be a CPU-based runtime. Be sure GPU is not required for the upcoming NNI experiment.')
else:
  print("GPU check succeeded!")
  print('GPU index: {}'.format(device_name[device_name.rfind(":")+1:]))

TF version: 2.4.1

GPU check succeeded!
GPU index: 0


#### *Check reference directories*

In [14]:
print("Root directory:\n{}\n".format(os.path.expanduser('~')))
print("Check the working directory is NNI:")
! pwd

Root directory:
/root

Check the working directory is NNI:
/content/NNI


## Keep the runtime working
The following cell prevents the runtime from disconnecting after few minutes of idle state. We need it in order to keep the notebook active as long as possible (about 6 hours using CPU and about 3 hours using GPU).

In [15]:
from google.colab.output import eval_js

js = """
function ClickConnect(){
console.log("Working"); 
document.querySelector("colab-toolbar-button#toolbar-add-code").click() 
}setInterval(ClickConnect,300000)
"""

eval_js(js)

1

## Start NNI experiment

**NOTE that the dataset information have to be included within the .py file**

Select which network architecture will be considered in the NNI experiment

In [16]:
net_type = "slmu"
configuration_filename = "nni_"+net_type+"_trial.yml"

if net_type == "scnn":
  print("REMEMBER that a previous CNN experiment is needed to have a reference network structure.")

Start the NNI experiment

In [17]:
! nnictl create --config configurations/$configuration_filename --port 5000 &
get_ipython().system_raw('./ngrok http 5000 &')

INFO: [0m [0mexpand searchSpacePath: ../searchspaces/nni_SearchSpace_slmu.json to /content/NNI/configurations/../searchspaces/nni_SearchSpace_slmu.json [0m
[0mINFO: [0m [0mexpand codeDir: ../experiments/ to /content/NNI/configurations/../experiments/ [0m
[0mINFO: [0m [0mStarting restful server...[0m
[0mINFO: [0m [0mSuccessfully started Restful server![0m
[0mINFO: [0m [0mStarting experiment...[0m
[0mINFO: [0m [0m[32mSuccessfully started experiment!
[39m------------------------------------------------------------------------------------
The experiment id is K05SGDA1
The Web UI urls are: http://127.0.0.1:5000   http://172.28.0.2:5000
------------------------------------------------------------------------------------

You can use these commands to get more information about the experiment
------------------------------------------------------------------------------------
         commands                       description
1. nnictl experiment show        show the 

Save the running experiment ID as a variable, since it will be useful later on

In [20]:
experiment_IDs = [ii for ii in os.listdir(os.path.expanduser('~')+"/nni-experiments") if ii[0]!="."]

experiment_folders = [os.path.join(os.path.expanduser('~'),"nni-experiments", jj) for jj in experiment_IDs]

experiment_folders.sort(key=lambda x: os.path.getmtime(x))

exp_ID = experiment_folders[-1][experiment_folders[-1].rfind("/")+1:]

print("Current experiment ID: {}".format(exp_ID))

if NNI_output_path:
  if os.path.isdir(NNI_output_path):
    local_experiment_folder = NNI_output_path+"/evaluations/Experiment_{}_{}/".format(net_type,exp_ID)
  else:
    local_experiment_folder = "content/drive/MyDrive/evaluations/Experiment_{}_{}/".format(net_type,exp_ID)
else:
  local_experiment_folder = "content/drive/MyDrive/evaluations/Experiment_{}_{}/".format(net_type,exp_ID)
  
print("Local experiment folder: {}".format(local_experiment_folder))
! mkdir $local_experiment_folder

Current experiment ID: K05SGDA1
Local experiment folder: /content/drive/MyDrive/Polito/MLIA/Project/evaluations/Experiment_slmu_K05SGDA1/


## View the NNI experiment
Open the first displayed link to access the NNI web UI

In [21]:
! curl -s http://localhost:4040/api/tunnels # don't change the port number 4040

{"tunnels":[{"name":"command_line","uri":"/api/tunnels/command_line","public_url":"https://e3bf-34-86-113-101.ngrok.io","proto":"https","config":{"addr":"http://localhost:5000","inspect":true},"metrics":{"conns":{"count":0,"gauge":0,"rate1":0,"rate5":0,"rate15":0,"p50":0,"p90":0,"p95":0,"p99":0},"http":{"count":0,"rate1":0,"rate5":0,"rate15":0,"p50":0,"p90":0,"p95":0,"p99":0}}},{"name":"command_line (http)","uri":"/api/tunnels/command_line%20%28http%29","public_url":"http://e3bf-34-86-113-101.ngrok.io","proto":"http","config":{"addr":"http://localhost:5000","inspect":true},"metrics":{"conns":{"count":0,"gauge":0,"rate1":0,"rate5":0,"rate15":0,"p50":0,"p90":0,"p95":0,"p99":0},"http":{"count":0,"rate1":0,"rate5":0,"rate15":0,"p50":0,"p90":0,"p95":0,"p99":0}}}],"uri":"/api/tunnels"}


## Periodically download experiment data

In [25]:
def zip_folder(folder_path):
  now = datetime.datetime.now().strftime("%Y-%m-%d_%H.%M.%S") # UTC+0 time
  destination = folder_path+"_"+now+".zip"
  origin = folder_path+"/"
  ! zip -r $destination $origin &> /dev/null
  return destination

def download_output():
  path = "./output"
  source = zip_folder(path)
  while True:
    if source[source.rfind("/")+1:] in os.listdir("./"):
      break
  if NNI_output_path:
    if os.path.isdir(NNI_output_path):
      destination = NNI_output_path+"evaluations/Experiment_"+net_type+"_"+exp_ID+"/"
    else:
      destination = "content/drive/MyDrive/evaluations/Experiment_"+net_type+"_"+exp_ID+"/"
  else:
    destination = "content/drive/MyDrive/evaluations/Experiment_"+net_type+"_"+exp_ID+"/"

  ! scp $source $destination
  os.remove(source)
  for ii in os.listdir(destination):
    if ("output" in ii) & (ii != source[source.rfind("/")+1:]):
      os.remove(destination+ii)

def download_experiment():
  path = os.path.expanduser('~')+"/nni-experiments/"+exp_ID
  source = zip_folder(path)
  while True:
    if source[source.rfind("/")+1:] in os.listdir(os.path.expanduser('~')+"/nni-experiments/"):
      break
  if NNI_output_path:
    if os.path.isdir(NNI_output_path):
      destination = NNI_output_path+"/evaluations/Experiment_"+net_type+"_"+exp_ID+"/"
    else:
      destination = "content/drive/MyDrive/evaluations/Experiment_"+net_type+"_"+exp_ID+"/"
  else:
    destination = "content/drive/MyDrive/evaluations/Experiment_"+net_type+"_"+exp_ID+"/"
  ! scp $source $destination
  os.remove(source)
  for ii in os.listdir(destination):
    if (exp_ID in ii) & (ii != source[source.rfind("/")+1:]):
      os.remove(destination+ii)

In [23]:
timer_download = 15*60 # miutes*seconds

In [None]:
t0 = time.time()
ready2download = False
while True:
  t1 = time.time()
  if t1-t0 >= timer_download:
    t0 = t1
    ready2download = True
    #break
  if ready2download:
    download_output()
    download_experiment()
    ready2download = False

## List running experiments

In [None]:
! nnictl experiment list

----------------------------------------------------------------------------------------
                Experiment information
Id: 8FR0aMQp    Name: MLinApp    Status: STOPPING    Port: 5000    Platform: local    StartTime: 2022-08-02 13:20:28    EndTime: N/A

----------------------------------------------------------------------------------------
[0m
[0m[0m

## Stop running NNI experiment

In [None]:
! nnictl stop

INFO: [0m [0mStopping experiment ky3fSVnO[0m
[0mINFO: [0m [0mStop experiment success.[0m
[0m[0m