## IDC Classification Trainer

This notebook provides the functionality to train an Inception V3 model for classification of invasive ductal carcinoma (IDC) using transfer learning. We are training the model on the lastest Skylake cluster (c009) on Intel A.I. DevCloud (Colfax Cluster) and will use the model with Intel Movidius.

This tutorial is part of Machine Learning and Mammography by Adam Milton-Barker.

# Create data sorter job

The first step is to create a script that can be used to create a job on the A.I. DevCloud for sorting the uploaded data. Before you run the following block make sure you have followed all of the steps at the beginning of the README file in the home directory of this project.

In [10]:
%%writefile IDC-DevCloud-Data-Sorter
cd $PBS_O_WORKDIR
echo "* Hello world from compute server `hostname` on the A.I. DevCloud!"
echo "* The current directory is ${PWD}."
echo "* Compute server's CPU model and number of logical CPUs:"
lscpu | grep 'Model name\\|^CPU(s)'
echo "* Python available to us:"
export PATH=/glob/intel-python/python3/bin:$PATH;
which python
python --version
echo "* This job sorts the data for the IDC Classifier on the Colfax Cluster"
python DevCloudTrainer.py DataSort
sleep 10
echo "*Adios"
# Remember to have an empty line at the end of the file; otherwise the last command will not run


Writing IDC-DevCloud-Data-Sorter


# Check the data sorter job script was created

Now check that the data sorter job script was created successfully by executing the following block which will print out the files located in the current directory. If all was successful, you should see the file "IDC-DevCloud-Data-Sorter". You can also open this file to confirm that the contents are correct.

In [18]:
%ls

DevCloudTrainer.ipynb  IDC-DevCloud-Data-Sorter  [0m[01;34mtools[0m/
DevCloudTrainer.py     [01;34mmodel[0m/


# Submit the data sorter job script

Now it is time to submit your data sorter job script, this will queue the training script ready for execution and return your job ID.

In [19]:
!qsub IDC-DevCloud-Data-Sorter

71698.c009


# Check the status of the job

Now you can monitor the status of the job by executing the following block. You may need to do this a number of times before the job completes. 

JOB STATUSES

R: Running  
Q: Waiting in queue

In [20]:
!qstat

Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
71615.c009                 ...ub-singleuser u13339          00:00:15 R jupyterhub     
71698.c009                 ...d-Data-Sorter u13339                 0 Q batch          


You can also get a full list of stats for the job by executing the following block, replacing the ID with your job ID:

In [21]:
!qstat -f 71698

Job Id: 71698.c009
    Job_Name = IDC-DevCloud-Data-Sorter
    Job_Owner = u13339@c009-n001
    job_state = Q
    queue = batch
    server = c009
    Checkpoint = u
    ctime = Sat Apr 21 14:53:49 2018
    Error_Path = c009-n001:/home/u13339/IDC-Colfax-Trainer/IDC-DevCloud-Data-S
	orter.e71698
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = n
    mtime = Sat Apr 21 14:53:49 2018
    Output_Path = c009-n001:/home/u13339/IDC-Colfax-Trainer/IDC-DevCloud-Data-
	Sorter.o71698
    Priority = 0
    qtime = Sat Apr 21 14:53:49 2018
    Rerunable = True
    Resource_List.nodect = 1
    Resource_List.nodes = 1:ppn=2
    Resource_List.walltime = 06:00:00
    Variable_List = PBS_O_QUEUE=batch,PBS_O_HOME=/home/u13339,
	PBS_O_LOGNAME=u13339,
	PBS_O_PATH=/glob/intel-python/python3/bin/:/glob/intel-python/python3
	/bin/:/glob/intel-python/python2/bin/:/glob/development-tools/versions
	/intel-parallel-studio-2018-update2/compilers_and_libraries_2018.

# Check error and output files

After the above job finished you will see two files in your current directory, as the job ID in my case was 57185, my error file ends with e57185 and my output file ends with o57185. In this case the error file contained a FutureWarning. The output will show you the full output of your program.

In [24]:
%ls

DevCloudTrainer.ipynb     IDC-DevCloud-Data-Sorter.e71698  [0m[01;34mtools[0m/
DevCloudTrainer.py        IDC-DevCloud-Data-Sorter.o71698
IDC-DevCloud-Data-Sorter  [01;34mmodel[0m/


# Create training job

Now it is time to create your training job, the script required for this is almost identical to the above created script, all we need to do is change filename and the commandline argument.

In [25]:
%%writefile IDC-DevCloud-Trainer
cd $PBS_O_WORKDIR
echo "* Hello world from compute server `hostname` on the A.I. DevCloud!"
echo "* The current directory is ${PWD}."
echo "* Compute server's CPU model and number of logical CPUs:"
lscpu | grep 'Model name\\|^CPU(s)'
echo "* Python available to us:"
export PATH=/glob/intel-python/python3/bin:$PATH;
which python
python --version
echo "* This job trains the IDC Classifier on the Colfax Cluster"
python DevCloudTrainer.py Train
sleep 10
echo "*Adios"
# Remember to have an empty line at the end of the file; otherwise the last command will not run


Writing IDC-DevCloud-Trainer


# Check the training job script was created

Now check that the trainer job script was created successfully by executing the following block which will print out the files located in the current directory. If all was successful, you should see the file "IDC-DevCloud-Trainer". You can also open this file to confirm that the contents are correct.

In [26]:
%ls

DevCloudTrainer.ipynb     IDC-DevCloud-Data-Sorter.e71698  [0m[01;34mmodel[0m/
DevCloudTrainer.py        IDC-DevCloud-Data-Sorter.o71698  [01;34mtools[0m/
IDC-DevCloud-Data-Sorter  IDC-DevCloud-Trainer


# Submit the training job script

Now it is time to submit your training job script, this will queue the training script ready for execution and return your job ID. In this command we set the walltime to 24 hours, which should give our script enough time to fully complete without getting killed. 

In [32]:
!qsub -l walltime=24:00:00 IDC-DevCloud-Trainer

71782.c009


# Check the status of the job

In [33]:
!qstat

Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
71615.c009                 ...ub-singleuser u13339          00:12:26 R jupyterhub     
71782.c009                 ...Cloud-Trainer u13339                 0 Q batch          


In [34]:
!qstat -f 71782

Job Id: 71782.c009
    Job_Name = IDC-DevCloud-Trainer
    Job_Owner = u13339@c009-n001
    job_state = Q
    queue = batch
    server = c009
    Checkpoint = u
    ctime = Sat Apr 21 17:40:03 2018
    Error_Path = c009-n001:/home/u13339/IDC-Colfax-Trainer/IDC-DevCloud-Traine
	r.e71782
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = n
    mtime = Sat Apr 21 17:40:03 2018
    Output_Path = c009-n001:/home/u13339/IDC-Colfax-Trainer/IDC-DevCloud-Train
	er.o71782
    Priority = 0
    qtime = Sat Apr 21 17:40:03 2018
    Rerunable = True
    Resource_List.nodect = 1
    Resource_List.nodes = 1:ppn=2
    Resource_List.walltime = 24:00:00
    Variable_List = PBS_O_QUEUE=batch,PBS_O_HOME=/home/u13339,
	PBS_O_LOGNAME=u13339,
	PBS_O_PATH=/glob/intel-python/python3/bin/:/glob/intel-python/python3
	/bin/:/glob/intel-python/python2/bin/:/glob/development-tools/versions
	/intel-parallel-studio-2018-update2/compilers_and_libraries_2018.2.199
	/lin

# Evaluate your model

Now we will evaluate the model. 

In [30]:
%%writefile IDC-DevCloud-Evaluator
cd $PBS_O_WORKDIR
echo "* Hello world from compute server `hostname` on the A.I. DevCloud!"
echo "* The current directory is ${PWD}."
echo "* Compute server's CPU model and number of logical CPUs:"
lscpu | grep 'Model name\\|^CPU(s)'
echo "* Python available to us:"
export PATH=/glob/intel-python/python3/bin:$PATH;
which python
python --version
echo "* This job evaluates the IDC Classifier on the Colfax Cluster"
python Eval.py
sleep 10
echo "*Adios"
# Remember to have an empty line at the end of the file; otherwise the last command will not run

Writing IDC-DevCloud-Evaluator


# Submit the evaluator job script

Execute the following block and then check the output file generated at the end of the program. Due to the way the AI DevCloud works the output you need to view will actually be in the error log.

In [31]:
!qsub IDC-DevCloud-Evaluator
!qstat

71780.c009
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
71615.c009                 ...ub-singleuser u13339          00:12:21 R jupyterhub     
71780.c009                 ...oud-Evaluator u13339                 0 Q batch          


# Download your trained model

When your training job completes, you will need to download the model and continue with the main README tutorial to convert the model into a format suitable for the Intel Movidius and complete the setup of the client and server. 

## CONGRATULATIONS

You have completed the training of an IDC classifier on the Intel AI DevCloud!