<a href="https://colab.research.google.com/github/jessecanada/MAPS/blob/master/MAPS_4_Phenotype_Classification_Azure.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **MAPS Module 4 - Phenotype Classification**
This notebook will guide you through classifying phenotypes with Azure Custom Vision.


## Set up Azure environtment

In [None]:
!pip -q install azure-cognitiveservices-vision-customvision

[?25l[K     |█████▎                          | 10kB 13.7MB/s eta 0:00:01[K     |██████████▋                     | 20kB 16.5MB/s eta 0:00:01[K     |███████████████▉                | 30kB 15.8MB/s eta 0:00:01[K     |█████████████████████▏          | 40kB 9.5MB/s eta 0:00:01[K     |██████████████████████████▍     | 51kB 4.4MB/s eta 0:00:01[K     |███████████████████████████████▊| 61kB 5.1MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 3.6MB/s 
[K     |████████████████████████████████| 92kB 4.5MB/s 
[K     |████████████████████████████████| 51kB 7.1MB/s 
[?25h

In [None]:
# data and file processing libraries
import numpy as np
import pandas as pd
import cv2
import matplotlib.pyplot as plt
import os
%matplotlib inline

# Azure related libraries
from azure.cognitiveservices.vision.customvision.training import CustomVisionTrainingClient
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials
from azure.cognitiveservices.vision.customvision.training.models import ImageFileCreateBatch, ImageFileCreateEntry, Region

Setup your Azure trainer and predictor. Follow [this guide](https://docs.microsoft.com/en-us/azure/cognitive-services/custom-vision-service/quickstarts/object-detection?tabs=visual-studio&pivots=programming-language-python) to locate the attributes

In [None]:
ENDPOINT = "your-endpoint" # ex: https://westus2.api.cognitive.microsoft.com/
training_key = "your-training-key"
prediction_key = "your-prediction-key"

In [None]:
credentials = ApiKeyCredentials(in_headers={"Training-key": training_key})
trainer = CustomVisionTrainingClient(ENDPOINT, credentials)
prediction_credentials = ApiKeyCredentials(in_headers={"Prediction-key": prediction_key})
predictor = CustomVisionPredictionClient(ENDPOINT, prediction_credentials)

In [None]:
# list your projects
for project in trainer.get_projects():
  print(project.name, project.id)

PTEN_classification 1eae5342-91d5-4f2c-9848-9652c1e13b36
PTEN_obj_detect 852eead8-f80d-4645-9c3d-5ba1fa221df2


In [None]:
# copy the 'id' value of your object detection project and paste it below
project = trainer.get_project(project_id="1eae5342-91d5-4f2c-9848-9652c1e13b36")
# if project is loaded successfully you should see it returned
project.id

'1eae5342-91d5-4f2c-9848-9652c1e13b36'

In [None]:
# list published iterations of your obj detection model
# iterations not published will print as "None"
for iter in trainer.get_iterations(project.id):
  if iter.publish_name == None:
    print(f'{iter.name}: not published')
  else:
    print(f'{iter.name} is published as "{iter.publish_name}"')

Iteration 7 is published as "Iteration7"
Iteration 4: not published
Iteration 2 is published as "Iteration2"


In [None]:
# specify the iteration you want to use (without spaces)
publish_iteration_name = "Iteration7"

## Get the ROI files ready for classification

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# unzip zip file containing individual ROI images
!unzip -q -d /content/ path-to-ROI-zip

In [None]:
# confirm how many cells are to be analyzed
!ls path-to-folder | wc -l

337


## Azure classification predictions

In [None]:
wrk_dir = "your-ROI-directory" # replace with your ROI folder path
temp_list = []

for entry in os.scandir(wrk_dir):
  if entry.name.endswith('.jpg'):
    image_ID = entry.name[:-4]
    print(f'image_ID: {image_ID}')

    # open an image and get back the prediction results
    with open(wrk_dir+entry.name, mode="rb") as image: # rb: 'read binary' (for images)
      results = predictor.classify_image(project.id, publish_iteration_name, image)
    
      # get prediction results
      tags = [prediction.tag_name for prediction in results.predictions]
      probabilities = [prediction.probability*100 for prediction in results.predictions]
      # make a dictionary of tag:prob pairs
      predictions_dict = dict(zip(tags, probabilities))
      # sort the tags in alphabetical order, append the corresponding prob of the sorted tags
      predictions_list = [predictions_dict[i] for i in sorted(predictions_dict)]
      # add image_ID to the beginning of the list
      predictions_list.insert(0, image_ID)
      # append the sorted list to a list as a compound list
      temp_list.append(predictions_list)
    
      for i in sorted(predictions_dict) : 
        print(f'{i}: {predictions_dict[i]:.2f}%') 
      print()

image_ID: merged_191120100001_B02f220_2
diffused: 61.53%
junk: 0.00%
non_nuclear: 38.44%
nuclear: 0.03%

image_ID: merged_191120100001_B02f73_0
diffused: 9.87%
junk: 0.03%
non_nuclear: 69.43%
nuclear: 20.67%

image_ID: merged_191120100001_B02f163_0
diffused: 0.00%
junk: 0.00%
non_nuclear: 0.00%
nuclear: 99.99%

image_ID: merged_191120100001_B02f113_4
diffused: 63.26%
junk: 0.30%
non_nuclear: 1.80%
nuclear: 34.64%

image_ID: merged_191120100001_B02f169_8
diffused: 7.27%
junk: 2.64%
non_nuclear: 88.47%
nuclear: 1.63%

image_ID: merged_191120100001_B02f87_1
diffused: 10.47%
junk: 0.44%
non_nuclear: 71.31%
nuclear: 17.77%

image_ID: merged_191120100001_B02f42_0
diffused: 23.98%
junk: 0.00%
non_nuclear: 3.94%
nuclear: 72.07%

image_ID: merged_191120100001_B02f43_1
diffused: 16.41%
junk: 0.00%
non_nuclear: 41.74%
nuclear: 41.85%

image_ID: merged_191120100001_B02f67_4
diffused: 51.73%
junk: 11.81%
non_nuclear: 2.34%
nuclear: 34.12%

image_ID: merged_191120100001_B02f241_0
diffused: 22.13%
ju

convert prediction results into a dataframe

In [None]:
col_names = [i for i in sorted(predictions_dict)]
col_names.insert(0, 'image_ID')
df_cls = pd.DataFrame(temp_list, columns = col_names)
df_cls.head(10)

Unnamed: 0,image_ID,diffused,junk,non_nuclear,nuclear
0,merged_191120100001_B02f220_2,61.52738,0.002915442,38.44033,0.029383
1,merged_191120100001_B02f73_0,9.870518,0.02971239,69.43301,20.666759
2,merged_191120100001_B02f163_0,0.002624,1.583634e-07,0.003244,99.994135
3,merged_191120100001_B02f113_4,63.25912,0.3011508,1.799152,34.640583
4,merged_191120100001_B02f169_8,7.26673,2.637421,88.465077,1.630769
5,merged_191120100001_B02f87_1,10.472993,0.4408319,71.31443,17.771743
6,merged_191120100001_B02f42_0,23.981301,0.0008397999,3.944585,72.07327
7,merged_191120100001_B02f43_1,16.409093,0.0003901587,41.74235,41.848165
8,merged_191120100001_B02f67_4,51.734036,11.80673,2.342854,34.116375
9,merged_191120100001_B02f241_0,22.132435,0.210284,73.97976,3.677519


save the dataframe to a csv file

In [None]:
# change the file name
df_cls.to_csv('classification_results.csv', index=False)