# Session 7- More Difficult Classifiers
*Goal: Learn how to classify cells in more difficult cases requiring new measurements and feature selection*

### 1. Preparation 
1. Load your project from yesterday with the CD45 classifier. If you did not finish the classifier, you can download and open "Proj1 - Session 7" from the Backup Projects folder. This has all of the cells, measurements, annotations, and classifiers we made yesterday. 

2. Immediately save a copy of the project. We will delete some cells and the classifier training annotations and want to keep the option of recalling them later.

3. In each image, clear the training data. 
  1. Delete the CD45 and Ignore* training annotations, by selecting the first, holding shift, selecting the last, then click <kbd>Delete</kbd>. 
  2. Delete the CD45- cells, which have no class (called "None" in the class list.) In the class list in the Annotation tab, right click on None > Select objects by classification > <kbd>Delete</kbd>

<img src="Images/DeleteTraining.gif">

4. The tumor annotation will be useful later, but for now it is visually distracting. To hide it without deleting it, select the Tumor class in the Annotation tab and press <kbd>Space Bar</kbd>. 

## 2. Training classifiers for T cell subsets
We now have a project with the full tissue region, the tumor region, and the CD45-bright leukocytes segmented. From here we are going to separate the CD45s into subsets. 

1. First, let's add the measurements will we need for the classifiers for today. We're eventually going to classify based on CD4, CD8, PD1, FoxP3. Let's add those measurements to the cells- but not just mean intensity, include StDev, Min/Max, and [Haralick features](https://en.wikipedia.org/wiki/Co-occurrence_matrix). 
  1. `Objects > Select > Select detections > Select cells`
  2. `Analyze > Calculate features > Add intensity features`
  3. Select these options and then click <kbd>Run</kbd> and then <kbd>OK</kbd> in the next popup:
  
  <img src="Images/TCellMeasurements.PNG">
2. You now have ~73 measurements for each cell. Save the file!
3. Create a display setting that shows CD4 and CD8 in contrasting colors. Save it. 
4. Add 2 new classes to your Annotations tab: 'CD4' and 'CD8'. `Right click > Add/remove > Add class`. 
  <img src="Images/AddClass.PNG">
4. If your display is still overwhelming, I recommend you show only cell boundaries. Anywhere on the image `right click > Cells > Cell boundaries only`
5. [Annotate examples](./Session%206%20-%20Classifying%20Cells%20pt1.ipynb#3.-Finding-Leukocytes---Training-an-ML-Object-Classifier) of CD4, CD8, and double negative cells. 
  - For the negative cells, **use the 'Ignore*' class**
  - Use the brush or [Points tool](https://qupath.readthedocs.io/en/0.5/docs/starting/cell_counting.html#clicking-cells) to annotate. 
  - Particularly focus on cells that are neighboring different cell types- these are the most challenging for the classifier and require the most human-annotated example data!
  - Annotations only matter if they are over segmented objects. If a cell was deleted in step 1, it is gone now and *cannot* be classified or used as training. 
6. When you've gotten a bunch of examples of each class (CD4, CD8, and Ignore*), `Classify > Object classification > Train object classifier`. Hit <kbd>Live update</kbd> . 
7. View the feature weights. 
  1. Turn on weight calculation by clicking <kbd>Edit</kbd> next to the 'Random Trees (RTrees)" dropdown. Then check <kbd>Calculate variable importance</kbd> and clicking <kbd>OK</kbd>
    <img src="Images/RTEdit.PNG">
    
  2. Show the log
   <img src="Images/ShowLog.PNG">
  3. The second from the last line in the Log will be the list of feature weights. Click on it to view them in the bottom window. 
   <img src="Images/FeatureWeights.PNG">
   The top of the list (largest numbers) are the features being weighted most heavily. The bottom are the least useful. We want the classifier to focus on the meaningful features, and not those that are mostly random/noise/irrelevant/biasing. <br><br>
8. Remove any features that are distracting the classifier. In this case, that is all measurements using the PD1, FoxP3, and S100a channels. 
  1. In the "Train object classifier" window, in the Features dropdown, select "Selected measurements" then click <kbd>Select</kbd>
     <img src="Images/SelectFeatures.PNG">
  2. Click <kbd>Select all</kbd>
  3. Type "PD1" in the search bar. Click <kbd>Select None</kbd>
  4. Repeat for FoxP3 and S100a. 
  5. **You MUST clear the search bar** when you're done. After you clear the search bar, click <kbd>Apply</kbd>
      <img src="Images/ChooseFeatures.gif">
9. In the log, there will be a new feature weight list. The relevant features will be weighted more heavily. This is a good "sanity check" on your classifier
  - Try removing various features from the classifier to see how it affects the results. Is your accuracy better or worse with the Haralick features? 
10. Iterate until you're mostly satisfied with the CD4 and CD8 calls. Then, save the classifier as "CD4_CD8" 
  - *In a real project, you should train on multiple images.* Today, for time's sake, we will focus on a single image. To review how to include multiple sets of training data, see [here](./Session%206%20-%20Classifying%20Cells%20pt1.ipynb#4.-Adding-measurements-to-cells-in-multiple-images). 
11. Save the file!


### 3. Classify on T cell activation state markers
#### PD1
1. Save the file again, just because. Then, duplicate the image, including the data file, and name it "CD4_CD8 training"
2. Reset the image to begin training a new classifier
  1. Delete the training annotations. (This time you cannot use the "Select objects by classification" function, because you will end up deleting the cells.  Instead, just select the CD4, CD8, and Ignore* annotations in the Annotations tab and press <kbd>Delete</kbd>.) Save the file. 
  2. Close the "Train object classifier" window. 
  3. Turn off the CD4 and CD8 channels in the display settings. Turn on the PD1 channel and set the display settings such that you can see dim cells. 
  4. Create a PD1 class in the Annotations tab. 
3. Repeat the above steps to train a PD1 classifier. Annotate examples of PD1 and Ignore* cells, select features, iterate. 
  - Many cells are dim. YOU must decide what is "positive enough" to count. QuPath can learn a consistent set of rules from your training examples, but it cannot decide for you what is real signal. 
4. Save the classifier as "PD1". 
<br>

#### FoxP3
5. Save the image file. Then Duplicate it and name it "PD1 training". 
6. Reset the image as Step 2 above. 
7. Make a FoxP3 class and turn on the FoxP3 channel. 
8. Create a [single measurement classifier](./Session%206%20-%20Classifying%20Cells%20pt1.ipynb#2.-Finding-and-removing-anuclear-cells---Single-Measurement-Classifier) using the FoxP3 mean intensity ("ROI: 0.33 µm per pixel: FoxP3: Mean")
  1. If you select "FoxP3" in the channel filter, QuPath will automatically fill in the class name (if it exists), classifier name, and suggest the mean intensity measurement.
  <img src="Images/ChannelAutocomplete.gif">
  2. Adjust the threshold until you separate the FoxP3+ cells from the negative. 


  
We now have 3 separate classifiers: To identify CD4 or CD8 cells, to identify PD1+/- cells and FoxP3+/- cells. We'll combine these next. 