# MOTHER-DB project

Image segmentation of ovarian follicles, part of https://mother-db.org/

**Program 1, Train Image Generation.ipynb, Version 5_3,  March 7, 2025:**<BR>
Creation of sub-images from a set of annotated histology images and files of annotations.

## About this project
This project developed uses AI/ML techniques to segment histology images from the ovaries of nonhuman primates. Specifically, this suite of programs attempts to identify the following six follicle types: 
 1. Primordial
 1. Transitional Primordial
 1. Primary
 1. Transitional Primary
 1. Secondary
 1. Multilayer
 
The follicle type definitions are based on the recommendations of the NICHD-Sponsored Ovarian Nomenclature 
Workshop committee for primates.
 1. Yano Maher JC, Zelinski MB, Oktay KH, Duncan FE, Segars JH, Lujan ME, Lou H, Yun B, Hanfling SN, Schwartz LE, 
Laronda MM, Halvorson LM, O'Neill KE, Gomez-Lobo V. Fertil Steril. 2024 Nov 14:S0015-0282(24)02394-X. 
doi: 10.1016/j.fertnstert.2024.11.016. Epub ahead of print. PMID: 39549739. https://pubmed.ncbi.nlm.nih.gov/39549739/  
 
## About this program module

This program takes as input a set of ovarian histology slides along with annotation files and creates sub-images of 
each annotation. In addition, the program creates "augmentations" of the images by rotating, flipping and offsetting 
the images. The output is a set of folders, one for each follicle type, containing the sub-images. 

In addition, this program creates a set of "negative" sub-images, which are random samples of the original image but 
not too near an existing annotation. The minimal allowed distance between a randomly selected negative image 
and an existing image is set to be 1/4 of the sub-image width (50 pixels for sub-images that are 200 pixels wide). 
The number of negatives generated is based on the number of actual annotations, as defined by the parameter 
`negatives_per_annot`. A "blank" image is defined as any sub-image that has a standard deviation of the pixels in 
the sub-image that is less than a threshold. Depending on the amount of non-tissue area in the original slide, this 
process can generate an excessive number of blank images. To counter this, the number of blank negatives is reduced. 

The output sub-images filenames include information necessity to trace a particular sub-image back to the 
original annotation and histology slide. 

## About the data

The data consists of paired images and annotation files. For example, the image file `14736_UN_050a.ome.tif` and 
its paired annotations file `14736_UN_050a.annotations.txt`. Note that all the image/annotation file pairs are 
in the same directory.

The path to the paired images, along with other information such as image resolution, is described in the `parameters.py` file.
 

## About the MOTHER project and MOTHER-DB

The Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) provides public access to digitized microscopic images of ovary tissues along with information that ensures image integrity and quality. Currently, there is no electronic repository of ovary histology slides that preserves these valuable research collections for future generations. MOTHER is a web-accessible, open resource for scientists, educators, and the public to stimulate collaboration and scientific research. Educators may use the slide images in a range of courses from reproductive biology to teaching computerized image analysis.

Biology is increasingly dependent upon quantitative data analysis, and MOTHER should inspire computational thinking in biology broadly, while developing specific skills in microscopy, computer programming, and data and image analysis.

## License For Use

This work is licensed under CC BY-NC-SA 4.0. To view a copy of this license, 
visit https://creativecommons.org/licenses/by-nc-sa/4.0/


## Funding

MOTHER-DB, and this project was funded by 
 * Grant “CIBR Multispecies Ovary Tissue Histology Electronic Repository (MOTHER)” from the National Science Foundation (NSF DBI-2054061, 2021 – 2024). 
 * Indiana University, Faculty Assistance in Data Science (FADS) Project
 * Arizona State University

## Contributors
Many people have contributed to this project:

 * Code development
   * James Sluka, Indiana University
   * Karen Watanabe, Arizona State University
   * Riley Israels, Arizona State University
   * Parth Ravindra Rao, Indiana University 
   * Param Nagda, Indiana University
   * Colette Lund, Arizona State University

 * Training data creation
   * Mary Zelinski, Oregon National Primate Research Center
   * Karen Watanabe, Arizona State University
   * Numerous Arizona State undergraduate and graduate students
 
## Program notes

<div class="alert alert-block alert-danger">
<b>Special Note:</b> <br>
The annotations from QuPath and the MOTHER annotators in the text files give annotation centers in micrometers 
from the origin and **not** in pixels from the origin.
</div>

### An input file 'filename' produces the following set of augmentation image files (16 files per annotation)
For one of the annotations in the original image file `21930_LT_060a.ome`, of `Primordial` type, with center 
(in microns, not pixels) of `x2643_y8070` and sub-image window 200 pixels wide:

<pre style="padding: 0px; margin: 0px;" >

21930_LT_060a.ome_Primordial_x2643_y8070_w200orig_ang0.png           # original image, centroid is 2643,8070 um
21930_LT_060a.ome_Primordial_x2643_y8070_w200orig_ang180.png         # rotated 180 degrees (CCW)
21930_LT_060a.ome_Primordial_x2643_y8070_w200orig_ang270.png         # rotated 270 degrees (CCW)
21930_LT_060a.ome_Primordial_x2643_y8070_w200orig_ang90.png          # rotated  90 degrees (CCW)
21930_LT_060a.ome_Primordial_x2643_y8070_w200orig_horiz_ang0.png     # original image, flipped horizontally
21930_LT_060a.ome_Primordial_x2643_y8070_w200orig_horiz_ang180.png   # ... then rotated 180 degrees (CCW)
21930_LT_060a.ome_Primordial_x2643_y8070_w200orig_horiz_ang270.png
21930_LT_060a.ome_Primordial_x2643_y8070_w200orig_horiz_ang90.png

21930_LT_060a.ome_Primordial_x2643_y8070_w200offs-07-05_ang0.png    # image offset by -7,-5 pixels (randomly selected)
21930_LT_060a.ome_Primordial_x2643_y8070_w200offs-07-05_ang180.png  # ... then rotated 180 (CCW)
21930_LT_060a.ome_Primordial_x2643_y8070_w200offs-07-05_ang270.png
21930_LT_060a.ome_Primordial_x2643_y8070_w200offs-07-05_ang90.png
21930_LT_060a.ome_Primordial_x2643_y8070_w200offs-07-05_horiz_ang0.png    # image offset, then flipped horizontally
21930_LT_060a.ome_Primordial_x2643_y8070_w200offs-07-05_horiz_ang180.png  # ... then rotated 180 (CCW)
21930_LT_060a.ome_Primordial_x2643_y8070_w200offs-07-05_horiz_ang270.png
21930_LT_060a.ome_Primordial_x2643_y8070_w200offs-07-05_horiz_ang90.png

</pre>
#### In addition, "Negative" files are created with similar naming conventions.
    
The file name contains "Negative" in place of a follicle type. The number of negatives generated is based on the 
number of actual annotations. This is set by the parameter `negatives_per_annot`, which is typically set to 4. Each
Negative image is also augmented in the same way the regular annotations are.

#### Other options for generating augmentations:

##### make_flipped_rgb

The option `make_flipped_rgb`, if set to be `True`, will generate an additional set of 16 images per annotation where the 
red and blue color channels in the sub-image are swapped. The resulting filenmes will have __flippedRGB__ as the last
part of their filename.

0381_RT_200c.ome_Multilayer_x9508_y6001_w200orig_ang0__flippedRGB__.png


##### make_grayscale

The option `make_grayscale`, if set to be `True`, will generate an aditonal set of 16 images per annotation where 
sub-image has been converted to gray scale. The resulting filenmes will have __grayscale__ as the last
part of their filename.
    
30381_RT_200c.ome_Multilayer_x9508_y6001_w200orig_ang0_grayscale.png
    

If both `make_flipped_rgb` and `make_grayscale` options are selected then a total of 48 images are created for 
each annotation. 

The `parameters.py` file contains the paths to folders containing the histology images and annotations. In addition, this file contains the image resolution for each file.

The `paramters.py` file includes an array, **types**, consisting of the names of different follicle types. Also, another 
list is created named **file_list** where multiple dictionaries are created. Each dictionary entry consists of the path of 
where the image and annotation file is and the conversion ratio to convert the x and y coordinates to pixel values.

### Outputs
This program creates a time stamped folder (e.g., `Train Images_2025-03-08_23-41-39`), The 
`Train Images_YYYY-MM-DD_hh-mm-ss` folder will contain subfolders for each of the follicle types being processed. 
Within those folder will be sub-images. This folder is located in the same folder as this code.

In addtion, a text file ('work_log.txt') summarizing the process, a file that contains the path to the projectd notes
folder (`work_log_filename.txt`), and an HTML version of the final state of this jupyter notebook are also created.

### Changes:

<div class="alert alert-block alert-danger">
<b>Special Note:</b> <br>
The sub-window size is defined for each follicle type in the parameters.py file. This should work well for 
single class classifiers ("hotdog-not-a-hotdog"). But for <b>multi-class classifiers it is likely that all 
the sub-window sizes should be the same.</b> This can be done in a project-specific parameters.py file.
</div>

## Imports

In [1]:
import pandas as pd
import numpy as np
import cv2  ###  opencv-python image handling
import os
import glob
import random
import imutils  ### additional image handling tools
import shutil
import time
import datetime
import sys
# Install conda package for ipylab in the current Jupyter kernel
### !conda install --yes --prefix {sys.prefix} -c conda-forge ipylab
from ipylab import JupyterFrontEnd

import parameters  ### parameters.py contains info for this particular MOTHER ML segmentation project

C:\Users\jsluka\OneDrive - Indiana University\Desktop\Work\Watanabe ovary 2021\MOTHER\AIML_code_for_Paper\Program 1
file_base:
 ./TrainingData_20250409/


In [2]:
make_flipped_rgb = False  # False
make_grayscale = False  # False
negatives_per_annot = 4

### Echo some info from the parameters.py file 

In [3]:
print(dir(parameters))
print('\nparameters.angles:\n',parameters.angles)
print('\nparameters.file_list:')  
print("%2s %-50s %-50s %s" % ("#","Image path","Annotation path","Resolution"))
for iii in range(len(parameters.file_list)):
    print("%2i %-50s %-50s   %5.3f" \
          % (iii+1,parameters.file_list[iii]["Image path"],parameters.file_list[iii]["Annotation path"],parameters.file_list[iii]["Resolution"],))
print('\nparameters.types:')    
for k in parameters.types:
    print('%25s  %50s' % (k,parameters.types[k]))

['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'angles', 'create_folder', 'datetime', 'file_base', 'file_list', 'os', 'prediction_follicle', 'types']

parameters.angles:
 [90, 180, 270]

parameters.file_list:
 # Image path                                         Annotation path                                    Resolution
 1 ./TrainingData_20250409/14736_UN_050a.ome.tif      ./TrainingData_20250409/14736_UN_050a.annotations.txt   0.690
 2 ./TrainingData_20250409/16418_UN_140b.ome.tif      ./TrainingData_20250409/16418_UN_140b.annotations.txt   0.690
 3 ./TrainingData_20250409/19006_UN_020a.ome.tif      ./TrainingData_20250409/19006_UN_020a.annotations.txt   0.690
 4 ./TrainingData_20250409/25058_LT_005a.ome.tif      ./TrainingData_20250409/25058_LT_005a.annotations.txt   0.690
 5 ./TrainingData_20250409/25065_LT_010a.ome.tif      ./TrainingData_20250409/25065_LT_010a.annotations.txt   0.690
 6 ./TrainingData_20250409/25081_LT

## Inputs
Enter the follicle type that will be processed (for "hotdog-not-a-hotdog" single classifiers).<br>

__Note 1: This may be overriden by setting in the parmaters.py file. In particular, when doing multiclassifictions the list of follicle types is defined in the parameters.py file.__

__Note 2: You can enter any follicle type if you are doing a multiclassifier. But the sub-image `window_size` and `offset` will be based on the follicle type entered here. For the 6 small follicle types enter _"Primordial"_ or just hit enter to accept the default.__

In [4]:
follicle_type = parameters.prediction_follicle()  # this is a function in the parameters.py file
follicle_type2 = follicle_type
if follicle_type == "Multiple":
    follicle_type = "Primary"

Choose a follicle type from:
    dict_keys(['Primordial', 'Transitional Primordial', 'Primary', 'Transitional Primary', 'Secondary', 'Multilayer']) 
or "Multiple"
Enter the type of follicle for the model[default=Multiple]:Primary

prediction_follicle type =  Primary


## Functions
Here a function called **offset_images** is created where a subimage of a follicle type is moved by x and y pixel which are randomly selected from a range of 1 to *offset_length* (typically 1 to 25) and n random offsets are created.

In [5]:
def offset_images(n, centroids, img, offset_length):
    i = 0
    while i < n:
        #x = random.randrange(-1,2,2)*random.randint(1,int(offset_length))
        #y = random.randrange(-1,2,2)*random.randint(1,int(offset_length))
        x = random.choice((-1,1))*random.randint(1,int(offset_length))  # 5_2: choice is easier to understand than randrange
        y = random.choice((-1,1))*random.randint(1,int(offset_length))  # 5_2: choice is easier to understand than randrange
        #if 0 < x < len(img[0]) and 0 < y < len(img):
        if 0 < centroids[0][0]+y < img.shape[0] and 0 < centroids[0][1]+x < img.shape[1]:  # 5_2
            centroids.append((centroids[0][0]+y, centroids[0][1]+x))
            i += 1

    return centroids

Here the **random_image_generation** function is created to produce random images that are not near images of any follicle type. 
* We have make sure that the center coordinates of the random image are *minOffsetExistingAnnot* pixels away from any of the subimages of a follicle. 
  * ONLY need to check the distances to images used for this particular classifier. 
  * Do not need to check distance to say the Corpus Leuteum.
* No image can be close than *minOffsetImageEdge* to the edge of the full image
* Each time this is called it returns ONE image 

Input and output
* *img* is the full image
* *df* is the dataframe containing all the annotation's coordinates
* returns the x,y cordinates of an aacceptable random image for use as a negative.

___The distance of the center of the random image to any existing image is currently fixed at 50 pixels. This might be too close for the larger follicles like antral. Perhaps the minimum distance should be based on the follicle's sub-window size. For example, Primordial are 100 pixesl wide so a 50 pixel offset it pretty big. But Antrals are 1800 pixesl wide, so 50 pixels is too small. So perhaps instead use a minimum distance that is 50% of the sub-window size for the follicle type?___

In [6]:
def random_image_generation(img, df, width):
    minOffsetImageEdge = int(width / 2) + int(width * 0.1) # was 100 in 5_1, no sub-image center can be closer than this to the edge of the full image
    minOffsetExistingAnnot = int(width / 4) #+ int(width * 0.1) # no sub-image center can be closer than this to the center of an annotation

    total_annots = df[['Centroid X pixels', 'Centroid Y pixels']].dropna().to_numpy()
    
    valid = False
    fails = 0
    
    while not valid:
        random_row = np.random.randint(minOffsetImageEdge, img.shape[0] - minOffsetImageEdge)
        random_col = np.random.randint(minOffsetImageEdge, img.shape[1] - minOffsetImageEdge)

        point_compare = []
        
        for annot in total_annots:
            point_compare.append([not (annot[0] - minOffsetExistingAnnot <= random_col <= annot[0] + minOffsetExistingAnnot), not (annot[1] - minOffsetExistingAnnot <= random_row <= annot[1] + minOffsetExistingAnnot)])

        point_compare = np.array(point_compare)

        valid = point_compare.any(axis = 1).all()
        
        (_, _, ratio) = ImageMeanSD(img[random_row - int(width / 2):random_row + int(width / 2), random_col - int(width / 2):random_col + int(width / 2), :])
        
        #Retry 95% of blank images
        if ratio < 0.02 and np.random.random() > 0.05:
            valid = False
        
        fails += 1
        if fails > 1000:
            #print("Failure")
            return 0, 0
    
    #print("Success")
    return random_row, random_col
    
    #print('random_image_generation: random_row, random_col',random_row, random_col)
    #return random_row, random_col

The **data** function is the main function where all the subimages are generated. Here, we read the image and annotation file from file_list, and convert the x and y coordinates to pixel values. We also create a new folder apart from the folders of each follicle called Negative that consists of the random images generated using random_image_generation function. Now, we get the x,y coordinates from the annotation file and obtain a 150x150 (actual sized defined in the paramters.py file) subimage with the x,y coordinate being the center coordinate. Offset images are generated by offsetting the subimage by random pixels as mentioned above. 

Along with the original subimage and offset images we also generate random images. The random images are stored in the negative images folder and the rest of the images are stored in the folder of the follicle type of the original subimage. For all the images horizontal flips are obtained and each subimage and its horizontal subimage are rotated by 90, 180 and 270 degrees. _The default in the data call is that one random image is generated for each input annotation?_

In [7]:
def data(im_path, ann, conv_ratio, width, n=1):
    img = cv2.imread(im_path)
    print("      image dimensions:",img.shape)
    df = pd.read_csv(ann, sep='\t')

    #Grab only follicles in parameters.types from annotations file
    df = df[df['Class'].isin(parameters.types.keys())].reset_index(drop = True)
    
    df = df.drop(['Image','Parent','Class','Num points'],axis=1)
    df['Centroid X pixels'] = df['Centroid X µm'] / conv_ratio
    df['Centroid Y pixels'] = df['Centroid Y µm'] / conv_ratio
    img_name = im_path.split('.tif')[0]
    img_name = os.path.basename(img_name) # needed since we have a path to the orignal images and annotations
    print("      lines in annotations=",len(df))
    
    if 'Negative' not in os.listdir(Work_Folder):
        path_random = os.path.join(Work_Folder,'Negative')
        os.mkdir(path_random)

    subImageFails = 0
    subImageFailsList = []

    #negative_centers = []
    
    for i in range(len(df)):
        if df.loc[i]['Name'] in parameters.types:  # is the annotation for one of the named follicle types?
            if df.loc[i]['Name'] not in os.listdir(Work_Folder): # create folder for this follicle type if it does not already exist
                path = os.path.join(Work_Folder,df.loc[i]['Name'])
                os.mkdir(path)
                print("         new dir:",path)

            col = int(df.loc[i]['Centroid X pixels'])
            row = int(df.loc[i]['Centroid Y pixels'])
            centroids = [(row,col)]

            path = os.path.join(Work_Folder,df.loc[i]['Name'], img_name)
            path_random = os.path.join(Work_Folder,'Negative', img_name)

            # update the centroids with the random offsets etc.
            centroids = offset_images(1, centroids, img, offset_length = parameters.types[follicle_type]['offset'])

            for j in range(len(centroids)):
                if    (parameters.types[follicle_type]['window_size']//2 < centroids[j][0] < len(img)    - parameters.types[follicle_type]['window_size']//2) \
                  and (parameters.types[follicle_type]['window_size']//2 < centroids[j][1] < len(img[0]) - parameters.types[follicle_type]['window_size']//2):
                    parameters.types[df.loc[i]['Name']]['coordinates'].append(centroids[j])
                    if j == 0:
                        tag = 'orig'
                        subimg = img[centroids[j][0]-int(width/2):centroids[j][0]+int(width/2), centroids[j][1]-int(width/2):centroids[j][1]+int(width/2), :]
                        img_h = cv2.flip(subimg,1)
                        fpath = path + '_' + df.loc[i]['Name'] + '_x' + str(col) + '_y' + str(row) + '_w' + str(width) + tag
                        cv2.imwrite(fpath + '_ang0.png',subimg)
                        cv2.imwrite(fpath + '_horiz_ang0.png',img_h)

                        if make_flipped_rgb:
                            #Flip RGB values and save -RI
                            cv2.imwrite(fpath + '_ang0_flippedRGB.png', cv2.cvtColor(subimg, cv2.COLOR_BGR2RGB))
                            cv2.imwrite(fpath + '_horiz_ang0_flippedRGB.png', cv2.cvtColor(img_h, cv2.COLOR_BGR2RGB))

                        if make_grayscale:
                            #Grayscale images and save -RI
                            cv2.imwrite(fpath + '_ang0_grayscale.png', cv2.cvtColor(cv2.cvtColor(subimg, cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                            cv2.imwrite(fpath + '_horiz_ang0_grayscale.png', cv2.cvtColor(cv2.cvtColor(img_h, cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                            
                        for k in parameters.angles:
                            cv2.imwrite(fpath + '_ang' + str(k) + '.png',imutils.rotate(subimg,angle=k))
                            cv2.imwrite(fpath + '_horiz_ang' + str(k) + '.png',imutils.rotate(img_h,angle=k))

                            if make_flipped_rgb:
                                #Flip RGB values and save -RI
                                cv2.imwrite(fpath + '_ang' + str(k) + '_flippedRGB.png', cv2.cvtColor(imutils.rotate(subimg,angle=k), cv2.COLOR_BGR2RGB))
                                cv2.imwrite(fpath + '_horiz_ang' + str(k) + '_flippedRGB.png', cv2.cvtColor(imutils.rotate(img_h,angle=k), cv2.COLOR_BGR2RGB))

                            if make_grayscale:
                                #Grayscale images and save -RI
                                cv2.imwrite(fpath + '_ang' + str(k) + '_grayscale.png', cv2.cvtColor(cv2.cvtColor(imutils.rotate(subimg,angle=k), cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                                cv2.imwrite(fpath + '_horiz_ang' + str(k) + '_grayscale.png', cv2.cvtColor(cv2.cvtColor(imutils.rotate(img_h,angle=k), cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                    
                    else:
                        tag = 'offs'
                        #print('\t\tcent off centroids[j]:',centroids[j])
                        offset_valsStr = str(centroids[0][1]-centroids[j][1]).zfill(3) + str(centroids[0][0]-centroids[j][0]).zfill(3)  # 5_2 the x and y offsets paded with leading zero
                        #print('\t\toffset_valsStr:',offset_valsStr)
                        subimg = img[centroids[j][0]-int(width/2):centroids[j][0]+int(width/2), centroids[j][1]-int(width/2):centroids[j][1]+int(width/2), :]
                        img_h = cv2.flip(subimg,1)
                        fpath = path + '_' + df.loc[i]['Name'] + '_x' + str(col) + '_y' + str(row) + '_w' + str(width)+ tag
                        cv2.imwrite(fpath + offset_valsStr + '_ang0.png',subimg)
                        cv2.imwrite(fpath + offset_valsStr + '_horiz_ang0.png',img_h)

                        if make_flipped_rgb:
                            #Flip RGB values and save -RI
                            cv2.imwrite(fpath + offset_valsStr + '_ang0_flippedRGB.png', cv2.cvtColor(subimg, cv2.COLOR_BGR2RGB))
                            cv2.imwrite(fpath + offset_valsStr + '_horiz_ang0_flippedRGB.png', cv2.cvtColor(img_h, cv2.COLOR_BGR2RGB))

                        if make_grayscale:
                            #Grayscale images and save -RI
                            cv2.imwrite(fpath + offset_valsStr + '_ang0_grayscale.png', cv2.cvtColor(cv2.cvtColor(subimg, cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                            cv2.imwrite(fpath + offset_valsStr + '_horiz_ang0_grayscale.png', cv2.cvtColor(cv2.cvtColor(img_h, cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                            
                        for k in parameters.angles:
                            cv2.imwrite(fpath + offset_valsStr + '_ang' + str(k) + '.png',imutils.rotate(subimg,angle=k))
                            cv2.imwrite(fpath + offset_valsStr + '_horiz_ang' + str(k) + '.png',imutils.rotate(img_h,angle=k))

                            if make_flipped_rgb:
                                #Flip RGB values and save -RI
                                cv2.imwrite(fpath + offset_valsStr + '_ang' + str(k) + '_flippedRGB.png', cv2.cvtColor(imutils.rotate(subimg,angle=k), cv2.COLOR_BGR2RGB))
                                cv2.imwrite(fpath + offset_valsStr + '_horiz_ang' + str(k) + '_flippedRGB.png', cv2.cvtColor(imutils.rotate(img_h,angle=k), cv2.COLOR_BGR2RGB))

                            if make_grayscale:
                                #Grayscale images and save -RI
                                cv2.imwrite(fpath + offset_valsStr + '_ang' + str(k) + '_grayscale.png', cv2.cvtColor(cv2.cvtColor(imutils.rotate(subimg,angle=k), cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                                cv2.imwrite(fpath + offset_valsStr + '_horiz_ang' + str(k) + '_grayscale.png', cv2.cvtColor(cv2.cvtColor(imutils.rotate(img_h,angle=k), cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                    
                else: # annotation is too close to edge of fullimage to make a subwindow    
                    print('\tAnnotation to close to image edge:   ',centroids[j])
                    subImageFails += 1
                    subImageFailsList.append(centroids[j])
                    
                    
            # generate random images ('Negatives')
            for randomCount in range(negatives_per_annot):  # generate this many randoms for each real annotation
                # random_row, random_col = random_image_generation(img, df, n) # version 5_0
                random_row, random_col = random_image_generation(img, df, width) # version 5_1        
                subimg = img[random_row-int(width/2):random_row+int(width/2), random_col-int(width/2):random_col+int(width/2), :]

                #negative_centers.append([random_row, random_col])
                
                # is this an image of just the slide background?
                #(mean,sd,ratio) = ImageMeanSD(subimg) #Attempting to handle in random_image_generation()
                # accept all images with ratio of sd/mean > 0.1, but only 10% with sd/mean <= 0.1 (omitting some blank images)
                #if ratio > 0.01 or np.random.random() <= 0.10:
                if True: #Placeholder to prevent major formatting changes
                    tag = 'orig'  # 5_2
                    img_h = cv2.flip(subimg,1)
                    # version 5_1: changed 'row' and 'col' inthe below to 'random_row' and 'random_col'.  JPS
                    cv2.imwrite(path_random + '_' + 'Negative' + '_x' + str(random_col) + '_y' + str(random_row) + '_w' + str(width)+ tag + '_ang0.png',subimg)
                    cv2.imwrite(path_random + '_' + 'Negative' + '_x' + str(random_col) + '_y' + str(random_row) + '_w' + str(width)+ tag + '_horiz_ang0.png',img_h)
                    
                    if make_flipped_rgb:
                        #Flip RGB values and save -RI
                        cv2.imwrite(path_random + '_' + 'Negative' + '_x' + str(random_col) + '_y' + str(random_row) + '_w' + str(width)+ tag + '_ang0_flippedRGB.png', cv2.cvtColor(subimg, cv2.COLOR_BGR2RGB))
                        cv2.imwrite(path_random + '_' + 'Negative' + '_x' + str(random_col) + '_y' + str(random_row) + '_w' + str(width)+ tag + '_horiz_ang0_flippedRGB.png', cv2.cvtColor(img_h, cv2.COLOR_BGR2RGB))

                    if make_grayscale:
                        #Grayscale images and save -RI
                        cv2.imwrite(path_random + '_' + 'Negative' + '_x' + str(random_col) + '_y' + str(random_row) + '_w' + str(width)+ tag + '_ang0_grayscale.png', cv2.cvtColor(cv2.cvtColor(subimg, cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                        cv2.imwrite(path_random + '_' + 'Negative' + '_x' + str(random_col) + '_y' + str(random_row) + '_w' + str(width)+ tag + '_horiz_ang0_grayscale.png', cv2.cvtColor(cv2.cvtColor(img_h, cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                        
                    for j in parameters.angles:
                        cv2.imwrite(path_random + '_' + 'Negative' + '_x' + str(random_col) + '_y' + str(random_row) + '_w' + str(width) + tag + '_ang' + str(j) + '.png',imutils.rotate(subimg,angle=j))
                        cv2.imwrite(path_random + '_' + 'Negative' + '_x' + str(random_col) + '_y' + str(random_row) + '_w' + str(width) + tag + '_horiz_ang' + str(j) + '.png',imutils.rotate(img_h,angle=j))

                        if make_flipped_rgb:
                            #Flip RGB values and save -RI
                            cv2.imwrite(path_random + '_' + 'Negative' + '_x' + str(random_col) + '_y' + str(random_row) + '_w' + str(width) + tag + '_ang' + str(j) + '_flippedRGB.png', cv2.cvtColor(imutils.rotate(subimg,angle=j), cv2.COLOR_BGR2RGB))
                            cv2.imwrite(path_random + '_' + 'Negative' + '_x' + str(random_col) + '_y' + str(random_row) + '_w' + str(width) + tag + '_horiz_ang' + str(j) + '_flippedRGB.png', cv2.cvtColor(imutils.rotate(img_h,angle=j), cv2.COLOR_BGR2RGB))

                        if make_grayscale:
                            #Grayscale images and save -RI
                            cv2.imwrite(path_random + '_' + 'Negative' + '_x' + str(random_col) + '_y' + str(random_row) + '_w' + str(width) + tag + '_ang' + str(j) + '_grayscale.png', cv2.cvtColor(cv2.cvtColor(imutils.rotate(subimg,angle=j), cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                            cv2.imwrite(path_random + '_' + 'Negative' + '_x' + str(random_col) + '_y' + str(random_row) + '_w' + str(width) + tag + '_horiz_ang' + str(j) + '_grayscale.png', cv2.cvtColor(cv2.cvtColor(imutils.rotate(img_h,angle=j), cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                            
                    
                    # 5_2 added in the missing random offsets for the negative images
                    tag = 'offs'  # 5_2
                    # below is done here for the random offsets for the negatives, this should
                    # match the code that does random offsets for the regular annotations.  ooooooooooooooooooooooooooo
                    offset_length = parameters.types[follicle_type]['offset']
                    x = random.choice((-1,1))*random.randint(1,int(offset_length))  # 5_2: choice is easier to understand than randrange
                    y = random.choice((-1,1))*random.randint(1,int(offset_length))  # 5_2: choice is easier to understand than randrange
                
                    # random_row, random_col are the centroids for the original negative image
                    offset_valsStr = str(x).zfill(3) + str(y).zfill(3)  # 5_2 the x and y offsets paded with leading zero
                    subimg = img[random_row-int(width/2)-y:random_row+int(width/2)-y, random_col-int(width/2)-x:random_col+int(width/2)-x, :]
                    img_h = cv2.flip(subimg,1)
                    #####################
                    #print('\t\t path_random:',path_random)
                    #print("df.loc[i]['Name'],random_row,random_col,width,offset_valsStr,img.shape:",df.loc[i]['Name'],random_row,random_col,width,offset_valsStr,img.shape)
                    #####################
                    fpath = path_random + '_' + 'Negative' + '_x' + str(random_col) + '_y' + str(random_row) + '_w' + str(width) + tag
                    cv2.imwrite(fpath+ offset_valsStr + '_ang0.png',subimg)
                    cv2.imwrite(fpath+ offset_valsStr + '_horiz_ang0.png',img_h)
                    
                    if make_flipped_rgb:
                        #Flip RGB values and save -RI
                        cv2.imwrite(fpath+ offset_valsStr + '_ang0_flippedRGB.png', cv2.cvtColor(subimg, cv2.COLOR_BGR2RGB))
                        cv2.imwrite(fpath+ offset_valsStr + '_horiz_ang0_flippedRGB.png', cv2.cvtColor(img_h, cv2.COLOR_BGR2RGB))

                    if make_grayscale:
                        #Grayscale images and save -RI
                        cv2.imwrite(fpath+ offset_valsStr + '_ang0_grayscale.png', cv2.cvtColor(cv2.cvtColor(subimg, cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                        cv2.imwrite(fpath+ offset_valsStr + '_horiz_ang0_grayscale.png', cv2.cvtColor(cv2.cvtColor(img_h, cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                        
                    for j in parameters.angles:
                        cv2.imwrite(fpath + offset_valsStr + '_ang' + str(j) + '.png',imutils.rotate(subimg,angle=j))
                        cv2.imwrite(fpath + offset_valsStr + '_horiz_ang' + str(j) + '.png',imutils.rotate(img_h,angle=j))

                        if make_flipped_rgb:
                            #Flip RGB values and save -RI
                            cv2.imwrite(fpath + offset_valsStr + '_ang' + str(j) + '_flippedRGB.png', cv2.cvtColor(imutils.rotate(subimg,angle=j), cv2.COLOR_BGR2RGB))
                            cv2.imwrite(fpath + offset_valsStr + '_horiz_ang' + str(j) + '_flippedRGB.png', cv2.cvtColor(imutils.rotate(img_h,angle=j), cv2.COLOR_BGR2RGB))

                        if make_grayscale:
                            #Grayscale images and save -RI
                            cv2.imwrite(fpath + offset_valsStr + '_ang' + str(j) + '_grayscale.png', cv2.cvtColor(cv2.cvtColor(imutils.rotate(subimg,angle=j), cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                            cv2.imwrite(fpath + offset_valsStr + '_horiz_ang' + str(j) + '_grayscale.png', cv2.cvtColor(cv2.cvtColor(imutils.rotate(img_h,angle=j), cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR))
                    
    print('\tTotal subImageFails:',subImageFails,'\n\t',subImageFailsList,'\n')
    
    #negative_centers = np.array(negative_centers)
    #np.save(img_name + "_negatives.npy", negative_centers)

### Function to Calculate the mean and standard deviation on an image.

This is used to identify sub-immages that are pure slide background.<br>
The nxnx3 image array is collapsed into a (n*n*s) vector, and we calculate the mean and standard deviation of that vector.<br>
Input: a _subimage_ from the __data__ routine.<br>
Output: Mean, standard deviation and the ratio mean/(standard deviation)<br>
A reasonable cutoff for an all background image is a ratio <= 0.01

In [8]:
def ImageMeanSD(subimage):
    length = subimage.shape[0]*subimage.shape[1]*subimage.shape[2]
    imgVector = subimage.reshape((length))
    mean = np.mean(imgVector)
    sd = np.std(imgVector)
    ratio = round(sd/mean,4)
    return(mean,sd,ratio)

## Some final setup

Create an output folder (`Train Images_YYYY-MM-DD_hh-mm-ss`) to hold the created sub-images.

In [9]:
# (Delete then re)create the output dir
path = os.getcwd()
#path = os.path.join(path,'Train Images')
Work_Folder = "Train Images_" + datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
print('Writing to the output folder:',Work_Folder)
path = os.path.join(path,Work_Folder)
'''
if not os.path.exists(path):
    os.mkdir(path)
    print("created new output dir:\n",path) 
else:
    #os.rmdir(path)
    shutil.rmtree(path)
    os.mkdir(path)
    print("output dir already exists, deleting then creating an empty dir:\n ",path)
'''
os.mkdir(path)
print(path)

Writing to the output folder: Train Images_2025-04-09_13-08-20
C:\Users\jsluka\OneDrive - Indiana University\Desktop\Work\Watanabe ovary 2021\MOTHER\AIML_code_for_Paper\Program 1\Train Images_2025-04-09_13-08-20


## Do the main processing
Actually process every file in the file list, and every follicle annotation for every file, and create the subimages.

In [10]:
print('\nThis may take a while, wait for the "Done" message ...\n')

for k in range(len(parameters.file_list)):
    print("Doing file",k+1,"of",len(parameters.file_list),
          "\n  ",parameters.file_list[k]["Image path"], \
          "\n  ",parameters.file_list[k]["Annotation path"], \
          "\n   Conv_ratio:",parameters.file_list[k]["Resolution"])
    data(parameters.file_list[k]["Image path"], parameters.file_list[k]["Annotation path"], parameters.file_list[k]["Resolution"], width = parameters.types[follicle_type]['window_size'])
print("\nDone  :)")


This may take a while, wait for the "Done" message ...

Doing file 1 of 18 
   ./TrainingData_20250409/14736_UN_050a.ome.tif 
   ./TrainingData_20250409/14736_UN_050a.annotations.txt 
   Conv_ratio: 0.69
      image dimensions: (7403, 6024, 3)
      lines in annotations= 75
         new dir: Train Images_2025-04-09_13-08-20\Transitional Primordial
         new dir: Train Images_2025-04-09_13-08-20\Primary
         new dir: Train Images_2025-04-09_13-08-20\Multilayer
         new dir: Train Images_2025-04-09_13-08-20\Primordial
         new dir: Train Images_2025-04-09_13-08-20\Transitional Primary
         new dir: Train Images_2025-04-09_13-08-20\Secondary
	Total subImageFails: 0 
	 [] 

Doing file 2 of 18 
   ./TrainingData_20250409/16418_UN_140b.ome.tif 
   ./TrainingData_20250409/16418_UN_140b.annotations.txt 
   Conv_ratio: 0.69
      image dimensions: (9128, 14464, 3)
      lines in annotations= 71
	Total subImageFails: 0 
	 [] 

Doing file 3 of 18 
   ./TrainingData_20250409/19

## Finish up by getting counts by type and creating some auxilary folders and files

In [11]:
# collect some statistics and save to the log file
print('Work_Folder:\n',path)
Dir_List = glob.glob(path+'/*')

Total_Images = 0
Total_Images_NN = 0
aText = "Follicle type = "+follicle_type2+"\n"
aText += 'Output to:\n'+os.path.join(os.getcwd(),"")+"\n\n"
aText += 'Counts are number of original annotations plus augmentations. \n'
aText += 'To get original number of annotations, omit "Negative" and then divide by 16, or 32 or 48\n'
aText += 'depending on what augmentations are being used.\n\n'

for aDir in Dir_List:
    file_list = glob.glob(os.path.join(aDir,'*.png'))
    Total_Images += len(file_list)
    if os.path.split(aDir)[1] != 'Negative':
        Total_Images_NN += len(file_list)
    aText += ('%6i   %-30s ".%s%s" ' % (len(file_list),os.path.split(aDir)[1],os.sep,aDir))+"\n"
    
aText += '\nTotal images = '+str(Total_Images)+'\n'
aText += 'Total images w/o "Negatives" = '+str(Total_Images_NN)+'\n'
aText += 'Total annotations = '+str(Total_Images_NN/16)+'\n'
print(aText)

Work_Folder:
 C:\Users\jsluka\OneDrive - Indiana University\Desktop\Work\Watanabe ovary 2021\MOTHER\AIML_code_for_Paper\Program 1\Train Images_2025-04-09_13-08-20
Follicle type = Primary
Output to:
C:\Users\jsluka\OneDrive - Indiana University\Desktop\Work\Watanabe ovary 2021\MOTHER\AIML_code_for_Paper\Program 1\

Counts are number of original annotations plus augmentations. 
To get original number of annotations, omit "Negative" and then divide by 16, or 32 or 48
depending on what augmentations are being used.

  2752   Multilayer                     ".\C:\Users\jsluka\OneDrive - Indiana University\Desktop\Work\Watanabe ovary 2021\MOTHER\AIML_code_for_Paper\Program 1\Train Images_2025-04-09_13-08-20\Multilayer" 
497536   Negative                       ".\C:\Users\jsluka\OneDrive - Indiana University\Desktop\Work\Watanabe ovary 2021\MOTHER\AIML_code_for_Paper\Program 1\Train Images_2025-04-09_13-08-20\Negative" 
  3760   Primary                        ".\C:\Users\jsluka\OneDrive - Indi

In [12]:
# Creating a work log folder
##file_name = parameters.create_folder(follicle_type)
file_name = parameters.create_folder("Project_")
print('Created Work log folder named: \'',file_name,'\'',sep='')

Created Work log folder named: 'Project__25-04-09__14_15_40'


In [13]:
# Copying the parameters file into the work log folder
shutil.copy('parameters.py', file_name + '/parameters.py')

'Project__25-04-09__14_15_40/parameters.py'

In [14]:
# write the summary statistics and other info ino a log file in the work log folder
f = open(os.path.join(file_name,"work_log.txt"),"w+")
print(f.name)
f.write(file_name+"\n")
f.write(aText)
f.close()

Project__25-04-09__14_15_40\work_log.txt


In [15]:
# Creating a text file to store the name of the work log file
f = open("work_log_filename.txt","w+")
print(f.name)
f.write(file_name)
f.close()

work_log_filename.txt


## Convert Notebook to HTML and save

Programmatically save the notebook, convert it to html, rename the .html file with a timestamp.

Make sure the NOTEBOOK_NAME and NOTEBOOK_HTML_NAME are properly defined.

In [16]:
#Needed for the next command to work, for some reason
time.sleep(1)

#Programmatically save the notebook

APP = JupyterFrontEnd() #Needed to save the notebook programmatically later, do not change.
APP.commands.execute("docmanager:save")

#Convert the notebook to html
NOTEBOOK_NAME = "Train Image Generation v5_3.ipynb" #The exact name of this notebook, including the file extension.
print('NOTEBOOK_NAME:',NOTEBOOK_NAME)
!jupyter nbconvert --to html "$NOTEBOOK_NAME"

#Rename the .html file with a timestamp
NOTEBOOK_HTML_NAME = "Train Image Generation v5_3_" + datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S") + ".html"
print('NOTEBOOK_HTML_NAME:',NOTEBOOK_HTML_NAME)
shutil.move(NOTEBOOK_NAME[:-6] + ".html",NOTEBOOK_HTML_NAME)


NOTEBOOK_NAME: Train Image Generation v5_3.ipynb
NOTEBOOK_HTML_NAME: Train Image Generation v5_3_2025-04-09_14-15-52.html


[NbConvertApp] Converting notebook Train Image Generation v5_3.ipynb to html
[NbConvertApp] Writing 732712 bytes to Train Image Generation v5_3.html


'Train Image Generation v5_3_2025-04-09_14-15-52.html'