# Tree Detection with Enshurin Data Using Deep Learning

In this notebook we will use several Deep Learning networks to detect trees of the several species in RGB and multispectral mosaics of Data Acquired in Enshurin. We will consider the following species:  

- Beech
- Oak
- Birch
- Larch
- Magnolia

The data was collected by Vladislav Bukin.

## Prerequisites.

- Due to the size of the networks, this code will not run in free google colab. Either paid colab or running in locally in a computer with GPU is necessary. In the following I will assume that the code is run in a local computer that has a GPU and is CUDA-capable.
- The computer needs to have software to run Python and jupyter notebook. `Anaconda` is recommended. If you are reading this, you most likely have already solved this part.
- Apart from this, you need to create a proper virtual environment to run the code into. I recomment creating the environment with `Anaconda` itself and then installing packages using `pip`. At the very least you will need to install:
    - opencv (for general image handling)
    - pytorch (for general DL computations and the following models fasterRCNN, convnextMaskRCNN, maskRCNN, FCOS, retinanet, SSD  
    - ultralytics for YOLO
    - Transformers for DETR
    - several other smaller libraries for several dependencies.

# Getting Started

Let's start by checking wheher or not the notebook has access to CUDA at this moment as it will determine whether or not you can run experiments on the GPU:

In [1]:
# Make a cell with all necessary imports "only"
import configparser
import sys
import os
import time
import cv2
import torch
import numpy as np
from pathlib import Path
from itertools import product
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from random import randint

from datasets import TDDataset

from config import read_config
from imageUtils import boxesFound,read_Color_Image,read_Binary_Mask,recoupMasks, sliding_window, boxCoordsToFile
from train import train_YOLO,makeTrainYAML, get_transform, train_pytorchModel,train_DETR, train_DeformableDETR

from dataHandling import computeBBfromLIEnshurin, filterBoxesWindow, filterBoxesWindowNormalized

#from dataHandling import (buildTRVT,buildNewDataTesting,separateTrainTest, 
#                           forPytorchFromYOLO, buildTestingFromSingleFolderSakuma2, 
#                           buildTestingFromSingleFolderSakuma2NOGT,
#                           makeParamDicts, paramsDictToString)
from predict import predict_yolo, predict_pytorch

from experimentsUtils import MODULARDLExperiment

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
import torch

# checking that CUDA is available
print("IS CUDA AVAILABLE???????????????????????")
print(torch.cuda.is_available())

IS CUDA AVAILABLE???????????????????????
True


# Data preprocessing:

Make sure to have the data, in this case files `Enshurin_2024_08_22_60m_ORTHO_СUT.tif` and `label_image_INT8.png`. In the same folder. In the following cell we define the path to that folder (relative to the folder that contains the notebook) and call a function to turn the mosaic into a series of tiles.

In [8]:
def visualize_bounding_boxes(
    mosaic, boxes, num_slices=50, slice_size=(1024, 1024), save_to_disk=True
):
    """
    Visualize sliding-window slices of the mosaic with bounding boxes.

    Slices are:
      • extracted in raster-scan order (top→bottom, left→right)
      • always slice_size (unless touching image border)
      • black slices are skipped (not counted toward num_slices)
      • saved images include drawn bounding boxes
      • Matplotlib visualization also shows bounding boxes

    Parameters:
    -----------
    mosaic : np.ndarray (H,W,3)
        The full RGB image.
    boxes : list of (x, y, w, h, category)
    num_slices : int
        Number of slices to *save/visualize* (after skipping black ones)
    slice_size : (H, W)
        Desired slice size
    save_to_disk : bool
        Save slices as PNGs
    """
    import matplotlib.pyplot as plt
    import matplotlib.patches as patches
    import numpy as np
    import cv2
    import matplotlib

    H, W = mosaic.shape[:2]
    sh, sw = slice_size
    saved = 0

    # Category → color
    colors = {
        1: "red",
        2: "blue",
        3: "green",
        4: "yellow",
        5: "cyan",
    }

    # matplotlib figure for visualization
    fig, axes = plt.subplots(1, num_slices, figsize=(4 * num_slices, 4))
    if num_slices == 1:
        axes = [axes]

    ax_index = 0  # which subplot to draw into

    # Slide in raster-scan order
    for y0 in range(0, H, sh):
        for x0 in range(0, W, sw):

            if saved >= num_slices:
                break

            y1 = min(y0 + sh, H)
            x1 = min(x0 + sw, W)

            slice_img = mosaic[y0:y1, x0:x1]

            # Skip black slices
            if np.sum(slice_img) == 0:
                continue

            # Draw slice in matplotlib
            ax = axes[ax_index]
            ax.imshow(slice_img)
            ax.set_title(f"Slice {saved+1}\n({x0}, {y0})")
            ax.axis("off")

            # Copy for saving (so we draw boxes onto the saved image too)
            save_img = slice_img.copy()

            # Draw bounding boxes (both in Matplotlib and saved image)
            for (px, py, bw, bh, category) in boxes:

                # Check intersection with slice
                if (px < x1 and px + bw > x0 and py < y1 and py + bh > y0):

                    rx = int(px - x0)
                    ry = int(py - y0)
                    bw = int(bw)
                    bh = int(bh)

                    # Clamp boxes near borders
                    if rx + bw < 0 or ry + bh < 0:
                        continue

                    color = colors.get(category, "white")

                    # ----- Matplotlib rectangle -----
                    rect = patches.Rectangle(
                        (rx, ry), bw, bh, linewidth=2, edgecolor=color, facecolor="none"
                    )
                    ax.add_patch(rect)
                    ax.text(
                        rx,
                        ry - 5,
                        str(category),
                        color=color,
                        fontsize=10,
                        fontweight="bold",
                        bbox=dict(
                            boxstyle="round,pad=0.3", facecolor="black", alpha=0.5
                        ),
                    )

                    # ----- DRAW ON SAVED IMAGE (OpenCV) -----
                    # Convert matplotlib color names to normalized RGB
                    rgb = matplotlib.colors.to_rgb(color)  # floats 0–1
                    bgr = tuple(int(255 * c) for c in rgb[::-1])

                    cv2.rectangle(
                        save_img, (rx, ry), (rx + bw, ry + bh), bgr, 2
                    )
                    cv2.putText(
                        save_img,
                        str(category),
                        (rx, max(ry - 5, 0)),
                        cv2.FONT_HERSHEY_SIMPLEX,
                        0.6,
                        bgr,
                        2,
                    )

            # ----- SAVE PNG -----
            if save_to_disk:
                outname = f"slice{saved+1}.png"
                cv2.imwrite(outname, cv2.cvtColor(save_img, cv2.COLOR_RGB2BGR))
                print(f"Saved {outname}")

            saved += 1
            ax_index += 1

        if saved >= num_slices:
            break

    plt.tight_layout()
    plt.show()


In [9]:
def downsample_images(mosaic_file, label_file, scale_factor=0.5):
    """
    Downsample mosaic and label images to reduce their size.
    
    Parameters:
    -----------
    mosaic_file : str
        Path to the mosaic image file
    label_file : str
        Path to the label image file
    scale_factor : float
        Scaling factor (0.5 = half size, 0.25 = quarter size, etc.)
    
    Returns:
    --------
    mosaic_downsampled : numpy.ndarray
        Downsampled mosaic image
    label_downsampled : numpy.ndarray
        Downsampled label image
    """
    # Read images
    mosaic = read_Color_Image(mosaic_file)
    label = cv2.imread(label_file, cv2.IMREAD_UNCHANGED)  # Read as-is to preserve labels
    
    # Calculate new dimensions
    new_width = int(mosaic.shape[1] * scale_factor)
    new_height = int(mosaic.shape[0] * scale_factor)
    new_size = (new_width, new_height)
    
    print(f"Original size: {mosaic.shape[1]}x{mosaic.shape[0]}")
    print(f"New size: {new_width}x{new_height}")
    
    # Downsample mosaic using bilinear interpolation (smooth for RGB images)
    mosaic_downsampled = cv2.resize(mosaic, new_size, interpolation=cv2.INTER_LINEAR)
    
    # Downsample label image using nearest neighbor (preserves discrete label values)
    label_downsampled = cv2.resize(label, new_size, interpolation=cv2.INTER_NEAREST)
    
    return mosaic_downsampled, label_downsampled

In [None]:
#VLAD's data

dataFolder = "./Data/Vlad"
labelImageFile = os.path.join(dataFolder,"label_image_INT8.png")
mosaicFile = os.path.join(dataFolder,"Enshurin_2024_08_22_60m_ORTHO_СUT.tif")

mosaic_half, label_half = downsample_images(mosaicFile, labelImageFile, scale_factor=0.5)
cv2.imwrite('mosaic_half.png', cv2.cvtColor(mosaic_half, cv2.COLOR_RGB2BGR))
cv2.imwrite('label_half.png', label_half)

mosaic_full, label_full = downsample_images(mosaicFile, labelImageFile, scale_factor=1)
cv2.imwrite('mosaicPNG.png', cv2.cvtColor(mosaic_full, cv2.COLOR_RGB2BGR))
cv2.imwrite('labelPNG.png', label_full)



In [3]:
def prepareData(lIMfile, mosFile, outFolderRoot, trainPerc, doYOLO, slice, verbose = True):
    """
        Given a mosaic and a label image, create bounding boxes for the label image
        then tile mosaic and boxes and store in a folder
        if trainPerc != 0 then divide into train and test folders
    """
    labelIM = cv2.imread(lIMfile,cv2.IMREAD_UNCHANGED)
    # check unique labels read    
    #print(np.unique(labelIM))
    #print(labelIM.shape)

    mosaic = read_Color_Image(mosFile)
    boxes, boxesNorm = computeBBfromLIEnshurin(labelIM)
    # Visualize bounding boxes
    #if verbose:
    #    visualize_bounding_boxes(mosaic, boxes, num_slices=200, slice_size=(1024, 1024))

    # If the train percentage is  0 then we are making one single folder 
    # otherwise, make one folder for trainining and one for validation
    singleFolder = (trainPerc == 0)
    
    # create output folders if they do not exist, if mosaic mode create train andn test subfolders
    Path(outFolderRoot).mkdir(parents=True, exist_ok=True)
    if not singleFolder :
        Path(os.path.join(outFolderRoot,"train")).mkdir(parents=True, exist_ok=True)
        Path(os.path.join(outFolderRoot,"validation")).mkdir(parents=True, exist_ok=True)
        if doYOLO:
            Path(os.path.join(outFolderRoot,"train","images")).mkdir(parents=True, exist_ok=True)
            Path(os.path.join(outFolderRoot,"train","masks")).mkdir(parents=True, exist_ok=True)
            Path(os.path.join(outFolderRoot,"train","labels")).mkdir(parents=True, exist_ok=True)
            Path(os.path.join(outFolderRoot,"validation","images")).mkdir(parents=True, exist_ok=True)
            Path(os.path.join(outFolderRoot,"validation","masks")).mkdir(parents=True, exist_ok=True)
            Path(os.path.join(outFolderRoot,"validation","labels")).mkdir(parents=True, exist_ok=True)
    else:    
        if doYOLO:
            Path(os.path.join(outFolderRoot,"images")).mkdir(parents=True, exist_ok=True)
            Path(os.path.join(outFolderRoot,"masks")).mkdir(parents=True, exist_ok=True)
            Path(os.path.join(outFolderRoot,"labels")).mkdir(parents=True, exist_ok=True)

    # add the name of the original image to every tile as a prefix (without extension)
    outputPrefix = os.path.basename(mosFile)[:-4] 
    
    # slice the three things and output
    wSize = (slice,slice)
    count = 0
    for (x, y, window) in sliding_window(mosaic, stepSize = int(slice*0.8), windowSize = wSize ):
        # get mask window
        if window.shape[:2] == (slice,slice) :
            labelW = labelIM[y:y + wSize[1], x:x + wSize[0]]
            boxesW = filterBoxesWindow(boxes,y,y + wSize[1], x,x + wSize[0])
            boxesWNorm = boxesWNorm = filterBoxesWindowNormalized(boxesNorm, y, y+slice, x, x+slice, full_w=mosaic.shape[1], full_h=mosaic.shape[0])

            if verbose: print(boxesW)

            # here we should probably add cleanUpMaskBlackPixels and maybe do it for YOLO too (in buildtrainvalidation?)
            if len(boxesW) > 0:
                # store them both, doing a randomDraw to see if they go to training or testing
                outFolder = (outFolderRoot if singleFolder else 
                (os.path.join(outFolderRoot,"train") if randint(1,100) < trainPerc else os.path.join(outFolderRoot,"validation") ) )
                if verbose: print("writing to "+str(os.path.join(outFolder,"Tile"+str(count)+".png")))
                if doYOLO:
                    cv2.imwrite(os.path.join(outFolder, "images",outputPrefix+"x"+str(x)+"y"+str(y)+".png"),window)
                    cv2.imwrite(os.path.join(outFolder,"masks",outputPrefix+"x"+str(x)+"y"+str(y)+"MASK.png"),labelW)
                    boxCoordsToFile(os.path.join(outFolder,"labels",outputPrefix+"x"+str(x)+"y"+str(y)+".txt"),boxesWNorm)
                else:                    
                    cv2.imwrite(os.path.join(outFolder,outputPrefix+"x"+str(x)+"y"+str(y)+".png"),window)
                    cv2.imwrite(os.path.join(outFolder,outputPrefix+"x"+str(x)+"y"+str(y)+"Labels.tif"),labelW)
                    boxCoordsToFile(os.path.join(outFolder,outputPrefix+"x"+str(x)+"y"+str(y)+"Boxes.txt"),boxesW)
                count+=1
            else:
                if verbose: print("no boxes here")
        else:
            if verbose:  print("sliceFolder, non full window, ignoring"+str(window.shape))


In [None]:
dataFolder = "./Data/Vlad"
#labelImageFile = os.path.join(dataFolder,"label_image_INT8.png")
#mosaicFile = os.path.join(dataFolder,"Enshurin_2024_08_22_60m_ORTHO_СUT.tif")
labelImageFile = os.path.join(dataFolder,"label_half.png")
mosaicFile = os.path.join(dataFolder,"mosaic_half.png")
#labelImageFile = os.path.join(dataFolder,"labelPNG.png")
#mosaicFile = os.path.join(dataFolder,"mosaicPNG.png")

outputFolder = os.path.join(dataFolder,"processedData")

prepareData(labelImageFile,mosaicFile,outputFolder, 80, 2500)

# Sarangerel's data

First of all, as I am having trouble with the tiff format, I will change everything to png

In [3]:
dataFolder = os.path.join(os.getcwd(), "Data", "NART") 

inputTrain = os.path.join(dataFolder,"train")
trainData = os.path.join(dataFolder,"processedTrain")

inputTest = os.path.join(dataFolder,"test")
testData = os.path.join(dataFolder,"processedTest")

sliceSize = 500
trainPercentage = 80

In [9]:
# not needed!
def convert_tif_to_png(folder_path):
    for root, _, files in os.walk(folder_path):
        for file in files:
            if file.lower().endswith(('.tif', '.tiff')):
                tif_path = os.path.join(root, file)
                png_path = os.path.splitext(tif_path)[0] + '.png'
                img = read_Color_Image(tif_path)
                if img is not None:
                    cv2.imwrite(png_path, img)
                    print(f"✓ {file} → {os.path.basename(png_path)}")        

In [None]:
# not needed!
# call the function to convert tif to png
convert_tif_to_png(os.path.join(dataFolder,"test_image"))
convert_tif_to_png(os.path.join(dataFolder,"test_label"))
convert_tif_to_png(os.path.join(dataFolder,"train_image"))
convert_tif_to_png(os.path.join(dataFolder,"train_label"))

In [6]:
def sliceAndBoxNartData(prefix, outputFolder, sliceSize = 500, trainPerc = 0):
    """
        receive one folder divided into prefi_image and 
        prefix_mask, one slice size in pixels
        traverse all images in the "image" folder
        make sure that they have a corresponding mask.
        Slice and box them all into an outputFolder
    """
    for root, _, files in os.walk(prefix+"_image"):
        for imageFile in files:
            labelImageFile = os.path.join(prefix+"_mask",imageFile)
            if os.path.isfile( labelImageFile ):
                prepareData(labelImageFile,os.path.join(prefix+"_image",imageFile),outputFolder, 0, False, sliceSize, verbose = False)
                prepareData(labelImageFile,os.path.join(prefix+"_image",imageFile),outputFolder, trainPerc, True, sliceSize, verbose = False)
        

In [7]:
# convert all of our images to masks, box files and slices
# make one folder for training YOLO
sliceAndBoxNartData(inputTrain,trainData, sliceSize = sliceSize, trainPerc = trainPercentage)
# also make one folder for testing
sliceAndBoxNartData(inputTest,testData, sliceSize = sliceSize)

# Experiments

In [8]:
# configuration of our experiments
conf = {
    "Prep" : False,
    "Train" : True,
    "ep" : 5,
    "numClasses" : 3,
    "Train_Perc" : trainPercentage,
    "slice": sliceSize,
    "TV_dir" : os.path.join(dataFolder,"processedTrain"),
    "Train_dir" : "train",
    "Valid_dir" : "validation",
    "Pred_dir" : "YoloPredictions",
    "Test_dir" : os.path.join(dataFolder,"processedTest"),
    "Train_res": os.path.join(os.getcwd(), "YOLOResults"),
    "Valid_res": os.path.join(os.getcwd(), "YOLOResults"),
    "outTEXT": "./results.txt"
}

In [None]:
doYolo = False
doPytorch = True
doDETR = False

yolo_params = {"scale": 0.3, "mosaic": 0.5} if doYolo else None
pytorch_params = {"modelType": "maskrcnn", "score": 0.25, "nms": 0.5, "predconf": 0.7} if doPytorch else None
detr_params = {"modelType": "DETR", "lr": 5e-6, "batch_size": 8, "predconf": 0.5, 
               "nms_iou": 0.5, "max_detections": 50, "resize": 800} if doDETR else None

# Run experiments
MODULARDLExperiment(conf, yolo_params, pytorch_params, detr_params)

Preparing datasets...

=== Running PyTorch Model Experiment ===
Parameters: {'modelType': 'maskrcnn', 'score': 0.25, 'nms': 0.5, 'predconf': 0.7}
Train dataset length: 529
Testing params {'modelType': 'maskrcnn', 'score': 0.25, 'nms': 0.5, 'predconf': 0.7} with file expmodelTypemaskrcnnscore0.25nms0.5Epochs5.pth
Inside Pytorch training Training Dataset Length 529
train again
get_model_instance_segmentation 4
cls out: 4
mask out: 4
maskrcnn
have the model
Epoch: [0]  [  0/529]  eta: 0:04:29  lr: 0.000014  loss: 3.9697 (3.9697)  loss_classifier: 1.4229 (1.4229)  loss_box_reg: 0.2936 (0.2936)  loss_mask: 1.3924 (1.3924)  loss_objectness: 0.7630 (0.7630)  loss_rpn_box_reg: 0.0977 (0.0977)  time: 0.5097  data: 0.1954  max mem: 9707
Epoch: [0]  [ 10/529]  eta: 0:03:23  lr: 0.000109  loss: 3.9772 (4.1838)  loss_classifier: 1.4125 (1.4066)  loss_box_reg: 0.1368 (0.1819)  loss_mask: 1.5058 (1.5566)  loss_objectness: 0.8426 (0.9743)  loss_rpn_box_reg: 0.0568 (0.0642)  time: 0.3920  data: 0.0179 