# Weed Detection with Nart Data using Deep Learning

In this notebook we will use several Deep Learning networks to detect Weeds of several species in RGB mosaics of Data Acquired in the NART research forest in Mongolia. We will consider the following species:  


The data was collected by Sarangerel Jarantaibaatar and initial processing was carried on in collaboration with Shiful Islam.

## Prerequisites.

- Due to the size of the networks, this code will not run in free google colab. Either paid colab or running in locally in a computer with GPU is necessary. In the following I will assume that the code is run in a local computer that has a GPU and is CUDA-capable.
- The computer needs to have software to run Python and jupyter notebook. `Anaconda` is recommended. If you are reading this, you most likely have already solved this part.
- Apart from this, you need to create a proper virtual environment to run the code into. I recomment creating the environment with `Anaconda` itself and then installing packages using `pip`. At the very least you will need to install:
    - opencv (for general image handling)
    - pytorch (for general DL computations and the following models fasterRCNN, convnextMaskRCNN, maskRCNN, FCOS, retinanet, SSD  
    - ultralytics for YOLO
    - Transformers for DETR
    - several other smaller libraries for several dependencies.

If you are reading this here it is likely that you already have this, but you should have donwloaded all the code from [this repository](https://github.com/nicill/DLTreeDetection). This notebook along with all `.py` files. Download them using git or just go to the webpage, click `code` and then `download zip`.

Once you have the folder with the code and this notebook, you should copy the data into it. The data is accessible [her](https://www.dropbox.com/scl/fi/b53k3eojxdrf0g0q7q305/NART.zip?rlkey=o8v9fwuqtle8l14w58awjqjmm&st=8lqk3op9&dl=0). Make a `Data` subfolder into the 
folder that contains the code and decompress the `NART.zip` file into it. You will end up with a Structure like Data->NART->test_image to access the data (there are four folders in the lower level: test_image, test_mask, train_image and train_mask.


# Getting Started

Let's start by checking wheher or not the notebook has access to CUDA at this moment as it will determine whether or not you can run experiments on the GPU:

In [1]:
# not actually needed, debugging purposes
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Necessary imports 
import configparser
import sys
import os
import time
import cv2
import torch
import numpy as np
from pathlib import Path
from itertools import product
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from random import randint

from datasets import TDDataset

from config import read_config
from imageUtils import boxesFound,read_Color_Image,read_Binary_Mask,recoupMasks, sliding_window, boxCoordsToFile
from train import train_YOLO,makeTrainYAML, get_transform, train_pytorchModel,train_DETR

from dataHandling import computeBBfromLIEnshurin, filterBoxesWindow, filterBoxesWindowNormalized

from predict import predict_yolo, predict_pytorch

from experimentsUtils import MODULARDLExperiment

You should be able to run the cell above without errors. If you get any `cannot import` errors it means that the environment in which you are running your notebook does not have all the necessary packages, install them using pip.

Next, let's make sure that CUDA is accessible. CUDA is the library that manages the GPU and is absolutely crucial to get the code running in a time frame that allows experiments to be run. 

In [3]:
import torch

# checking that CUDA is available
print("IS CUDA AVAILABLE???????????????????????")
print(torch.cuda.is_available())

IS CUDA AVAILABLE???????????????????????
False


You should see the `IS CUDA AVAILABLE???????????????????????` message in the first line and `True` in the second. If the second line prints `False` then you do not have CUDA properly configured. Do that before proceeding.

# Data preprocessing:

Before we can run any experiment we need to process the data into a format that Python and its Deep Learning library `Pytroch` can use. In particular, here we well divide every image into a series of tiles of the size that we decide. Additionally, we will translate all the image files into PNG format and the label images into text files with list of bounding box coordinates. The following two cells cell contains the cdoe that does all of this. The first function processes one image along with its label image and the second one runs it over all the files in a folder. As two of the DL networks that we use need slightly different data formats, the second function calls the first one twice:

In [3]:
def prepareData(lIMfile, mosFile, outFolderRoot, trainPerc, doYOLO, slice, verbose = True):
    """
        Given a mosaic and a label image, create bounding boxes for the label image
        then tile mosaic and boxes and store in a folder
        if trainPerc != 0 then divide into train and test folders
    """
    labelIM = cv2.imread(lIMfile,cv2.IMREAD_UNCHANGED)
    # check unique labels read    
    #print(np.unique(labelIM))
    #print(labelIM.shape)

    mosaic = read_Color_Image(mosFile)
    boxes, boxesNorm = computeBBfromLIEnshurin(labelIM)
    # Visualize bounding boxes
    #if verbose:
    #    visualize_bounding_boxes(mosaic, boxes, num_slices=200, slice_size=(1024, 1024))

    # If the train percentage is  0 then we are making one single folder 
    # otherwise, make one folder for trainining and one for validation
    singleFolder = (trainPerc == 0)
    
    # create output folders if they do not exist, if mosaic mode create train andn test subfolders
    Path(outFolderRoot).mkdir(parents=True, exist_ok=True)
    if not singleFolder :
        Path(os.path.join(outFolderRoot,"train")).mkdir(parents=True, exist_ok=True)
        Path(os.path.join(outFolderRoot,"validation")).mkdir(parents=True, exist_ok=True)
        if doYOLO:
            Path(os.path.join(outFolderRoot,"train","images")).mkdir(parents=True, exist_ok=True)
            Path(os.path.join(outFolderRoot,"train","masks")).mkdir(parents=True, exist_ok=True)
            Path(os.path.join(outFolderRoot,"train","labels")).mkdir(parents=True, exist_ok=True)
            Path(os.path.join(outFolderRoot,"validation","images")).mkdir(parents=True, exist_ok=True)
            Path(os.path.join(outFolderRoot,"validation","masks")).mkdir(parents=True, exist_ok=True)
            Path(os.path.join(outFolderRoot,"validation","labels")).mkdir(parents=True, exist_ok=True)
    else:    
        if doYOLO:
            Path(os.path.join(outFolderRoot,"images")).mkdir(parents=True, exist_ok=True)
            Path(os.path.join(outFolderRoot,"masks")).mkdir(parents=True, exist_ok=True)
            Path(os.path.join(outFolderRoot,"labels")).mkdir(parents=True, exist_ok=True)

    # add the name of the original image to every tile as a prefix (without extension)
    outputPrefix = os.path.basename(mosFile)[:-4] 
    
    # slice the three things and output
    wSize = (slice,slice)
    count = 0
    for (x, y, window) in sliding_window(mosaic, stepSize = int(slice*0.8), windowSize = wSize ):
        # get mask window
        if window.shape[:2] == (slice,slice) :
            labelW = labelIM[y:y + wSize[1], x:x + wSize[0]]
            boxesW = filterBoxesWindow(boxes,y,y + wSize[1], x,x + wSize[0])
            boxesWNorm = boxesWNorm = filterBoxesWindowNormalized(boxesNorm, y, y+slice, x, x+slice, full_w=mosaic.shape[1], full_h=mosaic.shape[0])

            if verbose: print(boxesW)

            # here we should probably add cleanUpMaskBlackPixels and maybe do it for YOLO too (in buildtrainvalidation?)
            if len(boxesW) > 0:
                # store them both, doing a randomDraw to see if they go to training or testing
                outFolder = (outFolderRoot if singleFolder else 
                (os.path.join(outFolderRoot,"train") if randint(1,100) < trainPerc else os.path.join(outFolderRoot,"validation") ) )
                if verbose: print("writing to "+str(os.path.join(outFolder,"Tile"+str(count)+".png")))
                if doYOLO:
                    cv2.imwrite(os.path.join(outFolder, "images",outputPrefix+"x"+str(x)+"y"+str(y)+".png"),window)
                    cv2.imwrite(os.path.join(outFolder,"masks",outputPrefix+"x"+str(x)+"y"+str(y)+"MASK.png"),labelW)
                    boxCoordsToFile(os.path.join(outFolder,"labels",outputPrefix+"x"+str(x)+"y"+str(y)+".txt"),boxesWNorm)
                else:                    
                    cv2.imwrite(os.path.join(outFolder,outputPrefix+"x"+str(x)+"y"+str(y)+".png"),window)
                    cv2.imwrite(os.path.join(outFolder,outputPrefix+"x"+str(x)+"y"+str(y)+"Labels.tif"),labelW)
                    boxCoordsToFile(os.path.join(outFolder,outputPrefix+"x"+str(x)+"y"+str(y)+"Boxes.txt"),boxesW)
                count+=1
            else:
                if verbose: print("no boxes here")
        else:
            if verbose:  print("sliceFolder, non full window, ignoring"+str(window.shape))


In [4]:
def sliceAndBoxNartData(prefix, outputFolder, sliceSize = 500, trainPerc = 0):
    """
        receive one folder divided into prefi_image and 
        prefix_mask, one slice size in pixels
        traverse all images in the "image" folder
        make sure that they have a corresponding mask.
        Slice and box them all into an outputFolder
    """
    for root, _, files in os.walk(prefix+"_image"):
        for imageFile in files:
            labelImageFile = os.path.join(prefix+"_mask",imageFile)
            if os.path.isfile( labelImageFile ):
                prepareData(labelImageFile,os.path.join(prefix+"_image",imageFile),outputFolder, 0, False, sliceSize, verbose = False)
                prepareData(labelImageFile,os.path.join(prefix+"_image",imageFile),outputFolder, trainPerc, True, sliceSize, verbose = False)
        

To call these two functions we need to define the path to our Data. The values included in the cell suppose that the structure mentioned above is followed. Change if your path is different.

In [4]:
# Define where our data is, change if necessary, this cell must be run in every execution!

# Define main data folder, supposed to be within the directory that contains the notebook then Data then NART.
# Depending on how you are running the jupyter notebook, you may need to change the definition of this folder
dataFolder = os.path.join(os.getcwd(), "Data", "NART") 
print("Main data Folder is" +str(dataFolder))

# Define input and output Folders for the training and testing data
inputTrain = os.path.join(dataFolder,"train")
trainData = os.path.join(dataFolder,"processedTrain")

inputTest = os.path.join(dataFolder,"test")
testData = os.path.join(dataFolder,"processedTest")

# Most of the parameters of our experiments will be defined later, but in this case, we need to define 
# the number of pixels that our tiles will have on the side (sliceSize)
# and the percentage of tiles in the training/validation set that will be used for training 
sliceSize = 500
trainPercentage = 80

Main data Folder is/home/x/Experiments/DLTreeDetection/Data/NART


We are now ready to translate our Data to a format that Pytorch can read:

**you only need to run the following cell once, if you do not change the data do not run this cell again** 

In [None]:
# convert all of our images to masks, box files and slices, if you did this before you do not need to run it again!
# make one folder for training YOLO
sliceAndBoxNartData(inputTrain,trainData, sliceSize = sliceSize, trainPerc = trainPercentage)
# also make one folder for testing
sliceAndBoxNartData(inputTest,testData, sliceSize = sliceSize)

Check that two new folders have been created inside of the main data folder with the names indicated in `trainData` and `testData`.

These two new folders should have:
- Three subfolders called `images`, `labels` and `masks`
- Lots of files corresponding to the tiles. Each Tile will have an image file (for example `patch_2x0y0.png`) a text file with the information of its bounding boxes (for example `patch_2x0y0Boxes.txt`) and a labelimage (for example patch_2x0y0Labels.tif).

Check that all of this worked without errors and we are now ready to run our...

# Experiments

These experiments are pretty complex and allow to train and test quite a lot of different networks. To begin with I focused only on the two networks that are more likely to give the best results. YOLO and MaskRCNN.

Interesting Parameters:

- "Train": Determines whether or not we want to retrain our model. Keep it `True` initially but if you only want to re-test your model, then you can set it to `False`.
- "ep" contains the number of epochs that the model will be trained for. I recommend running an initial run with one epoch to test if your model works and then set it to whatever value you consider correct. So far a value of 200 is pretty commonly used.
- "batchSize" is set to 4. This parameter is crucial in terms of speed, the bigger the better. However, if you make it too big then the GPU will run out of memory and the code will crash. I used size 8 in my large computer and every epoch took less than a minute. with size 1 every epoch was about 4 minutes. I recommend testing this extensively before launching into long experiments. The final stages of the code use a bit of extra GPU memory so do not cut it too short.  
- "doYolo" and "doPytorch" control what network(s) we use. I recommend starting with the one-epoch experiment for both YOLO and Pytorch and then setting up individual experiment setting the other value to `False`. For the moment leave the "doDETR" parameter at `False`. This is another network that we can compare if we are interested in making a large comparison of models but the code has not been adapted properly and it is not likely to give better results.
- the "numClasses" parameter is set to `3` and should reflect the number of different weed species present in the label data. If the number is correct you do not need to ever change it.
- The "Train_Perc" and "slice" parameters refer to two previous parameters defined when we created the training data. If you want to change them, go back there, change them there and make sure to re-run the function to create the data (also erase the data previously processed to be sure). Do not change them here.
- the    "Pred_dir" and "outTEXT" parameters contain the route to the output that the algorithm creates. The output for the Yolo algorithm is also stored in "Train_res" and "Valid_res". Change them if you prefer. **Make sure to check these out as they contain quite a lot of interesting information.** In particular, Yolo will dump lots of tile prediction files into the root of the predictions folder. MaskRCNN will create one subfolder for each parameter configuration. Inside of each subfolder you will find lots of files related to tile and one more subfolder called "full". Check this "full" folder for the masks of every full image and a "Pretty" image that shows the prediction boxes superimposed to the original image along with the category of each one. At this moment Yolo does not create these "Pretty" images, it is on the TODO list.
- The code produces precision and recall values. **These are detection only.** so they do not concern themselves with whether the weed inside the box is very well segmented or not, only that it is properly found. Also, at this moment they do not check the category of the weed. For example, if a weed is detected correctly (it box is properly placed) but the category is wrong (1 is predicted instead of 2), at this moment this will be counted as a correct prediction. This will be improved in the near future, hopefully.
  
The rest of the parameters can be safely ignore for the moment.

In [5]:
# configuration of our experiments
conf = {
    "Train" : True,
    "ep" : 1,
    "batchSize": 4,
    "doYolo" : True,
    "doPytorch" : True,
    "doDETR" : False,
    "numClasses" : 3,
    "Train_Perc" : trainPercentage,
    "slice": sliceSize,
    "Pred_dir" : "Predictions",
    "outTEXT": "results.txt",
    "Train_res": os.path.join(os.getcwd(), "YOLOResults"),
    "Valid_res": os.path.join(os.getcwd(), "YOLOResults"),
    "Prep" : False,
    "TV_dir" : os.path.join(dataFolder,"processedTrain"),
    "Train_dir" : "train",
    "Valid_dir" : "validation",
    "Test_dir" : os.path.join(dataFolder,"processedTest")
}

Finally, the cell that actually runs the experiments. Let's start by leaving it as it is and then we can complicate things a bit by considering different parameters, other pytorch models, the DETR model. For now, let's make sure that it all works properly and that it can be run in your system.

In [6]:
yolo_params = {"scale": 0.3, "mosaic": 0.5} if conf["doYolo"] else None
pytorch_params = {"modelType": "maskrcnn", "score": 0.25, "nms": 0.5, "predconf": 0.7} if conf["doPytorch"] else None
detr_params = {"modelType": "DETR", "lr": 5e-6, "batch_size": 8, "predconf": 0.5, 
               "nms_iou": 0.5, "max_detections": 50, "resize": 800} if conf["doDETR"] else None

# Run experiments
MODULARDLExperiment(conf, yolo_params, pytorch_params, detr_params)

Preparing datasets...

=== Running DETR Experiment ===
Parameters: {'modelType': 'DETR', 'lr': 5e-06, 'batch_size': 8, 'predconf': 0.5, 'nms_iou': 0.5, 'max_detections': 50, 'resize': 800}
[DETR] Testing params {'modelType': 'DETR', 'lr': 5e-06, 'batch_size': 8, 'predconf': 0.5, 'nms_iou': 0.5, 'max_detections': 50, 'resize': 800} with file DETR_expmodelTypeDETRlr5e-06batch_size8nms_iou0.5max_detections50resize800Epochs1.pth

=== DATASET DEBUG ===
Dataset length: 529

=== DEBUG Sample 0 ===
Image: patch_19x1200y400.png, Size: (500, 500)
Number of raw boxes: 1
First raw box: (1, 449, 0, 32, 25)
All raw boxes: [(1, 449, 0, 32, 25)]
Sample 0 - image shape: (500, 500, 3)
Sample 0 - num annotations: 1
First annotation: {'bbox': [449.0, 0.0, 32.0, 25.0], 'category_id': 0, 'iscrowd': 0, 'area': 800.0}

[DETR] File: DETR_expmodelTypeDETRlr5e-06batch_size8nms_iou0.5max_detections50resize800Epochs1.pth, Train: True, Device: cpu, Epochs: 1


12/15/2025 15:19:47 - INFO - timm.models._builder -   Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k)
12/15/2025 15:19:47 - INFO - timm.models._hub -   [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
12/15/2025 15:19:47 - INFO - timm.models._builder -   Missing keys (fc.weight, fc.bias) discovered while loading pretrained weights. This is expected if model is being adapted.


[DETR] Configuring model with num_labels=3


12/15/2025 15:19:48 - INFO - timm.models._builder -   Loading pretrained weights from Hugging Face hub (timm/resnet50.a1_in1k)
12/15/2025 15:19:48 - INFO - timm.models._hub -   [timm/resnet50.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
12/15/2025 15:19:48 - INFO - timm.models._builder -   Missing keys (fc.weight, fc.bias) discovered while loading pretrained weights. This is expected if model is being adapted.
Some weights of the model checkpoint at facebook/detr-resnet-50 were not used when initializing DetrForObjectDetection: ['model.backbone.conv_encoder.model.layer1.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked']
- This IS expected if you are initializing DetrForObjectDetection from th

[DETR] Initialized with pretrained weights (except class head)
[DETR] LR schedule: drop by 0.1x every 5 epochs

=== BATCH 0 DEBUG ===
Number of images: 8
Number of targets: 8
First target keys: dict_keys(['annotations'])
First target annotations: 24 annotations
First annotation: {'bbox': [206.0, 0.0, 155.0, 79.0], 'category_id': 2, 'iscrowd': 0, 'area': 12245.0}


TRAINING DIAGNOSTIC
Config: num_labels=3, batch_size=8
Pixel values: torch.Size([8, 3, 800, 800])
First target: boxes=torch.Size([24, 4]), classes=[0, 1, 2]


=== DEBUG Sample 4 ===
Image: patch_8x400y1600.png, Size: (500, 500)
Number of raw boxes: 7
First raw box: (1, 252, 106, 29, 27)
All raw boxes: [(1, 252, 106, 29, 27), (1, 76, 170, 50, 42), (1, 42, 192, 20, 35), (1, 368, 199, 69, 55), (2, 486, 410, 14, 21), (1, 368, 458, 37, 37), (1, 313, 486, 30, 14)]


KeyboardInterrupt: 