In [1]:
from ultralytics import YOLO
import os

In [8]:
yolo = YOLO('yolov8n.pt')  # load a pretrained model (recommended for training)

In [None]:
yolo.train(data="../data.yaml", epochs=2, imgsz=640, device='mps')

## Train with freezing layers

This can be done with the help of custom callback functions in Ultralytics. See the link for further reference:
https://docs.ultralytics.com/usage/callbacks/#trainer-callbacks

In [14]:
for k, v in yolo.model.named_parameters():
    print(k)

model.0.conv.weight
model.0.bn.weight
model.0.bn.bias
model.1.conv.weight
model.1.bn.weight
model.1.bn.bias
model.2.cv1.conv.weight
model.2.cv1.bn.weight
model.2.cv1.bn.bias
model.2.cv2.conv.weight
model.2.cv2.bn.weight
model.2.cv2.bn.bias
model.2.m.0.cv1.conv.weight
model.2.m.0.cv1.bn.weight
model.2.m.0.cv1.bn.bias
model.2.m.0.cv2.conv.weight
model.2.m.0.cv2.bn.weight
model.2.m.0.cv2.bn.bias
model.3.conv.weight
model.3.bn.weight
model.3.bn.bias
model.4.cv1.conv.weight
model.4.cv1.bn.weight
model.4.cv1.bn.bias
model.4.cv2.conv.weight
model.4.cv2.bn.weight
model.4.cv2.bn.bias
model.4.m.0.cv1.conv.weight
model.4.m.0.cv1.bn.weight
model.4.m.0.cv1.bn.bias
model.4.m.0.cv2.conv.weight
model.4.m.0.cv2.bn.weight
model.4.m.0.cv2.bn.bias
model.4.m.1.cv1.conv.weight
model.4.m.1.cv1.bn.weight
model.4.m.1.cv1.bn.bias
model.4.m.1.cv2.conv.weight
model.4.m.1.cv2.bn.weight
model.4.m.1.cv2.bn.bias
model.5.conv.weight
model.5.bn.weight
model.5.bn.bias
model.6.cv1.conv.weight
model.6.cv1.bn.weight
model.

In [12]:
def freeze_layer(trainer):
    model = trainer.model
    num_freeze = 10
    print(f"Freezing {num_freeze} layers")
    freeze = [f'model.{x}.' for x in range(num_freeze)]  # layers to freeze 
    for k, v in model.named_parameters(): 
        v.requires_grad = True  # train all layers 
        if any(x in k for x in freeze): 
            print(f'freezing {k}') 
            v.requires_grad = False 
    print(f"{num_freeze} layers are freezed.")

In [None]:
yolo.add_callback("on_train_start", freeze_layer)
yolo.train(data="../data.yaml")

Freezing layers in YOLOv8 using a custom callback function can indeed help to freeze specific layers during training. However, freezing layers may not necessarily result in faster training speed.

Freezing layers typically aims to retain the knowledge in the pre-trained layers while only updating the unfrozen layers. This can be useful when you want to fine-tune a model on a new dataset without losing the previously learned knowledge.

The reason why freezing layers may not lead to faster training speed in YOLOv8 could be due to the nature of the model architecture. YOLOv8 consists of multiple components, including backbone layers, neck layers, and detection heads. Freezing layers in one component may not necessarily speed up the training process since the model still needs to compute forward and backward passes through the unfrozen layers.

It's important to note that the effectiveness of freezing layers may vary depending on the specific use case and dataset. It's always a good idea to experiment and evaluate the impact of freezing layers on the overall training performance and model accuracy in a particular scenario.

However, freezing too many layers can cause the model to lose the ability to learn and make accurate predictions. Instead of freezing the first 10 layers, try freezing only a smaller number of layers, such as the first 3-5 layers. This can help the model to retain its ability to learn and improve accuracy on the new dataset.

### How to do transfer learning the right way

When using YOLOv8, you can indeed use transfer learning by freezing the first few layers of the model. By freezing these layers, you can retain the pre-trained weights from a previous model (trained on dataset A) and fine-tune the model on a new dataset (dataset B).

However, it's important to note that freezing the initial layers will only allow the model to focus on learning the patterns specific to dataset B. As a result, the model will primarily detect objects from dataset B and may not perform as well on objects from dataset A.

In your case, when you load the bestA.pt weights and freeze the first 5 layers, the model will learn to detect objects specific to dataset B while ignoring the objects from dataset A. This behavior is expected as the model is fine-tuned to prioritize dataset B.

If you want the model to detect objects from both datasets A and B, you would need to train the model on a combined dataset that includes samples from both datasets. This way, the model can learn to detect objects from both datasets simultaneously.


### Discriminative learning rate

In YOLOv8, lr0 and lrf are both used during training to control the learning rate schedule. The lr0 parameter is the initial learning rate, and lrf will calculate the final learning rate at the last epoch of training (=lr0 * lrf). By default, in YOLOv8, both lr0 and lrf have the same value (0.01), and this value is used during training. If you changed these values individually and observed no change in training behavior, then it is possible that other factors (such as the number of epochs or batch size) might be affecting the learning rate as well.

Regarding the cos_lr parameter, if it is set to True, then the learning rate schedule will follow a cosine annealing pattern rather than a linear schedule. This can lead to a smoother learning rate schedule and potentially better results. Both lr0 and lrf are still used in the cosine annealing learning rate schedule.