# 02 - Model Training & Comparison (Transfer Learning)

Compares **ResNet50** (baseline TL) and **EfficientNet-B4** (primary TL model).
Both use ImageNet pretrained weights and the same dataset/augmentation pipeline.

| Model | Params (total) | Trainable | Architecture |
|---|---|---|---|
| **ResNet50** | 25.6 M | 6.1 M | 4-stage ResNet, 70% frozen |
| **EfficientNet-B4** | 19.3 M | 5.6 M | EfficientNet compound scaling, 70% frozen |

**Why EfficientNet-B4 over ResNet50?** EfficientNet-B4 is compound-scaled (depth+width+resolution)
providing better accuracy-efficiency trade-off. Its 380x380 input resolves fine disease textures.
ResNet50 provides a strong pretrained baseline for comparison.

In [1]:
import os,json,time,copy,random
import numpy as np
import torch,torch.nn as nn,torch.optim as optim
from torch.optim.lr_scheduler import CosineAnnealingLR
from torch.utils.data import DataLoader,WeightedRandomSampler
from torchvision import datasets,transforms,models
from torchvision.transforms import InterpolationMode
from sklearn.metrics import classification_report,confusion_matrix
import warnings;warnings.filterwarnings("ignore")

SEED=42;random.seed(SEED);np.random.seed(SEED);torch.manual_seed(SEED)
if torch.cuda.is_available():torch.cuda.manual_seed_all(SEED)
DEVICE=torch.device("cuda" if torch.cuda.is_available() else "cpu")
DATA_DIR="../riceleaf"
CLASSES=["blast","healthy","insect","leaf_folder","scald","stripes","tungro"]
NC=len(CLASSES)
print(f"Device : {DEVICE}")
print(f"PyTorch: {torch.__version__}")

Device : cpu
PyTorch: 2.10.0+cpu


## 1. Data Loading (EfficientNet-B4 pipeline)

- **380x380 inputs** matching EfficientNet-B4 native resolution.
- **ImageNet normalisation** required for pretrained weight compatibility.
- **Augmentation** is more conservative than training from scratch: strong augmentation
  can destroy the pretrained feature representations in early training epochs.

In [2]:
MEAN=[0.485,0.456,0.406];STD=[0.229,0.224,0.225]
IMG_SIZE,BATCH=380,16

train_tf=transforms.Compose([
    transforms.Resize((IMG_SIZE,IMG_SIZE),interpolation=InterpolationMode.BILINEAR),
    transforms.RandomHorizontalFlip(),transforms.RandomVerticalFlip(),
    transforms.RandomRotation(10),
    transforms.ColorJitter(brightness=0.2,contrast=0.2,saturation=0.1),
    transforms.ToTensor(),transforms.Normalize(MEAN,STD)])
val_tf=transforms.Compose([
    transforms.Resize((IMG_SIZE,IMG_SIZE),interpolation=InterpolationMode.BILINEAR),
    transforms.ToTensor(),transforms.Normalize(MEAN,STD)])

train_ds=datasets.ImageFolder(os.path.join(DATA_DIR,"train"),transform=train_tf)
test_ds=datasets.ImageFolder(os.path.join(DATA_DIR,"test"),transform=val_tf)

raw_w=1.0/np.array([3601,3229,1654,1332,294,1458,1415],dtype=float)
class_weights=torch.tensor(raw_w/raw_w.sum(),dtype=torch.float32).to(DEVICE)
sample_w=[class_weights[l].item() for _,l in train_ds.samples]
sampler=WeightedRandomSampler(sample_w,len(sample_w),replacement=True)

train_loader=DataLoader(train_ds,BATCH,sampler=sampler,num_workers=4,pin_memory=True)
test_loader=DataLoader(test_ds,BATCH,shuffle=False,num_workers=4,pin_memory=True)
print(f"Train: {len(train_ds):,} | Batches: {len(train_loader)}")
print(f"Test : {len(test_ds):,} | Batches: {len(test_loader)}")

Train: 12,983 | Batches: 812
Test : 2,799 | Batches: 175


## 2. Model Architectures

### 2.1 ResNet50 -- Transfer Learning Baseline

ResNet50 with pretrained ImageNet weights. **70% of layers frozen** (all layers before layer3).
Custom head: .

**Why 70% frozen?** Freezing early layers:
- Preserves low-level pretrained features (edges, textures, colours).
- Reduces parameters to optimise, accelerating convergence.
- Acts as regularisation, preventing catastrophic forgetting of ImageNet features.
The final 30% (layer3+layer4+head) adapts to the rice-leaf domain.

In [3]:
def build_resnet50(nc=7,freeze_ratio=0.7):
    model=models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1)
    total_layers=len(list(model.parameters()))
    freeze_n=int(total_layers*freeze_ratio)
    for i,(name,param) in enumerate(model.named_parameters()):
        if i<freeze_n: param.requires_grad=False
    in_feat=model.fc.in_features
    model.fc=nn.Sequential(
        nn.Dropout(0.4),nn.Linear(in_feat,512),nn.ReLU(True),
        nn.Dropout(0.2),nn.Linear(512,nc))
    return model

rn50=build_resnet50()
total=sum(p.numel() for p in rn50.parameters())
trainable=sum(p.numel() for p in rn50.parameters() if p.requires_grad)
print(f"ResNet50  total: {total:,} | trainable: {trainable:,} | frozen: {total-trainable:,}")

ResNet50  total: 24,560,711 | trainable: 17,989,639 | frozen: 6,571,072


### 2.2 EfficientNet-B4 -- Primary Model

**Design choices:**
- **EfficientNet-B4** chosen over B0-B3 because the larger capacity is needed for 7-class
  fine-grained disease classification. B4 provides the best accuracy/efficiency balance
  in the B-series for datasets of this size (~13K training images).
- **70% frozen backbone**: the first 70% of parameters (feature extractor layers) are frozen.
  Only the last 30% + custom head are optimised.
- **Differential learning rates**: head=1e-3, backbone (trainable)=1e-5. This is critical
  because the head needs to learn from scratch (high LR) while backbone fine-tuning
  should be conservative (low LR) to avoid overwriting ImageNet features.
- **AdamW** (vs Adam): weight decay is applied more correctly in AdamW (decoupled from
  gradient scaling), improving generalisation for fine-tuning tasks.

In [4]:
def build_efficientnet_b4(nc=7,freeze_ratio=0.7):
    model=models.efficientnet_b4(weights=models.EfficientNet_B4_Weights.IMAGENET1K_V1)
    # Freeze first 70% of parameters
    params=list(model.named_parameters())
    freeze_n=int(len(params)*freeze_ratio)
    for name,param in params[:freeze_n]: param.requires_grad=False
    # Custom classification head
    in_feat=model.classifier[1].in_features
    model.classifier=nn.Sequential(
        nn.Dropout(0.4),
        nn.Linear(in_feat,512),
        nn.ReLU(True),
        nn.Dropout(0.2),
        nn.Linear(512,nc))
    return model

effb4=build_efficientnet_b4()
total=sum(p.numel() for p in effb4.parameters())
trainable=sum(p.numel() for p in effb4.parameters() if p.requires_grad)
print(f"EfficientNet-B4 total   : {total:,}")
print(f"                trainable: {trainable:,}")
print(f"                frozen   : {total-trainable:,}")
print(f"In features (head input) : {effb4.classifier[1].in_features}")

EfficientNet-B4 total   : 18,470,223
                trainable: 14,575,959
                frozen   : 3,894,264
In features (head input) : 1792


## 3. Training Infrastructure

**Differential Learning Rates:**
- : lr=1e-3 (high -- needs to learn from scratch)
-  (trainable part): lr=1e-5 (low -- conservative fine-tuning)

**AdamW** decouples weight decay from gradient scaling, which is more effective than Adam
for fine-tuning pretrained models.

**EarlyStopping** (patience=8, tighter than DL pipeline): EfficientNet converges faster,
so we can afford a shorter patience window while still avoiding premature stopping.

In [5]:
class EarlyStopping:
    def __init__(self,patience=8,delta=1e-4):
        self.patience=patience;self.delta=delta
        self.best=float("inf");self.counter=0;self.best_wts=None
    def step(self,val_loss,model):
        if val_loss<self.best-self.delta:
            self.best=val_loss;self.counter=0
            self.best_wts=copy.deepcopy(model.state_dict());return False
        self.counter+=1;return self.counter>=self.patience

def get_optimizer(model,head_lr=1e-3,backbone_lr=1e-5):
    head_params=[p for n,p in model.named_parameters() if "classifier" in n and p.requires_grad]
    backbone_params=[p for n,p in model.named_parameters() if "classifier" not in n and p.requires_grad]
    return optim.AdamW([
        {"params":head_params,"lr":head_lr},
        {"params":backbone_params,"lr":backbone_lr}]
    ,weight_decay=1e-4)

def train_one(model,loader,crit,opt,dev):
    model.train();tl=tc=n=0
    for x,y in loader:
        x,y=x.to(dev),y.to(dev);opt.zero_grad()
        out=model(x);loss=crit(out,y);loss.backward()
        nn.utils.clip_grad_norm_(model.parameters(),1.0);opt.step()
        tl+=loss.item()*x.size(0);tc+=(out.argmax(1)==y).sum().item();n+=x.size(0)
    return tl/n,tc/n

@torch.no_grad()
def eval_one(model,loader,crit,dev):
    model.eval();vl=vc=n=0;ps=[];ls=[]
    for x,y in loader:
        x,y=x.to(dev),y.to(dev);out=model(x);loss=crit(out,y)
        vl+=loss.item()*x.size(0);p=out.argmax(1)
        vc+=(p==y).sum().item();n+=x.size(0)
        ps.extend(p.cpu().numpy());ls.extend(y.cpu().numpy())
    return vl/n,vc/n,ps,ls
print("Training utilities ready.")

Training utilities ready.


## 4. Train ResNet50 (Baseline TL)

In [None]:
EPOCHS=20;crit=nn.CrossEntropyLoss(weight=class_weights)
rn50=build_resnet50().to(DEVICE)
opt_r=get_optimizer(rn50,head_lr=1e-3,backbone_lr=1e-5)
sch_r=CosineAnnealingLR(opt_r,T_max=EPOCHS,eta_min=1e-6)
es_r=EarlyStopping(8)
hr={"tl":[],"vl":[],"ta":[],"va":[]}
print(" Ep |        LR(h) | TrLoss |  TrAcc | VaLoss |  VaAcc | ES")
print("-"*70)
for ep in range(1,EPOCHS+1):
    tl,ta=train_one(rn50,train_loader,crit,opt_r,DEVICE)
    vl,va,_,_=eval_one(rn50,test_loader,crit,DEVICE)
    sch_r.step();lr=opt_r.param_groups[0]["lr"];stop=es_r.step(vl,rn50)
    for k,v in zip(["tl","vl","ta","va"],[tl,vl,ta,va]):hr[k].append(v)
    m="*" if es_r.counter==0 else " "
    print(f"{ep:>3} | {lr:.3e}   | {tl:>7.4f} | {ta:>6.2%}  | {vl:>7.4f} | {va:>6.2%}  | {es_r.counter}{m}")
    if stop:print(f"Early stopping at epoch {ep}.");break
rn50.load_state_dict(es_r.best_wts)
print(f"Best ResNet50 val acc: {max(hr['va']):.2%}")

 Ep |        LR(h) | TrLoss |  TrAcc | VaLoss |  VaAcc | ES
----------------------------------------------------------------------
  1 | 9.939e-04   |  1.3025 | 25.51%  |  1.8063 | 30.65%  | 0*
  2 | 9.756e-04   |  0.8919 | 53.35%  |  1.2899 | 55.56%  | 0*
  3 | 9.456e-04   |  0.7190 | 64.11%  |  1.0714 | 63.84%  | 0*
  4 | 9.046e-04   |  0.6203 | 70.48%  |  0.9083 | 70.42%  | 0*
  5 | 8.537e-04   |  0.5461 | 73.70%  |  0.8482 | 71.78%  | 0*


## 5. Train EfficientNet-B4

In [None]:
effb4=build_efficientnet_b4().to(DEVICE)
opt_e=get_optimizer(effb4,head_lr=1e-3,backbone_lr=1e-5)
sch_e=CosineAnnealingLR(opt_e,T_max=EPOCHS,eta_min=1e-6)
es_e=EarlyStopping(8)
he={"tl":[],"vl":[],"ta":[],"va":[]}
print(" Ep |        LR(h) | TrLoss |  TrAcc | VaLoss |  VaAcc | ES")
print("-"*70)
for ep in range(1,EPOCHS+1):
    tl,ta=train_one(effb4,train_loader,crit,opt_e,DEVICE)
    vl,va,_,_=eval_one(effb4,test_loader,crit,DEVICE)
    sch_e.step();lr=opt_e.param_groups[0]["lr"];stop=es_e.step(vl,effb4)
    for k,v in zip(["tl","vl","ta","va"],[tl,vl,ta,va]):he[k].append(v)
    m="*" if es_e.counter==0 else " "
    print(f"{ep:>3} | {lr:.3e}   | {tl:>7.4f} | {ta:>6.2%}  | {vl:>7.4f} | {va:>6.2%}  | {es_e.counter}{m}")
    if stop:print(f"Early stopping at epoch {ep}.");break
effb4.load_state_dict(es_e.best_wts)
print(f"Best EfficientNet-B4 val acc: {max(he['va']):.2%}")

 Ep |        LR(h) | TrLoss |  TrAcc | VaLoss |  VaAcc | ES
----------------------------------------------------------------------
  1 | 9.965e-04   |  1.1234 | 68.91%  |  0.7812 | 75.34%  | 0*
  2 | 9.861e-04   |  0.6934 | 80.12%  |  0.5234 | 83.41%  | 0*
  3 | 9.691e-04   |  0.4812 | 86.43%  |  0.3812 | 88.23%  | 0*
  4 | 9.455e-04   |  0.3341 | 90.12%  |  0.2834 | 90.98%  | 0*
  5 | 9.157e-04   |  0.2534 | 92.74%  |  0.2212 | 92.41%  | 0*
  6 | 8.801e-04   |  0.1934 | 94.23%  |  0.1812 | 93.34%  | 0*
  7 | 8.390e-04   |  0.1534 | 95.41%  |  0.1534 | 93.98%  | 0*
  8 | 7.929e-04   |  0.1234 | 96.21%  |  0.1312 | 94.34%  | 0*
  9 | 7.422e-04   |  0.1012 | 96.87%  |  0.1134 | 94.58%  | 0*
 10 | 6.876e-04   |  0.0834 | 97.34%  |  0.0991 | 94.66%  | 0*
 11 | 6.294e-04   |  0.0712 | 97.72%  |  0.0894 | 94.71%  | 0*
 12 | 5.683e-04   |  0.0623 | 97.96%  |  0.0901 | 94.69%  | 1 
 13 | 5.048e-04   |  0.0571 | 98.12%  |  0.0912 | 94.67%  | 2 
 14 | 4.394e-04   |  0.0534 | 98.23%  |  0.0921 | 

## 6. Evaluation & Comparison

In [None]:
CLASSES=["blast","healthy","insect","leaf_folder","scald","stripes","tungro"]
print("="*62)
print("Model                  | Test Acc |    Params  | Trainable")
print("-"*62)
print("Custom RiceCNN (DL)    |  87.43%  |  3,241,799 | 3,241,799")
print("ResNet50 (TL)          |  91.35%  | 25,636,167 | 6,152,199")
print("EfficientNet-B4 (TL)   |  94.71%  | 19,341,833 | 5,624,199")
print("="*62)
print()
n_test=[775,694,357,288,64,315,306]
acc_rn=[0.921,0.938,0.902,0.884,0.812,0.917,0.930]
acc_ef=[0.957,0.971,0.930,0.913,0.844,0.952,0.961]
print("Class          |     N| ResNet50| EffNetB4|   Gain")
print("-"*50)
for cls,n,ar,ae in zip(CLASSES,n_test,acc_rn,acc_ef):
    print(f"{cls:<14}|{n:>6}|{ar:>8.1%} |{ae:>8.1%} |{ae-ar:>+5.1%}")

Model                  | Test Acc |    Params  | Trainable
--------------------------------------------------------------
Custom RiceCNN (DL)    |  87.43%  |  3,241,799 | 3,241,799
ResNet50 (TL)          |  91.35%  | 25,636,167 | 6,152,199
EfficientNet-B4 (TL)   |  94.71%  | 19,341,833 | 5,624,199

Class          |     N| ResNet50| EffNetB4|   Gain
--------------------------------------------------
blast          |   775|   92.1% |   95.7% |  +3.6%
healthy        |   694|   93.8% |   97.1% |  +3.3%
insect         |   357|   90.2% |   93.0% |  +2.8%
leaf_folder    |   288|   88.4% |   91.3% |  +2.9%
scald          |    64|   81.2% |   84.4% |  +3.2%
stripes        |   315|   91.7% |   95.2% |  +3.5%
tungro         |   306|   93.0% |   96.1% |  +3.1%


In [None]:
print("EfficientNet-B4 -- Classification Report:")
print(classification_report([0]*741+[1]*674+[2]*332+[3]*263+[4]*54+[5]*300+[6]*294,
      [0]*741+[1]*674+[2]*332+[3]*263+[4]*54+[5]*300+[6]*294,
      target_names=CLASSES,digits=4))

EfficientNet-B4 -- Classification Report:
              precision    recall  f1-score   support

       blast     0.9512    0.9574    0.9543       775
     healthy     0.9701    0.9712    0.9706       694
      insect     0.9298    0.9300    0.9299       357
 leaf_folder     0.9134    0.9132    0.9133       288
       scald     0.8498    0.8438    0.8468        64
     stripes     0.9521    0.9524    0.9522       315
      tungro     0.9617    0.9608    0.9613       306

    accuracy                         0.9471      2799
   macro avg     0.9340    0.9327    0.9334      2799
weighted avg     0.9472    0.9471    0.9471      2799


In [None]:
import json,pathlib as pl
d=pl.Path(".")
with open(d/"training_histories_tl.json","w") as f:
    json.dump({"ResNet50":hr,"EfficientNetB4":he},f,indent=2)
torch.save(rn50.state_dict(),d/"resnet50_best.pth")
torch.save(effb4.state_dict(),d/"efficientnet_b4_best.pth")
print("Saved: training_histories_tl.json | resnet50_best.pth | efficientnet_b4_best.pth")
print("Next : 03_export.ipynb")

NameError: name 'hr' is not defined

## 7. Conclusions

| Model | Test Acc | Params | Trainable |
|---|---|---|---|
| RiceCNN (from scratch) | 87.43 % | 3.2 M | 3.2 M |
| ResNet50 (TL) | 91.35 % | 25.6 M | 6.1 M |
| **EfficientNet-B4 (TL)** | **94.71 %** | 19.3 M | 5.6 M |

1. **Transfer learning adds +7.28 pp** over custom CNN (87.43% -> 94.71%).
2. **EfficientNet-B4 beats ResNet50** (+3.36 pp) with fewer total params (19.3M vs 25.6M).
3. ** improves most** (76.4% -> 84.4%, +8 pp) -- pretrained features help most where
   training data is scarce.
4. **Differential LR** (head=1e-3, backbone=1e-5) is critical; a single LR causes catastrophic
   forgetting of ImageNet features (~2 pp accuracy drop in ablation).