YOLOv3-PyTorch

This is was the research topic for my master's thesis. You are welcome to play with the code, ~~but please don't hijack my research. I'll die.~~
A guide on 'how to use this code'
- First, download the CARRADA Dataset, the Pascal_VOC Dataset from kaggle, or the CFAR Dataset and structure the folders as shown in the file tree below
- Second, set up a virtual environment using Anaconda, e.g.
  - conda create --name pt3.7 python=3.7
  - conda create --name pt3.8 python=3.8
- Before installing any packages, remember to enter your conda virtual environment, e.g.
  - conda activate pt3.7
  - conda activate pt3.8
- Third, you can manually install all the packages that you need, or you can install with pip install -r requirements.txt
- Then, copy the code to anywhere you like, and make sure you have changed the file path in config.py before running the code
  - Just click the 'run' button and see the results
- Caveats:
  - There are a bunch of dead code, commented code, and outdated comments in my program
  - I use albumentations library solely for the purpose of padding
- Dataset file tree
```
D:\Datasets\RADA\RD_JPG>tree
D:.
├─checks
├─images
├─imagesc
├─imwrite
├─labels
├─mats
└─training_logs
    ├─mAP
    ├─test
    │  ├─class_accuracy
    │  ├─no_object_accuracy
    │  └─object_accuracy
    └─train
        ├─class_accuracy
        ├─losses
        ├─mean_loss
        ├─no_object_accuracy
        └─object_accuracy
```
- Stable dependency
  - for python 3.7
```
python==3.7.13
numpy==1.19.2
pytorch==1.7.1
torchaudio==0.7.2
torchvision==0.8.2
pandas==1.2.1
pillow==8.1.0 
tqdm==4.56.0
albumentations==0.5.2 
matplotlib==3.3.4
```
  - for python 3.8
```
python==3.8.16
numpy==1.23.5
pytorch==1.13.1
pytorch-cuda==11.7
torchaudio==0.13.1
torchvision==0.14.1
pandas==1.5.2
pillow==9.3.0
tqdm==4.64.1
albumentations==1.3.0
matplotlib==3.6.2
```
  - It's well tested, and the code can be properly executed under these settings

Notes

2023.05.09

The training duration is 22.8276 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 14e-5

Under 300 epochs
Try to observe the difference between training directly for 200 epochs versus the combination of separate training for 100 epochs each

--------------------------------------------------
The stats of 2023-05-09-1 training:
--------------------------------------------------
max mAP:  0.4652663767337799
mean mAP: 0.41992816428343455

max training loss: 120.43387603759766
min training loss: 0.7253801822662354

max training loss on average: 15.41774600982666
min training loss on average: 0.9773107906182606

min training accuracy: 2.3786652088165283
max training accuracy: 97.95069122314453

min testing accuracy: 35.47798538208008
max testing accuracy: 72.39334106445312
--------------------------------------------------

2023.05.08

The training duration is 14.2820 hours with higher WEIGHT_DECAY = 1e-3 and LEARNING_RATE = 15e-5

Under 200 epochs and higher weight decay setting
Try to see how the previous best learning rate goes with higher weight decay

--------------------------------------------------
The stats of 2023-05-08-1 training: 
--------------------------------------------------
max mAP:  0.42102280259132385
mean mAP: 0.37317809015512465

max training loss: 174.62034606933594
min training loss: 1.0440865755081177

max training loss on average: 17.338274812698366
min training loss on average: 1.3041423439979554

min training accuracy: 0.4254150986671448
max training accuracy: 92.8823013305664

min testing accuracy: 34.81633758544922
max testing accuracy: 69.26762390136719
--------------------------------------------------

Using torchinfo.summary() to get the result

The second way to get model summary in PyTorch besides torchsummary.summary()

sample code

import torchsummary            # torchsummary.summary()
from torchinfo import summary  # torchinfo.summary()

# simple test settings
IMAGE_SIZE = 416  # multiples of 32 are workable with stride [32, 16, 8]
num_classes = 3   # 
batch_size = 20   # num_examples
num_channels = 3  # num_anchors

model = YOLOv3(num_classes=num_classes) # initialize a YOLOv3 model as model

# simple test with random inputs of 20 examples, 3 channels, and IMAGE_SIZE-by-IMAGE_SIZE input
x = torch.randn((batch_size, num_channels, IMAGE_SIZE, IMAGE_SIZE))

out = model(x)

# print out the model summary using torchinfo.summary()
summary(model.cuda(), input_size=(batch_size, num_channels, IMAGE_SIZE, IMAGE_SIZE))

model parameter summary

====================================================================================================
Total params: 61,534,648
Trainable params: 61,534,648
Non-trainable params: 0
Total mult-adds (G): 653.05
====================================================================================================
Input size (MB): 41.53
Forward/backward pass size (MB): 12265.99
Params size (MB): 246.14
Estimated Total Size (MB): 12553.66
====================================================================================================

2023.05.07

The training duration is 8.3639 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 14e-5

Still 100 epochs
Continued the training with the weights of checkpoint-2023-05-02-2.pth.tar with same weight decay and learning rate

--------------------------------------------------
The stats of 2023-05-07-1 training:
--------------------------------------------------
max mAP:  0.44356226921081543
mean mAP: 0.42320117354393005

max training loss: 4.268791675567627
min training loss: 0.8072612285614014

max training loss on average: 2.3641905311743416
min training loss on average: 1.0348780262470245

min training accuracy: 68.03897857666016
max training accuracy: 96.88028717041016

min testing accuracy: 59.59388732910156
max testing accuracy: 67.21424102783203
--------------------------------------------------

2023.05.06

The training duration is 8.7493 hours with higher weight decay of WEIGHT_DECAY = 1e-3 and LEARNING_RATE = 14e-5

Switching back to 100 epochs
Continued the training with the weights of checkpoint-2023-05-02-2.pth.tar (with WEIGHT_DECAY = 1e-4)

--------------------------------------------------
The stats of 2023-05-06-1 training:
--------------------------------------------------
max mAP:  0.4469827115535736
mean mAP: 0.41541612446308135

max training loss: 7.434675216674805
min training loss: 0.8201318383216858

max training loss on average: 5.396891689300537
min training loss on average: 1.0210446101427078

min training accuracy: 65.16170501708984
max training accuracy: 96.99007415771484

min testing accuracy: 49.76043701171875
max testing accuracy: 65.93656921386719
--------------------------------------------------

2023.05.05

The training duration is 26.4082 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 15e-5

Switching up to 400 epochs

--------------------------------------------------
The stats of 2023-05-05-1 training: 
--------------------------------------------------
max mAP:  0.4396911561489105
mean mAP: 0.4000327423214912

max training loss: 171.85177612304688
min training loss: 0.6924741864204407

max training loss on average: 15.746856501897176
min training loss on average: 0.9486591788132985

min training accuracy: 3.29353666305542
max training accuracy: 97.90494537353516

min testing accuracy: 34.040611267089844
max testing accuracy: 74.72051239013672
--------------------------------------------------

2023.05.04

The training duration is 7.4228 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 19e-5

--------------------------------------------------
The stats of 2023-05-04-1 training: 
--------------------------------------------------
max mAP:  0.44310665130615234
mean mAP: 0.38241996318101884

max training loss: 124.89086151123047
min training loss: 1.0636780261993408

max training loss on average: 16.811252358754476
min training loss on average: 1.2993631919225057

min training accuracy: 2.063034772872925
max training accuracy: 92.96463775634766

min testing accuracy: 31.75906753540039
max testing accuracy: 70.13461303710938
--------------------------------------------------

The training duration is 8.4733 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 20e-5

--------------------------------------------------
The stats of 2023-05-04-2 training: 
--------------------------------------------------
max mAP:  0.4227812588214874
mean mAP: 0.34420192539691924

max training loss: 138.41775512695312
min training loss: 1.0614862442016602

max training loss on average: 15.212103751500448
min training loss on average: 1.326857070128123

min training accuracy: 5.338273525238037
max training accuracy: 93.10186767578125

min testing accuracy: 31.074607849121094
max testing accuracy: 69.54141235351562
--------------------------------------------------

2023.05.03

The training duration is 7.1341 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 17e-5

--------------------------------------------------
The stats of 2023-05-03-1 training: 
--------------------------------------------------
max mAP:  0.4469388425350189
mean mAP: 0.3841908037662506

max training loss: 113.7841567993164
min training loss: 0.963789165019989

max training loss on average: 15.0015398200353
min training loss on average: 1.2312769017616907

min training accuracy: 7.767257213592529
max training accuracy: 94.10365295410156

min testing accuracy: 33.51585388183594
max testing accuracy: 69.63267517089844
--------------------------------------------------

The training duration is 7.1676 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 18e-5

--------------------------------------------------
The stats of 2023-05-03-2 training: 
--------------------------------------------------
max mAP:  0.44689252972602844
mean mAP: 0.3790498897433281

max training loss: 175.32229614257812
min training loss: 1.0493061542510986

max training loss on average: 17.080741675694785
min training loss on average: 1.291623563369115

min training accuracy: 7.433328628540039
max training accuracy: 93.3305892944336

min testing accuracy: 34.49692153930664
max testing accuracy: 70.79625701904297
--------------------------------------------------

2023.05.02

The training duration is 6.8200 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 13e-5

--------------------------------------------------
The stats of 2023-05-02-1 training:
--------------------------------------------------
max mAP:  0.4374929666519165
mean mAP: 0.3631990686058998

max training loss: 134.36065673828125
min training loss: 0.9601410627365112

max training loss on average: 18.045101165771484
min training loss on average: 1.2157120569547017

min training accuracy: 2.877269983291626
max training accuracy: 94.9224624633789

min testing accuracy: 42.20853042602539
max testing accuracy: 70.84188842773438
--------------------------------------------------

The training duration is 5.8219 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 14e-5

--------------------------------------------------
The stats of 2023-05-02-2 training: 
--------------------------------------------------
max mAP:  0.452169269323349
mean mAP: 0.3887074992060661

max training loss: 135.23342895507812
min training loss: 0.9823306798934937

max training loss on average: 16.633436683019003
min training loss on average: 1.268118454615275

min training accuracy: 3.00077748298645
max training accuracy: 94.62512969970703

min testing accuracy: 30.800823211669922
max testing accuracy: 73.10061645507812
--------------------------------------------------

The training duration is 5.5819 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 15e-5

--------------------------------------------------
The stats of 2023-05-02-3 training: 
--------------------------------------------------
max mAP:  0.45209288597106934
mean mAP: 0.3701558232307434

max training loss: 217.28318786621094
min training loss: 1.000074863433838

max training loss on average: 16.713819392522176
min training loss on average: 1.2200019482771556

min training accuracy: 5.814006328582764
max training accuracy: 94.16769409179688

min testing accuracy: 41.84348678588867
max testing accuracy: 69.8380126953125
--------------------------------------------------

The training duration is 8.4758 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 16e-5

--------------------------------------------------
The stats of 2023-05-02-4 training: 
--------------------------------------------------
max mAP:  0.43429771065711975
mean mAP: 0.3629924669861794

max training loss: 178.13705444335938
min training loss: 0.9736015796661377

max training loss on average: 16.70783141930898
min training loss on average: 1.2728607519467672

min training accuracy: 6.477288246154785
max training accuracy: 93.60047912597656

min testing accuracy: 21.12708282470703
max testing accuracy: 72.98653411865234
--------------------------------------------------

The comparison between different LEARNING_RATE under the same WEIGHT_DECAY = 1e-4
- The loss value for every updates
- The train-object-accuracy for every epochs
- The test-object-accuracy for every 10 epochs
- The mAP for every 10 epochs

2023.05.01

The training duration is 5.7350 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 10e-5

--------------------------------------------------
The stats of 2023-05-01-1 training:
--------------------------------------------------
max mAP:  0.43861937522888184
mean mAP: 0.3436750084161758

max training loss: 140.25660705566406
min training loss: 0.8655197024345398

max training loss on average: 18.507166700363157
min training loss on average: 1.1550286275148391

min training accuracy: 4.963176250457764
max training accuracy: 94.96363067626953

min testing accuracy: 36.68720245361328
max testing accuracy: 68.37782287597656
--------------------------------------------------

The training duration is 7.0366 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 11e-5

--------------------------------------------------
The stats of 2023-05-01-2 training:
--------------------------------------------------
max mAP:  0.449009507894516
mean mAP: 0.38735678046941757

max training loss: 102.38961791992188
min training loss: 0.9561270475387573

max training loss on average: 17.273788038889567
min training loss on average: 1.2045013213157654

min training accuracy: 3.2843875885009766
max training accuracy: 96.39083099365234

min testing accuracy: 35.68332290649414
max testing accuracy: 73.03217315673828
--------------------------------------------------

The training duration is 7.1689 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 12e-5

--------------------------------------------------
The stats of 2023-05-01-3 training:
--------------------------------------------------
max mAP:  0.4372125566005707
mean mAP: 0.38275537043809893

max training loss: 125.8486099243164
min training loss: 0.9757415056228638

max training loss on average: 17.398162371317547
min training loss on average: 1.2320519105593364

min training accuracy: 1.1024198532104492
max training accuracy: 94.62055206298828

min testing accuracy: 34.86196517944336
max testing accuracy: 73.3515853881836
--------------------------------------------------

2023.04.30

The training duration is 7.0542 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 6e-5

--------------------------------------------------
The stats of 2023-04-29-1 training:
--------------------------------------------------
max mAP:  0.4267594814300537
mean mAP: 0.3732090950012207

max training loss: 70.09312438964844
min training loss: 0.9483757019042969

max training loss on average: 20.225014870961505
min training loss on average: 1.1955717974901199

min training accuracy: 0.8279584646224976
max training accuracy: 95.35245513916016

min testing accuracy: 32.3294563293457
max testing accuracy: 73.48847961425781
--------------------------------------------------

The training duration is 7.1015 hours and WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 7e-5

--------------------------------------------------
The stats of 2023-04-29-2 training:
--------------------------------------------------
max mAP:  0.4309697151184082
mean mAP: 0.37477160841226576

max training loss: 105.76203155517578
min training loss: 0.8929504752159119

max training loss on average: 20.704750878016153
min training loss on average: 1.1069866104920705

min training accuracy: 4.180961608886719
max training accuracy: 96.9443359375

min testing accuracy: 37.37166213989258
max testing accuracy: 77.36710357666016
--------------------------------------------------

The training duration is 6.7780 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 8e-5

--------------------------------------------------
The stats of 2023-04-30-1 training:
--------------------------------------------------
max mAP:  0.4340965747833252
mean mAP: 0.36167612075805666

max training loss: 104.89147186279297
min training loss: 0.9307739734649658

max training loss on average: 19.40190040588379
min training loss on average: 1.1852473825216294

min training accuracy: 1.6238964796066284
max training accuracy: 95.42564392089844

min testing accuracy: 30.458589553833008
max testing accuracy: 71.52635192871094
--------------------------------------------------

The training duration is 5.5800 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 9e-5

--------------------------------------------------
The stats of 2023-04-30-2 training:
--------------------------------------------------
max mAP:  0.43561217188835144
mean mAP: 0.3712215393781662

max training loss: 125.58506774902344
min training loss: 0.9454944729804993

max training loss on average: 18.3668585618337
min training loss on average: 1.1890920907258988

min training accuracy: 2.2048397064208984
max training accuracy: 96.52349090576172

min testing accuracy: 30.778005599975586
max testing accuracy: 72.59867858886719
--------------------------------------------------

How to get model summary in PyTorch?

Using torchsummary to get the result

from torchsummary import summary
# simple test settings
num_classes = 3   # 
num_examples = 20 # batch size
num_channels = 3  # num_anchors

model = YOLOv3(num_classes=num_classes) # initialize a YOLOv3 model as model

# simple test with random inputs of 20 examples, 3 channels, and IMAGE_SIZE-by-IMAGE_SIZE input
x = torch.randn((num_examples, num_channels, IMAGE_SIZE, IMAGE_SIZE))

out = model(x) 

# print out the model summary using third-party library called 'torchsummary'
summary(model.cuda(), (3, 416, 416), bs=16)

model parameter summary

================================================================
Total params: 61,534,504
Trainable params: 61,534,504
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 31.69
Forward/backward pass size (MB): 13175.06
Params size (MB): 234.74
Estimated Total Size (MB): 13441.48
----------------------------------------------------------------

Reference
- stackoverflow How do I print the model summary in PyTorch?
- PyTorch Doc Is there similar pytorch function as model.summary() as keras?

2023.04.29

The comparison between different WEIGHT_DECAY under the same LEARNING_RATE = 3e-5

The loss value for every updates
The train-object-accuracy for every epochs
The test-object-accuracy for every 10 epochs

The mAP for every 10 epochs

2023-04-27, epoch: 100, duration: 7.1676 hours, WEIGHT_DECAY = 1e-1, LEARNING_RATE = 3e-5, max mAP: 0.3289
2023-04-26, epoch: 100, duration: 7.7900 hours, WEIGHT_DECAY = 1e-2, LEARNING_RATE = 3e-5, max mAP: 0.3646
2023-04-25, epoch: 100, duration: 6.2753 hours, WEIGHT_DECAY = 1e-3, LEARNING_RATE = 3e-5, max mAP: 0.3603
2023-04-22, epoch: 100, duration: 7.2117 hours, WEIGHT_DECAY = 1e-4, LEARNING_RATE = 3e-5, max mAP: 0.3792

2023.04.28

The training duration is 7.5511 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 1e-5

--------------------------------------------------
The stats of 2023-04-28 training:
--------------------------------------------------
max mAP:  0.2697495222091675
mean mAP: 0.18186791352927684

max training loss: 92.10067749023438
min training loss: 1.1181566715240479

max training loss on average: 32.7851714070638
min training loss on average: 1.3748071026802062

min training accuracy: 2.0996294021606445
max training accuracy: 92.99666595458984

min testing accuracy: 19.9406795501709
max testing accuracy: 64.90988159179688
--------------------------------------------------

The training duration is 7.2838 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 2e-5

--------------------------------------------------
The stats of 2023-04-27-2 training:
--------------------------------------------------
max mAP:  0.3233576714992523
mean mAP: 0.23364422097802162

max training loss: 67.91127014160156
min training loss: 0.9422303438186646

max training loss on average: 27.790054613749188
min training loss on average: 1.224434497753779

min training accuracy: 0.37967154383659363
max training accuracy: 95.5811767578125

min testing accuracy: 22.19940757751465
max testing accuracy: 69.38169860839844
--------------------------------------------------

The training duration is 7.2117 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 3e-5

--------------------------------------------------
The stats of 2023-04-22 training:
--------------------------------------------------
max mAP:  0.37920689582824707
mean mAP: 0.3020245939493179

max training loss: 72.82600402832031
min training loss: 0.8917444944381714

max training loss on average: 25.31787603378296
min training loss on average: 1.1737037108341852

min training accuracy: 0.5489227175712585
max training accuracy: 96.67901611328125

min testing accuracy: 28.838693618774414
max testing accuracy: 70.72781372070312
--------------------------------------------------

The training duration is 8.1383 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 4e-5

--------------------------------------------------
The stats of 2023-04-28-2 training:
--------------------------------------------------
max mAP:  0.3963651657104492
mean mAP: 0.3341544926166534

max training loss: 67.77149963378906
min training loss: 0.9209076166152954

max training loss on average: 22.19623363494873
min training loss on average: 1.146754193107287

min training accuracy: 0.7410457134246826
max training accuracy: 96.36795806884766

min testing accuracy: 33.926536560058594
max testing accuracy: 70.9787826538086
--------------------------------------------------

The training duration is 7.1785 hours with WEIGHT_DECAY = 1e-4 and LEARNING_RATE = 5e-5

--------------------------------------------------
The stats of 2023-04-28-3 training:
--------------------------------------------------
max mAP:  0.41434526443481445
mean mAP: 0.3443592220544815

max training loss: 65.99470520019531
min training loss: 0.8718012571334839

max training loss on average: 20.811571718851724
min training loss on average: 1.0752028383811314

min training accuracy: 1.0612506866455078
max training accuracy: 96.4731674194336

min testing accuracy: 35.318275451660156
max testing accuracy: 74.19575500488281
--------------------------------------------------

2023.04.27

Performing a grid search to find the optimal weight decay setting, all tests have the same settings except for the weight decay parameter

The training duration is 7.1676 hours with WEIGHT_DECAY = 1e-1

--------------------------------------------------
The stats of 2023-04-27 training: 
--------------------------------------------------
max mAP:  0.328988641500473
mean mAP: 0.26167472153902055

max training loss: 57.49835968017578
min training loss: 2.0004570484161377

max training loss on average: 23.69808032989502
min training loss on average: 2.260551511446635

min training accuracy: 1.3860299587249756
max training accuracy: 78.02479553222656

min testing accuracy: 25.644535064697266
max testing accuracy: 53.70750427246094
--------------------------------------------------

mean mAP: 0.26167472153902055
loss range: [57.49835968017578, 2.0004570484161377]
max training accuracy: 78.02479553222656
max testing accuracy: 53.70750427246094

The training duration is 7.7900 hours with WEIGHT_DECAY = 1e-2

--------------------------------------------------
The stats of 2023-04-26 training: 
--------------------------------------------------
max mAP:  0.36460867524147034
mean mAP: 0.2820669665932655

max training loss: 59.46959686279297
min training loss: 1.341576099395752

max training loss on average: 24.546935682296752
min training loss on average: 1.5733012755711873

min training accuracy: 0.7913635969161987
max training accuracy: 89.63908386230469

min testing accuracy: 29.956649780273438
max testing accuracy: 64.81861114501953
--------------------------------------------------

mean mAP: 0.2820669665932655
loss range: [59.46959686279297, 1.341576099395752]
max training accuracy: 89.63908386230469
max testing accuracy: 64.81861114501953

The training duration is 6.2753 hours with WEIGHT_DECAY = 1e-3

--------------------------------------------------
The stats of 2023-04-25 training: 
--------------------------------------------------
max mAP:  0.3603482246398926
mean mAP: 0.2835115119814873

max training loss: 61.669921875
min training loss: 0.9460040330886841

max training loss on average: 23.978200359344484
min training loss on average: 1.233974441687266

min training accuracy: 1.289968490600586
max training accuracy: 95.745849609375

min testing accuracy: 23.180469512939453
max testing accuracy: 69.15354919433594
--------------------------------------------------

mean mAP: 0.2835115119814873
loss range: [61.669921875, 0.9460040330886841]
max training accuracy: 95.745849609375
max testing accuracy: 69.15354919433594

The training duration is 7.2117 hours with WEIGHT_DECAY = 1e-4

--------------------------------------------------
The stats of 2023-04-22 training: 
--------------------------------------------------
max mAP:  0.37920689582824707
mean mAP: 0.3020245939493179

max training loss: 72.82600402832031
min training loss: 0.8917444944381714

max training loss on average: 25.31787603378296
min training loss on average: 1.1737037108341852

min training accuracy: 0.5489227175712585
max training accuracy: 96.67901611328125

min testing accuracy: 28.838693618774414
max testing accuracy: 70.72781372070312
--------------------------------------------------

mean mAP: 0.3020245939493179
loss range: [72.82600402832031, 0.8917444944381714]
max training accuracy: 96.67901611328125
max testing accuracy: 70.72781372070312

2023.04.26

Performing a grid search to find the optimal weight decay setting, all tests have the same settings except for the weight decay parameter

The training duration is 6.2753 hours with WEIGHT_DECAY = 1e-3

--------------------------------------------------
The stats of 2023-04-25 training: 
--------------------------------------------------
max mAP:  0.3603482246398926
mean mAP: 0.2835115119814873

max training loss: 61.669921875
min training loss: 0.9460040330886841

max training loss on average: 23.978200359344484
min training loss on average: 1.233974441687266

min training accuracy: 1.289968490600586
max training accuracy: 95.745849609375

min testing accuracy: 23.180469512939453
max testing accuracy: 69.15354919433594
--------------------------------------------------

The training duration is 7.7900 hours with WEIGHT_DECAY = 1e-2

--------------------------------------------------
The stats of 2023-04-26 training: 
--------------------------------------------------
max mAP:  0.36460867524147034
mean mAP: 0.2820669665932655

max training loss: 59.46959686279297
min training loss: 1.341576099395752

max training loss on average: 24.546935682296752
min training loss on average: 1.5733012755711873

min training accuracy: 0.7913635969161987
max training accuracy: 89.63908386230469

min testing accuracy: 29.956649780273438
max testing accuracy: 64.81861114501953
--------------------------------------------------

2023.04.24
- The train and test settings

The result of training for 100 epochs, with k_means() anchor that rounded to 3 decimal places

The training duration: 7.2117 hours

--------------------------------------------------
The stats of 2023-04-22 training: 
--------------------------------------------------
max mAP:  0.37920689582824707
mean mAP: 0.3020245939493179

max training loss: 72.82600402832031
min training loss: 0.8917444944381714

max training loss on average: 25.31787603378296
min training loss on average: 1.1737037108341852

min training accuracy: 0.5489227175712585
max training accuracy: 96.67901611328125

min testing accuracy: 28.838693618774414
max testing accuracy: 70.72781372070312
--------------------------------------------------

The result of training for 300 epochs, with same anchors above

The training duration: 20.8263 hours

--------------------------------------------------
The stats of 2023-04-23 training: 
--------------------------------------------------
max mAP:  0.4179251194000244
mean mAP: 0.3632150818904241

max training loss: 72.01780700683594
min training loss: 0.5801995992660522

max training loss on average: 24.274858560562134
min training loss on average: 0.7920041881004969

min training accuracy: 0.45743560791015625
max training accuracy: 99.13544464111328

min testing accuracy: 35.75177001953125
max testing accuracy: 72.34770965576172
--------------------------------------------------

The figures for the stats
- max mAP: 0.4179251194000244
- loss range: [72.82600402832031, 0.8917444944381714]
- max training accuracy: 99.13544464111328
- max testing accuracy: 72.34770965576172
2023.04.23
- Script for plotting the figures plot_training_state.py
2023.04.21
- Script for creating random samples create_csv.py

2023.04.18

The third clustering result using custom k_means()

Number of clusters: 9
Average IoU: 0.6639814720619468

Anchors original: 
(0.42412935323383083, 0.09495491293532338), (0.040049518201284794, 0.04793729925053533), (0.12121121241202815, 0.02474208253358925), 
(0.21935948581560283, 0.041091810726950354), (0.015625, 0.016347497459349592), (0.21888516435986158, 0.09671009948096886), 
(0.038657583841463415, 0.008815858422256097), (0.125454418344519, 0.07256711409395973), (0.058373810467882634, 0.018722739888977002), 

Anchors rounded to 2 decimal places: 
(0.42, 0.09), (0.04, 0.05), (0.12, 0.02), 
(0.22, 0.04), (0.02, 0.02), (0.22, 0.10), 
(0.04, 0.01), (0.13, 0.07), (0.06, 0.02), 

Anchors rounded to 3 decimal places: 
(0.424, 0.095), (0.040, 0.048), (0.121, 0.025), 
(0.219, 0.041), (0.016, 0.016), (0.219, 0.097), 
(0.039, 0.009), (0.125, 0.073), (0.058, 0.019),

The comparison of

original anchor for general image dataset

(0.28, 0.22), (0.38, 0.48), (0.9,  0.78),
(0.07, 0.15), (0.15, 0.11), (0.14, 0.29),
(0.02, 0.03), (0.04, 0.07), (0.08, 0.06)

sklearn.cluster.KMeans() result

(0.211, 0.098), (0.339, 0.087), (0.495, 0.092),
(0.158, 0.033), (0.232, 0.043), (0.125, 0.082), 
(0.033, 0.017), (0.065, 0.027), (0.107, 0.024),

sklearn.cluster.MiniBatchKMeans() result

(0.329, 0.085), (0.424, 0.096), (0.530, 0.089),
(0.157, 0.031), (0.232, 0.064), (0.164, 0.094),
(0.027, 0.016), (0.056, 0.024), (0.105, 0.029),

Custom k_means() result

(0.125, 0.073), (0.219, 0.097), (0.424, 0.095),
(0.040, 0.048), (0.121, 0.025), (0.219, 0.041),
(0.016, 0.016), (0.039, 0.009), (0.058, 0.019),

training for 1000 epochs with original anchors

max mAP:  0.18192845582962036 (the highest mAP obtained out of 10 tests)
mean mAP: 0.1663009986281395  (the average mAP obtained out of 10 tests)

max training loss: 125.03005981445312
min training loss: 0.6005923748016357

max training loss on average: 19.55863230228424
min training loss on average: 0.8333272246519724

min training accuracy: 2.8318750858306885
max training accuracy: 98.84278869628906

min testing accuracy: 33.172786712646484
max testing accuracy: 70.57997131347656

The figures fot the stats

training for 100 epochs with sklearn.cluster.KMeans() anchor that rounded to 2 decimal places

max training loss on average: 17.887332406044006
min training loss on average: 1.1761843407154082

min training accuracy: 1.1478031873703003
max training accuracy: 96.33079528808594

min testing accuracy: 28.48825454711914
max testing accuracy: 67.01465606689453

max mAP:  0.1628512293100357
mean mAP: 0.1628512293100357 (only test once)

training for 100 epochs with sklearn.cluster.KMeans() anchor that rounded to 3 decimal places

max training loss on average: 18.193040917714438
min training loss on average: 1.2186308292547863

min training accuracy: 4.069056510925293
max training accuracy: 94.63731384277344

min testing accuracy: 28.80947494506836
max testing accuracy: 66.93435668945312

max mAP:  0.17361223697662354
mean mAP: 0.17361223697662354 (only test once)

The YOLO network seems not able to properly learn this task
- Keep improving the anchor settings
  - Plot the comparison between different anchor settings
- Redesign the feture extractor structure
  - Change the detection head network
- Apply certain training strategy for our task, e.g. Weight Initialization:
  - Random Initialization (current method)
  - Xavier Initialization, or Glorot Initialization
  - Kaiming Initialization, or He Initialization
  - LeCun Initialization
  - Ref. Deeplizard Weight Initialization Explained
- Using k-fold cross-validation to ensure that there's no training data selection bias

2023.04.17

The code for handcrafted-from-scratch version of k_means() which consider IoU in its distance metric

The first clustering result using sklearn.cluster.KMeans()

Estimator: KMeans(n_clusters=9, verbose=True)
Number of Clusters: 9
Average IoU: 0.6268763251152744
Inertia: 4.175114625246291
Silhouette Score: 0.4465142389008657
Date and Duration: 2023-04-13 / 0.0951 seconds

Anchors: 
  1: (0.03258875446251471852, 0.01661357100357002681)   5.414155861808978
  2: (0.06474560301507539806, 0.02702967964824120467)   17.50052908129688
  3: (0.10668965880370681609, 0.02383240311710192738)   25.426709570360032
  4: (0.15826612903225806273, 0.03252153592375366803)   51.47057600836014
  5: (0.23229679802955666146, 0.04291102216748768350)   99.68093049682716
  6: (0.12471330275229357276, 0.08154147553516821745)   101.69306725286172
  7: (0.21058315334773208827, 0.09842400107991366998)   207.26436512508812
  8: (0.33944144518272417743, 0.08742992109634553644)   296.77338769155074
  9: (0.49540441176470573215, 0.09187346813725494332)   455.1452143932022

Anchors original: 
(0.03258875446251472, 0.016613571003570027), (0.0647456030150754, 0.027029679648241205), (0.10668965880370682, 0.023832403117101927), 
(0.15826612903225806, 0.03252153592375367), (0.23229679802955666, 0.042911022167487683), (0.12471330275229357, 0.08154147553516822), 
(0.2105831533477321, 0.09842400107991367), (0.3394414451827242, 0.08742992109634554), (0.49540441176470573, 0.09187346813725494), 

Anchors rounded to 2 decimal places: 
(0.03, 0.02), (0.06, 0.03), (0.11, 0.02), 
(0.16, 0.03), (0.23, 0.04), (0.12, 0.08), 
(0.21, 0.10), (0.34, 0.09), (0.50, 0.09), 

Anchors rounded to 3 decimal places: 
(0.033, 0.017), (0.065, 0.027), (0.107, 0.024), 
(0.158, 0.033), (0.232, 0.043), (0.125, 0.082), 
(0.211, 0.098), (0.339, 0.087), (0.495, 0.092),

The second clustering result using

Estimator: MiniBatchKMeans(n_clusters=9, tol=0.0001, verbose=True)
Number of Clusters: 9
Average IoU: 0.6075905487924542
Inertia: 4.375712040766109
Silhouette Score: 0.41462042329969084
Date and Duration: 2023-04-13 / 0.0423 seconds

Anchors: 
  1: (0.02677950180907319802, 0.01550867137489563008)   4.153144931403392
  2: (0.05614595190665907370, 0.02351197887023335348)   13.201024348785062
  3: (0.10527306967984934039, 0.02908427495291902171)   30.61790903706541
  4: (0.15678998161764706731, 0.03086224724264705413)   48.388911778539104
  5: (0.23159116755117511999, 0.06435983699772555855)   149.0516979370658
  6: (0.16395052370452040114, 0.09384044239250277641)   153.85189674914707
  7: (0.32857417864476384795, 0.08490278490759754770)   278.9686281566692
  8: (0.42449951171874988898, 0.09640502929687500000)   409.23887863755215
  9: (0.53048469387755103899, 0.08938137755102043558)   474.1545270850689

Anchors original: 
(0.026779501809073198, 0.01550867137489563), (0.056145951906659074, 0.023511978870233353), (0.10527306967984934, 0.02908427495291902), 
(0.15678998161764707, 0.030862247242647054), (0.23159116755117512, 0.06435983699772556), (0.1639505237045204, 0.09384044239250278), 
(0.32857417864476385, 0.08490278490759755), (0.4244995117187499, 0.096405029296875), (0.530484693877551, 0.08938137755102044), 

Anchors rounded to 2 decimal places: 
(0.03, 0.02), (0.06, 0.02), (0.11, 0.03), 
(0.16, 0.03), (0.23, 0.06), (0.16, 0.09), 
(0.33, 0.08), (0.42, 0.10), (0.53, 0.09), 

Anchors rounded to 3 decimal places: 
(0.027, 0.016), (0.056, 0.024), (0.105, 0.029), 
(0.157, 0.031), (0.232, 0.064), (0.164, 0.094), 
(0.329, 0.085), (0.424, 0.096), (0.530, 0.089),

The original anchor for general image dataset

ANCHORS = [
    [(0.28, 0.22), (0.38, 0.48), (0.9,  0.78)], 
    [(0.07, 0.15), (0.15, 0.11), (0.14, 0.29)], 
    [(0.02, 0.03), (0.04, 0.07), (0.08, 0.06)], 
]  # Note these have been rescaled to be between [0, 1]

2023.04.13
- stackoverflow Custom Python list sorting
```
from functools import cmp_to_key
cmp_key = cmp_to_key(cmp_function)
mylist.sort(key=cmp_key)
```
- get_anchors2.py
  - Finishing the part where I use sklearn.cluster.KMeans() and sklearn.cluster.MiniBatchKMeans() for clustering
  - The custom-designed / handcrafted-from-scratch version of k_means() is also finished, but it hasn't been well-tested yet
- The part of the code
2023.04.10
- Need to recompute / regenerate anchors for YOLO Training YOLO? Select Anchor Boxes Like This
- for YOLOv2 AlexeyAB/darknet/scripts/ gen_anchors.py
  - The anchor boxes were calculated with a k-means clustering algorithm only
  - With 1 - IoU as a distance metric
  - Doing k-means clustering only is a good approach already
- for YOLOv5 / YOLOv7 ultralytics/yolov5/utils/ autoanchor.py
- ultralytics YOLOv5 Docs Train Custom Data
Auto-anchor algorithm
- Step 0. K-means (with simple Euclidean distance) is used to get the initial guess for anchor boxes
  - We also can do it with 1 - IoU as a distance metric
- Step 1. Get bounding box sizes from the train data
- Step 2. Choose a metric to define anchor fitness
  - Ideally, the metric should be connected to the loss function
- Step 3. Do clustering to get an initial guess for anchors
- Step 4. Evolve anchors to improve anchor fitness
Things I'm Googling but haven't finished reading
- Faster RCNN with PyTorch
  - PyTorch Docs TORCHVISION OBJECT DETECTION FINETUNING TUTORIAL
  - PyTorch Docs MODELS AND PRE-TRAINED WEIGHTS
  - PyTorch Source Code fasterrcnn_resnet50_fpn()
  - 知呼 FasterRCNN 解析 pytorch官方FasterRCNN代碼
- Faster RCNN reproduction
  - Kaggle object detection Aquarium Dataset
  - Kaggle Pytorch Starter - FasterRCNN Train
  - github search for faster-r-cnn
- Kmeans implementation
  - scikit-learn Clustering with kmeans
  - scikit-learn Clustering performance evaluation
  - scikit-learn sklearn.cluster.KMeans()
  - Tech-with-Tim Implementing K Means Clustering
  - Sentdex K-Means from Scratch in Python
2023.04.09
- 過去 10 天確診啥也沒做

2023.03.28

I tried to train the model until a point where we're satisfied with its performance, then we can do the edge computing modifications on it
Quick recap:
- The DAROD paper propose a light architecture for the Faster R-CNN object detector on this particular task
- They can reach respectively an mAP@0.5 and mAP@0.3 of 55.83 and 70.68
- So our goal is to at least get a better mAP then they did
The current mAP@50 (for every 100 epochs) and mean loss (for every epoch), for a total of 300 epochs of training:
```
max training loss (on average): 20.516442289352415
min training loss (on average): 1.0732185713450113
```
To further analyze where the problems are, I first extracted some of the data that I think might be helpful

The file tree structure:

D:/Datasets/RADA/RD_JPG/training_logs>tree
D:.
├─mAP
├─test
│  ├─class_accuracy
│  ├─no_object_accuracy
│  └─object_accuracy
└─train
    ├─class_accuracy
    ├─losses
    ├─mean_loss
    ├─no_object_accuracy
    └─object_accuracy

Some other results

train-class-accuracy vs. test-class-accuracy
train-no-object-accuracy vs. test-no-object-accuracy

train-object-accuracy vs. test-object-accuracy

min training accuracy: 2.3661680221557617
max training accuracy: 94.16690826416016

min testing accuracy: 46.69877624511719
max testing accuracy: 72.34597778320312

The layers of the model

layer 0:  torch.Size([20, 32, 416, 416])
layer 1:  torch.Size([20, 64, 208, 208])
layer 2:  torch.Size([20, 64, 208, 208])
layer 3:  torch.Size([20, 128, 104, 104])
layer 4:  torch.Size([20, 128, 104, 104])
layer 5:  torch.Size([20, 256, 52, 52])
layer 6:  torch.Size([20, 256, 52, 52])
layer 7:  torch.Size([20, 512, 26, 26])
layer 8:  torch.Size([20, 512, 26, 26])
layer 9:  torch.Size([20, 1024, 13, 13])
layer 10:  torch.Size([20, 1024, 13, 13])
layer 11:  torch.Size([20, 512, 13, 13])
layer 12:  torch.Size([20, 1024, 13, 13])
layer 13:  torch.Size([20, 1024, 13, 13])
layer 14:  torch.Size([20, 512, 13, 13])
layer 16:  torch.Size([20, 256, 13, 13])
layer 17:  torch.Size([20, 256, 26, 26])
layer 18:  torch.Size([20, 256, 26, 26])
layer 19:  torch.Size([20, 512, 26, 26])
layer 20:  torch.Size([20, 512, 26, 26])
layer 21:  torch.Size([20, 256, 26, 26])
layer 23:  torch.Size([20, 128, 26, 26])
layer 24:  torch.Size([20, 128, 52, 52])
layer 25:  torch.Size([20, 128, 52, 52])
layer 26:  torch.Size([20, 256, 52, 52])
layer 27:  torch.Size([20, 256, 52, 52])
layer 28:  torch.Size([20, 128, 52, 52])

config = [
    (32, 3, 1),   # (32, 3, 1) is the CBL, CBL = Conv + BN + LeakyReLU
    (64, 3, 2),
    ["B", 1],     # (64, 3, 2) + ["B", 1] is the Res1, Res1 = ZeroPadding + CBL + (CBL + CBL + Add)*1
    (128, 3, 2),
    ["B", 2],     # (128, 3, 2) + ["B", 2] is th Res2, Res2 = ZeroPadding + CBL + (CBL + CBL + Add)*2
    (256, 3, 2),
    ["B", 8],     # (256, 3, 2) + ["B", 8] is th Res8, Res8 = ZeroPadding + CBL + (CBL + CBL + Add)*8
    (512, 3, 2),
    ["B", 8],     # (512, 3, 2) + ["B", 8] is th Res8, Res8 = ZeroPadding + CBL + (CBL + CBL + Add)*8
    (1024, 3, 2),
    ["B", 4],     # (1024, 3, 2) + ["B", 4] is th Res4, Res4 = ZeroPadding + CBL + (CBL + CBL + Add)*4
    # to this point is Darknet-53 which has 52 layers
    (512, 1, 1),  # 
    (1024, 3, 1), #
    "S",
    (256, 1, 1),
    "U",
    (256, 1, 1),
    (512, 3, 1),
    "S",
    (128, 1, 1),
    "U",
    (128, 1, 1),
    (256, 3, 1),
    "S",
]

2023.03.19
- The actual size of each input image is:
  - 875-by-1489 or 310-by-1240
- The resizing results are completely different. We could even conclude that they are wrong (and I don't know why), since we might not need to resize images anymore. Currently, I am just ignoring this issue
- Some samples of person, cyclist and car:
- I first tried to run train.py for 100 epochs with the following config settings:
- The resulted mAP is 0.182485
  - The code for extracting the data from the log files read_logs.py
2023.03.16
- It's finally trainable now
- The major mistakes that I made were: Misinterpreting the labels, but actually translating them correctly.
  - In short, simply switching the x and y coordinates will solve our problems
  - This makes me wonder, How did I get it right when replicating YOLO-CFAR before?
  - Since the shape of the feature map is printed as torch.Size([256, 64, 3]), it shows the same coordinate system as the RD map where the origin (0, 0) is located at the top left corner
  - But it turns out that's not the case. The model still recognizes the bottom left corner as the origin, which is the same as we usually do.
- The correct way to translate the labels
2023.03.15
- Still not actually trainable
```
ValueError: Expected x_min for bbox (-0.103515625, 0.306640625, 0.224609375, 0.365234375, 2.0) to be in the range [0.0, 1.0], got -0.103515625.
```
  - The issue stems from my erroneous translation of the labels
  - The way we figured this out is by feeding the model with correct but actually wrong answers, so that we can distinguish whether the issue lies in the content of the label or my code implementation
- What I mean by wrong labels is that I use the previously well-tested synthetic radar dataset labels for training
- It is trainable with correct but actually wrong labels
- When testing PASCAL_VOC dataset, I actually used padding for the input images, but I forgot that padding existed. So we can now confirm that my code can only take square inputs
- Remove useless transforms of YOLOv3-VOC
  - we need LongestMaxSize() and PadIfNeeded() to avoid RuntimeError: Trying to resize storage that is not resizable
  - we need Normalize() to avoid RuntimeError: Input type (torch.cuda.ByteTensor) and weight type (torch.cuda.HalfTensor) should be the same
  - we need ToTensorV2() to avoid RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[10, 416, 416, 3] to have 3 channels, but got 416 channels instead
2023.03.14
- Ref. Albumentations Documentation Full API Reference
  - testing different border modes
  - comparison of the 4 different modes:
  - cv2.BORDER_CONSTANT, cv2.BORDER_REFLECT, cv2.BORDER_DEFAULT, cv2.BORDER_REPLICATE with the value of 0, 2, 4 and 1, respectively
- Remove useless transforms of YOLOv3-VOC
  - we need LongestMaxSize() and PadIfNeeded() to avoid RuntimeError: Trying to resize storage that is not resizable
  - we need Normalize() to avoid RuntimeError: Input type (torch.cuda.ByteTensor) and weight type (torch.cuda.HalfTensor) should be the same
  - we need ToTensorV2() to avoid RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[10, 416, 416, 3] to have 3 channels, but got 416 channels instead
- The execution result and the error messages of the same code are different when using my PC compared to the lab PC, which is weird and annoying.
2023.03.10
- Still untrainable
  - First, I prepare 3 types of square sizes of images, 64-by-64, 256-by-256, and 416-by-416, respectively.
  - The way I tested it is by simply changing the input images to the previously successful version, without changing anything else, and seeing how it goes.
  - Even though I resized all the images to a square size, the exact same error persists. Specifically:
```
RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[16, 64, 64, 3] to have 3 channels, but got 64 channels instead
```
```
RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[16, 256, 256, 3] to have 3 channels, but got 256 channels instead
```
```
RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[16, 416, 416, 3] to have 3 channels, but got 416 channels instead
```
- It still doesn't work, but every piece of code is the same, so I speculate that maybe it's because the images are not actually encoded in the 'JPEG' format.
- So I re-read the dataset, stored the .mat files out, and converted the .mat files into scaled color and grayscale.
  - Plotting 7193 frames of the CARRADA Dataset in scaled color using MATLAB link
- Then I used the scaled color images to train, still getting errors, but at least now we have a different error message.
```
ValueError: Expected x_min for bbox (-0.103515625, 0.306640625, 0.224609375, 0.365234375, 2.0) to be in the range [0.0, 1.0], got -0.103515625.
```
2023.03.09
- The function for converting .mat files to .jpg images
2023.03.04
- New breach, image file format may be the issue
- Regenerate all data in .jpg

2023.02.21

Modified from YOLO-CFAR

(pt3.8) D:\Datasets\YOLOv3-PyTorch\YOLOv3-debug1>D:/ProgramData/Anaconda3/envs/pt3.8/python.exe d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug1/train.py
  0%|                                                                                                                                            | 0/375 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug1/train.py", line 166, in <module>
    main()
  File "d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug1/train.py", line 107, in main    
    train_fn(train_loader, model, optimizer, loss_fn, scaler, scaled_anchors)    
  File "d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug1/train.py", line 57, in train_fn
    out = model(x)
  File "D:\ProgramData\Anaconda3\envs\pt3.8\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\Datasets\YOLOv3-PyTorch\YOLOv3-debug1\model.py", line 191, in forward
    x = layer(x) #
  File "D:\ProgramData\Anaconda3\envs\pt3.8\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\Datasets\YOLOv3-PyTorch\YOLOv3-debug1\model.py", line 110, in forward
    return self.leaky(self.bn(self.conv(x))) # bn_act()
  File "D:\ProgramData\Anaconda3\envs\pt3.8\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\ProgramData\Anaconda3\envs\pt3.8\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "D:\ProgramData\Anaconda3\envs\pt3.8\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[16, 256, 64, 3] to have 3 channels, but got 256 channels instead

Modified from YOLO-Pascal_VOC

(pt3.8) D:\Datasets\YOLOv3-PyTorch\YOLOv3-debug2>D:/ProgramData/Anaconda3/envs/pt3.8/python.exe d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug2/train.py
  0%|                                                                                                                           | 0/5999 [00:00<?, ?it/s]
x:  torch.Size([1, 3, 256, 64])
y0: torch.Size([1, 3, 2, 2, 6])
y1: torch.Size([1, 3, 2, 2, 6])
y2: torch.Size([1, 3, 2, 2, 6])
  0%|                                                                                                                           | 0/5999 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug2/train.py", line 144, in <module>
    main()
  File "d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug2/train.py", line 91, in main
    train_fn(train_loader, model, optimizer, loss_fn, scaler, scaled_anchors)
  File "d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug2/train.py", line 47, in train_fn
    loss_fn(out[0], y0, scaled_anchors[0])
  File "D:\ProgramData\Anaconda3\envs\pt3.8\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\Datasets\YOLOv3-PyTorch\YOLOv3-debug2\loss.py", line 83, in forward
    no_object_loss = self.bce((predictions[..., 0:1][noobj]), (target[..., 0:1][noobj]),)
IndexError: The shape of the mask [1, 3, 2, 2] at index 2 does not match the shape of the indexed tensor [1, 3, 8, 2, 1] at index 2

2023.02.20
- We now have the model trained on Pascal_VOC dataset with the following result
- The model was evaluated with confidence 0.6 and IOU threshold 0.5 using NMS
  
  Model mAP_50
  
  YOLOv3-Pascal_VOC 75.7776
  - The overlapped area means
  - IoU threshold value to the actual overlapped area

2023.02.18

The virtual envs are summarized below:

My PC (Intel i7-8700 + Nvidia Geforce RTX 2060):

env pt3.7 with CUDA

python==3.7.13
numpy==1.19.2
pytorch==1.7.1
torchaudio==0.7.2
torchvision==0.8.2
pandas==1.2.1
pillow==8.1.0 
tqdm==4.56.0
matplotlib==3.3.4
albumentations==0.5.2

Lab PC (Intel i7-12700 + Nvidia Geforce RTX 3060 Ti):

env pt3.7 without CUDA

python==3.7.13
numpy==1.21.6
torch==1.13.1
torchvision==0.14.1
pandas==1.3.5
pillow==9.4.0
tqdm==4.64.1
matplotlib==3.5.3
albumentations==1.3.0

env pt3.8 with CUDA

python==3.8.16
numpy==1.23.5
pytorch==1.13.1
pytorch-cuda==11.7
torchaudio==0.13.1             
torchvision==0.14.1
pandas==1.5.2
pillow==9.3.0
tqdm==4.64.1
matplotlib==3.6.2
albumentations==1.3.0

An annoying bug in dataset.py due to pytorch version

The code segment that contains potential bug (on line 149 and 155)
scale_idx = anchor_idx // self.num_anchors_per_scale works fine on my PC, but on lab PC will get the following warning, so I naturally followed the suggestions and changed the syntax to (torch.div())
```
UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch.
```
After following the suggestion and chage the deprecated usage // we have: scale_idx = torch.div(anchor_idx, self.num_anchors_per_scale, rounding_mode='floor'). This piece of code works fine on lab PC, under both env pt3.7 and pt3.8, but failed on my PC.

The error only occur on my PC, under env pt3.7, but this env is the initial and stable one.

Original Traceback (most recent call last):
  File "C:\Users\paulc\.conda\envs\pt3.7\lib\site-packages\torch\utils\data\_utils\worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "C:\Users\paulc\.conda\envs\pt3.7\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\paulc\.conda\envs\pt3.7\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "d:\Datasets\YOLOv3-PyTorch\dataset.py", line 153, in __getitem__
    scale_idx = torch.div(anchor_idx, self.num_anchors_per_scale, rounding_mode='floor')
TypeError: div() got an unexpected keyword argument 'rounding_mode'

2023.02.10
- Trying newer stable PyTorch and CUDA version for the project
- Python 3.8 + CUDA 11.7
  - conda create --name pt3.8 python=3.8
  - conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia (Install PyTorch)
- Interesting to know!
  - 如果透過系統管理員開啟 Anaconda Prompt 並安裝的環境會存在 D 槽的 D:/ProgramData/Anaconda3/envs/
  - 反之，直接開 Anaconda Prompt 安裝的環境會存在 C 槽的 C:/Users/Paul/.conda/envs/
  - 以後記得都用系統管理員執行!
- The new dependency is:
```
numpy==1.23.5
matplotlib==3.6.2
pytorch==1.13.1
pytorch-cuda==11.7
torchaudio==0.13.1             
torchvision==0.14.1
tqdm==4.64.1
albumentations==1.3.0
pandas==1.5.2
pillow==9.3.0
```
2023.02.08
- The YOLOv3 model is trainable with Pascal_VOC dataset
  - But it's bind with Albumentations / data augmentations, which means we need to decoupling it
- To our knowledge, we know that pre-training is good for our task, least that's what the paper says, so I was trying to solve this issue
  - C. Decourt, R. VanRullen, D. Salle and T. Oberlin, "DAROD: A Deep Automotive Radar Object Detector on Range-Doppler maps," 2022 IEEE Intelligent Vehicles Symposium (IV), Aachen, Germany, 2022, pp. 112-118.
- Originally, I want to convert the pre-trained weights from darknet_format to pytorch_format, it does not work
- Add two additional functions load_CNN_weights() and load_darknet_weights() in model.py to read the darknet weights
  - fun fact, there are in total 62001757 parameters of YOLOv3
At least, in the future, we can separate our training process if needed
- we can "save checkpoint" for every epoch or every 10, 20 epochs
- but the correctness of doing so is unsure, what I mean unsure is that say we already train for 100 epochs and achieve centain level of preformance, but if we stop and continue the training for another 100 epochs, the performance may drop
- remember to test it with seed_everything() and make sure it works
Need to find a newer dependency
- Currently run without CUDA support since there will be PyTorch 2.0 updates soon
- Deprecation of CUDA 11.6 and Python 3.7 Support
  - Please note that as of Feb 1, CUDA 11.6 and Python 3.7 are no longer included in the Stable CUDA
- There is a new paper that says their model can learn spatial and temporal relationships between frames by leveraging the characteristics of the FMCW radar signal
  - Decourt, Colin, et al. "A recurrent CNN for online object detection on raw radar frames." arXiv preprint arXiv:2212.11172 (2022).
- The comparison between different generations showed that, though newer versions of the model may be more complex, they are not necessarily bigger
  - YOLOv3 222 layers, 62001757 parameters
  - YOLOv4 488 layers, 64363101 parameters
    - YOLOv4-CSP 516 layers, 52921437 parameters
  - YOLOv7 314 layers, 36907898 parameters
Future works
- Make sure we can properly run train.py with radar dataset
- Find a proper way to measure the "communication overhead"
- Test the functionality of seed_everything(), check if it works like the way we think
- Find a newer stable PyTorch and CUDA version for the project
2023.02.07
- The code detect.py and model_with_weights2.py works fine, but the result may not be the way as we expected
- Need to figure out the usability of the converted weights, since there is a huge difference between random weights and the converted weights, maybe it's not complete garbage
2023.02.06
- On lab PC, create a new env pt3.7 through command conda create --name pt3.7 python=3.7
  - to use the env conda activate pt3.7
  - to leave the env conda deactivate
  - actual env and pkgs locates at C:\Users\Paul\.conda\envs\pt3.7, don't know why it is not been stored in D Drive
- Upgrade default conda env base through command conda update -n base -c defaults conda
  - It has to be done under (base) C:\Windows\system32>
- Install all the packages through pip install -r requirements.txt
  - content in the requirements file
```
numpy>=1.19.2
matplotlib>=3.3.4
torch>=1.7.1
tqdm>=4.56.0
torchvision>=0.8.2
albumentations>=0.5.2
pandas>=1.2.1
Pillow>=8.1.0
```
  - cmd output stored as D:/Datasets/YOLOv3-PyTorch/logs/installation_logs_0206.txt
  - actual dependency, the new requirement is:
```
numpy==1.21.6
matplotlib==3.5.3
torch==1.13.1
tqdm==4.64.1
torchvision==0.14.1
albumentations==1.3.0
pandas==1.3.5
Pillow==9.4.0
```
- Currently run without CUDA support since there will be PyTorch 2.0 updates soon
  - Deprecation of CUDA 11.6 and Python 3.7 Support
  - Please note that as of Feb 1, CUDA 11.6 and Python 3.7 are no longer included
- Run model_with_weights2.py again on lab PC to generate the weights in PyTorch format
  - we name the output weights as checkpoint-2023-02-06.pth.tar also stored in the same directory
- Wanted to test the training ability using PASCAL_VOC dataset
  - download the preprocessed PASCAL_VOC dataset from kaggle
  - download the preprocessed MS-COCO dataset from kaggle
- But first, we have to test the converted weights to check if they actually work
  - to do so, maybe we could write a program detect.py and test the weights with some inference samples
  - if it can predict perfectly, then we may assume it is converted correctly
  - Okay, it does not work..., the inference outputs are bunch of random tags
2023.02.05
- first download the YOLOv3 weights from https://pjreddie.com/media/files/yolov3.weights as yolov3.weights and put it at the same directory
- then run model_with_weights2.py, it will save the weights to PyTorch format. we name the output weights as checkpoint-2023-02-05.pth.tar also in the same directory
- inside the directory
- I override most of the files with my previous ones, except for model_with_weights2.py

Reference

The implementation is based on the following paper
- Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
```
  @article{yolov3,
  title={YOLOv3: An Incremental Improvement},
  author={Redmon, Joseph and Farhadi, Ali},
  journal = {arXiv},
  year={2018}
}
```
The original code was copied from YOLOv3-PyTorch and for more details please read their Medium post YOLOv3 — Implementation with Training setup from Scratch

Name		Name	Last commit message	Last commit date
Latest commit History 325 Commits
YOLOv3-CARRADA		YOLOv3-CARRADA
YOLOv3-CFAR		YOLOv3-CFAR
YOLOv3-Model-Summary		YOLOv3-Model-Summary
YOLOv3-VOC		YOLOv3-VOC
YOLOv3-stable		YOLOv3-stable
scripts		scripts
README.md		README.md
config.py		config.py
dataset.py		dataset.py
detect.py		detect.py
evaluate.py		evaluate.py
loss.py		loss.py
model.py		model.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YOLOv3-PyTorch

Notes

Reference

About

Releases

Packages

Languages

paulchen2713/YOLOv3-PyTorch

Folders and files

Latest commit

History

Repository files navigation

YOLOv3-PyTorch

Notes

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages