- This
iswas the research topic for my master's thesis. You are welcome to play with the code,but please don't hijack my research. I'll die. - A guide on 'how to use this code'
- First, download the
CARRADA Dataset
, thePascal_VOC Dataset
from kaggle, or theCFAR Dataset
and structure the folders as shown in the file tree below - Second, set up a virtual environment using Anaconda, e.g.
conda create --name pt3.7 python=3.7
conda create --name pt3.8 python=3.8
- Before installing any packages, remember to enter your conda virtual environment, e.g.
conda activate pt3.7
conda activate pt3.8
- Third, you can manually install all the packages that you need, or you can install with
pip install -r requirements.txt
- Then, copy the code to anywhere you like, and make sure you have changed the file path in
config.py
before running the code- Just click the 'run' button and see the results
- Caveats:
- There are a bunch of dead code, commented code, and outdated comments in my program
- I use
albumentations
library solely for the purpose of padding
- Dataset file tree
D:\Datasets\RADA\RD_JPG>tree D:. ├─checks ├─images ├─imagesc ├─imwrite ├─labels ├─mats └─training_logs ├─mAP ├─test │ ├─class_accuracy │ ├─no_object_accuracy │ └─object_accuracy └─train ├─class_accuracy ├─losses ├─mean_loss ├─no_object_accuracy └─object_accuracy
- Stable dependency
- for python 3.7
python==3.7.13 numpy==1.19.2 pytorch==1.7.1 torchaudio==0.7.2 torchvision==0.8.2 pandas==1.2.1 pillow==8.1.0 tqdm==4.56.0 albumentations==0.5.2 matplotlib==3.3.4
- for python 3.8
python==3.8.16 numpy==1.23.5 pytorch==1.13.1 pytorch-cuda==11.7 torchaudio==0.13.1 torchvision==0.14.1 pandas==1.5.2 pillow==9.3.0 tqdm==4.64.1 albumentations==1.3.0 matplotlib==3.6.2
- It's well tested, and the code can be properly executed under these settings
- for python 3.7
- First, download the
- 2023.05.09
- The training duration is
22.8276 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 14e-5
- Under
300
epochs - Try to observe the difference between training directly for
200
epochs versus the combination of separate training for100
epochs each
-------------------------------------------------- The stats of 2023-05-09-1 training: -------------------------------------------------- max mAP: 0.4652663767337799 mean mAP: 0.41992816428343455 max training loss: 120.43387603759766 min training loss: 0.7253801822662354 max training loss on average: 15.41774600982666 min training loss on average: 0.9773107906182606 min training accuracy: 2.3786652088165283 max training accuracy: 97.95069122314453 min testing accuracy: 35.47798538208008 max testing accuracy: 72.39334106445312 --------------------------------------------------
- Under
- The training duration is
- 2023.05.08
- The training duration is
14.2820 hours
with higherWEIGHT_DECAY = 1e-3
andLEARNING_RATE = 15e-5
- Under
200
epochs and higher weight decay setting - Try to see how the previous best learning rate goes with higher weight decay
-------------------------------------------------- The stats of 2023-05-08-1 training: -------------------------------------------------- max mAP: 0.42102280259132385 mean mAP: 0.37317809015512465 max training loss: 174.62034606933594 min training loss: 1.0440865755081177 max training loss on average: 17.338274812698366 min training loss on average: 1.3041423439979554 min training accuracy: 0.4254150986671448 max training accuracy: 92.8823013305664 min testing accuracy: 34.81633758544922 max testing accuracy: 69.26762390136719 --------------------------------------------------
- Under
- The training duration is
- Using
torchinfo.summary()
to get the result- The second way to get model summary in PyTorch besides
torchsummary.summary()
- sample code
import torchsummary # torchsummary.summary() from torchinfo import summary # torchinfo.summary() # simple test settings IMAGE_SIZE = 416 # multiples of 32 are workable with stride [32, 16, 8] num_classes = 3 # batch_size = 20 # num_examples num_channels = 3 # num_anchors model = YOLOv3(num_classes=num_classes) # initialize a YOLOv3 model as model # simple test with random inputs of 20 examples, 3 channels, and IMAGE_SIZE-by-IMAGE_SIZE input x = torch.randn((batch_size, num_channels, IMAGE_SIZE, IMAGE_SIZE)) out = model(x) # print out the model summary using torchinfo.summary() summary(model.cuda(), input_size=(batch_size, num_channels, IMAGE_SIZE, IMAGE_SIZE))
- model parameter summary
==================================================================================================== Total params: 61,534,648 Trainable params: 61,534,648 Non-trainable params: 0 Total mult-adds (G): 653.05 ==================================================================================================== Input size (MB): 41.53 Forward/backward pass size (MB): 12265.99 Params size (MB): 246.14 Estimated Total Size (MB): 12553.66 ====================================================================================================
- The second way to get model summary in PyTorch besides
- 2023.05.07
- The training duration is
8.3639 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 14e-5
- Still
100
epochs - Continued the training with the weights of
checkpoint-2023-05-02-2.pth.tar
with same weight decay and learning rate
-------------------------------------------------- The stats of 2023-05-07-1 training: -------------------------------------------------- max mAP: 0.44356226921081543 mean mAP: 0.42320117354393005 max training loss: 4.268791675567627 min training loss: 0.8072612285614014 max training loss on average: 2.3641905311743416 min training loss on average: 1.0348780262470245 min training accuracy: 68.03897857666016 max training accuracy: 96.88028717041016 min testing accuracy: 59.59388732910156 max testing accuracy: 67.21424102783203 --------------------------------------------------
- Still
- The training duration is
- 2023.05.06
- The training duration is
8.7493 hours
with higher weight decay ofWEIGHT_DECAY = 1e-3
andLEARNING_RATE = 14e-5
- Switching back to
100
epochs - Continued the training with the weights of
checkpoint-2023-05-02-2.pth.tar
(withWEIGHT_DECAY = 1e-4
)
-------------------------------------------------- The stats of 2023-05-06-1 training: -------------------------------------------------- max mAP: 0.4469827115535736 mean mAP: 0.41541612446308135 max training loss: 7.434675216674805 min training loss: 0.8201318383216858 max training loss on average: 5.396891689300537 min training loss on average: 1.0210446101427078 min training accuracy: 65.16170501708984 max training accuracy: 96.99007415771484 min testing accuracy: 49.76043701171875 max testing accuracy: 65.93656921386719 --------------------------------------------------
- Switching back to
- The training duration is
- 2023.05.05
- The training duration is
26.4082 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 15e-5
-------------------------------------------------- The stats of 2023-05-05-1 training: -------------------------------------------------- max mAP: 0.4396911561489105 mean mAP: 0.4000327423214912 max training loss: 171.85177612304688 min training loss: 0.6924741864204407 max training loss on average: 15.746856501897176 min training loss on average: 0.9486591788132985 min training accuracy: 3.29353666305542 max training accuracy: 97.90494537353516 min testing accuracy: 34.040611267089844 max testing accuracy: 74.72051239013672 --------------------------------------------------
- The training duration is
- 2023.05.04
- The training duration is
7.4228 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 19e-5
-------------------------------------------------- The stats of 2023-05-04-1 training: -------------------------------------------------- max mAP: 0.44310665130615234 mean mAP: 0.38241996318101884 max training loss: 124.89086151123047 min training loss: 1.0636780261993408 max training loss on average: 16.811252358754476 min training loss on average: 1.2993631919225057 min training accuracy: 2.063034772872925 max training accuracy: 92.96463775634766 min testing accuracy: 31.75906753540039 max testing accuracy: 70.13461303710938 --------------------------------------------------
- The training duration is
8.4733 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 20e-5
-------------------------------------------------- The stats of 2023-05-04-2 training: -------------------------------------------------- max mAP: 0.4227812588214874 mean mAP: 0.34420192539691924 max training loss: 138.41775512695312 min training loss: 1.0614862442016602 max training loss on average: 15.212103751500448 min training loss on average: 1.326857070128123 min training accuracy: 5.338273525238037 max training accuracy: 93.10186767578125 min testing accuracy: 31.074607849121094 max testing accuracy: 69.54141235351562 --------------------------------------------------
- The training duration is
- 2023.05.03
- The training duration is
7.1341 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 17e-5
-------------------------------------------------- The stats of 2023-05-03-1 training: -------------------------------------------------- max mAP: 0.4469388425350189 mean mAP: 0.3841908037662506 max training loss: 113.7841567993164 min training loss: 0.963789165019989 max training loss on average: 15.0015398200353 min training loss on average: 1.2312769017616907 min training accuracy: 7.767257213592529 max training accuracy: 94.10365295410156 min testing accuracy: 33.51585388183594 max testing accuracy: 69.63267517089844 --------------------------------------------------
- The training duration is
7.1676 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 18e-5
-------------------------------------------------- The stats of 2023-05-03-2 training: -------------------------------------------------- max mAP: 0.44689252972602844 mean mAP: 0.3790498897433281 max training loss: 175.32229614257812 min training loss: 1.0493061542510986 max training loss on average: 17.080741675694785 min training loss on average: 1.291623563369115 min training accuracy: 7.433328628540039 max training accuracy: 93.3305892944336 min testing accuracy: 34.49692153930664 max testing accuracy: 70.79625701904297 --------------------------------------------------
- The training duration is
- 2023.05.02
- The training duration is
6.8200 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 13e-5
-------------------------------------------------- The stats of 2023-05-02-1 training: -------------------------------------------------- max mAP: 0.4374929666519165 mean mAP: 0.3631990686058998 max training loss: 134.36065673828125 min training loss: 0.9601410627365112 max training loss on average: 18.045101165771484 min training loss on average: 1.2157120569547017 min training accuracy: 2.877269983291626 max training accuracy: 94.9224624633789 min testing accuracy: 42.20853042602539 max testing accuracy: 70.84188842773438 --------------------------------------------------
- The training duration is
5.8219 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 14e-5
-------------------------------------------------- The stats of 2023-05-02-2 training: -------------------------------------------------- max mAP: 0.452169269323349 mean mAP: 0.3887074992060661 max training loss: 135.23342895507812 min training loss: 0.9823306798934937 max training loss on average: 16.633436683019003 min training loss on average: 1.268118454615275 min training accuracy: 3.00077748298645 max training accuracy: 94.62512969970703 min testing accuracy: 30.800823211669922 max testing accuracy: 73.10061645507812 --------------------------------------------------
- The training duration is
5.5819 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 15e-5
-------------------------------------------------- The stats of 2023-05-02-3 training: -------------------------------------------------- max mAP: 0.45209288597106934 mean mAP: 0.3701558232307434 max training loss: 217.28318786621094 min training loss: 1.000074863433838 max training loss on average: 16.713819392522176 min training loss on average: 1.2200019482771556 min training accuracy: 5.814006328582764 max training accuracy: 94.16769409179688 min testing accuracy: 41.84348678588867 max testing accuracy: 69.8380126953125 --------------------------------------------------
- The training duration is
8.4758 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 16e-5
-------------------------------------------------- The stats of 2023-05-02-4 training: -------------------------------------------------- max mAP: 0.43429771065711975 mean mAP: 0.3629924669861794 max training loss: 178.13705444335938 min training loss: 0.9736015796661377 max training loss on average: 16.70783141930898 min training loss on average: 1.2728607519467672 min training accuracy: 6.477288246154785 max training accuracy: 93.60047912597656 min testing accuracy: 21.12708282470703 max testing accuracy: 72.98653411865234 --------------------------------------------------
- The training duration is
- The comparison between different
LEARNING_RATE
under the sameWEIGHT_DECAY = 1e-4
- 2023.05.01
- The training duration is
5.7350 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 10e-5
-------------------------------------------------- The stats of 2023-05-01-1 training: -------------------------------------------------- max mAP: 0.43861937522888184 mean mAP: 0.3436750084161758 max training loss: 140.25660705566406 min training loss: 0.8655197024345398 max training loss on average: 18.507166700363157 min training loss on average: 1.1550286275148391 min training accuracy: 4.963176250457764 max training accuracy: 94.96363067626953 min testing accuracy: 36.68720245361328 max testing accuracy: 68.37782287597656 --------------------------------------------------
- The training duration is
7.0366 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 11e-5
-------------------------------------------------- The stats of 2023-05-01-2 training: -------------------------------------------------- max mAP: 0.449009507894516 mean mAP: 0.38735678046941757 max training loss: 102.38961791992188 min training loss: 0.9561270475387573 max training loss on average: 17.273788038889567 min training loss on average: 1.2045013213157654 min training accuracy: 3.2843875885009766 max training accuracy: 96.39083099365234 min testing accuracy: 35.68332290649414 max testing accuracy: 73.03217315673828 --------------------------------------------------
- The training duration is
7.1689 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 12e-5
-------------------------------------------------- The stats of 2023-05-01-3 training: -------------------------------------------------- max mAP: 0.4372125566005707 mean mAP: 0.38275537043809893 max training loss: 125.8486099243164 min training loss: 0.9757415056228638 max training loss on average: 17.398162371317547 min training loss on average: 1.2320519105593364 min training accuracy: 1.1024198532104492 max training accuracy: 94.62055206298828 min testing accuracy: 34.86196517944336 max testing accuracy: 73.3515853881836 --------------------------------------------------
- The training duration is
- 2023.04.30
- The training duration is
7.0542 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 6e-5
-------------------------------------------------- The stats of 2023-04-29-1 training: -------------------------------------------------- max mAP: 0.4267594814300537 mean mAP: 0.3732090950012207 max training loss: 70.09312438964844 min training loss: 0.9483757019042969 max training loss on average: 20.225014870961505 min training loss on average: 1.1955717974901199 min training accuracy: 0.8279584646224976 max training accuracy: 95.35245513916016 min testing accuracy: 32.3294563293457 max testing accuracy: 73.48847961425781 --------------------------------------------------
- The training duration is
7.1015 hours
andWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 7e-5
-------------------------------------------------- The stats of 2023-04-29-2 training: -------------------------------------------------- max mAP: 0.4309697151184082 mean mAP: 0.37477160841226576 max training loss: 105.76203155517578 min training loss: 0.8929504752159119 max training loss on average: 20.704750878016153 min training loss on average: 1.1069866104920705 min training accuracy: 4.180961608886719 max training accuracy: 96.9443359375 min testing accuracy: 37.37166213989258 max testing accuracy: 77.36710357666016 --------------------------------------------------
- The training duration is
6.7780 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 8e-5
-------------------------------------------------- The stats of 2023-04-30-1 training: -------------------------------------------------- max mAP: 0.4340965747833252 mean mAP: 0.36167612075805666 max training loss: 104.89147186279297 min training loss: 0.9307739734649658 max training loss on average: 19.40190040588379 min training loss on average: 1.1852473825216294 min training accuracy: 1.6238964796066284 max training accuracy: 95.42564392089844 min testing accuracy: 30.458589553833008 max testing accuracy: 71.52635192871094 --------------------------------------------------
- The training duration is
5.5800 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 9e-5
-------------------------------------------------- The stats of 2023-04-30-2 training: -------------------------------------------------- max mAP: 0.43561217188835144 mean mAP: 0.3712215393781662 max training loss: 125.58506774902344 min training loss: 0.9454944729804993 max training loss on average: 18.3668585618337 min training loss on average: 1.1890920907258988 min training accuracy: 2.2048397064208984 max training accuracy: 96.52349090576172 min testing accuracy: 30.778005599975586 max testing accuracy: 72.59867858886719 --------------------------------------------------
- The training duration is
- How to get model summary in PyTorch?
- Using
torchsummary
to get the resultfrom torchsummary import summary # simple test settings num_classes = 3 # num_examples = 20 # batch size num_channels = 3 # num_anchors model = YOLOv3(num_classes=num_classes) # initialize a YOLOv3 model as model # simple test with random inputs of 20 examples, 3 channels, and IMAGE_SIZE-by-IMAGE_SIZE input x = torch.randn((num_examples, num_channels, IMAGE_SIZE, IMAGE_SIZE)) out = model(x) # print out the model summary using third-party library called 'torchsummary' summary(model.cuda(), (3, 416, 416), bs=16)
- model parameter summary
================================================================ Total params: 61,534,504 Trainable params: 61,534,504 Non-trainable params: 0 ---------------------------------------------------------------- Input size (MB): 31.69 Forward/backward pass size (MB): 13175.06 Params size (MB): 234.74 Estimated Total Size (MB): 13441.48 ----------------------------------------------------------------
- model parameter summary
- Reference
- stackoverflow How do I print the model summary in
PyTorch
? - PyTorch Doc Is there similar pytorch function as
model.summary()
askeras
?
- stackoverflow How do I print the model summary in
- Using
- 2023.04.29
- The comparison between different
WEIGHT_DECAY
under the sameLEARNING_RATE = 3e-5
- The
loss
value for every updates - The
train-object-accuracy
for every epochs - The
test-object-accuracy
for every10
epochs - The
mAP
for every10
epochs2023-04-27, epoch: 100, duration: 7.1676 hours, WEIGHT_DECAY = 1e-1, LEARNING_RATE = 3e-5, max mAP: 0.3289 2023-04-26, epoch: 100, duration: 7.7900 hours, WEIGHT_DECAY = 1e-2, LEARNING_RATE = 3e-5, max mAP: 0.3646 2023-04-25, epoch: 100, duration: 6.2753 hours, WEIGHT_DECAY = 1e-3, LEARNING_RATE = 3e-5, max mAP: 0.3603 2023-04-22, epoch: 100, duration: 7.2117 hours, WEIGHT_DECAY = 1e-4, LEARNING_RATE = 3e-5, max mAP: 0.3792
- The
- The comparison between different
- 2023.04.28
- The training duration is
7.5511 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 1e-5
-------------------------------------------------- The stats of 2023-04-28 training: -------------------------------------------------- max mAP: 0.2697495222091675 mean mAP: 0.18186791352927684 max training loss: 92.10067749023438 min training loss: 1.1181566715240479 max training loss on average: 32.7851714070638 min training loss on average: 1.3748071026802062 min training accuracy: 2.0996294021606445 max training accuracy: 92.99666595458984 min testing accuracy: 19.9406795501709 max testing accuracy: 64.90988159179688 --------------------------------------------------
- The training duration is
7.2838 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 2e-5
-------------------------------------------------- The stats of 2023-04-27-2 training: -------------------------------------------------- max mAP: 0.3233576714992523 mean mAP: 0.23364422097802162 max training loss: 67.91127014160156 min training loss: 0.9422303438186646 max training loss on average: 27.790054613749188 min training loss on average: 1.224434497753779 min training accuracy: 0.37967154383659363 max training accuracy: 95.5811767578125 min testing accuracy: 22.19940757751465 max testing accuracy: 69.38169860839844 --------------------------------------------------
- The training duration is
7.2117 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 3e-5
-------------------------------------------------- The stats of 2023-04-22 training: -------------------------------------------------- max mAP: 0.37920689582824707 mean mAP: 0.3020245939493179 max training loss: 72.82600402832031 min training loss: 0.8917444944381714 max training loss on average: 25.31787603378296 min training loss on average: 1.1737037108341852 min training accuracy: 0.5489227175712585 max training accuracy: 96.67901611328125 min testing accuracy: 28.838693618774414 max testing accuracy: 70.72781372070312 --------------------------------------------------
- The training duration is
8.1383 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 4e-5
-------------------------------------------------- The stats of 2023-04-28-2 training: -------------------------------------------------- max mAP: 0.3963651657104492 mean mAP: 0.3341544926166534 max training loss: 67.77149963378906 min training loss: 0.9209076166152954 max training loss on average: 22.19623363494873 min training loss on average: 1.146754193107287 min training accuracy: 0.7410457134246826 max training accuracy: 96.36795806884766 min testing accuracy: 33.926536560058594 max testing accuracy: 70.9787826538086 --------------------------------------------------
- The training duration is
7.1785 hours
withWEIGHT_DECAY = 1e-4
andLEARNING_RATE = 5e-5
-------------------------------------------------- The stats of 2023-04-28-3 training: -------------------------------------------------- max mAP: 0.41434526443481445 mean mAP: 0.3443592220544815 max training loss: 65.99470520019531 min training loss: 0.8718012571334839 max training loss on average: 20.811571718851724 min training loss on average: 1.0752028383811314 min training accuracy: 1.0612506866455078 max training accuracy: 96.4731674194336 min testing accuracy: 35.318275451660156 max testing accuracy: 74.19575500488281 --------------------------------------------------
- The training duration is
- 2023.04.27
- Performing a grid search to find the optimal weight decay setting, all tests have the same settings except for the weight decay parameter
- The training duration is
7.1676 hours
withWEIGHT_DECAY = 1e-1
-------------------------------------------------- The stats of 2023-04-27 training: -------------------------------------------------- max mAP: 0.328988641500473 mean mAP: 0.26167472153902055 max training loss: 57.49835968017578 min training loss: 2.0004570484161377 max training loss on average: 23.69808032989502 min training loss on average: 2.260551511446635 min training accuracy: 1.3860299587249756 max training accuracy: 78.02479553222656 min testing accuracy: 25.644535064697266 max testing accuracy: 53.70750427246094 --------------------------------------------------
- The training duration is
7.7900 hours
withWEIGHT_DECAY = 1e-2
-------------------------------------------------- The stats of 2023-04-26 training: -------------------------------------------------- max mAP: 0.36460867524147034 mean mAP: 0.2820669665932655 max training loss: 59.46959686279297 min training loss: 1.341576099395752 max training loss on average: 24.546935682296752 min training loss on average: 1.5733012755711873 min training accuracy: 0.7913635969161987 max training accuracy: 89.63908386230469 min testing accuracy: 29.956649780273438 max testing accuracy: 64.81861114501953 --------------------------------------------------
- The training duration is
6.2753 hours
withWEIGHT_DECAY = 1e-3
-------------------------------------------------- The stats of 2023-04-25 training: -------------------------------------------------- max mAP: 0.3603482246398926 mean mAP: 0.2835115119814873 max training loss: 61.669921875 min training loss: 0.9460040330886841 max training loss on average: 23.978200359344484 min training loss on average: 1.233974441687266 min training accuracy: 1.289968490600586 max training accuracy: 95.745849609375 min testing accuracy: 23.180469512939453 max testing accuracy: 69.15354919433594 --------------------------------------------------
- The training duration is
7.2117
hours withWEIGHT_DECAY = 1e-4
-------------------------------------------------- The stats of 2023-04-22 training: -------------------------------------------------- max mAP: 0.37920689582824707 mean mAP: 0.3020245939493179 max training loss: 72.82600402832031 min training loss: 0.8917444944381714 max training loss on average: 25.31787603378296 min training loss on average: 1.1737037108341852 min training accuracy: 0.5489227175712585 max training accuracy: 96.67901611328125 min testing accuracy: 28.838693618774414 max testing accuracy: 70.72781372070312 --------------------------------------------------
- 2023.04.26
- Performing a grid search to find the optimal weight decay setting, all tests have the same settings except for the weight decay parameter
- The training duration is
6.2753 hours
withWEIGHT_DECAY = 1e-3
-------------------------------------------------- The stats of 2023-04-25 training: -------------------------------------------------- max mAP: 0.3603482246398926 mean mAP: 0.2835115119814873 max training loss: 61.669921875 min training loss: 0.9460040330886841 max training loss on average: 23.978200359344484 min training loss on average: 1.233974441687266 min training accuracy: 1.289968490600586 max training accuracy: 95.745849609375 min testing accuracy: 23.180469512939453 max testing accuracy: 69.15354919433594 --------------------------------------------------
- The training duration is
7.7900 hours
withWEIGHT_DECAY = 1e-2
-------------------------------------------------- The stats of 2023-04-26 training: -------------------------------------------------- max mAP: 0.36460867524147034 mean mAP: 0.2820669665932655 max training loss: 59.46959686279297 min training loss: 1.341576099395752 max training loss on average: 24.546935682296752 min training loss on average: 1.5733012755711873 min training accuracy: 0.7913635969161987 max training accuracy: 89.63908386230469 min testing accuracy: 29.956649780273438 max testing accuracy: 64.81861114501953 --------------------------------------------------
- 2023.04.24
- The result of training for
100
epochs, withk_means()
anchor that rounded to3
decimal places- The training duration:
7.2117
hours-------------------------------------------------- The stats of 2023-04-22 training: -------------------------------------------------- max mAP: 0.37920689582824707 mean mAP: 0.3020245939493179 max training loss: 72.82600402832031 min training loss: 0.8917444944381714 max training loss on average: 25.31787603378296 min training loss on average: 1.1737037108341852 min training accuracy: 0.5489227175712585 max training accuracy: 96.67901611328125 min testing accuracy: 28.838693618774414 max testing accuracy: 70.72781372070312 --------------------------------------------------
- The training duration:
- The result of training for
300
epochs, with same anchors above- The training duration:
20.8263
hours-------------------------------------------------- The stats of 2023-04-23 training: -------------------------------------------------- max mAP: 0.4179251194000244 mean mAP: 0.3632150818904241 max training loss: 72.01780700683594 min training loss: 0.5801995992660522 max training loss on average: 24.274858560562134 min training loss on average: 0.7920041881004969 min training accuracy: 0.45743560791015625 max training accuracy: 99.13544464111328 min testing accuracy: 35.75177001953125 max testing accuracy: 72.34770965576172 --------------------------------------------------
- The training duration:
- The figures for the stats
- 2023.04.23
- 2023.04.21
- 2023.04.18
- The third clustering result using custom
k_means()
Number of clusters: 9 Average IoU: 0.6639814720619468 Anchors original: (0.42412935323383083, 0.09495491293532338), (0.040049518201284794, 0.04793729925053533), (0.12121121241202815, 0.02474208253358925), (0.21935948581560283, 0.041091810726950354), (0.015625, 0.016347497459349592), (0.21888516435986158, 0.09671009948096886), (0.038657583841463415, 0.008815858422256097), (0.125454418344519, 0.07256711409395973), (0.058373810467882634, 0.018722739888977002), Anchors rounded to 2 decimal places: (0.42, 0.09), (0.04, 0.05), (0.12, 0.02), (0.22, 0.04), (0.02, 0.02), (0.22, 0.10), (0.04, 0.01), (0.13, 0.07), (0.06, 0.02), Anchors rounded to 3 decimal places: (0.424, 0.095), (0.040, 0.048), (0.121, 0.025), (0.219, 0.041), (0.016, 0.016), (0.219, 0.097), (0.039, 0.009), (0.125, 0.073), (0.058, 0.019),
- The comparison of
- original anchor for general image dataset
(0.28, 0.22), (0.38, 0.48), (0.9, 0.78), (0.07, 0.15), (0.15, 0.11), (0.14, 0.29), (0.02, 0.03), (0.04, 0.07), (0.08, 0.06)
sklearn.cluster.KMeans()
result
(0.211, 0.098), (0.339, 0.087), (0.495, 0.092), (0.158, 0.033), (0.232, 0.043), (0.125, 0.082), (0.033, 0.017), (0.065, 0.027), (0.107, 0.024),
sklearn.cluster.MiniBatchKMeans()
result
(0.329, 0.085), (0.424, 0.096), (0.530, 0.089), (0.157, 0.031), (0.232, 0.064), (0.164, 0.094), (0.027, 0.016), (0.056, 0.024), (0.105, 0.029),
- Custom
k_means()
result
(0.125, 0.073), (0.219, 0.097), (0.424, 0.095), (0.040, 0.048), (0.121, 0.025), (0.219, 0.041), (0.016, 0.016), (0.039, 0.009), (0.058, 0.019),
- training for 1000 epochs with original anchors
max mAP: 0.18192845582962036 (the highest mAP obtained out of 10 tests) mean mAP: 0.1663009986281395 (the average mAP obtained out of 10 tests) max training loss: 125.03005981445312 min training loss: 0.6005923748016357 max training loss on average: 19.55863230228424 min training loss on average: 0.8333272246519724 min training accuracy: 2.8318750858306885 max training accuracy: 98.84278869628906 min testing accuracy: 33.172786712646484 max testing accuracy: 70.57997131347656
- training for 100 epochs with
sklearn.cluster.KMeans()
anchor that rounded to 2 decimal placesmax training loss on average: 17.887332406044006 min training loss on average: 1.1761843407154082 min training accuracy: 1.1478031873703003 max training accuracy: 96.33079528808594 min testing accuracy: 28.48825454711914 max testing accuracy: 67.01465606689453 max mAP: 0.1628512293100357 mean mAP: 0.1628512293100357 (only test once)
- training for 100 epochs with
sklearn.cluster.KMeans()
anchor that rounded to 3 decimal placesmax training loss on average: 18.193040917714438 min training loss on average: 1.2186308292547863 min training accuracy: 4.069056510925293 max training accuracy: 94.63731384277344 min testing accuracy: 28.80947494506836 max testing accuracy: 66.93435668945312 max mAP: 0.17361223697662354 mean mAP: 0.17361223697662354 (only test once)
- The third clustering result using custom
- The YOLO network seems not able to properly learn this task
- Keep improving the anchor settings
- Plot the comparison between different anchor settings
- Redesign the feture extractor structure
- Change the detection head network
- Apply certain training strategy for our task, e.g. Weight Initialization:
- Random Initialization (current method)
- Xavier Initialization, or Glorot Initialization
- Kaiming Initialization, or He Initialization
- LeCun Initialization
- Ref. Deeplizard Weight Initialization Explained
- Using k-fold cross-validation to ensure that there's no training data selection bias
- Keep improving the anchor settings
- 2023.04.17
- The code for handcrafted-from-scratch version of
k_means()
which consider IoU in its distance metric - The first clustering result using
sklearn.cluster.KMeans()
Estimator: KMeans(n_clusters=9, verbose=True) Number of Clusters: 9 Average IoU: 0.6268763251152744 Inertia: 4.175114625246291 Silhouette Score: 0.4465142389008657 Date and Duration: 2023-04-13 / 0.0951 seconds Anchors: 1: (0.03258875446251471852, 0.01661357100357002681) 5.414155861808978 2: (0.06474560301507539806, 0.02702967964824120467) 17.50052908129688 3: (0.10668965880370681609, 0.02383240311710192738) 25.426709570360032 4: (0.15826612903225806273, 0.03252153592375366803) 51.47057600836014 5: (0.23229679802955666146, 0.04291102216748768350) 99.68093049682716 6: (0.12471330275229357276, 0.08154147553516821745) 101.69306725286172 7: (0.21058315334773208827, 0.09842400107991366998) 207.26436512508812 8: (0.33944144518272417743, 0.08742992109634553644) 296.77338769155074 9: (0.49540441176470573215, 0.09187346813725494332) 455.1452143932022 Anchors original: (0.03258875446251472, 0.016613571003570027), (0.0647456030150754, 0.027029679648241205), (0.10668965880370682, 0.023832403117101927), (0.15826612903225806, 0.03252153592375367), (0.23229679802955666, 0.042911022167487683), (0.12471330275229357, 0.08154147553516822), (0.2105831533477321, 0.09842400107991367), (0.3394414451827242, 0.08742992109634554), (0.49540441176470573, 0.09187346813725494), Anchors rounded to 2 decimal places: (0.03, 0.02), (0.06, 0.03), (0.11, 0.02), (0.16, 0.03), (0.23, 0.04), (0.12, 0.08), (0.21, 0.10), (0.34, 0.09), (0.50, 0.09), Anchors rounded to 3 decimal places: (0.033, 0.017), (0.065, 0.027), (0.107, 0.024), (0.158, 0.033), (0.232, 0.043), (0.125, 0.082), (0.211, 0.098), (0.339, 0.087), (0.495, 0.092),
- The second clustering result using
Estimator: MiniBatchKMeans(n_clusters=9, tol=0.0001, verbose=True) Number of Clusters: 9 Average IoU: 0.6075905487924542 Inertia: 4.375712040766109 Silhouette Score: 0.41462042329969084 Date and Duration: 2023-04-13 / 0.0423 seconds Anchors: 1: (0.02677950180907319802, 0.01550867137489563008) 4.153144931403392 2: (0.05614595190665907370, 0.02351197887023335348) 13.201024348785062 3: (0.10527306967984934039, 0.02908427495291902171) 30.61790903706541 4: (0.15678998161764706731, 0.03086224724264705413) 48.388911778539104 5: (0.23159116755117511999, 0.06435983699772555855) 149.0516979370658 6: (0.16395052370452040114, 0.09384044239250277641) 153.85189674914707 7: (0.32857417864476384795, 0.08490278490759754770) 278.9686281566692 8: (0.42449951171874988898, 0.09640502929687500000) 409.23887863755215 9: (0.53048469387755103899, 0.08938137755102043558) 474.1545270850689 Anchors original: (0.026779501809073198, 0.01550867137489563), (0.056145951906659074, 0.023511978870233353), (0.10527306967984934, 0.02908427495291902), (0.15678998161764707, 0.030862247242647054), (0.23159116755117512, 0.06435983699772556), (0.1639505237045204, 0.09384044239250278), (0.32857417864476385, 0.08490278490759755), (0.4244995117187499, 0.096405029296875), (0.530484693877551, 0.08938137755102044), Anchors rounded to 2 decimal places: (0.03, 0.02), (0.06, 0.02), (0.11, 0.03), (0.16, 0.03), (0.23, 0.06), (0.16, 0.09), (0.33, 0.08), (0.42, 0.10), (0.53, 0.09), Anchors rounded to 3 decimal places: (0.027, 0.016), (0.056, 0.024), (0.105, 0.029), (0.157, 0.031), (0.232, 0.064), (0.164, 0.094), (0.329, 0.085), (0.424, 0.096), (0.530, 0.089),
- The original anchor for general image dataset
ANCHORS = [ [(0.28, 0.22), (0.38, 0.48), (0.9, 0.78)], [(0.07, 0.15), (0.15, 0.11), (0.14, 0.29)], [(0.02, 0.03), (0.04, 0.07), (0.08, 0.06)], ] # Note these have been rescaled to be between [0, 1]
- The code for handcrafted-from-scratch version of
- 2023.04.13
- stackoverflow Custom Python list sorting
from functools import cmp_to_key cmp_key = cmp_to_key(cmp_function) mylist.sort(key=cmp_key)
get_anchors2.py
- Finishing the part where I use
sklearn.cluster.KMeans()
andsklearn.cluster.MiniBatchKMeans()
for clustering - The custom-designed / handcrafted-from-scratch version of
k_means()
is also finished, but it hasn't been well-tested yet
- Finishing the part where I use
- The part of the code
- stackoverflow Custom Python list sorting
- 2023.04.10
- Need to recompute / regenerate anchors for YOLO Training YOLO? Select Anchor Boxes Like This
- for YOLOv2
AlexeyAB/darknet/scripts/
gen_anchors.py
- The anchor boxes were calculated with a k-means clustering algorithm only
- With
1 - IoU
as a distance metric - Doing k-means clustering only is a good approach already
- for YOLOv5 / YOLOv7
ultralytics/yolov5/utils/
autoanchor.py
- ultralytics YOLOv5 Docs Train Custom Data
- Auto-anchor algorithm
Step 0.
K-means (with simple Euclidean distance) is used to get the initial guess for anchor boxes- We also can do it with
1 - IoU
as a distance metric
- We also can do it with
Step 1.
Get bounding box sizes from the train dataStep 2.
Choose a metric to define anchor fitness- Ideally, the metric should be connected to the loss function
Step 3.
Do clustering to get an initial guess for anchorsStep 4.
Evolve anchors to improve anchor fitness
- Things I'm Googling but haven't finished reading
- Faster RCNN with PyTorch
- PyTorch Docs TORCHVISION OBJECT DETECTION FINETUNING TUTORIAL
- PyTorch Docs MODELS AND PRE-TRAINED WEIGHTS
- PyTorch Source Code
fasterrcnn_resnet50_fpn()
- 知呼 FasterRCNN 解析 pytorch官方FasterRCNN代碼
- Faster RCNN reproduction
- Kaggle object detection Aquarium Dataset
- Kaggle Pytorch Starter - FasterRCNN Train
- github search for faster-r-cnn
- Kmeans implementation
- scikit-learn Clustering with kmeans
- scikit-learn Clustering performance evaluation
- scikit-learn
sklearn.cluster.KMeans()
- Tech-with-Tim Implementing K Means Clustering
- Sentdex K-Means from Scratch in Python
- Faster RCNN with PyTorch
- 2023.04.09
- 過去 10 天確診啥也沒做
- 2023.03.28
- I tried to train the model until a point where we're satisfied with its performance, then we can do the edge computing modifications on it
- Quick recap:
- The DAROD paper propose a light architecture for the
Faster R-CNN
object detector on this particular task - They can reach respectively an
mAP@0.5
andmAP@0.3
of55.83
and70.68
- So our goal is to at least get a better mAP then they did
- The DAROD paper propose a light architecture for the
- The current
mAP@50
(for every100
epochs) andmean loss
(for every epoch), for a total of300
epochs of training:max training loss (on average): 20.516442289352415 min training loss (on average): 1.0732185713450113
- To further analyze where the problems are, I first extracted some of the data that I think might be helpful
- The file tree structure:
D:/Datasets/RADA/RD_JPG/training_logs>tree D:. ├─mAP ├─test │ ├─class_accuracy │ ├─no_object_accuracy │ └─object_accuracy └─train ├─class_accuracy ├─losses ├─mean_loss ├─no_object_accuracy └─object_accuracy
- Some other results
- train-class-accuracy vs. test-class-accuracy
- train-no-object-accuracy vs. test-no-object-accuracy
- train-object-accuracy vs. test-object-accuracy
min training accuracy: 2.3661680221557617 max training accuracy: 94.16690826416016 min testing accuracy: 46.69877624511719 max testing accuracy: 72.34597778320312
- The layers of the model
layer 0: torch.Size([20, 32, 416, 416]) layer 1: torch.Size([20, 64, 208, 208]) layer 2: torch.Size([20, 64, 208, 208]) layer 3: torch.Size([20, 128, 104, 104]) layer 4: torch.Size([20, 128, 104, 104]) layer 5: torch.Size([20, 256, 52, 52]) layer 6: torch.Size([20, 256, 52, 52]) layer 7: torch.Size([20, 512, 26, 26]) layer 8: torch.Size([20, 512, 26, 26]) layer 9: torch.Size([20, 1024, 13, 13]) layer 10: torch.Size([20, 1024, 13, 13]) layer 11: torch.Size([20, 512, 13, 13]) layer 12: torch.Size([20, 1024, 13, 13]) layer 13: torch.Size([20, 1024, 13, 13]) layer 14: torch.Size([20, 512, 13, 13]) layer 16: torch.Size([20, 256, 13, 13]) layer 17: torch.Size([20, 256, 26, 26]) layer 18: torch.Size([20, 256, 26, 26]) layer 19: torch.Size([20, 512, 26, 26]) layer 20: torch.Size([20, 512, 26, 26]) layer 21: torch.Size([20, 256, 26, 26]) layer 23: torch.Size([20, 128, 26, 26]) layer 24: torch.Size([20, 128, 52, 52]) layer 25: torch.Size([20, 128, 52, 52]) layer 26: torch.Size([20, 256, 52, 52]) layer 27: torch.Size([20, 256, 52, 52]) layer 28: torch.Size([20, 128, 52, 52])
config = [ (32, 3, 1), # (32, 3, 1) is the CBL, CBL = Conv + BN + LeakyReLU (64, 3, 2), ["B", 1], # (64, 3, 2) + ["B", 1] is the Res1, Res1 = ZeroPadding + CBL + (CBL + CBL + Add)*1 (128, 3, 2), ["B", 2], # (128, 3, 2) + ["B", 2] is th Res2, Res2 = ZeroPadding + CBL + (CBL + CBL + Add)*2 (256, 3, 2), ["B", 8], # (256, 3, 2) + ["B", 8] is th Res8, Res8 = ZeroPadding + CBL + (CBL + CBL + Add)*8 (512, 3, 2), ["B", 8], # (512, 3, 2) + ["B", 8] is th Res8, Res8 = ZeroPadding + CBL + (CBL + CBL + Add)*8 (1024, 3, 2), ["B", 4], # (1024, 3, 2) + ["B", 4] is th Res4, Res4 = ZeroPadding + CBL + (CBL + CBL + Add)*4 # to this point is Darknet-53 which has 52 layers (512, 1, 1), # (1024, 3, 1), # "S", (256, 1, 1), "U", (256, 1, 1), (512, 3, 1), "S", (128, 1, 1), "U", (128, 1, 1), (256, 3, 1), "S", ]
- 2023.03.19
- The actual size of each input image is:
- The resizing results are completely different. We could even conclude that they are wrong (and I don't know why), since we might not need to resize images anymore. Currently, I am just ignoring this issue
- Some samples of person, cyclist and car:
- I first tried to run
train.py
for100
epochs with the following config settings: - The resulted
mAP
is0.182485
- 2023.03.16
- It's finally trainable now
- The major mistakes that I made were: Misinterpreting the labels, but actually translating them correctly.
- In short, simply switching the
x
andy
coordinates will solve our problems - This makes me wonder, How did I get it right when replicating
YOLO-CFAR
before? - Since the shape of the feature map is printed as
torch.Size([256, 64, 3])
, it shows the same coordinate system as theRD map
where the origin(0, 0)
is located at the top left corner - But it turns out that's not the case. The model still recognizes the bottom left corner as the origin, which is the same as we usually do.
- In short, simply switching the
- The correct way to translate the labels
- 2023.03.15
- Still not actually trainable
ValueError: Expected x_min for bbox (-0.103515625, 0.306640625, 0.224609375, 0.365234375, 2.0) to be in the range [0.0, 1.0], got -0.103515625.
- The issue stems from my erroneous translation of the labels
- The way we figured this out is by feeding the model with correct but actually wrong answers, so that we can distinguish whether the issue lies in the content of the label or my code implementation
- What I mean by wrong labels is that I use the previously well-tested synthetic radar dataset labels for training
- It is trainable with correct but actually wrong labels
- When testing
PASCAL_VOC
dataset, I actually used padding for the input images, but I forgot that padding existed. So we can now confirm that my code can only take square inputs - Remove useless transforms of
YOLOv3-VOC
- we need
LongestMaxSize()
andPadIfNeeded()
to avoidRuntimeError: Trying to resize storage that is not resizable
- we need
Normalize()
to avoidRuntimeError: Input type (torch.cuda.ByteTensor) and weight type (torch.cuda.HalfTensor) should be the same
- we need
ToTensorV2()
to avoidRuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[10, 416, 416, 3] to have 3 channels, but got 416 channels instead
- we need
- Still not actually trainable
- 2023.03.14
- Ref. Albumentations Documentation Full API Reference
- Remove useless transforms of
YOLOv3-VOC
- we need
LongestMaxSize()
andPadIfNeeded()
to avoidRuntimeError: Trying to resize storage that is not resizable
- we need
Normalize()
to avoidRuntimeError: Input type (torch.cuda.ByteTensor) and weight type (torch.cuda.HalfTensor) should be the same
- we need
ToTensorV2()
to avoidRuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[10, 416, 416, 3] to have 3 channels, but got 416 channels instead
- we need
- The execution result and the error messages of the same code are different when using my PC compared to the lab PC, which is weird and annoying.
- 2023.03.10
- Still untrainable
- First, I prepare
3
types of square sizes of images, 64-by-64, 256-by-256, and 416-by-416, respectively. - The way I tested it is by simply changing the input images to the previously successful version, without changing anything else, and seeing how it goes.
- Even though I resized all the images to a square size, the exact same error persists. Specifically:
RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[16, 64, 64, 3] to have 3 channels, but got 64 channels instead
RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[16, 256, 256, 3] to have 3 channels, but got 256 channels instead
RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[16, 416, 416, 3] to have 3 channels, but got 416 channels instead
- First, I prepare
- It still doesn't work, but every piece of code is the same, so I speculate that maybe it's because the images are not actually encoded in the
'JPEG'
format. - So I re-read the dataset, stored the
.mat
files out, and converted the.mat
files into scaled color and grayscale.- Plotting 7193 frames of the CARRADA Dataset in scaled color using MATLAB link
- Then I used the scaled color images to train, still getting errors, but at least now we have a different error message.
ValueError: Expected x_min for bbox (-0.103515625, 0.306640625, 0.224609375, 0.365234375, 2.0) to be in the range [0.0, 1.0], got -0.103515625.
- Still untrainable
- 2023.03.09
- 2023.03.04
- New breach, image file format may be the issue
- Regenerate all data in .jpg
- 2023.02.21
- Modified from YOLO-CFAR
(pt3.8) D:\Datasets\YOLOv3-PyTorch\YOLOv3-debug1>D:/ProgramData/Anaconda3/envs/pt3.8/python.exe d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug1/train.py 0%| | 0/375 [00:03<?, ?it/s] Traceback (most recent call last): File "d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug1/train.py", line 166, in <module> main() File "d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug1/train.py", line 107, in main train_fn(train_loader, model, optimizer, loss_fn, scaler, scaled_anchors) File "d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug1/train.py", line 57, in train_fn out = model(x) File "D:\ProgramData\Anaconda3\envs\pt3.8\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "d:\Datasets\YOLOv3-PyTorch\YOLOv3-debug1\model.py", line 191, in forward x = layer(x) # File "D:\ProgramData\Anaconda3\envs\pt3.8\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "d:\Datasets\YOLOv3-PyTorch\YOLOv3-debug1\model.py", line 110, in forward return self.leaky(self.bn(self.conv(x))) # bn_act() File "D:\ProgramData\Anaconda3\envs\pt3.8\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "D:\ProgramData\Anaconda3\envs\pt3.8\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "D:\ProgramData\Anaconda3\envs\pt3.8\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[16, 256, 64, 3] to have 3 channels, but got 256 channels instead
- Modified from YOLO-Pascal_VOC
(pt3.8) D:\Datasets\YOLOv3-PyTorch\YOLOv3-debug2>D:/ProgramData/Anaconda3/envs/pt3.8/python.exe d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug2/train.py 0%| | 0/5999 [00:00<?, ?it/s] x: torch.Size([1, 3, 256, 64]) y0: torch.Size([1, 3, 2, 2, 6]) y1: torch.Size([1, 3, 2, 2, 6]) y2: torch.Size([1, 3, 2, 2, 6]) 0%| | 0/5999 [00:04<?, ?it/s] Traceback (most recent call last): File "d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug2/train.py", line 144, in <module> main() File "d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug2/train.py", line 91, in main train_fn(train_loader, model, optimizer, loss_fn, scaler, scaled_anchors) File "d:/Datasets/YOLOv3-PyTorch/YOLOv3-debug2/train.py", line 47, in train_fn loss_fn(out[0], y0, scaled_anchors[0]) File "D:\ProgramData\Anaconda3\envs\pt3.8\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "d:\Datasets\YOLOv3-PyTorch\YOLOv3-debug2\loss.py", line 83, in forward no_object_loss = self.bce((predictions[..., 0:1][noobj]), (target[..., 0:1][noobj]),) IndexError: The shape of the mask [1, 3, 2, 2] at index 2 does not match the shape of the indexed tensor [1, 3, 8, 2, 1] at index 2
- Modified from YOLO-CFAR
- 2023.02.20
- 2023.02.18
- The virtual envs are summarized below:
- My PC
(Intel i7-8700 + Nvidia Geforce RTX 2060)
:- env
pt3.7
with CUDApython==3.7.13 numpy==1.19.2 pytorch==1.7.1 torchaudio==0.7.2 torchvision==0.8.2 pandas==1.2.1 pillow==8.1.0 tqdm==4.56.0 matplotlib==3.3.4 albumentations==0.5.2
- env
- Lab PC
(Intel i7-12700 + Nvidia Geforce RTX 3060 Ti)
:- env
pt3.7
without CUDApython==3.7.13 numpy==1.21.6 torch==1.13.1 torchvision==0.14.1 pandas==1.3.5 pillow==9.4.0 tqdm==4.64.1 matplotlib==3.5.3 albumentations==1.3.0
- env
pt3.8
with CUDApython==3.8.16 numpy==1.23.5 pytorch==1.13.1 pytorch-cuda==11.7 torchaudio==0.13.1 torchvision==0.14.1 pandas==1.5.2 pillow==9.3.0 tqdm==4.64.1 matplotlib==3.6.2 albumentations==1.3.0
- env
- An annoying bug in
dataset.py
due to pytorch version- The code segment that contains potential bug (on line
149
and155
) scale_idx = anchor_idx // self.num_anchors_per_scale
works fine on my PC, but on lab PC will get the following warning, so I naturally followed the suggestions and changed the syntax to (torch.div()
)UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch.
- After following the suggestion and chage the deprecated usage
//
we have:scale_idx = torch.div(anchor_idx, self.num_anchors_per_scale, rounding_mode='floor')
. This piece of code works fine on lab PC, under both envpt3.7
andpt3.8
, but failed on my PC. - The error only occur on my PC, under env
pt3.7
, but this env is the initial and stable one.Original Traceback (most recent call last): File "C:\Users\paulc\.conda\envs\pt3.7\lib\site-packages\torch\utils\data\_utils\worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "C:\Users\paulc\.conda\envs\pt3.7\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\paulc\.conda\envs\pt3.7\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "d:\Datasets\YOLOv3-PyTorch\dataset.py", line 153, in __getitem__ scale_idx = torch.div(anchor_idx, self.num_anchors_per_scale, rounding_mode='floor') TypeError: div() got an unexpected keyword argument 'rounding_mode'
- The code segment that contains potential bug (on line
- 2023.02.10
- Trying newer stable PyTorch and CUDA version for the project
- Python 3.8 + CUDA 11.7
conda create --name pt3.8 python=3.8
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
(Install PyTorch)
- Interesting to know!
- The new dependency is:
numpy==1.23.5 matplotlib==3.6.2 pytorch==1.13.1 pytorch-cuda==11.7 torchaudio==0.13.1 torchvision==0.14.1 tqdm==4.64.1 albumentations==1.3.0 pandas==1.5.2 pillow==9.3.0
- 2023.02.08
- The
YOLOv3
model is trainable withPascal_VOC
dataset- But it's bind with
Albumentations
/ data augmentations, which means we need to decoupling it
- But it's bind with
- To our knowledge, we know that pre-training is good for our task, least that's what the paper says, so I was trying to solve this issue
- C. Decourt, R. VanRullen, D. Salle and T. Oberlin, "DAROD: A Deep Automotive Radar Object Detector on Range-Doppler maps," 2022 IEEE Intelligent Vehicles Symposium (IV), Aachen, Germany, 2022, pp. 112-118.
- Originally, I want to convert the pre-trained weights from darknet_format to pytorch_format, it does not work
- Add two additional functions
load_CNN_weights()
andload_darknet_weights()
inmodel.py
to read the darknet weights- fun fact, there are in total
62001757
parameters of YOLOv3
- fun fact, there are in total
- The
- At least, in the future, we can separate our training process if needed
- we can "save checkpoint" for every epoch or every 10, 20 epochs
- but the correctness of doing so is unsure, what I mean unsure is that say we already train for 100 epochs and achieve centain level of preformance, but if we stop and continue the training for another 100 epochs, the performance may drop
- remember to test it with
seed_everything()
and make sure it works
- Need to find a newer dependency
- Currently run without CUDA support since there will be PyTorch 2.0 updates soon
- Deprecation of CUDA 11.6 and Python 3.7 Support
- Please note that as of Feb 1, CUDA 11.6 and Python 3.7 are no longer included in the Stable CUDA
- There is a new paper that says their model can learn spatial and temporal relationships between frames by leveraging the characteristics of the FMCW radar signal
- Decourt, Colin, et al. "A recurrent CNN for online object detection on raw radar frames." arXiv preprint arXiv:2212.11172 (2022).
- The comparison between different generations showed that, though newer versions of the model may be more complex, they are not necessarily bigger
- YOLOv3
222
layers,62001757
parameters - YOLOv4
488
layers,64363101
parameters- YOLOv4-CSP
516
layers,52921437
parameters
- YOLOv4-CSP
- YOLOv7
314
layers,36907898
parameters
- YOLOv3
- Future works
- Make sure we can properly run
train.py
with radar dataset - Find a proper way to measure the "communication overhead"
- Test the functionality of
seed_everything()
, check if it works like the way we think - Find a newer stable PyTorch and CUDA version for the project
- Make sure we can properly run
- 2023.02.07
- The code
detect.py
andmodel_with_weights2.py
works fine, but the result may not be the way as we expected - Need to figure out the usability of the converted weights, since there is a huge difference between random weights and the converted weights, maybe it's not complete garbage
- The code
- 2023.02.06
- On lab PC, create a new env
pt3.7
through commandconda create --name pt3.7 python=3.7
- to use the env
conda activate pt3.7
- to leave the env
conda deactivate
- actual env and pkgs locates at
C:\Users\Paul\.conda\envs\pt3.7
, don't know why it is not been stored inD Drive
- to use the env
- Upgrade default conda env
base
through commandconda update -n base -c defaults conda
- It has to be done under
(base) C:\Windows\system32>
- It has to be done under
- Install all the packages through
pip install -r requirements.txt
- content in the requirements file
numpy>=1.19.2 matplotlib>=3.3.4 torch>=1.7.1 tqdm>=4.56.0 torchvision>=0.8.2 albumentations>=0.5.2 pandas>=1.2.1 Pillow>=8.1.0
- cmd output stored as
D:/Datasets/YOLOv3-PyTorch/logs/installation_logs_0206.txt
- actual dependency, the new requirement is:
numpy==1.21.6 matplotlib==3.5.3 torch==1.13.1 tqdm==4.64.1 torchvision==0.14.1 albumentations==1.3.0 pandas==1.3.5 Pillow==9.4.0
- content in the requirements file
- Currently run without CUDA support since there will be PyTorch 2.0 updates soon
- Deprecation of CUDA 11.6 and Python 3.7 Support
- Please note that as of Feb 1, CUDA 11.6 and Python 3.7 are no longer included
- Run
model_with_weights2.py
again on lab PC to generate the weights in PyTorch format - Wanted to test the training ability using
PASCAL_VOC
dataset - But first, we have to test the converted weights to check if they actually work
- to do so, maybe we could write a program
detect.py
and test the weights with some inference samples - if it can predict perfectly, then we may assume it is converted correctly
- Okay, it does not work..., the inference outputs are bunch of random tags
- to do so, maybe we could write a program
- On lab PC, create a new env
- 2023.02.05
- first download the YOLOv3 weights from https://pjreddie.com/media/files/yolov3.weights as
yolov3.weights
and put it at the same directory - then run
model_with_weights2.py
, it will save the weights to PyTorch format. we name the output weights ascheckpoint-2023-02-05.pth.tar
also in the same directory - inside the directory
- I override most of the files with my previous ones, except for
model_with_weights2.py
- first download the YOLOv3 weights from https://pjreddie.com/media/files/yolov3.weights as
- The implementation is based on the following paper
- Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
@article{yolov3, title={YOLOv3: An Incremental Improvement}, author={Redmon, Joseph and Farhadi, Ali}, journal = {arXiv}, year={2018} }
- Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
- The original code was copied from YOLOv3-PyTorch and for more details please read their Medium post YOLOv3 — Implementation with Training setup from Scratch