Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use YoloR with swin transformer as backbone. #71

Closed
farazBhatti opened this issue Sep 1, 2022 · 10 comments
Closed

Use YoloR with swin transformer as backbone. #71

farazBhatti opened this issue Sep 1, 2022 · 10 comments

Comments

@farazBhatti
Copy link

farazBhatti commented Sep 1, 2022

@leondgarse I am trying to get inference using yolor with swin backbone but getting the following results. What can be the issue?

from keras_cv_attention_models import efficientnet, yolor
from keras_cv_attention_models import swin_transformer_v2

from keras_cv_attention_models import efficientnet, yolor
bb = swin_transformer_v2.SwinTransformerV2Small_window16(input_shape=(256, 256, 3), num_classes=1000)
model = yolor.YOLOR(backbone=bb) 

from keras_cv_attention_models import test_images
imm = test_images.dog_cat()
preds = model(model.preprocess_input(imm))
bboxs, lables, confidences = model.decode_predictions(preds)[0]

from keras_cv_attention_models.coco import data
data.show_image_with_bboxes(imm, bboxs, lables, confidences)

resulting output
download

@leondgarse
Copy link
Owner

I think this a same issue with #70.
You have to train it firstly, as this combination is using their own pre-trained weights separately, we cannot expect a good result.
I may take a try for training, just got very little spare time recently...

@farazBhatti
Copy link
Author

@leondgarse , which training script should i use for above mentioned combination training?

@farazBhatti farazBhatti changed the title Use YoloR with swin trqansformer as backbone. Use YoloR with swin transformer as backbone. Sep 2, 2022
@leondgarse
Copy link
Owner

leondgarse commented Sep 3, 2022

This is a command I've just tested. Detail usage for coco_train_script.py is explained in COCO training and evaluating.

CUDA_VISIBLE_DEVICES='0' ./coco_train_script.py --backbone swin_transformer_v2.SwinTransformerV2Small_window16 \
--det_header yolor.YOLOR --anchors_mode yolor -s yolor_swin

Here is a test result after only runing 9 epochs:

from keras_cv_attention_models import yolor, swin_transformer_v2, test_images

bb = swin_transformer_v2.SwinTransformerV2Small_window16(input_shape=(256, 256, 3), pretrained=None, num_classes=0)
model = yolor.YOLOR(backbone=bb, input_shape=(256, 256, 3), rescale_mode='torch')  # Default rescale_mode from coco_train_script.py is "torch"
model.load_weights('checkpoints/yolor_swin_latest.h5')  # Load the trained weights

# Detect
imm = test_images.dog_cat()
preds = model(model.preprocess_input(imm))
bboxs, lables, confidences = model.decode_predictions(preds)[0]

# Show
from keras_cv_attention_models.coco import data
data.show_image_with_bboxes(imm, bboxs, lables, confidences)

yolor_swin

@leondgarse
Copy link
Owner

Uh, right, pull the latest code first, as I modified a little for swin float16 / float32 issue.

@farazBhatti
Copy link
Author

farazBhatti commented Sep 3, 2022

ok, sure. I am working on it now, is it possible for you to share your trained model maybe?

@leondgarse
Copy link
Owner

If you really want, this yolor_swin.h5 is a model trained for 10 epochs.

@farazBhatti
Copy link
Author

@leondgarse ,very thanks. I am currently training this model using colab, ill share once its done.::))

@hamzakhalil798
Copy link

@leondgarse

Hey, Thank you so much for sharing your work and helping solve the quires. I want to train the above model you provided but am severely lacking in the resources department. It would be very generous of you to kindly train the model a little more whenever feasible and provide it if possible.
regards

@leondgarse
Copy link
Owner

Ya, my training is actually finished earlier, using inputa_shape 512. But only reaches test AP 0.4204, not very satisfying, as the total FLOPs is 63.25G...

CUDA_VISIBLE_DEVICES='0' ./coco_train_script.py \
--backbone swin_transformer_v2.SwinTransformerV2Small_window16 --det_header yolor.YOLOR \
--anchors_mode yolor -i 512 -b 32 -p adamw

Model weights is uploaded YOLOR_SwinTransformerV2Small_window16_512_epoch_89_val_ap_ar_0.4204.h5. Basic usage is same with the previous one, just with input_shape=512.

from keras_cv_attention_models import yolor, swin_transformer_v2, test_images

bb = swin_transformer_v2.SwinTransformerV2Small_window16(input_shape=(512, 512, 3), pretrained=None, num_classes=0)
model = yolor.YOLOR(backbone=bb, input_shape=(512, 512, 3), rescale_mode='torch')  # Default rescale_mode from coco_train_script.py is "torch"
model.load_weights('YOLOR_SwinTransformerV2Small_window16_512_epoch_89_val_ap_ar_0.4204.h5')  # Load the trained weights

# Detect
imm = test_images.dog_cat()
preds = model(model.preprocess_input(imm))
bboxs, lables, confidences = model.decode_predictions(preds)[0]

# Show
from keras_cv_attention_models.coco import data
data.show_image_with_bboxes(imm, bboxs, lables, confidences)

yolor_swin_512

Eval

CUDA_VISIBLE_DEVICES='1' ./coco_eval_script.py \
-m checkpoints/YOLOR_SwinTransformerV2Small_window16_512_epoch_89_val_ap_ar_0.4204.h5 \
--nms_method hard --nms_iou_or_sigma 0.65 --nms_max_output_size 300 --nms_topk -1
# Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.426
# Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.618
# Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.456
# Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.216
# Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.473
# Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.608
# Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.336
# Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.526
# Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.562
# Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.316
# Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.628
# Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.778

@hamzakhalil798
Copy link

@leondgarse
Thankyou so much!!!
Means alot.
You've made my day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants