[RFC] YOLO v5 #48

oke-aditya · 2020-11-20T08:59:06Z

🚀 Feature

Implement YOLO v5 from torch.hub.
This library removes such dataset abstraction and aims to provides a clean modular interface to models.

Some key points to note: -

YOLO Achieves SOTA using Mosiac transforms as @zhiqwang mentioned.
We don't need to define the model/weights/backbones, we can load using torch hub.
We should provide an easy to use model definition, a simple train_step, val_step, fit API and a lightning trainer. Datasets, augmentations, transforms are not needed.

Note that none of quickvision models can achieve SOTA, limitations being torchvision's implementations and not using transforms/datasets. But they are faster, easier and flexible to train. Something which torchvision too does.

With this context, we can start adding YOLO v5.

Depedencies: -

Avoid opencv-python at all costs. Opencv is not like PIL a library for image reading. It is huge and has lot of subdependencies. Keeping library light will enable us to use PyTorch Docker containers and directly infer using torchserve.

Evaluation mode: -
. We don't need .fuse() .eval()such methods. We only need to load the model fromtorch.hub.
Currently, we do not have inference scripts for any models, but surely in future #2 . So right now let's focus on training.

The text was updated successfully, but these errors were encountered:

zhiqwang · 2020-11-20T09:53:03Z

Hi @oke-aditya , Recently, I refactor ultralytics's yolov5 following the philosophy of torchvision's here. Now the inference part looks similar to torchvision's retinanet. The inference part is done. And I plan to use quickvision's API (train_step, val_step, fit and etc.) to implement the training part.

zhiqwang · 2020-11-20T10:03:12Z

If we totally use torch.hub to implement ultralytic's in quickvision, it's difficult to obtain the intermediate features.

oke-aditya · 2020-11-20T10:05:34Z

True, but that reduces codebase significantly to maintain. I would love a custom implementation though, it would need some testing and weights etc.

oke-aditya · 2020-11-20T10:07:56Z

Just saw your implementation. That's great. Really nice and so complete.

zhiqwang · 2020-11-20T10:11:11Z

And the difference of the model weights between my implementation and ultralytics's is just some keyname, you can check the model weights converting script here.

oke-aditya · 2020-11-20T10:29:07Z

That's absoltely fine!.
Just one small doubt. Will using GeneralizedRCNNTransform detoriate the results ?
P.S> I'm not super expert in these models or scratch implementing them.

zhiqwang · 2020-11-20T10:34:44Z

Just one small doubt. Will using GeneralizedRCNNTransform detoriate the results ?

True, Ultralytics use the letterbox to do this function, and it reply on opencv here, we should refactor this part using something in torch to make it consistently comparing to ultralytics.

And this is the only one difference between ultralytics's and my.

oke-aditya · 2020-11-20T10:36:24Z

I will try once locally by using torch.hub without opencv dependency and try to produce a training script, which we can use.

There are few problems which I forsee while doing using hub, but I want to verify them once.
Some are these opencv hardbindings and other might be that inference is attached to produce some custom BBoX class which I really don't want in this repo.

oke-aditya · 2020-11-20T10:41:41Z

Though GeneralizedRCNNTransform worked actually fine while torchvision adopted RetinaNet, there seemed to be no issue that time. I'm not sure how super flexible it is.

zhiqwang · 2020-11-20T10:42:03Z

I will try once locally by using torch.hub without opencv dependency and try to produce a training script, which we can use.

Hopefully to see your experiment result here.

oke-aditya · 2020-11-20T10:43:03Z

Sure, I will share a Colab so that we can dig into their implementation.
This will enable us to compare ours as well later !

oke-aditya · 2020-11-21T09:10:04Z

Just a doubt. FRCNN and RetinaNet are not differentiable in eval model.
E.g.

model = FRCNN()
model.eval()
opt = Adam(lr=1e-3, model.parameters())
opt.step()

Will not work.

But it will work for Detr. Which is differentiable in both train and eval mode.

I would like to know For YOLO what are your thoughts?

zhiqwang · 2020-11-21T09:53:00Z

The frcnn and retinanet in eval mode containing the PostProcess module which shouldn't be learnt. In other words, there are no parameters to be learn in PostProcess.

But in detr, they split PostProcess out here, its output is same with train mode except for something like BatchNormlization.

For yolov5, we can choose either ways, No rules of thumb I think?

oke-aditya · 2020-11-21T10:14:34Z

No Thumb rule here.
For Detr, I seperated out the PostProcess as I think it is something optional and leave it to best of users.
That's what Detr too did in their paper and demo.

Best to leave it out, as I think it will be great to keep same interface in train and eval mode, this offers flexibility to users.
E.g. For CNNs and Detr, we have almost identical outputs, loss calculations in val mode,

The idea here is training and validation is really almost something similar with small changes.

While inference is something where we don't care about loss and do PostProcess etc.
Inference is something which one would do in real-time, just visualize the outputs.

Sadly we can't do this for FRCNN and RetinaNet as you mentioned the PostProcess is combined.

Let me know your thoughts !

zhiqwang · 2020-11-21T10:14:44Z

Here is a discussion of the frcnn output in torchvision: pytorch/vision#1775

oke-aditya · 2020-11-21T10:18:34Z

Both loss_dict and detections is fine too, but currently torchvision models don't do that.
This is acutally good idea, as users can slowly track how the detection is improving with respect to losses.

Sometimes dataset is huge and detectoins are simply high which can cause memory burden to save them though.

I think for detection users should simple do in a inference and not a train_step or val_step

zhiqwang · 2020-11-21T10:32:06Z

Yep, the mechanisms in detr is more flexible, but it will influence the recovery of image shape in the PostProcess procedure. This fixes is minor, I will test it in my yolov5-rt-track repo.

oke-aditya · 2020-11-21T10:32:13Z

One advantage of doing PostProcess is that it enables to convert from YOLO to Pascal VOC. from which we can compute IoU and other metrics from torchvision.

I think if we too do PostProcess internally and are able to compute these metrics, it wold be nice.

Detr PostProcess returns

{'scores': s, 'labels': l, 'boxes': b}

If we can stick with these output formats for all models in both train_step and val_step it would be standard.
Plus this allows us to compute metrics as well which we return in both the places train_step and val_step

Sadly we can't do this for torchvision models as this is coupled with the model defintion. Hence the train_step differs, we don't get the detection outputs only the loss_dict.

So we can return

{"loss1" : avg_loss, "loss2": avg_loss, ... , "iou" : avg_iou, "giou" : avg_giou}
as a standard for train_step and val_step

What I would propose is in train_step and val_step we do this PostProcess. But not attach with the model.
So the user gets best of both. If he uses train_step or val_step everything is handled by us.

Also, if the user would like very specific post processing, he can simply instantiate the model and continue on his own.
This is what Detr did and I think was very good thought by them.

Thoughts?

zhiqwang · 2020-11-21T11:40:51Z

Just a doubt here, Compared to torchvision using ImageList, detr adopts NestedTensor. The difference between these two is relatively small, ImageList contains image_sizes in initialization while NestedTensor not. If we seperate out the PostProcess, we should use NestedTensor instead?

oke-aditya · 2020-11-21T13:15:17Z

I think Nested Tensor Utils can be re-used from our utils.
ImageList is really torchvision specific, and Nested Tensor is probably going to be better supported and is slightly better representation.

zhiqwang · 2020-11-23T06:47:22Z

Hi @oke-aditya Here is a slightly abstract model graph visualization comparing yolov5 to retinanet, their struct is almost same now:

vs

Check for more details in my notebooks

oke-aditya · 2020-11-23T07:13:06Z

Was bit away for this weekend, I will have a look definitely.

oke-aditya · 2020-11-27T15:42:09Z

Just a quick update @zhiqwang .
After release 0.1 we will continue on this. (From next week). I am super eager to have your YOLO v5 here.
It will definitely be there is 0.2 no doubts 😄

zhiqwang · 2020-11-27T16:33:16Z

@oke-aditya Sure, and we can follow detr from the release 0.1.

zhiqwang · 2020-11-29T11:18:56Z

Hi @oke-aditya , It seems that 0.1 will be released soon, I'm currently working for make yolov5-rt-stack to support training now.

oke-aditya · 2020-11-29T11:32:20Z

Yes, release is today 😄 We will come back to this.

zhiqwang · 2020-12-07T18:15:46Z

Hi @oke-aditya , there are some bugs in loss computation in my yolov5 implementation zhiqwang/yolort#16. When this was fixed, it should be loaded as DETR or retinanet.

oke-aditya · 2020-12-07T18:21:31Z

Very nice. Till then I will make layers API which will make porting YOLO easier.

zhiqwang · 2021-01-07T02:27:48Z

Hi @oke-aditya , The yolov5 now can be used for training, checkout zhiqwang/yolort#25 to see the details, but there are some hidden bugs in the master branch, I will add more unittest soon.

oke-aditya · 2021-01-07T04:47:40Z

Superb. I have been really busy this month. I will get back to working at end of month.

oke-aditya added Medium Priority Should be addressed in few days Model to Implement This model should be implmented feature A new feature request labels Nov 20, 2020

oke-aditya added this to To do in Release 0.1.1 via automation Nov 20, 2020

oke-aditya assigned zhiqwang and oke-aditya Nov 20, 2020

oke-aditya mentioned this issue Dec 1, 2020

[RFC] Layers API in Quickvision #61

Open

oke-aditya added this to To do in Release 0.2.0 Dec 11, 2020

oke-aditya moved this from To do to Done in Release 0.1.1 Dec 11, 2020

oke-aditya moved this from Done to Stale in Release 0.1.1 Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] YOLO v5 #48

[RFC] YOLO v5 #48

oke-aditya commented Nov 20, 2020

zhiqwang commented Nov 20, 2020 •

edited

zhiqwang commented Nov 20, 2020

oke-aditya commented Nov 20, 2020

oke-aditya commented Nov 20, 2020

zhiqwang commented Nov 20, 2020 •

edited

oke-aditya commented Nov 20, 2020 •

edited

zhiqwang commented Nov 20, 2020 •

edited

oke-aditya commented Nov 20, 2020 •

edited

oke-aditya commented Nov 20, 2020

zhiqwang commented Nov 20, 2020

oke-aditya commented Nov 20, 2020

oke-aditya commented Nov 21, 2020

zhiqwang commented Nov 21, 2020 •

edited

oke-aditya commented Nov 21, 2020 •

edited

zhiqwang commented Nov 21, 2020

oke-aditya commented Nov 21, 2020 •

edited

zhiqwang commented Nov 21, 2020 •

edited

oke-aditya commented Nov 21, 2020 •

edited

zhiqwang commented Nov 21, 2020 •

edited

oke-aditya commented Nov 21, 2020

zhiqwang commented Nov 23, 2020 •

edited

oke-aditya commented Nov 23, 2020

oke-aditya commented Nov 27, 2020 •

edited

zhiqwang commented Nov 27, 2020

zhiqwang commented Nov 29, 2020 •

edited

oke-aditya commented Nov 29, 2020

zhiqwang commented Dec 7, 2020 •

edited

oke-aditya commented Dec 7, 2020

zhiqwang commented Jan 7, 2021 •

edited

oke-aditya commented Jan 7, 2021

[RFC] YOLO v5 #48

[RFC] YOLO v5 #48

Comments

oke-aditya commented Nov 20, 2020

🚀 Feature

zhiqwang commented Nov 20, 2020 • edited

zhiqwang commented Nov 20, 2020

oke-aditya commented Nov 20, 2020

oke-aditya commented Nov 20, 2020

zhiqwang commented Nov 20, 2020 • edited

oke-aditya commented Nov 20, 2020 • edited

zhiqwang commented Nov 20, 2020 • edited

oke-aditya commented Nov 20, 2020 • edited

oke-aditya commented Nov 20, 2020

zhiqwang commented Nov 20, 2020

oke-aditya commented Nov 20, 2020

oke-aditya commented Nov 21, 2020

zhiqwang commented Nov 21, 2020 • edited

oke-aditya commented Nov 21, 2020 • edited

zhiqwang commented Nov 21, 2020

oke-aditya commented Nov 21, 2020 • edited

zhiqwang commented Nov 21, 2020 • edited

oke-aditya commented Nov 21, 2020 • edited

zhiqwang commented Nov 21, 2020 • edited

oke-aditya commented Nov 21, 2020

zhiqwang commented Nov 23, 2020 • edited

oke-aditya commented Nov 23, 2020

oke-aditya commented Nov 27, 2020 • edited

zhiqwang commented Nov 27, 2020

zhiqwang commented Nov 29, 2020 • edited

oke-aditya commented Nov 29, 2020

zhiqwang commented Dec 7, 2020 • edited

oke-aditya commented Dec 7, 2020

zhiqwang commented Jan 7, 2021 • edited

oke-aditya commented Jan 7, 2021

zhiqwang commented Nov 20, 2020 •

edited

zhiqwang commented Nov 20, 2020 •

edited

oke-aditya commented Nov 20, 2020 •

edited

zhiqwang commented Nov 20, 2020 •

edited

oke-aditya commented Nov 20, 2020 •

edited

zhiqwang commented Nov 21, 2020 •

edited

oke-aditya commented Nov 21, 2020 •

edited

oke-aditya commented Nov 21, 2020 •

edited

zhiqwang commented Nov 21, 2020 •

edited

oke-aditya commented Nov 21, 2020 •

edited

zhiqwang commented Nov 21, 2020 •

edited

zhiqwang commented Nov 23, 2020 •

edited

oke-aditya commented Nov 27, 2020 •

edited

zhiqwang commented Nov 29, 2020 •

edited

zhiqwang commented Dec 7, 2020 •

edited

zhiqwang commented Jan 7, 2021 •

edited