Skip to content

saeed5959/object-detection-transformer

Repository files navigation

An End-to-End Object Detection with Vision Transformation Link

Official implementation of "Place of Attention Matters!" in pytorch : Article is Here .

This work has been inspired by vision transformer and Detr.


GOOD NEWS!!
The pretrained model for 56 epoch is in Drive

Model Result


Preprocess

1-download coco dataset("annotations", "train2017") and unzip and put in dataset folder
2-python3 preprocess.py
3-it will make dataset_file_out.txt

Train or Fine-tune

python3 train.py --train_file_path ./dataset/dataset_file_out.txt --model_path ./m.pth --pretrained ./x_56.pth

Inference

python3 inference.py --img_path ./dataset/train2017/000000580197.jpg --model_path ./x_56.pth --out_path ./out.jpg

Model Architecture



Place of Attention Matters!


An End-to-End Object Detection with Vision Transformation!

In the object detection task, the purpose is to find the class of object and a bounding box around it. Most works have focused on just finding the class of object without considering bounding box features properly. We present a new method that focuses on relationships between patches of the image as a feature for bounding box detector. Also, we combine convolutional neural network as a local feature detector and Transformer network as a long-distance feature detector. We were also inspired by the method that has been used in Transformer as a relationship between patches in the image. Our implementation can perform in real-time and improve the accuracy of previous works.



Sample output



LOSS