Trouble training custom dataset #169
Comments
How many classes do you have in your custom dataset? If you have N classes, then you should set NUM_CLASSES: N+1 in your yaml config file. For example, for six classes you should set NUM_CLASSES: 7. For 80 classes COCO you should set it to 81. |
Thank you 👍 . I have 4 classes so I should set NUM_CLASSES to 5. The error (from what I understood in lib/roi_data/fast_rcnn.py) comes from the fact _expand_boxes_targets create an array with size defined by NUM_CLASSES parameter but when this array is filled up in for loop, it takes first box element as the class index and error happens when this class index is greater than the NUM_CLASSES parameter. The fact I can get a greater class index value than NUM_CLASSES is weird. For the record, I put bellow the lines of code I talking about (in lib/roi_data/fast_rcnn.py ): l.251 l.256 ll.260-270
Error occurs when cls is greater than cfg.MODEL.NUM_CLASSES |
@francoto I have a question, how you converted your dataset to coco format? |
@raninbowlalala Hope it will help you |
@francoto Thanks for your help, I converted my dataset to coco format successfully. |
I finally made it:
|
Hi francoto, |
Hello @YanWang2014, |
I'm sorry but I'm still struggling with training on a different number of classes. I have 2 classes in my annotation file so I set the number of classes in my config file to 3. I added some lines in the net.py to prevent the class related layers from loading (after this line):
That way Detectron should not load the weights from these layers and leave them in the dimensions as configured in the .yaml file. I'm happy for any help. |
Hello @mattifrind ! I would say that you should not 'force' detectron to forget about weights. |
@francoto are you using inference to show your pdf results? as I was initially doing that and in infer_simple.py it uses a dummy dataset in dummy_coco_dataset = dummy_datasets.get_coco_dataset() ... with the COCO dataset labels. Also, when you get your bounding boxes, do they make sense? Because I get decent masks, but the bounding boxes are not around these masks. |
Hey @francoto! Thanks for your help. |
Hey @GabriellaP,
to test:
I can't share publicly my results but my bounding boxes location and mask are quite fine (I obviously have some errors but considering my dataset is only ~350 images, I think its pretty amazing) but as I said I still have the COCO dataset labels. I need to check the infer_simple.py file. |
Hey @mattifrind, from what I remember, the error
Have you tried to train without changing the code for the weights ? Hope that may help you out, |
Hey, @francoto thanks for your help!
Didn't you had this problem to when you changed the number of classes and used a pre-trained model? With my change, I get the broadcast error. My dataset has no background class and my 2 categories have the indices 1 and 2 (i also tried 0 and 1 with the same effect). |
Hello @mattifrind, I haven't seem these kind of errors so I can't really help you on this. |
@mattifrind and @francoto I got that error because I tried with a pre-trained model with 81 classes, so to fix this I just use the ImageNet pretrained model in MODEL_ZOO |
Will you solve the problem? I encountered the same problem. @mattifrind Thanks in advance. |
@ZSSNIKE because I need to get my task done I stopped trying to fix that. It works for me with 81 classes as a workaround. Good luck! |
@mattifrind how do you set 81 classes? I mean, only changing NUM_CLASSES to 81 is not enough? right? Do you also need to convert the annotations to contains 81 categories? |
@chenweisomebody126 yes the pre-trained models from Detectron have 81 classes and so the configuration files (.yaml) too. I wrote a Java program to convert my dataset in the COCO format. After the conversion, the program delets 2 classes of the original COCO dataset and adds the two of me. That's how I train. |
@francoto I am getting exactly the same erroras yours.
U have written the steps but I can't understand them clearly. Can u please elaborate on the steps u took ,i.e. : Thanks in advance |
Hello @vsd550, it has been a while I post this and I haven't use Detectron since I got my first results but I will try to explain.
I hope I make my steps clear (or clearer) for you. |
I meet error 1, after check my data, i found that my number_class is right = 150, so in the yaml file ,number_class = 151, but error 1. finaly i found that one of 150 classes is not right, i was added a ' ' , my en is so pool!!! |
Hi! I got a same problem as you when I trained my custom dataset. The box AP is ~0.6, while the mask AP is ~0.5. Did you find the cause for this phenomenon? Look forward to your reply! |
I got the same problem. Did you fix it. How? Would you please tell me? Thanks. |
Hello @maiff, (I'm still using an old version from january 2019) Good luck :) |
@francoto Thank you very much, I have solved it |
@francoto which cloud service you used or do you have gpu on personal computer? |
@vaibhavkumar049 I use my local GPU which is GeFoce GTX 1080. |
Training Detectron on custom dataset
I'm trying to train Mask RCNN on my custom dataset to perform segmentation task on new classes that coco or ImageNet never seen.
The config file I used is based on configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml . My dataset contains only 4 classes without background so I set NUM_CLASSES to 5 ( 4 does not work either). When I try to train using the command bellow :
python2 tools/train_net.py --cfg configs/encov/copy_maskrcnn_R-101-FPN.yaml OUTPUT_DIR /tmp/detectron-output/
ERROR 1:
I get the following error (complete log file is here output.txt)
At: /home/encov/Softwares/Detectron/lib/roi_data/fast_rcnn.py(269): _expand_bbox_targets /home/encov/Softwares/Detectron/lib/roi_data/fast_rcnn.py(181): _sample_rois /home/encov/Softwares/Detectron/lib/roi_data/fast_rcnn.py(112): add_fast_rcnn_blobs /home/encov/Softwares/Detectron/lib/ops/collect_and_distribute_fpn_rpn_proposals.py(62): forward terminate called after throwing an instance of 'caffe2::EnforceNotMet' what(): [enforce fail at pybind_state.h:423] . Exception encountered running PythonOp function: ValueError: could not broadcast input array from shape (4) into shape (0)
This error comes from the expand box procedure that increase the size of bounding box weights by 4 (see roi_data/fast_rcnn.py). It basically takes the first element which represents the class, checks that it is not 0 (the background) and copy weights values at index_class x 4. Error happens because the index is greater than the NUM_CLASSES parameter which has been used to create the output array.
ERROR 2
I try same training except I set NUM_CLASSES to 81 which was the number of classes used for coco training which is working on my set-up by the way.
The error I described above does not appear but in the really early beginning of the the iterations, bounding box areas is null which cause some divisions by zero.
output2.txt
Has someone experienced the same issue for training fast rcnn or mask rcnn on a custom dataset ?
I really suspect an error in my json coco-like file because training on coco dataset in working correctly.
Thank you for your help,
System information
python --version
output: Python 2.7.12The text was updated successfully, but these errors were encountered: