Regarding running CutLER on Custom dataset #16

hetavv · 2023-03-18T14:31:27Z

Hi Folks, excellent read and amazing work! I've been trying to run the CutLER on my dataset and had some queries regarding running the experiment, but also some clarifications regarding the paper in general. Please let me know if this is not the appropriate medium for the question, I'll send a mail instead. Thanks!

When I create a custom dataset as mentioned I believe I'll need to run the following script to register a COCO Format dataset
from detectron2.data.datasets import register_coco_instances
register_coco_instances("my_dataset", {}, "json_annotation.json", "path/to/image/dir")

Where do I need to run this code snippet from? Can I just create a jupyter notebook in the CutLER folder and run these snippets? And if I do, I need to provide the annotations file as well, but I'm trying to use the MaskCUT approach discussed to generate the psuedo ground truth, in that case how do I pass the .json file to register the dataset

Would it be easier to just use the naming convention of imagenet, and put my domain related images in that folder and train it with imagenet or would that make any difference? Because that approach sounds easier to me rather than registering the custom dataset.
In the command to run the merge_jsons.py, the savepath passes --save-path imagenet_train_fixsize480_tau0.15_N3.json however the naming convention of the json file generated by running the maskcut.py is different, so while running the merge_jsons.py are we supposed to pass the imagenet_train_fixsize480_tau0.15_N3.json or the one that was generated after running the maskcut.py
While doing self training on the new dataset using the given command
python train_net.py --num-gpus 8 \ --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml \ --test-dataset imagenet_train \ --eval-only TEST.DETECTIONS_PER_IMAGE 30 \ MODEL.WEIGHTS output/model_final.pth \ # load previous stage/round checkpoints OUTPUT_DIR output/ # path to save model predictions
Could you please explain a little bit about the parameters

test-dataset: are we supposed to pass the whole train dataset?
MODEL.WEIGHTS: it is output/model_final.pth, is the output folder to be created in the cutler folder?
OUTPUT_DIR: is it the same directory where we are providing the path to the model weights, and

And when we want to get the annotations using the following command
python tools/get_self_training_ann.py \ --new-pred output/inference/coco_instances_results.json \ # load model predictions --prev-ann DETECTRON2_DATASETS/imagenet/annotations/imagenet_train_fixsize480_tau0.15_N3.json \ # path to the old annotation file. --save-path DETECTRON2_DATASETS/imagenet/annotations/cutler_imagenet1k_train_r1.json \ # path to save a new annotation file. --threshold 0.7

Here we are passing the coco_instances_results.json # load model predictions, but are we supposed to pass anything else instead if we are doing custom training on our dataset? If you could elaborate what that file is and will it be generated when we train it?

Lastly, lets say after carrying out preliminary experiment on N images I want to run the entire pipeline Cut and Learn, what is the best way to go about this? Repeat in another folder or will the newly created files naming convention handle the different runs?

I have some more theoretical doubts as well, let me know If I add them this to this issue or create a separate issue as well? Thanks and sorry for an extended and (possibly) trivial queries regarding semantics.

The text was updated successfully, but these errors were encountered:

frank-xwang · 2023-03-26T06:03:10Z

Hi, sorry for the late reply. I'll do my best to answer all of your questions, but please let me know if I miss anything.

Registering a COCO format dataset: Since we're using Detectron2, I recommend checking out the "Use Custom Datasets" tutorial in the Detectron2 documentation for a detailed explanation on how to register custom datasets. You can also follow our approach to registering ImageNet by modifying the "builtin.py" and "builtin_meta.py" files in the "cutler/data/datasets" directory of our GitHub repository.
Would it be easier to just use the naming convention of imagenet? Yes, it is. But I may recommend you register a new dataset.
The command to run the merge_jsons.py. Yes, you should use the one that was generated after running the maskcut.py.
Parameters. 1) The test-dataset should be the entire training set, as we'll be using the model's predictions on the training set as the pseudo-masks for the next stage of self-training. 2) The MODEL.WEIGHTS parameter should point to the checkpoint obtained from the unsupervised model learning stage. 3) OUTPUT_DIR specifies the path where the model predictions will be saved. These predictions will be used as the "ground-truth" for the next stage of self-training. 4) The default name for the model predictions is "coco_instances_results.json", but you can check the files saved under OUTPUT_DIR/inference/ and modify the name accordingly if needed.
Repeat the self-training process multiple times. If you only care about the final results and not the intermediate ones, the easiest approach is to overwrite the results of the previous runs. This means that you should always use the same file name, such as r1.json or r2.json. However, if you want to keep track of the results from each run, you'll need to register the "new" dataset. The images will be the same as before, but the annotations will be updated for each run. For example, you could name the updated datasets "cutler_imagenet1k_train_r3.json" or "cutler_imagenet1k_train_r4.json".

Hope these answers help.
Best,
XuDong

frank-xwang · 2023-03-28T16:28:10Z

Closing it now. Please feel free to reopen it if you have further questions.

frank-xwang closed this as completed Mar 28, 2023

This was referenced Apr 3, 2023

Supervised/unsupervised custom dataset #22

Closed

Customer coco dataset in self-training #23

Closed

alexaatm mentioned this issue Jul 18, 2023

Correct steps for self-training (custom dataset w/o annotations) #37

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding running CutLER on Custom dataset #16

Regarding running CutLER on Custom dataset #16

hetavv commented Mar 18, 2023

frank-xwang commented Mar 26, 2023 •

edited

Loading

frank-xwang commented Mar 28, 2023

Regarding running CutLER on Custom dataset #16

Regarding running CutLER on Custom dataset #16

Comments

hetavv commented Mar 18, 2023

frank-xwang commented Mar 26, 2023 • edited Loading

frank-xwang commented Mar 28, 2023

frank-xwang commented Mar 26, 2023 •

edited

Loading