Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding running CutLER on Custom dataset #16

Closed
hetavv opened this issue Mar 18, 2023 · 2 comments
Closed

Regarding running CutLER on Custom dataset #16

hetavv opened this issue Mar 18, 2023 · 2 comments

Comments

@hetavv
Copy link

hetavv commented Mar 18, 2023

Hi Folks, excellent read and amazing work! I've been trying to run the CutLER on my dataset and had some queries regarding running the experiment, but also some clarifications regarding the paper in general. Please let me know if this is not the appropriate medium for the question, I'll send a mail instead. Thanks!

  • When I create a custom dataset as mentioned I believe I'll need to run the following script to register a COCO Format dataset
    from detectron2.data.datasets import register_coco_instances
    register_coco_instances("my_dataset", {}, "json_annotation.json", "path/to/image/dir")

Where do I need to run this code snippet from? Can I just create a jupyter notebook in the CutLER folder and run these snippets? And if I do, I need to provide the annotations file as well, but I'm trying to use the MaskCUT approach discussed to generate the psuedo ground truth, in that case how do I pass the .json file to register the dataset

  • Would it be easier to just use the naming convention of imagenet, and put my domain related images in that folder and train it with imagenet or would that make any difference? Because that approach sounds easier to me rather than registering the custom dataset.
  • In the command to run the merge_jsons.py, the savepath passes --save-path imagenet_train_fixsize480_tau0.15_N3.json however the naming convention of the json file generated by running the maskcut.py is different, so while running the merge_jsons.py are we supposed to pass the imagenet_train_fixsize480_tau0.15_N3.json or the one that was generated after running the maskcut.py
  • While doing self training on the new dataset using the given command
    python train_net.py --num-gpus 8 \ --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml \ --test-dataset imagenet_train \ --eval-only TEST.DETECTIONS_PER_IMAGE 30 \ MODEL.WEIGHTS output/model_final.pth \ # load previous stage/round checkpoints OUTPUT_DIR output/ # path to save model predictions
    Could you please explain a little bit about the parameters
  1. test-dataset: are we supposed to pass the whole train dataset?
  2. MODEL.WEIGHTS: it is output/model_final.pth, is the output folder to be created in the cutler folder?
  3. OUTPUT_DIR: is it the same directory where we are providing the path to the model weights, and

And when we want to get the annotations using the following command
python tools/get_self_training_ann.py \ --new-pred output/inference/coco_instances_results.json \ # load model predictions --prev-ann DETECTRON2_DATASETS/imagenet/annotations/imagenet_train_fixsize480_tau0.15_N3.json \ # path to the old annotation file. --save-path DETECTRON2_DATASETS/imagenet/annotations/cutler_imagenet1k_train_r1.json \ # path to save a new annotation file. --threshold 0.7

Here we are passing the coco_instances_results.json # load model predictions, but are we supposed to pass anything else instead if we are doing custom training on our dataset? If you could elaborate what that file is and will it be generated when we train it?

  • Lastly, lets say after carrying out preliminary experiment on N images I want to run the entire pipeline Cut and Learn, what is the best way to go about this? Repeat in another folder or will the newly created files naming convention handle the different runs?

I have some more theoretical doubts as well, let me know If I add them this to this issue or create a separate issue as well? Thanks and sorry for an extended and (possibly) trivial queries regarding semantics.

@frank-xwang
Copy link
Collaborator

frank-xwang commented Mar 26, 2023

Hi, sorry for the late reply. I'll do my best to answer all of your questions, but please let me know if I miss anything.

  1. Registering a COCO format dataset: Since we're using Detectron2, I recommend checking out the "Use Custom Datasets" tutorial in the Detectron2 documentation for a detailed explanation on how to register custom datasets. You can also follow our approach to registering ImageNet by modifying the "builtin.py" and "builtin_meta.py" files in the "cutler/data/datasets" directory of our GitHub repository.
  2. Would it be easier to just use the naming convention of imagenet? Yes, it is. But I may recommend you register a new dataset.
  3. The command to run the merge_jsons.py. Yes, you should use the one that was generated after running the maskcut.py.
  4. Parameters. 1) The test-dataset should be the entire training set, as we'll be using the model's predictions on the training set as the pseudo-masks for the next stage of self-training. 2) The MODEL.WEIGHTS parameter should point to the checkpoint obtained from the unsupervised model learning stage. 3) OUTPUT_DIR specifies the path where the model predictions will be saved. These predictions will be used as the "ground-truth" for the next stage of self-training. 4) The default name for the model predictions is "coco_instances_results.json", but you can check the files saved under OUTPUT_DIR/inference/ and modify the name accordingly if needed.
  5. Repeat the self-training process multiple times. If you only care about the final results and not the intermediate ones, the easiest approach is to overwrite the results of the previous runs. This means that you should always use the same file name, such as r1.json or r2.json. However, if you want to keep track of the results from each run, you'll need to register the "new" dataset. The images will be the same as before, but the annotations will be updated for each run. For example, you could name the updated datasets "cutler_imagenet1k_train_r3.json" or "cutler_imagenet1k_train_r4.json".

Hope these answers help.
Best,
XuDong

@frank-xwang
Copy link
Collaborator

Closing it now. Please feel free to reopen it if you have further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants