Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model not loading #74

Open
sparshgarg23 opened this issue Dec 15, 2023 · 0 comments
Open

Model not loading #74

sparshgarg23 opened this issue Dec 15, 2023 · 0 comments

Comments

@sparshgarg23
Copy link

When training on colab,I ran into the following issue

/usr/local/lib/python3.10/dist-packages/torch/distributed/launch.py:181: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects `--local-rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
usage: launch.py [-h] [--nnodes NNODES] [--nproc-per-node NPROC_PER_NODE]
                 [--rdzv-backend RDZV_BACKEND] [--rdzv-endpoint RDZV_ENDPOINT] [--rdzv-id RDZV_ID]
                 [--rdzv-conf RDZV_CONF] [--standalone] [--max-restarts MAX_RESTARTS]
                 [--monitor-interval MONITOR_INTERVAL] [--start-method {spawn,fork,forkserver}]
                 [--role ROLE] [-m] [--no-python] [--run-path] [--log-dir LOG_DIR] [-r REDIRECTS]
                 [-t TEE] [--node-rank NODE_RANK] [--master-addr MASTER_ADDR]
                 [--master-port MASTER_PORT] [--local-addr LOCAL_ADDR] [--use-env]
                 training_script ...
launch.py: error: the following arguments are required: training_script, training_script_args
configs/r50_motr_train.sh: 14: --use_env: not found
configs/r50_motr_train.sh: 15: --meta_arch: not found
configs/r50_motr_train.sh: 16: --use_checkpoint: not found
configs/r50_motr_train.sh: 17: --dataset_file: not found
configs/r50_motr_train.sh: 18: --epoch: not found
configs/r50_motr_train.sh: 19: --with_box_refine: not found
configs/r50_motr_train.sh: 20: --lr_drop: not found
configs/r50_motr_train.sh: 21: --lr: not found
configs/r50_motr_train.sh: 22: --lr_backbone: not found
configs/r50_motr_train.sh: 23: --pretrained: not found
configs/r50_motr_train.sh: 24: --output_dir: not found
configs/r50_motr_train.sh: 25: --batch_size: not found
configs/r50_motr_train.sh: 26: --sample_mode: not found
configs/r50_motr_train.sh: 27: --sample_interval: not found
configs/r50_motr_train.sh: 28: --sampler_steps: not found
configs/r50_motr_train.sh: 29: --sampler_lengths: not found
configs/r50_motr_train.sh: 30: --update_query_pos: not found
configs/r50_motr_train.sh: 31: --merger_dropout: not found
configs/r50_motr_train.sh: 32: --dropout: not found
configs/r50_motr_train.sh: 33: --random_drop: not found
configs/r50_motr_train.sh: 34: --fp_ratio: not found
configs/r50_motr_train.sh: 35: --query_interaction_layer: not found
configs/r50_motr_train.sh: 36: --extra_track_attn: not found
configs/r50_motr_train.sh: 37: --data_txt_path_train: not found
configs/r50_motr_train.sh: 38: --data_txt_path_val: not found

It seems your code is configured to be run on distributed setting and is not ideal to be run on colab based environments.Is that correct
also ,for weight training i am currently using detr_r50 pretrained weight(not iterative box refined version as mentioned in
#28
Would choosing that weight cause a problem? If not any suggestions on how I can change the code so that it can be executed on a single gpu.[Note I understand that training larger models like this on single gpu is not efficient].
thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant