Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training problem #41

Closed
MohammadVaziriCh opened this issue Jul 13, 2023 · 4 comments
Closed

Training problem #41

MohammadVaziriCh opened this issue Jul 13, 2023 · 4 comments

Comments

@MohammadVaziriCh
Copy link

When I want to train HQ_SAM by instruction,I get the error "HQ-SAM: error: unrecognized arguments: --local-rank=0
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 1468) of binary: /usr/bin/python3" in colab.how to resolve it?

@ymq2017
Copy link
Collaborator

ymq2017 commented Jul 18, 2023

Hi, this issue might help you. It looks like a PyTorch version problem. You could use torchrun or install a lower version of torch.

@vishakhalall
Copy link

I fixed this issue by finding the correct compatible version of pytorch, found the CUDA version using nvcc --version and found the compatible version of PyTorch https://pytorch.org/get-started/previous-versions/

@MohammadVaziriCh
Copy link
Author

I solved it by using torchrun instead of distributed

@MohammadVaziriCh
Copy link
Author

I solved it by using torchrun instead of distributed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants