Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wandb: Waiting for W&B process to finish... (failed -1) #47

Open
jmulsy opened this issue Sep 25, 2023 · 2 comments
Open

wandb: Waiting for W&B process to finish... (failed -1) #47

jmulsy opened this issue Sep 25, 2023 · 2 comments

Comments

@jmulsy
Copy link

jmulsy commented Sep 25, 2023

Hello, first of all, thank you for your outstanding work.
Then, I have a problem and need your help. When I train the model, using Wandb, whether online or offline, this problem always occurs:
wandb: Waiting for W&B process to finish... (failed -1)
I don't know what caused this problem. Could you give some suggestions?

@heiwang1997
Copy link
Collaborator

Thanks for your interest in our work. Could you please paste a more complete log of the code run?

@jmulsy
Copy link
Author

jmulsy commented Sep 29, 2023

Thank you for your reply. Here are my questions:

Firstly, To use Wandb offline, I wrote this code in train.py:

image-20230929192209125

When I input the following code for training:

python train.py configs/shapenet/train_3k_noise.yaml

the terminal showed:

09-29 18:58:24 (train.py:72) [INFO] Intelligent GPU selection: 0 
wandb: WARNING `resume` will be ignored since W&B syncing is set to `offline`. Starting a new run with run id 6zddnqd7.
wandb: Tracking run with wandb version 0.15.10
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
Global seed set to 0
/home/lisy/anaconda3/envs/nksr/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:447: LightningDeprecationWarning: Setting `Trainer(gpus=1)` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=1)` instead.
  rank_zero_deprecation(
Auto select gpus: [0]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
 >>>> ======= MODEL HYPER-PARAMETERS ======= <<<< 
exec: null
include: null
visualize: false
test_set_shuffle: false
...
...
...
...
...
...
  random_seed: fixed
_shapenet_transforms:
- name: PointcloudNoise
  args:
    stddev: 0.005
- name: SubsamplePointcloud
  args:
    'N': 3000

 >>>> ====================================== <<<< 
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
09-29 18:58:38 (train.py:316) [INFO] 

Wandb Run of nkfw-shapenet/6zddnqd7 (with name noise_3k/0929-big-vehicle) marked to be cleared.

 
wandb: Waiting for W&B process to finish... (failed -1).
wandb: You can sync this run to the cloud by running:
wandb: wandb sync /home/lisy/NKSR/wandb/offline-run-20230929_185825-6zddnqd7
wandb: Find logs at: ./wandb/offline-run-20230929_185825-6zddnqd7/logs

Then, I used 'wandb sync', and the terminal showed that:

Find logs at: /home/lisy/NKSR/wandb/debug-cli.lisy.log
Syncing: https://wandb.ai/lisy0408/nkfw-shapenet/runs/6zddnqd7 ... done.

Lastly, I went to the link of Wandb and the result was as follows:

image-20230929191018653

When cancel the 'offline', the result of failure also appears:

- name: PointcloudNoise
  args:
    stddev: 0.005
- name: SubsamplePointcloud
  args:
    'N': 3000

 >>>> ====================================== <<<< 
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
09-29 19:12:25 (train.py:316) [INFO] 

Wandb Run of lisy0408/nkfw-shapenet/mud4kkdq (with name noise_3k/0929-grave-angle) marked to be cleared.

 
wandb: Waiting for W&B process to finish... (failed -1). Press Control-C to abort syncing.
wandb: 🚀 View run noise_3k/0929-grave-angle at: https://wandb.ai/lisy0408/nkfw-shapenet/runs/mud4kkdq
wandb: ️⚡ View job at https://wandb.ai/lisy0408/nkfw-shapenet/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjEwMDY5MTM5OA==/version_details/v3
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230929_191209-mud4kkdq/logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants