Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue length modification with the use of DDP #1127

Merged
merged 7 commits into from
Nov 24, 2023
Merged

Conversation

haughty-yeon
Copy link
Contributor

@haughty-yeon haughty-yeon commented Nov 23, 2023

num_subjects()
iterations_per_epoch()
modified

Fixes #1125.

Description

Checklist

  • I have read the CONTRIBUTING docs and have a developer setup (especially important are pre-commitand pytest)
  • Non-breaking change (would not break existing functionality)
  • Breaking change (would cause existing functionality to change)
  • Tests added or modified to cover the changes
  • Integration tests passed locally by running pytest
  • In-line docstrings updated
  • Documentation updated, tested running make html inside the docs/ folder
  • This pull request is ready to be reviewed

haughty-yeon and others added 2 commits November 22, 2023 17:03
num_subjects()
iterations_per_epoch()
modified
haughty-yeon

This comment was marked as outdated.

@fepegar
Copy link
Owner

fepegar commented Nov 24, 2023

This makes total sense. I've added some minor readability changes and tested the new implementation as follows:

import os

import torch
import torch.distributed as dist
import torchio as tio
from loguru import logger


num_subjects = 6
samples_per_volume = 2
max_length = 1000

subjects = []
tensor = torch.ones(1, 16, 16, 16)
for i in range(num_subjects):
    subject = tio.Subject(
        image=tio.ScalarImage(tensor=i * tensor),
        id=i,
    )
    subjects.append(subject)
dataset = tio.SubjectsDataset(subjects)

is_distributed = bool(os.environ.get('WORLD_SIZE'))
if is_distributed:
    dist.init_process_group()
    subject_sampler = torch.utils.data.distributed.DistributedSampler(
        dataset,
        shuffle=False,
    )
    rank = dist.get_rank()
else:
    subject_sampler = None
    rank = 0

patch_sampler = tio.sampler.UniformSampler(patch_size=2)

queue = tio.Queue(
    dataset,
    max_length,
    sampler=patch_sampler,
    samples_per_volume=samples_per_volume,
    num_workers=0,
    shuffle_subjects=False,
    shuffle_patches=False,
    subject_sampler=subject_sampler,
)

loader = torch.utils.data.DataLoader(
    queue,
    batch_size=1,
    num_workers=0,
    shuffle=False,
    collate_fn=lambda x: x[0],
)

for i, patch in enumerate(loader):
    logger.info(f'Rank {rank} | Batch {i} | Subject {patch["id"]}')

Run with

torchrun --nproc_per_node=3 /tmp/ddp.py

Output:

2023-11-23 16:19:14.933 | INFO     | __main__:<module>:57 - Rank 1 | Batch 0 | Subject 1
2023-11-23 16:19:14.933 | INFO     | __main__:<module>:57 - Rank 1 | Batch 1 | Subject 1
2023-11-23 16:19:14.933 | INFO     | __main__:<module>:57 - Rank 1 | Batch 2 | Subject 4
2023-11-23 16:19:14.933 | INFO     | __main__:<module>:57 - Rank 1 | Batch 3 | Subject 4
2023-11-23 16:19:14.935 | INFO     | __main__:<module>:57 - Rank 0 | Batch 0 | Subject 0
2023-11-23 16:19:14.935 | INFO     | __main__:<module>:57 - Rank 0 | Batch 1 | Subject 0
2023-11-23 16:19:14.935 | INFO     | __main__:<module>:57 - Rank 0 | Batch 2 | Subject 3
2023-11-23 16:19:14.935 | INFO     | __main__:<module>:57 - Rank 0 | Batch 3 | Subject 3
2023-11-23 16:19:14.947 | INFO     | __main__:<module>:57 - Rank 2 | Batch 0 | Subject 2
2023-11-23 16:19:14.947 | INFO     | __main__:<module>:57 - Rank 2 | Batch 1 | Subject 2
2023-11-23 16:19:14.947 | INFO     | __main__:<module>:57 - Rank 2 | Batch 2 | Subject 5
2023-11-23 16:19:14.947 | INFO     | __main__:<module>:57 - Rank 2 | Batch 3 | Subject 5

@haughty-yeon haughty-yeon reopened this Nov 24, 2023
@fepegar fepegar merged commit befd121 into fepegar:main Nov 24, 2023
23 of 24 checks passed
@fepegar
Copy link
Owner

fepegar commented Nov 24, 2023

Thanks for your contribution, @haughty-yeon!

@allcontributors please add @haughty-yeon for bug

Copy link
Contributor

@fepegar

I couldn't determine any contributions to add, did you specify any contributions?
Please make sure to use valid contribution names.

I've put up a pull request to add @haughty-yeon! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Halve queue length when using DDP
2 participants