Training reproducible with PyTorch but not with PyTorch + PyTorch3D #659

abhi1kumar · 2021-04-27T18:30:30Z

❓ How to ensure reproducibility of training with PyTorch3D

I am trying to reproduce the training with PyTorch + PyTorch3D. When I only use PyTorch and do not use PyTorch3D, my entire training is reproducible. In other words, when I execute my training script, the errors and the logs match. However, when I introduce PyTorch3D based rendering in training, the training becomes irreproducible.

Libraries and their versions -

PyTorch3D 0.4.0
PyTorch 1.5.1
Torchvision 0.6.1
Cuda 10.1

Code to seed out the training

def init_torch(rng_seed, cuda_seed):
    """
    Initializes the seeds for ALL potential randomness, including numpy, random and  torch.

    Args:
        rng_seed (int): the shared random seed to use for numpy and random
        cuda_seed (int): the random seed to use for pytorch's torch.cuda.manual_seed_all function
    """
    np.random.seed(rng_seed)
    random.seed(rng_seed)
    os.environ['PYTHONHASHSEED'] = str(rng_seed)
    
    torch.manual_seed(rng_seed)
    torch.cuda.manual_seed(cuda_seed)
    torch.cuda.manual_seed_all(cuda_seed)

    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

I also looked if I am missing something in the PyTorch 1.5.1 reproducibility documentation but could not find anything else.

The latest PyTorch reproducibility documentation says that
Furthermore, if you are using CUDA tensors, and your CUDA version is 10.2 or greater, you should set the environment variable CUBLAS_WORKSPACE_CONFIG according to CUDA documentation
Since I am using Cuda 10.1, so I assume this problem should not arise.

It would be great if you could tell how do we remove randomness while using PyTorch3D in order to fully reproduce the training.

bottler · 2021-04-27T19:03:42Z

Which parts of PyTorch3D are you using?

abhi1kumar · 2021-04-27T19:19:16Z

@bottler Thank you for your reply. I am using MeshRasterizer with the following settings.

class MeshRendererWithDepth(nn.Module):
    def __init__(self, rasterizer):
        super().__init__()
        self.rasterizer = rasterizer

    def forward(self, meshes_world, **kwargs) -> torch.Tensor:
        fragments = self.rasterizer(meshes_world, **kwargs)
        return fragments.zbuf

raster_settings = RasterizationSettings(
                        image_size= raster_image_size,
                        blur_radius= 0,
                        faces_per_pixel= 2,
                        perspective_correct=False,
                        cull_backfaces= True,
                        max_faces_per_bin= 320
                    )
renderer = MeshRendererWithDepth(
                        rasterizer=MeshRasterizer(
                            cameras=cameras,
                            raster_settings=raster_settings
                        )
                    )

depth_maps = renderer(meshes_world= mesh, R=R_camera, T= T_camera)

bottler · 2021-04-28T21:30:45Z

Separate CUDA threads deal with separate faces. If there are two or more faces which have exactly the same distance to a certain pixel, then the order in which they appear in the output for that pixel is not determined. Further, if the nearest faces_per_pixel faces needs to include one but not all of these equally-distant faces, then it is not determined which of them will be included in the output. In many applications, these exact ties should be rare.

It should be possible to change PyTorch3D to remove this non-determinism, e.g. by making a lower-indexed equally-distant face count as "closer".

abhi1kumar · 2021-04-28T22:13:36Z

@bottler Thank you for your reply.

One option is that I change the faces_per_pixel=1 instead of 2. However, I am not sure if I can still get useful gradients.
The other option is to always consider a lower-indexed face as the closest one among all equally distant faces .

Can we ensure lower-indexed equally-distant face as the closer face through a RasterizationSettings option? In case your answer is yes, would you mind telling me which option does this? Or do we need to re-compile PyTorch3D and change its internals?

bottler · 2021-04-28T23:12:55Z

Or do we need to re-compile PyTorch3D and change its internals?

Yes. This would be a code change in a couple of places in /pytorch3d/csrc/rasterize_meshes/rasterize_meshes.cu.

bottler · 2021-04-28T23:16:18Z

You might know more about your specific meshes. But in general, setting faces_per_pixel=1 doesn't solve the problem. You may be able to increase faces_per_pixel to more than you need, and then sort the rasterization output to resolve ties, and then truncate it to what you need.

abhi1kumar · 2021-04-29T18:42:04Z

Or do we need to re-compile PyTorch3D and change its internals?

Yes. This would be a code change in a couple of places in /pytorch3d/csrc/rasterize_meshes/rasterize_meshes.cu.

Will determinism be added as a PyTorch3D feature in the future? In other words, is reproducibility in your TODO list? In my opinion, reproducibility in Rasterization is an important feature to add to PyTorch3D. This addition easily reproduces the training.

You may be able to increase faces_per_pixel to more than you need, and then sort the rasterization output to resolve ties, and then truncate it to what you need.

Could you elaborate more on this? A code snippet explaining the same would be great.

github-actions · 2021-06-23T05:32:01Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2021-06-29T05:31:41Z

This issue was closed because it has been stalled for 5 days with no activity.

github-actions · 2021-07-30T05:31:28Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2021-08-30T05:31:32Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

bottler added the question Further information is requested label Apr 27, 2021

nikhilaravi assigned bottler Apr 28, 2021

github-actions bot added the Stale label Jun 23, 2021

github-actions bot closed this as completed Jun 29, 2021

bottler removed the Stale label Jun 29, 2021

bottler reopened this Jun 29, 2021

github-actions bot added the Stale label Jul 30, 2021

bottler added do-not-reap Do not delete this pull request or issue due to inactivity. and removed Stale labels Jul 30, 2021

github-actions bot added the Stale label Aug 30, 2021

patricklabatut removed the Stale label Aug 30, 2021

facebook-github-bot closed this as completed in 860b742 Sep 23, 2021

nh236 mentioned this issue Apr 24, 2023

Non-deterministic behaviour of MeshRasterizer #1519

Open

reginehartwig mentioned this issue Sep 25, 2023

Training Reproducibility monniert/unicorn#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training reproducible with PyTorch but not with PyTorch + PyTorch3D #659

Training reproducible with PyTorch but not with PyTorch + PyTorch3D #659

abhi1kumar commented Apr 27, 2021 •

edited

Loading

bottler commented Apr 27, 2021

abhi1kumar commented Apr 27, 2021

bottler commented Apr 28, 2021

abhi1kumar commented Apr 28, 2021

bottler commented Apr 28, 2021

bottler commented Apr 28, 2021

abhi1kumar commented Apr 29, 2021

github-actions bot commented Jun 23, 2021

github-actions bot commented Jun 29, 2021

github-actions bot commented Jul 30, 2021

github-actions bot commented Aug 30, 2021

Training reproducible with PyTorch but not with PyTorch + PyTorch3D #659

Training reproducible with PyTorch but not with PyTorch + PyTorch3D #659

Comments

abhi1kumar commented Apr 27, 2021 • edited Loading

❓ How to ensure reproducibility of training with PyTorch3D

bottler commented Apr 27, 2021

abhi1kumar commented Apr 27, 2021

bottler commented Apr 28, 2021

abhi1kumar commented Apr 28, 2021

bottler commented Apr 28, 2021

bottler commented Apr 28, 2021

abhi1kumar commented Apr 29, 2021

github-actions bot commented Jun 23, 2021

github-actions bot commented Jun 29, 2021

github-actions bot commented Jul 30, 2021

github-actions bot commented Aug 30, 2021

abhi1kumar commented Apr 27, 2021 •

edited

Loading