Optim-wip: Fix failing tests, model download link, & more #656

ProGamerGov · 2021-04-26T17:58:42Z

Quick fix for the model file path as it's breaking all the notebooks & tests.
It also looks like black was updated and now is reporting an issue with captum/attr/_core/feature_ablation.py & captum/metrics/_core/infidelity.py, so I've resolved that as well. The files in the master branch may be a bit different, but we need to at least keep the tests passing for the other areas of Captum until we merge into master.
Added new ImageTensor tests.
Added detach() to InputOptimization to fix out of memory crashes.
Made sure that SkipLayer can handle any addition arguments to it's init and forward functions. This adds to the usefulness of the layer and improves it beyond just copying nn.Identity.
Fixed various bugs.

NarineK · 2021-04-29T05:36:48Z

Thank you for the fixes, @ProGamerGov ! It doesn't seem to fail on master. Are those python files different on optim-wip branch ?

ProGamerGov · 2021-04-29T13:27:59Z

@NarineK I think the files are slightly different, but @vivekmig also changed the same files in the most recent PR for the master branch: 5cf38cb. When we merge with the master branch we can updated everything, but for the moment updating the files will ensure that the tests keep passing.

ProGamerGov · 2021-04-29T19:56:55Z

Looks like lint_test_py36_pip and lint_test_py37_conda tests are now failing due to this error:

    @staticmethod
    def __new__(
        cls: Type["ImageTensor"],
        x: Union[List, np.ndarray, torch.Tensor] = [],
        *args,
        **kwargs,
    ) -> torch.Tensor:
        if isinstance(x, torch.Tensor) and x.is_cuda:
            x.show = MethodType(cls.show, x)
            x.export = MethodType(cls.export, x)
            return x
        else:
>           return super().__new__(cls, x, *args, **kwargs)
E           TypeError: object.__new__(ImageTensor) is not safe, use Tensor.__new__()

I think the error could be a Python bug or something as I haven't seen it until today.

ProGamerGov · 2021-05-01T22:00:31Z

I found the source of the TypeError when subclassing torch.Tensor! it's a bug with the nightly PyTorch builds. The lint_test_py36_pip test uses torch-1.9.0.dev20210501+cpu and the lint_test_py37_conda test uses torch==1.9.0.dev20210501 , and thus they both fail on tests involving ImageTensor.

I made an issue post on the PyTorch repo here: pytorch/pytorch#57421

…7421

* Remove `ImageTensor` test skips as the `torch.Tensor`'s `__new__` function has been fixed. * Add tests for `ImageTensor` functions. * Removed old `AlphaChannelLoss` code.

ProGamerGov · 2021-05-06T15:26:55Z

The issue with ImageTensor's __new__ function is now resolved in the latest version of the PyTorch nightly build!

ProGamerGov · 2021-05-11T21:17:07Z

I have discovered a new issue with the code:

Running these two lines of code:

image_param = opt.images.NaturalImage(init=torch.ones(3, 1, 1)).cuda()
image_param = opt.images.NaturalImage(init=torch.ones(3, 1, 1)).cuda()

Results in this error:

RuntimeError                              Traceback (most recent call last)

<ipython-input-3-f944a33c48db> in <module>()
      1 image_param = opt.images.NaturalImage(init=torch.ones(3, 1, 1)).cuda()
----> 2 image_param = opt.images.NaturalImage(init=torch.ones(3, 1, 1)).cuda()

2 frames

/content/captum/captum/optim/_param/image/images.py in __init__(self, size, channels, batch, init, parameterization, squash_func, decorrelation_module, decorrelate_init)
    439                     else init.refine_names("C", "H", "W")
    440                 )
--> 441                 init = self.decorrelate(init, inverse=True).rename(None)
    442             if squash_func is None:
    443 

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

/content/captum/captum/optim/_param/image/transforms.py in forward(self, x, inverse)
    108         flat = x.flatten(("H", "W"), "spatials")
    109         if inverse:
--> 110             correct = torch.inverse(self.transform) @ flat
    111         else:
    112             correct = self.transform @ flat

RuntimeError: Tensor for argument #3 'mat2' is on CPU, but expected it to be on GPU (while checking arguments for addmm)

If I add a line to set the init variable to the device that the color decorrelation transform is on, then I get the following error:

RuntimeError                              Traceback (most recent call last)

<ipython-input-3-f944a33c48db> in <module>()
      1 image_param = opt.images.NaturalImage(init=torch.ones(3, 1, 1)).cuda()
----> 2 image_param = opt.images.NaturalImage(init=torch.ones(3, 1, 1)).cuda()

1 frames

/content/captum/captum/optim/_param/image/images.py in __init__(self, size, channels, batch, init, parameterization, squash_func, decorrelation_module, decorrelate_init)
    452         self.squash_func = squash_func
    453         self.parameterization = parameterization(
--> 454             size=size, channels=channels, batch=batch, init=init
    455         )
    456 

/content/captum/captum/optim/_param/image/images.py in __init__(self, size, channels, batch, init)
    132             fourier_coeffs = random_coeffs / 50
    133         else:
--> 134             fourier_coeffs = self.torch_rfft(init) / spectrum_scale
    135 
    136         self.fourier_coeffs = nn.Parameter(fourier_coeffs)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

We don't have to solve this issue in this PR though, as it doesn't effect any of the current tutorials.

Edit:

I have resolved the issue! Though my fix makes it harder to disable color decorrelation.

* Fix `NaturalImage` device bug. * Set `decorrelate_init` default to `False`. * Fix `NaturalImage` size type.

…tests

ProGamerGov · 2021-05-16T22:35:43Z

So, currently NaturalImage is setup like this:

class NaturalImage(ImageParameterization):
    def __init__(
        self,
        size: Tuple[int, int] = (224, 224),
        channels: int = 3,
        batch: int = 1,
        init: Optional[torch.Tensor] = None,
        parameterization: ImageParameterization = FFTImage,
        squash_func: Optional[Callable[[torch.Tensor], torch.Tensor]] = None,
        decorrelation_module: Optional[nn.Module] = None,
        decorrelate_init: bool = True,
    ) -> None:
        super().__init__()
        self.decorrelate = decorrelation_module or ToRGB(transform="klt")
        if init is not None:
            assert init.dim() == 3 or init.dim() == 4
            if decorrelate_init:
                assert self.decorrelate is not None
                init = (
                    init.refine_names("B", "C", "H", "W")
                    if init.dim() == 4
                    else init.refine_names("C", "H", "W")
                )
                init = self.decorrelate(init, inverse=True).rename(None)
            if squash_func is None:

                def squash_func(x: torch.Tensor) -> torch.Tensor:
                    return x.clamp(0, 1)

        else:
            if squash_func is None:

                squash_func = torch.sigmoid

        self.squash_func = squash_func
        self.parameterization = parameterization(
            size=size, channels=channels, batch=batch, init=init
        )

    def forward(self) -> torch.Tensor:
        image = self.parameterization()
        if self.decorrelate is not None:
            image = self.decorrelate(image)
        image = image.rename(None)  # TODO: the world is not yet ready
        return ImageTensor(self.squash_func(image))

If you want to disable color decorrelation / recorrelation, then you either need to set NaturalImage's decorrelate variable to None or you need to set decorrelation_module to be an empty module. I think that there may be a more elegant solution to turning the color decorrelation on and off, but I haven't figured anything out yet.

We can resolve this specific issue in a future PR.

…ents

It now fits with the `Optional` type hint that it was given.

* Fix issue where the final value in a list was not selectable. * Fix error when lists have a size of 1.

ProGamerGov · 2021-05-22T00:27:12Z

Well I have no idea what happened here:

self = <tests.optim.core.test_optimization.TestInputOptimization testMethod=test_input_optimization>

    def test_input_optimization(self) -> None:
        if torch.__version__ <= "1.2.0":
            raise unittest.SkipTest(
                "Skipping InputOptimization test due to insufficient Torch version."
            )
        model = BasicModel_ConvNet_Optim()
        loss_fn = opt.loss.ChannelActivation(model.layer, 0)
        obj = opt.InputOptimization(model, loss_function=loss_fn)
        n_steps = 5
        history = obj.optimize(opt.optimization.n_steps(n_steps, show_progress=False))
>       self.assertTrue(history[0] > history[-1])
E       AssertionError: ImageTensor(False) is not true

My fix for _rand_select is completely unrelated, so I don't understand why this error showed up.

Edit: I can't reproduce the error at all, so maybe it's an issue with CircleCI?

Second Edit:

I think the test was failing because the previous parameters it was used were tuned for the old _rand_select which never selected the final list value. Specifically the default random scale list of (1, 0.975, 1.025, 0.95, 1.05) would have never resulted in 1.05 being chosen until I fixed _rand_select.

I've fixed the issue now!

* Tests showed that using only 5 iterations was no longer sufficient to ensure the final loss values were less than the first loss values.

ProGamerGov · 2021-05-24T15:15:46Z

@NarineK The lines I altered in feature_ablation.py and infidelity.py are now the same as in the master branch, so there won't be any merging conflicts caused by those changes.

ProGamerGov · 2021-05-24T22:11:05Z

I looked into the FFTImage size thing, and there is a clear deviation in Ludwig's first Captum iteration of FFTImage and the Lucid code.

The Captum code create the image with the shape given from the size variable, and then this same shape is used to create the scale / frequency tensor:

        coeffs_shape = (channels, size[0], size[1] // 2 + 1, 2)
        random_coeffs = torch.randn(
            coeffs_shape
        )  # names=["C", "H_f", "W_f", "complex"]
        self.fourier_coeffs = nn.Parameter(random_coeffs / 50)

        frequencies = FFTImage.rfft2d_freqs(*size)

The second last dimension in coeffs_shape is always 2 + 1 and never 2 + 2, so making the rfft2d_freqs always use 2 + 1 prevents a size mismatch.

https://github.com/pytorch/captum/pull/412/files#diff-d9ef468be9704729ff7c3bd65ad5b115e206b5f418039e18cb333e123428dde5R160-R166

While Lucid derives the shape of the image from the shape of the frequency tensor (that the if statement changes with an odd width value):

    batch, h, w, ch = shape
    freqs = rfft2d_freqs(h, w)
    init_val_size = (2, batch, ch) + freqs.shape

https://github.com/tensorflow/lucid/blob/master/lucid/optvis/param/spatial.py#L66-L67

Therefore, I think the special behavior for widths that have an odd value is required for the Lucid way of doing things and not the new Captum FFTImage. I think that Ludwig may have just forgotten to update rfft2d_freqs to match his changes in the init function.

NarineK · 2021-05-25T19:33:02Z

#3 'mat2' is

Interesting, according to code snippet it looks like it worked first time calling opt.images.NaturalImage(init=torch.ones(3, 1, 1)).cuda() but failed second time ?

The issue was that by creating the ToRGB instance in the init function, and it creates a single instance that will be passed to future constructor calls as default. It's apparently a well-known trap when using default parameters:
# Example code
import torch
import torch.nn as nn

class Test1(nn.Module):
    def __init__(self):
        super().__init__()
        self.register_buffer("test_tensor", torch.ones(5))

    def forward(self, x):
        return x * self.test_tensor

class Test2(nn.Module):
    def __init__(self, x, test1 = Test1()):
        super().__init__()
        self.test1 = test1
        x = self.test1(x)

    def forward(self, x):
        return self.test1(x)

t = Test2(torch.ones(5)).cuda()
t = Test2(torch.ones(5)).cuda() # Fails
t = Test2(torch.ones(5), Test1()).cuda()
t = Test2(torch.ones(5), Test1()).cuda() # Works
code that Lucid uses

According to lucid implementation it looks like we need to add 2 for odd cases and later remove that one additional pixel. What was the original problem that you saw with line: wadd = 2 if width % 2 == 1 else 1 ? What was failing ?

The original issue was that fourier_coeffs and spectrum_scale would have a size mistmatch if the width given to FFTImage either through the size argument, or the init tensor was an odd number.

This code here would result in the following error:
image = opt.images.FFTImage((512, 405))
out = image()
RuntimeError                              Traceback (most recent call last)

<ipython-input-3-358d0813d735> in <module>()
      1 image = opt.images.FFTImage((512, 405))
----> 2 out = image()

1 frames

/content/captum/captum/optim/_param/image/images.py in forward(self)
    183     def forward(self) -> torch.Tensor:
    184         h, w = self.size
--> 185         scaled_spectrum = self.fourier_coeffs * self.spectrum_scale
    186         output = self.torch_irfft(scaled_spectrum)
    187         return output.refine_names("B", "C", "H", "W")

RuntimeError: The size of tensor a (203) must match the size of tensor b (204) at non-singleton dimension 3
And the same issue occurred when the init tensor size had an odd width:
image = opt.images.FFTImage(init=torch.ones(1, 3, 512, 405))
out = image()
RuntimeError                              Traceback (most recent call last)

<ipython-input-5-48fc824a1f55> in <module>()
----> 1 image = opt.images.FFTImage(init=torch.ones(1, 3, 512, 405))
      2 out = image()

/content/captum/captum/optim/_param/image/images.py in __init__(self, size, channels, batch, init)
    132             fourier_coeffs = random_coeffs / 50
    133         else:
--> 134             fourier_coeffs = self.torch_rfft(init) / spectrum_scale
    135 
    136         self.fourier_coeffs = nn.Parameter(fourier_coeffs)

RuntimeError: The size of tensor a (203) must match the size of tensor b (204) at non-singleton dimension 3
The Lucid code also talks about slicing the image tensor to resolve the issue, but in our case spectrum_scale is the bigger tensor and I'm not sure that we should be slicing it.

Thank you for the explanation, @ProGamerGov! Is there a specific reason why we chose to use buffer instead of instance field?If we use it as an instance field it shouldn't cause any problems. I meant in ToRGB as well.

# Example code
import torch
import torch.nn as nn

class Test1(nn.Module):
    def __init__(self):
        super().__init__()
        self.test_tensor = torch.ones(5)
        print(self._buffers)

    def forward(self, x):
        return x * self.test_tensor

class Test2(nn.Module):
    def __init__(self, x, test1 = Test1()):
        super().__init__()
        self.test1 = test1
        x = self.test1(x)

    def forward(self, x):
        return self.test1(x)

t = Test2(torch.ones(5)).cuda()
t = Test2(torch.ones(5)).cuda() # Works

NarineK · 2021-05-25T19:36:05Z

@NarineK The lines I altered in feature_ablation.py and infidelity.py are now the same as in the master branch, so there won't be any merging conflicts caused by those changes.

If they are exactly the same then there won't be merge conflicts but we will have same changes in different commits. I think it is fine for now since those are minor changes but it is better to sync up with master for larger changes.

ProGamerGov · 2021-05-25T19:57:31Z

@NarineK Ludwig's original Captum code had both FFTImage and ToRGB using buffers. I'm not sure why he chose to do it that way, but I can change it if you think that they should be set as self variables?

If they are exactly the same then there won't be merge conflicts but we will have same changes in different commits. I think it is fine for now since those are minor changes but it is better to sync up with master for larger changes.

Yeah, we can sync up with the master branch in separate PR. I was just trying to avoid it in this PR so that the commit history wasn't filled with all the master commits, like what happened with SK's PR.

ProGamerGov · 2021-05-25T20:22:07Z

Oh, I think I understand why buffers were used now. They make it so that when a NaturalImage instance is placed onto the GPU, the buffers are also placed on the GPU.

This code works when spectrum_scale (FFTImage) and transform (ToRGB) are buffers, but results in a device error when I just define them as self variables:

obj = opt.InputOptimization(
    model,
    opt.loss.ChannelActivation(model.mixed4a, 476),
    input_param=opt.images.NaturalImage((224,224)).to(device),
)
obj.optimize()

obj.input_param().show()

/content/captum/captum/optim/_param/image/images.py in forward(self)
    180 
    181     def forward(self) -> torch.Tensor:
--> 182         scaled_spectrum = self.fourier_coeffs * self.spectrum_scale
    183         output = self.torch_irfft(scaled_spectrum)
    184         return output.refine_names("B", "C", "H", "W")

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

I guess we could override .to() & .cuda() as a possible solution? Or we could override _apply so that both .to() and .cuda() works. Though I think that buffers are the recommend solution to this problem.

NarineK · 2021-05-25T22:53:37Z

I looked into the FFTImage size thing, and there is a clear deviation in Ludwig's first Captum iteration of FFTImage and the Lucid code.

The Captum code create the image with the shape given from the size variable, and then this same shape is used to create the scale / frequency tensor:
        coeffs_shape = (channels, size[0], size[1] // 2 + 1, 2)
        random_coeffs = torch.randn(
            coeffs_shape
        )  # names=["C", "H_f", "W_f", "complex"]
        self.fourier_coeffs = nn.Parameter(random_coeffs / 50)

        frequencies = FFTImage.rfft2d_freqs(*size)
The second last dimension in coeffs_shape is always 2 + 1 and never 2 + 2, so making the rfft2d_freqs always use 2 + 1 prevents a size mismatch.

https://github.com/pytorch/captum/pull/412/files#diff-d9ef468be9704729ff7c3bd65ad5b115e206b5f418039e18cb333e123428dde5R160-R166

While Lucid derives the shape of the image from the shape of the frequency tensor (that the if statement changes with an odd width value):
    batch, h, w, ch = shape
    freqs = rfft2d_freqs(h, w)
    init_val_size = (2, batch, ch) + freqs.shape
https://github.com/tensorflow/lucid/blob/master/lucid/optvis/param/spatial.py#L66-L67

Therefore, I think the special behavior for widths that have an odd value is required for the Lucid way of doing things and not the new Captum FFTImage. I think that Ludwig may have just forgotten to update rfft2d_freqs to match his changes in the init function.

Thank you for looking deeper into this:
It looks like we are dividing by 2 here:
https://github.com/pytorch/captum/pull/412/files#diff-d9ef468be9704729ff7c3bd65ad5b115e206b5f418039e18cb333e123428dde5R160-R166
and also here:
https://github.com/pytorch/captum/pull/656/files#diff-d1326a272667e088fe9934dd175f0be589edf7594ee01b7463451a5266c56b47R141
Is it necessary to do it in booth places ?

Oh, I think I understand why buffers were used now. They make it so that when a NaturalImage instance is placed onto the GPU, the buffers are also placed on the GPU.

This code works when spectrum_scale (FFTImage) and transform (ToRGB) are buffers, but results in a device error when I just define them as self variables:
obj = opt.InputOptimization(
    model,
    opt.loss.ChannelActivation(model.mixed4a, 476),
    input_param=opt.images.NaturalImage((224,224)).to(device),
)
obj.optimize()

obj.input_param().show()
/content/captum/captum/optim/_param/image/images.py in forward(self)
    180 
    181     def forward(self) -> torch.Tensor:
--> 182         scaled_spectrum = self.fourier_coeffs * self.spectrum_scale
    183         output = self.torch_irfft(scaled_spectrum)
    184         return output.refine_names("B", "C", "H", "W")

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
I guess we could override .to() & .cuda() as a possible solution? Or we could override _apply so that both .to() and .cuda() works. Though I think that buffers are the recommend solution to this problem.

I think that it is tricky to put a module on the device because pytorch puts actual tensors (parameters) related to that module on the device and not the module per se. For example we cannot call .device on the module. And this makes it unclear what is on the device and what not.
https://discuss.pytorch.org/t/how-to-check-if-model-is-on-cuda/180

To be clear we can explicitly put the transform - ed tensors on the device where the actual input is before we multiply with it.

We can set:
self.transform.to(x.device)

before using it and use instance fields instead. Would that work ?

NarineK · 2021-05-25T23:28:58Z

I looked into the FFTImage size thing, and there is a clear deviation in Ludwig's first Captum iteration of FFTImage and the Lucid code.

The Captum code create the image with the shape given from the size variable, and then this same shape is used to create the scale / frequency tensor:
        coeffs_shape = (channels, size[0], size[1] // 2 + 1, 2)
        random_coeffs = torch.randn(
            coeffs_shape
        )  # names=["C", "H_f", "W_f", "complex"]
        self.fourier_coeffs = nn.Parameter(random_coeffs / 50)

        frequencies = FFTImage.rfft2d_freqs(*size)
The second last dimension in coeffs_shape is always 2 + 1 and never 2 + 2, so making the rfft2d_freqs always use 2 + 1 prevents a size mismatch.

https://github.com/pytorch/captum/pull/412/files#diff-d9ef468be9704729ff7c3bd65ad5b115e206b5f418039e18cb333e123428dde5R160-R166

While Lucid derives the shape of the image from the shape of the frequency tensor (that the if statement changes with an odd width value):
    batch, h, w, ch = shape
    freqs = rfft2d_freqs(h, w)
    init_val_size = (2, batch, ch) + freqs.shape
https://github.com/tensorflow/lucid/blob/master/lucid/optvis/param/spatial.py#L66-L67

Therefore, I think the special behavior for widths that have an odd value is required for the Lucid way of doing things and not the new Captum FFTImage. I think that Ludwig may have just forgotten to update rfft2d_freqs to match his changes in the init function.

Thank you for looking into this, @ProGamerGov ! As you mentioned for coeffs_shape Ludwig used 2 + 1 and that is the inconsistency. I don't know exactly why he did it. We can try to ask and understand it better. Either way, it would be good to document those differences in the code so that we don't forget.

In terms of results, not shapes, are we getting different outcomes ?

Also there is a division by 50 magic number (this isn't related to the dimensionality) that I don't see in lucid

https://github.com/pytorch/captum/pull/412/files#diff-d9ef468be9704729ff7c3bd65ad5b115e206b5f418039e18cb333e123428dde5R164).

ProGamerGov · 2021-05-26T00:27:17Z

@NarineK Yeah, moving the tensor to the input device for every forward call would work! But would that incur any sort of performance penalty for constantly moving the tensor to the target device? I tried testing it out, and there didn't seem to be any obvious effect to the time it takes to complete 512 iterations.

Alternatively, we could just include an additional parameter to NaturalImage to disable ToRGB or set the decorrelation_module to SkipLayer.

I've been disabling the color decorrelation like this for experiments:

image = opt.images.NaturalImage(
    parameterization=opt.images.PixelImage,
    init=init_image.cpu(),
    decorrelation_module=opt.models.SkipLayer(), # Disable color decorrelation
    squash_func=lambda x: x, # Disable squash function
).to(device)

Thank you for looking into this, @ProGamerGov ! As you mentioned for coeffs_shape Ludwig used 2 + 1 and that is the inconsistency. I don't know exactly why he did it. We can try to ask and understand it better. Either way, it would be good to document those differences in the code so that we don't forget.

In terms of results, not shapes, are we getting different outcomes ?

Also there is a division by 50 magic number (this isn't related to the dimensionality) that I don't see in lucid

https://github.com/pytorch/captum/pull/412/files#diff-d9ef468be9704729ff7c3bd65ad5b115e206b5f418039e18cb333e123428dde5R164).

Ludwig hasn't been actively lately, so we might have to wait a bit to ask him why he made certain design choices. Chris may be able to help, but he is busy at the moment as well I think. Our version of FFTImage produces different results that appear to be superior to Lucid's FFTImage results. We don't divide the output by 4, normalize the fft operations, and I think there's a small difference with what is done to the scale variable before it's saved as spectrum_scale.

I think that he intended Captum's FFTImage to be the successor to the Lucid's FFTImage, but I did create a Captum version of the old FFTImage a while back and found the results to be less detailed.

ProGamerGov · 2021-05-26T00:45:32Z

Thank you for looking deeper into this:
It looks like we are dividing by 2 here:
https://github.com/pytorch/captum/pull/412/files#diff-d9ef468be9704729ff7c3bd65ad5b115e206b5f418039e18cb333e123428dde5R160-R166
and also here:
https://github.com/pytorch/captum/pull/656/files#diff-d1326a272667e088fe9934dd175f0be589edf7594ee01b7463451a5266c56b47R141
Is it necessary to do it in booth places ?

Yes, I think it's necessary as that way we get the right matching size for 5 dimensional fft tensors:

import torch
import torch.fft
x = torch.randn(1, 3, 512, 405)
print(torch.view_as_real(torch.fft.rfftn(x, s=(512, 405))).shape)
# outputs torch.Size([1, 3, 512, 203, 2])
# 405 // 2 + 1 = 203

NarineK · 2021-05-26T05:58:27Z

@NarineK Yeah, moving the tensor to the input device for every forward call would work! But would that incur any sort of performance penalty for constantly moving the tensor to the target device? I tried testing it out, and there didn't seem to be any obvious effect to the time it takes to complete 512 iterations.

Alternatively, we could just include an additional parameter to NaturalImage to disable ToRGB or set the decorrelation_module to SkipLayer.

I've been disabling the color decorrelation like this for experiments:
image = opt.images.NaturalImage(
    parameterization=opt.images.PixelImage,
    init=init_image.cpu(),
    decorrelation_module=opt.models.SkipLayer(), # Disable color decorrelation
    squash_func=lambda x: x, # Disable squash function
).to(device)
Thank you for looking into this, @ProGamerGov ! As you mentioned for coeffs_shape Ludwig used 2 + 1 and that is the inconsistency. I don't know exactly why he did it. We can try to ask and understand it better. Either way, it would be good to document those differences in the code so that we don't forget.
In terms of results, not shapes, are we getting different outcomes ?
Also there is a division by 50 magic number (this isn't related to the dimensionality) that I don't see in lucid
https://github.com/pytorch/captum/pull/412/files#diff-d9ef468be9704729ff7c3bd65ad5b115e206b5f418039e18cb333e123428dde5R164).

Ludwig hasn't been actively lately, so we might have to wait a bit to ask him why he made certain design choices. Chris may be able to help, but he is busy at the moment as well I think. Our version of FFTImage produces different results that appear to be superior to Lucid's FFTImage results. We don't divide the output by 4, normalize the fft operations, and I think there's a small difference with what is done to the scale variable before it's saved as spectrum_scale.

I think that he intended Captum's FFTImage to be the successor to the Lucid's FFTImage, but I did create a Captum version of the old FFTImage a while back and found the results to be less detailed.

I don't know or think that there will be performance issues. It is simply linking the device. I don't think that .to(device) is costly. It is most probably a simple linking.

Alternatively, we could also pass device as an optional argument to the constructor. The user needs to specify explicitly, but we can specifically describe that the device is used for the transformation tensor.

I think that users need to understand SkipLayer and additional logic related to input arguments whereas passing device is much easier concept to grasp.

ProGamerGov · 2021-05-26T14:20:58Z

@NarineK I just realized that there may be a better way that we could solve this with minimal change to NaturalImage. I think that we could just do self.decorrelate = decorrelation_module.cpu() to ensure that the decorrelation_module is always on the CPU for the start, and then we can still create the ToRGB instance in the __init__ function signature . The init tensor is already expected to be on the CPU, if it is not set to None. So, this might be a more elegant solution?

This way we would keep the .to() and .cuda() behavior of the buffers, and it means that we don't have to change the code in ToRGB. The original intended behavior of the decorrelation_module where it can be set to None to disable color decorrelation would also be preserved.

    def __init__(
        decorrelation_module: Optional[nn.Module] = ToRGB(transform="klt"),
    ) -> None:
        super().__init__()

        # Place decorrelation_module on cpu if it's not None
        self.decorrelate = (
            decorrelation_module.cpu() if decorrelation_module is not None else None
        )

See: meta-pytorch#675 for more details.

Resolve the `ToRGB` device issue as mentioned in: meta-pytorch#656

ProGamerGov · 2021-05-26T19:14:59Z

@NarineK I've implemented the decorrelation_module.cpu() fix and resolved the Conda test issue. I think that all of the issues raised in this PR have been resolved now, and thus it may be ready for merging?

NarineK · 2021-05-27T04:53:48Z

@NarineK I've implemented the decorrelation_module.cpu() fix and resolved the Conda test issue. I think that all of the issues raised in this PR have been resolved now, and thus it may be ready for merging?

This means that the tensor coming out of decorrelation_module will always be on cpu even if the user explicitly puts it on the gpu in the decorrelation_module - we will override it with cpu. Since this is not critical we can leave it as is. I haven't reviewed the rest of PR yet. Let me have a quick look.

NarineK · 2021-06-01T04:12:17Z

captum/optim/_param/image/images.py

+            decorrelation_module.cpu() if decorrelation_module is not None else None
+        )
        if init is not None:
+            assert not init.is_cuda


Enforcing the tensor to be on CPU without warning why, will make the user to think that there is a bug in the code.
We should explicitly put is on the CPU. I assume this is because of decorrelation_module.cpu()

I think that it would be good to print a warning message about why we put decorrelation_module on the CPU and explain it in the documentation.

Perhaps also adding a TODO so that we can make the module more flexible and don't have to move to CPU device.

@NarineK The init tensor had to be put on the CPU because FFTImage expected it to be on the CPU, but I just fixed it so that spectrum_scale is now placed on to the init tensor's device.

In order to avoid any future issues with the decorrelation module being created in the function signature, I added a deepcopy line that only runs when it's a ToRGB instance.

I also removed the device assertion and made ToRGB place the transform on the input tensor's device as well, so that it becomes a no-op when the buffer is placed on the target device.

The device issues should now be resolved I think?

I removed the deepcopy line as it was redundant.

It looks cleaner now.

Here squash_func is not assigned, right? We could also represent it as one-line lambda function. I wonder why we use different squash_funcs depending on whether init is provided or not.

if squash_func is None: def squash_func(x: torch.Tensor) -> torch.Tensor: return x.clamp(0, 1)

Also, do you know what is it meant by the comment # TODO: the world is not yet ready at line 460 ?

@NarineK The different squash function for when init tensors are provided was something I found to produce better results with images. Lambda functions were used previously, but SK thought inner functions were better.

The TODO line is because named dimensions are not fully supported by PyTorch yet.

interesting, why are inner functions better in this case ?
rename is used in line 440 too. I we can leave it but ideally we should use it for the PT versions that support it.

NarineK · 2021-06-01T04:20:46Z

captum/optim/models/_common.py

    """

-    def forward(self, x: torch.Tensor) -> torch.Tensor:
+    def __init__(self, *args, **kwargs) -> None:


why are the args and kwargs passed if they are not used ? Are there any real scenarios that we need it ? Is this because we replace ReLU with SkipLayer ?
I could imagine, for example, that x is a tuple of tensors and we would need to return that tuple.

I made the change because users may have models using activ(inplace=False) or activ(False) where activ = torch.nn.ReLU. This is same way that torch.nn.Identity works: https://pytorch.org/docs/stable/generated/torch.nn.Identity.html

I'll add the type hint for tuples of tensors.

I've improved the documentation, added the type hints, and also provided a link to the nn.Identity class!

NarineK · 2021-06-01T04:20:55Z

captum/optim/models/_common.py

+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__()
+
+    def forward(self, x: torch.Tensor, *args, **kwargs) -> torch.Tensor:


why are the args and kwargs passed if they are not used ? We should document it because it is unclear why this is needed.

I made the change for situations like replacing the decorrelation module as it specifies an additional argument in the forward pass. I'll add some documentation explaining why I added args and kwargs. Without this change, SkipLayer is the exact same as torch.nn.Identity.

yes, this will work for ReLU with one input tensor and specific cases like ReLUs but it will have issues with custom modules that take tuples of tensors and modules such as nn.Linear, etc. I think that perhaps it would be good to give more specific name to it or describe that this is a skip layer that assumes that the inputs and outputs have to have the same shape and that it is the first tensor that is returned as an output.

NarineK

LGTM! Thank you for addressing the comments, @ProGamerGov!
I left two minor comments.

Fix the Inception5h model's download link

d600559

facebook-github-bot added the cla signed label Apr 26, 2021

ProGamerGov mentioned this pull request Apr 26, 2021

optim-wip: Fix objectives.py, images.py, RedirectedReLU etc. #552

Merged

ProGamerGov added 3 commits April 26, 2021 12:04

Fix black error

8612fee

Ensure history has no gradient

d31d60d

Remove .cpu() to improve optimization speed

bd3c7fe

ProGamerGov changed the title ~~Optim-wip: Fix the Inception5h model's download link~~ Optim-wip: Fix failing tests & model download link May 2, 2021

ProGamerGov added 4 commits May 3, 2021 10:00

Temporarily disable nightly build tests affected by pytorch/pytorch#5…

bbebf22

…7421

Fix flake8 error

0fb9eb5

Remove ImageTensor test skips & add new tests

7c453e0

* Remove `ImageTensor` test skips as the `torch.Tensor`'s `__new__` function has been fixed. * Add tests for `ImageTensor` functions. * Removed old `AlphaChannelLoss` code.

Fix ImageTensor __new__ list test

5d0d143

This was referenced May 9, 2021

Optim-wip: Add a ton of missing docs #571

Merged

Optim-wip: Add the pre-trained InceptionV1 Places365 model #655

Closed

ProGamerGov added 3 commits May 12, 2021 12:00

Fix NaturalImage device bug

50455bf

* Fix `NaturalImage` device bug. * Set `decorrelate_init` default to `False`. * Fix `NaturalImage` size type.

Set decorrelate_init default back to True

c9ece95

Check for presence of Pillow / PIL library in ImageTensor applicable …

4bd8e5b

…tests

ProGamerGov added 5 commits May 17, 2021 09:05

Update Conda installation script to latest version

71a370d

Add SkipLayer to models __init__

e7fcdc0

Make SkipLayer work if there are any additional init or forward argum…

015890e

…ents

Minor correction to optimize's loss summarizer setup

1bcac93

It now fits with the `Optional` type hint that it was given.

Fix _rand_select bug

811a269

* Fix issue where the final value in a list was not selectable. * Fix error when lists have a size of 1.

Increase number of steps in optimization test

1f2d421

* Tests showed that using only 5 iterations was no longer sufficient to ensure the final loss values were less than the first loss values.

Change NumPy rfft2d_freqs to match PyTorch version

9ca0f9b

ProGamerGov added 2 commits May 26, 2021 10:18

Fix PCA ChannelReducer test

681e3ee

Fix failing nodejs

8aaf1a2

See: meta-pytorch#675 for more details.

ProGamerGov mentioned this pull request May 26, 2021

Resolve ToRGB device issue with NaturalImage ProGamerGov/captum#311

Merged

ProGamerGov added 2 commits May 26, 2021 12:10

Resolve the ToRGB device issue with NaturalImage

27145c7

Resolve the `ToRGB` device issue as mentioned in: meta-pytorch#656

Fix no color decorrelation test

15a7db6

NarineK reviewed Jun 1, 2021

View reviewed changes

ProGamerGov added 5 commits June 1, 2021 09:12

Add ToDos and better SkipLayer docs

02fc2d7

Fix lint and FFTImage init device

7f078f9

Fix NaturalImage device issues

d9b4620

Improve NaturalImage fix

5aaece3

Remove redundant NaturalImage device fix

c72adc5

NarineK self-requested a review June 7, 2021 00:24

NarineK approved these changes Jun 7, 2021

View reviewed changes

Improve SkipLayer documentation

1eae61c

NarineK merged commit 46e16e4 into meta-pytorch:optim-wip Jun 7, 2021

Optim-wip: Fix failing tests, model download link, & more #656

Optim-wip: Fix failing tests, model download link, & more #656

Uh oh!

Conversation

ProGamerGov commented Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NarineK commented Apr 29, 2021

Uh oh!

ProGamerGov commented Apr 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProGamerGov commented Apr 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProGamerGov commented May 1, 2021

Uh oh!

ProGamerGov commented May 6, 2021

Uh oh!

ProGamerGov commented May 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProGamerGov commented May 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProGamerGov commented May 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProGamerGov commented May 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProGamerGov commented May 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NarineK commented May 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NarineK commented May 25, 2021

Uh oh!

ProGamerGov commented May 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProGamerGov commented May 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NarineK commented May 25, 2021

Uh oh!

NarineK commented May 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProGamerGov commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProGamerGov commented May 26, 2021

Uh oh!

NarineK commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProGamerGov commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProGamerGov commented May 26, 2021

Uh oh!

NarineK commented May 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NarineK Jun 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ProGamerGov Jun 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ProGamerGov Jun 2, 2021

Choose a reason for hiding this comment

Uh oh!

NarineK Jun 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

ProGamerGov commented Apr 26, 2021 •

edited

Loading

ProGamerGov commented Apr 29, 2021 •

edited

Loading

ProGamerGov commented Apr 29, 2021 •

edited

Loading

ProGamerGov commented May 11, 2021 •

edited

Loading

ProGamerGov commented May 16, 2021 •

edited

Loading

ProGamerGov commented May 22, 2021 •

edited

Loading

ProGamerGov commented May 24, 2021 •

edited

Loading

ProGamerGov commented May 24, 2021 •

edited

Loading

NarineK commented May 25, 2021 •

edited

Loading

ProGamerGov commented May 25, 2021 •

edited

Loading

ProGamerGov commented May 25, 2021 •

edited

Loading

NarineK commented May 25, 2021 •

edited

Loading

ProGamerGov commented May 26, 2021 •

edited

Loading

NarineK commented May 26, 2021 •

edited

Loading

ProGamerGov commented May 26, 2021 •

edited

Loading

NarineK commented May 27, 2021 •

edited

Loading

NarineK Jun 1, 2021 •

edited

Loading

ProGamerGov Jun 1, 2021 •

edited

Loading

NarineK Jun 7, 2021 •

edited

Loading

ProGamerGov Jun 7, 2021 •

edited

Loading

ProGamerGov Jun 1, 2021 •

edited

Loading

ProGamerGov Jun 1, 2021 •

edited

Loading