Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rounding to nearest pixel value breaks almost all attacks #44

Closed
huyvnphan opened this issue Feb 12, 2021 · 3 comments
Closed

Rounding to nearest pixel value breaks almost all attacks #44

huyvnphan opened this issue Feb 12, 2021 · 3 comments

Comments

@huyvnphan
Copy link

Usually images are stored in uint8 format, in range [0, 255]
Hence when I try to round the values of an image to its nearest interger values, all attacks fail to achieve desired accuracy.

class ModelWrapper(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model.eval()
        self.mean = torch.tensor([0.4914, 0.4822, 0.4465]).view(1, 3, 1, 1).to(device)
        self.std = torch.tensor([0.2470, 0.2435, 0.2616]).view(1, 3, 1, 1).to(device)
    
    def forward(self, x):
        x = x.clamp(0, 1)
        x = x * 255
        x = torch.round(x)
        x = x / 255
        x = (x - self.mean) / self.std
        x = self.model(x)
        return x

I know that torch.round() doesn't give useful gradients to the adversary, hence the drop the attack accuracy.
So how to make sure the inputs to the model correspond to valid integer value of [0, 255], but still achieve high attack accuracy?

@fra31
Copy link
Owner

fra31 commented Feb 12, 2021

Hi,

the first things that I'd try would be 1) to run the attack without rounding and then just round the final output and 2) to exclude the rounding in the backward pass so that the gradients are computed normally (in a PGD-like attack you could round the current iterate after the projection step to ensure that it belongs to the desired image domain, in the end rounding is just a particular projection).
Also, at least for Linf, Square Attack should give valid images already if eps is an integer multiple of 1/255.

Let me know if this helps!

@huyvnphan
Copy link
Author

Hi thank you for your inputs. I've implemented a custom round function.
Here is the solution for anyone interested

import torch

class CustomRound(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x):
        return torch.round(x)

    @staticmethod
    def backward(ctx, g):
        # send the gradient g straight-through on the backward pass.
        return g, None

class ModelWrapper(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model.eval()
        self.mean = torch.tensor([0.4914, 0.4822, 0.4465]).view(1, 3, 1, 1).to(device)
        self.std = torch.tensor([0.2470, 0.2435, 0.2616]).view(1, 3, 1, 1).to(device)
        self.round = CustomRound.apply
    
    def forward(self, x):
        x = x.clamp(0, 1)
        x = x * 255
        x = self.round(x)
        x = x / 255
        x = (x - self.mean) / self.std
        x = self.model(x)
        return x

model = ModelWrapper(resnet18(pretrained=True))
model = model.to(device).eval()

@fra31
Copy link
Owner

fra31 commented Feb 12, 2021

Thanks for sharing. Did you see any difference in the robustness of the model with rounding in this way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants