Skip to content

Conversation

z-a-f
Copy link

@z-a-f z-a-f commented May 20, 2021

Stack from ghstack:

The basic demo for this particular implementation can be found here:
https://gist.github.com/z-a-f/1d06ae8d5a509d3c9c1596dcb924afe0

Test Plan:

python test/test_ao_sparsity.py

Differential Revision: D28970959

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 20, 2021

💊 CI failures summary and remediations

As of commit 9a69b41 (more details on the Dr. CI page and at hud.pytorch.org/pr/58705):


  • 2/2 failures possibly* introduced in this PR
    • 1/2 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build Windows CI (pytorch-win-vs2019-cpu-py3) / test (default, 1, 2, windows.4xlarge) (1/1)

Step: "Run test scripts" (full log | diagnosis details | 🔁 rerun)

2021-07-01T22:52:39.4142956Z ERROR [0.098s]: te...aunch_user_script (__main__.TestDistirbutedLaunch)
2021-07-01T22:52:39.3847633Z https://pytorch.org/docs/stable/distributed.html#launch-utility for 
2021-07-01T22:52:39.3848201Z further instructions
2021-07-01T22:52:39.3848432Z 
2021-07-01T22:52:39.3848700Z   warnings.warn(
2021-07-01T22:52:39.3849000Z ERROR (0.098s)
2021-07-01T22:52:39.3849258Z *****************************************
2021-07-01T22:52:39.3850112Z Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
2021-07-01T22:52:39.3851080Z *****************************************
2021-07-01T22:52:39.4141613Z 
2021-07-01T22:52:39.4142175Z ======================================================================
2021-07-01T22:52:39.4142956Z ERROR [0.098s]: test_launch_user_script (__main__.TestDistirbutedLaunch)
2021-07-01T22:52:39.4143676Z ----------------------------------------------------------------------
2021-07-01T22:52:39.4144132Z Traceback (most recent call last):
2021-07-01T22:52:39.4145275Z   File "distributed/test_launcher.py", line 49, in test_launch_user_script
2021-07-01T22:52:39.4145779Z     launch.main(args)
2021-07-01T22:52:39.4146542Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\distributed\launch.py", line 187, in main
2021-07-01T22:52:39.4147116Z     launch(args)
2021-07-01T22:52:39.4147844Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\distributed\launch.py", line 173, in launch
2021-07-01T22:52:39.4148430Z     run(args)
2021-07-01T22:52:39.4149108Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\distributed\run.py", line 688, in run
2021-07-01T22:52:39.4149682Z     elastic_launch(

Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@z-a-f
Copy link
Author

z-a-f commented Jun 8, 2021

@zafartahirov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

import torch
from torch.nn import functional as F

class Linear(torch.nn.Linear):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have the masking done by a module that can operate on the weights. This would be a FakeSparseModule, whos forward method masks the weights. Can you rewrite this implementation such that:

  1. We have a base FakeSparseModule class that contains a mask attribute.
  2. The FakeSparseModule is applied to the weight as a parameterization.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, the parametrization is going to be part of a separate PR

Copy link
Contributor

@raghuramank100 raghuramank100 Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, in this PR lets do the following:

  1. define fakesparse module [similar to the mulby module]
class FakeSparse(nn.Module):
    def __init__(self, shape, sparsity_config):
        super().__init__()
       # Initialize mask to be all ones

        self.mask = torch.ones(shape)
        self.sparsity_config = sparsity_config

    def forward(self, x):
        assert self.mask.shape == x.shape
        return self.mask * x

If you are doing the reparameterization in a later PR, use this module manually, similar to how it is used in nn.qat.linear.

class Linear(torch.nn.Linear):
   def __init__(self, in_features, out_features, bias=True, device=None, dtype=None) -> None:
        factory_kwargs = {'device': device, 'dtype': dtype}
        super().__init__(in_features, out_features, bias, **factory_kwargs)
        self.fake_sparse = FakeSparse(self.weight.size())

    def forward(self, input):
        return F.linear(input, self.fake_sparse(self.weight), self.bias)


@classmethod
def from_dense(cls, dense):
sparse = cls(dense.in_features, dense.out_features, (dense.bias is not None))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we handle this instead by doing the default init of the mask attr to be all ones? This way from_dense is not needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this? We need a transformation function for the prepare.

"""
_version = 1
_FLOAT_MODULE = torch.nn.Linear
_FLOAT_MODULE = (torch.nn.Linear, SparseLinear)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that nn.Linear is needed for the PTQ flow and SparseLinear for the QAT flow. Is that correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove the torch.nn.Linear as a supported option, In both flows, we will need to create a SparseLinearModule. In the Post training sparsity case, one could do:

m = nn.Linear(3,5)
# Sparsifier.prepare would essentially do this:
m_sparse = torch.nn.sparse.linear.from_dense(m)
# m_sparse now has the same weights, but the mask is set to identity
m_sparse.set_mask(mask, pattern) # Function to override the mask and define the sparsity pattern
# Now we can convert only the sparse linear module.

weight = mod.weight
if getattr(mod.qconfig, 'mask', False):
weight = mod.qconfig.mask * mod.weight
weight = mod.weight * mod.mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us make mask an attribute of the fakeSparse Module and do this via reparameterization

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reparametrization will be a separate PR, as there are several things to iron out there.


@classmethod
def from_float(cls, mod):
def from_float(cls, mod, row_block_size=1, col_block_size=4):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not need the row/block col size here. This block just applies the mask. The mask should enforce the required sparsity pattern

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how we could avoid that -- once we call the from_float, we convert the float model into the sparse quantized model. In the latter we have to pack the weight, and we need the block shape in there. The sparsifier has this information, but the current model does not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, from the point of view of keeping the API clean, should we do the following:

  1. When we create a sparseLinear module, the fakesparse module should have all the information needed to specify the mask.
  2. At convert, we only rely on the fake-sparse module
 def from_float(cls, mod):
  sparsity_pattern = mod.fake_sparse.sparsity_config.sparsity_pattern # This is the shape of the sparse blocks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being addressed in a later PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be addressed by the torch.ao.utils.convert -- because it will take the sparse_config, the rows/cols can be passed around by that utility. Currently, it will be an argument to this to make sure the tests pass. I am going to keep the current implementation as a stopgap measure -- addressed in the later PRs

Copy link
Contributor

@raghuramank100 raghuramank100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests so that functionality is verified.

@z-a-f z-a-f requested review from kazhou and raghuramank100 June 10, 2021 19:18
@z-a-f
Copy link
Author

z-a-f commented Jun 11, 2021

@zafartahirov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@z-a-f
Copy link
Author

z-a-f commented Jun 14, 2021

@zafartahirov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@z-a-f z-a-f requested a review from raghuramank100 June 28, 2021 17:54
The basic demo for this particular implementation can be found here:
https://gist.github.com/z-a-f/1d06ae8d5a509d3c9c1596dcb924afe0

Test Plan:

```
python test/test_ao_sparsity.py
```

Differential Revision: [D28970959](https://our.internmc.facebook.com/intern/diff/D28970959)

[ghstack-poisoned]
self.assertEqual(model_save.seq[1].parametrizations['weight'][0].mask,
model_load.seq[1].parametrizations['weight'][0].mask)

def test_jit_trace(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test to check if parameterized models are scriptable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parametrizations are not scriptable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mark a TODO to add tests for symbolic tracing also. We need paramaterization to compose with these APIs.

not.
"""
def __init__(self, mask):
super().__init__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead init with shape instead of mask?
The mask could be initialized to torch.ones(shape)

  def __init__(self, shape):
      super().__init__()
      self.register_buffer('mask', torch.ones(shape))

If the user wants to override the mask, they can directly set the attribute.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user already has a mask?

Copy link
Contributor

@raghuramank100 raghuramank100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few comments

@z-a-f
Copy link
Author

z-a-f commented Jun 29, 2021

@zafartahirov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

The basic demo for this particular implementation can be found here:
https://gist.github.com/z-a-f/1d06ae8d5a509d3c9c1596dcb924afe0

Test Plan:

```
python test/test_ao_sparsity.py
```

Differential Revision: [D28970959](https://our.internmc.facebook.com/intern/diff/D28970959)

[ghstack-poisoned]
@z-a-f
Copy link
Author

z-a-f commented Jun 29, 2021

@zafartahirov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@z-a-f z-a-f requested a review from raghuramank100 July 1, 2021 06:23
The basic demo for this particular implementation can be found here:
https://gist.github.com/z-a-f/1d06ae8d5a509d3c9c1596dcb924afe0

Test Plan:

```
python test/test_ao_sparsity.py
```

Differential Revision: [D28970959](https://our.internmc.facebook.com/intern/diff/D28970959)

[ghstack-poisoned]
@z-a-f
Copy link
Author

z-a-f commented Jul 1, 2021

@zafartahirov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

The basic demo for this particular implementation can be found here:
https://gist.github.com/z-a-f/1d06ae8d5a509d3c9c1596dcb924afe0

Test Plan:

```
python test/test_ao_sparsity.py
```

Differential Revision: [D28970959](https://our.internmc.facebook.com/intern/diff/D28970959)

[ghstack-poisoned]
self.seq[0].weight = nn.Parameter(torch.zeros_like(self.seq[0].weight) + 2.0)
self.seq[1].weight = nn.Parameter(torch.zeros_like(self.seq[1].weight) + 3.0)
if bias:
self.linear = nn.Parameter(torch.zeros_like(self.linear.bias) + 10.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.seq[0] should be self.seq[1]

Copy link
Contributor

@raghuramank100 raghuramank100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some previous comments were missed, please take a look before landing.
Also, please add TODOs for scripting and symbolic tracing test coverage.

@z-a-f
Copy link
Author

z-a-f commented Jul 1, 2021

@zafartahirov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

The basic demo for this particular implementation can be found here:
https://gist.github.com/z-a-f/1d06ae8d5a509d3c9c1596dcb924afe0

Test Plan:

```
python test/test_ao_sparsity.py
```

Differential Revision: [D28970959](https://our.internmc.facebook.com/intern/diff/D28970959)

[ghstack-poisoned]
@z-a-f
Copy link
Author

z-a-f commented Jul 1, 2021

@zafartahirov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@z-a-f merged this pull request in 80cab10.

@facebook-github-bot facebook-github-bot deleted the gh/z-a-f/105/head branch July 6, 2021 14:17
jasperzhong pushed a commit to jasperzhong/swift that referenced this pull request Nov 25, 2021
The basic demo for this particular implementation can be found here:
https://gist.github.com/z-a-f/1d06ae8d5a509d3c9c1596dcb924afe0

Test Plan:

```
python test/test_ao_sparsity.py
```

ghstack-source-id: cf62000
Pull Request resolved: pytorch/pytorch#58705
jasperzhong pushed a commit to jasperzhong/swift that referenced this pull request Nov 25, 2021
ghstack-source-id: f42b3da
Pull Request resolved: pytorch/pytorch#58705
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants