Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Population Count Op #36380

Open
tom-bird opened this issue Apr 10, 2020 · 3 comments
Open

Population Count Op #36380

tom-bird opened this issue Apr 10, 2020 · 3 comments
Labels
actionable feature A request for a proper, new feature. module: python frontend For issues relating to PyTorch's Python frontend triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@tom-bird
Copy link

tom-bird commented Apr 10, 2020

馃殌 Feature

It would be great to have the bitwise operation 'Population Count' that counts the number of 1 bits to be exposed.

Not sure if this exists in the back end of torch? It is exposed in tensorflow

Motivation

This is a key operation in implementing binarised neural nets, for example in XNOR-Net. Binarised linear and conv layers can be performed quickly using XNOR and population counts. XNOR is already exposed in the pytorch API.

cc @albanD

@Adam-Vandervorst
Copy link

Any update or workaround here?

@Exferro
Copy link

Exferro commented Mar 13, 2024

I will express interest too, since NumPy is about to add np.bitwise_count in its new major release. I benchmarked it a bit, and here's what I've got:

  1. My custom implementation of popcount64c taken from the Wikipedia page on Hamming weight (population count) takes ~800 ms to popcount an array a = np.random.randint(20000, size=(10000, 10000)).
  2. At the same time, a novel np.bitwise_count which just calls the underlying CPU instruction takes ~50 ms, which is a 16x speedup.
  3. To give more context, np.bitwise_and(b, b) takes ~75 ms. Thus, we can see that bitwise_and roughly bounds from above the time required for popcount.

Now, if we proceed to GPU, I have the following:

  1. My custom implementation of popcount64c takes ~30ms.
  2. At the same time, torch.bitwise_and(b, b) takes ~2.5ms.
  3. Thus, I would expect that native PyTorch popcount implementation, which calls just a basic processor instruction, would also take milliseconds, again providing a ~10x speedup.

The code I work on daily is quite specific and there I use popcount a lot.
Thus I would really appreciate if PyTorch had a native popcount :) Thank you!

@albanD albanD added the module: python frontend For issues relating to PyTorch's Python frontend label Apr 15, 2024
@albanD
Copy link
Collaborator

albanD commented Apr 15, 2024

Quick update: I think the fact that numpy added it is a good signal that we want it as well for best compatibility.
We would be happy to accept a PR that adds this new unary op!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
actionable feature A request for a proper, new feature. module: python frontend For issues relating to PyTorch's Python frontend triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants