-
Notifications
You must be signed in to change notification settings - Fork 22.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
N dimensional histogram function in pytorch as in numpy.histogramdd #29209
Comments
Hi @miranov25, I was digging some |
Hello @leofang Concerning ND data in CERN (particle physics) we have huge amount of quite dense data. See example bellow.
We have plenty of use cases. Until recently we were working with C++ implementation. Now we are moving implementation to Visualization Examples based on the ND histograms
Reconstruction performance parameterization (5-6D)
Detector - distortion map calibration (5-6 dimensions)
particle identification - 6D
|
Hello @albanD My collaborators implemented requested functionality in our RootInteractive repository. In case of interest we will be happy to make a pull request to PyTorch. Regards |
Thanks for the pointers. |
Not being targeted at the moment, no, @albanD, but we do have torch.searchsorted (https://pytorch.org/docs/master/generated/torch.searchsorted.html?highlight=searchsorted#torch.searchsorted) already. |
We are aware of new searchsorted, we use it already in our code working with current release (1.6) Concerning pytorch/numpy comparison, we observe significant difference in performance depending on the number of cores. |
cc @ngimel for the performance This is really interesting stuff. Give me an opportunity to take a closer look at the provided links. |
Finally had a chance to take a closer look. I think we would accept |
Hello @mruberry and @albanD Please provide us pointers, so he can can start. |
Take a look at #42755, which added the quantile operator, and #42799, which implemented a few simple composite ops, and #43400, which added an alias for an existing PyTorch function. I would start by implementing histogram by
If a custom kernel is needed then let's identify what kind of kernel we'd like to use and we can discuss implementation strategies. Reviewing the histc implementations for CPU and CUDA might be interesting, too:
Moving the histc implementation from TH to ATen may also be a good place to start as it would cover registering a function with ATen and validating the implementation. |
Hello @mruberry I have implemented an operator for histogram in #44485 with support for non-uniform binning and the density flag which works like in NumPy, but I'm not entirely sure about the interface, as it only outputs one Tensor which is the histogram, while Numpy also outputs the array of bin edges. With histogramdd there are more questions about the interface.
|
How significant is the performance gap? PyTorch doesn't typically support array_likes, so I would expect us to only support the first (N, D) array option.
You could start by only supporting a single integer (bins for all dimensions).
You could start by putting range in the signature but not supporting values other than None.
Agreed. You could also not implement this to start.
The return type should match NumPy, so a tuple (H, edges). |
I have comments related to pull request #44485 in context of this issue #29209
|
Closed by #65318 |
@saketh-are Does your implementation handle CUDA tensors ? It does not seem to, so I would not close this issue until then. |
|
🚀 Feature -N dimensional histogram function in pytorch as in numpy
It would be great to have support for multi-dimensional histograms with similar functionality as in the
numpy.histogramdd
See discussion within https://discuss.pytorch.org/t/n-dimensional-histogram-function-in-pytorch-cpu-gpu/59888
Motivation
We are working on interactive visualization of multi-dimensional data
Usually O(2-7) dimensions (still dense) Currently we are using numpy implementation for multidimensional histogram filling, slicing, summing:
H, axis = np.histogramdd(inputArray.values, bins=bins, range=hRange)
y = np.sum(H[hSliceLocal], axis=axis)
We would like to use PyTorch (or Numba) to get faster GPU implementation
Pitch
I checked numpy implementation in https://github.com/numpy/numpy/blob/v1.17.0/numpy/lib/histograms.py#L935-L1113
At the end algorithm in (step 3) is using 1D histogramming -
np.bincount
However, numpy implementation assume non uniform binning - to find bins they use internally
np.searchsorted
.Alternatives
In case searchsorted will be difficult to implement (I did not find it in pytorch= only external https://github.com/aliutkus/torchsearchsorted), uniform binning will be sufficient.
Additional context
Current numpy Algorithm:
np.searchsorted
np.ravel_multi_index
np.bincount
hist.reshape(nbin)
cc @mruberry @rgommers
The text was updated successfully, but these errors were encountered: