-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Device agnostic compute for polarization #150
Conversation
A step towards #144. |
Also tested the pip install torch --index-url https://download.pytorch.org/whl/rocm5.4.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Sorry I'm slow to review this...thanks @ziw-liu.
Speed comparison: import torch
from waveorder.models import inplane_oriented_thick_pol3d
def test_apply_inverse_transfer_function(device):
input_shape = (5, 100, 2048, 2048)
czyx_data = torch.rand(input_shape, device=device)
intensity_to_stokes_matrix = (
inplane_oriented_thick_pol3d.calculate_transfer_function(
swing=0.1,
scheme="5-State",
).to(device)
)
_ = inplane_oriented_thick_pol3d.apply_inverse_transfer_function(
czyx_data=czyx_data,
intensity_to_stokes_matrix=intensity_to_stokes_matrix,
) AMD EPYC 7302P CPU: test_apply_inverse_transfer_function("cpu")
# 11.6 s ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) A single NVIDIA A40 GPU is 60x faster (and also faster than a typical camera's framerate): test_apply_inverse_transfer_function("cuda")
# 193 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) |
Fixed the stokes module and its tests to work both on CPU and GPU.
Caveat: the background estimation in
waveorder.models.inplane_oriented_thick_pol3D.apply_inverse_transfer_function
is still NumPy codedoes not work with the MPS backend. See #153.Tested on
cuda
(NVIDIA A40, CUDA 12.2; AMD EPYC 7302P, Linux 4.18.0) andmps
(Apple M1 Pro, macOS 13.5.1), both with native PyTorch build from PyPI.