Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device agnostic compute for polarization #150

Merged
merged 11 commits into from
Jan 6, 2024
Merged

Device agnostic compute for polarization #150

merged 11 commits into from
Jan 6, 2024

Conversation

ziw-liu
Copy link
Contributor

@ziw-liu ziw-liu commented Sep 6, 2023

Fixed the stokes module and its tests to work both on CPU and GPU.

Caveat: the background estimation in waveorder.models.inplane_oriented_thick_pol3D.apply_inverse_transfer_function is still NumPy code does not work with the MPS backend. See #153.

Tested on cuda (NVIDIA A40, CUDA 12.2; AMD EPYC 7302P, Linux 4.18.0) and mps (Apple M1 Pro, macOS 13.5.1), both with native PyTorch build from PyPI.

tests/test_stokes.py::test_S2I_matrix PASSED                                                                    [  5%]
tests/test_stokes.py::test_I2S_matrix PASSED                                                                    [ 11%]
tests/test_stokes.py::test_s12_to_orientation[cpu] PASSED                                                       [ 17%]
tests/test_stokes.py::test_s12_to_orientation[cuda] PASSED                                                      [ 23%]
tests/test_stokes.py::test_stokes_recon[cpu] PASSED                                                             [ 29%]
tests/test_stokes.py::test_stokes_recon[cuda] PASSED                                                            [ 35%]
tests/test_stokes.py::test_stokes_after_adr_usage PASSED                                                        [ 41%]
tests/test_stokes.py::test_mueller_from_stokes PASSED                                                           [ 47%]
tests/test_stokes.py::test_mmul[cpu] PASSED                                                                     [ 52%]
tests/test_stokes.py::test_mmul[cuda] PASSED                                                                    [ 58%]
tests/test_stokes.py::test_copying[cpu] PASSED                                                                  [ 64%]
tests/test_stokes.py::test_copying[cuda] PASSED                                                                 [ 70%]
tests/test_stokes.py::test_orientation_offset[cpu] PASSED                                                       [ 76%]
tests/test_stokes.py::test_orientation_offset[cuda] PASSED                                                      [ 82%]
tests/models/test_inplane_oriented_thick_pol3D.py::test_calculate_transfer_function PASSED                      [ 88%]
tests/models/test_inplane_oriented_thick_pol3D.py::test_apply_inverse_transfer_function[cpu] PASSED             [ 94%]
tests/models/test_inplane_oriented_thick_pol3D.py::test_apply_inverse_transfer_function[cuda] PASSED            [100%]

tests/test_stokes.py::test_S2I_matrix PASSED                                                                                                                                                                                                                [  5%]
tests/test_stokes.py::test_I2S_matrix PASSED                                                                                                                                                                                                                [ 11%]
tests/test_stokes.py::test_s12_to_orientation[cpu] PASSED                                                                                                                                                                                                   [ 17%]
tests/test_stokes.py::test_s12_to_orientation[mps] PASSED                                                                                                                                                                                                   [ 23%]
tests/test_stokes.py::test_stokes_recon[cpu] PASSED                                                                                                                                                                                                         [ 29%]
tests/test_stokes.py::test_stokes_recon[mps] PASSED                                                                                                                                                                                                         [ 35%]
tests/test_stokes.py::test_stokes_after_adr_usage PASSED                                                                                                                                                                                                    [ 41%]
tests/test_stokes.py::test_mueller_from_stokes PASSED                                                                                                                                                                                                       [ 47%]
tests/test_stokes.py::test_mmul[cpu] PASSED                                                                                                                                                                                                                 [ 52%]
tests/test_stokes.py::test_mmul[mps] PASSED                                                                                                                                                                                                                 [ 58%]
tests/test_stokes.py::test_copying[cpu] PASSED                                                                                                                                                                                                              [ 64%]
tests/test_stokes.py::test_copying[mps] PASSED                                                                                                                                                                                                              [ 70%]
tests/test_stokes.py::test_orientation_offset[cpu] PASSED                                                                                                                                                                                                   [ 76%]
tests/test_stokes.py::test_orientation_offset[mps] PASSED                                                                                                                                                                                                   [ 82%]
tests/models/test_inplane_oriented_thick_pol3D.py::test_calculate_transfer_function PASSED                                                                                                                                                                  [ 88%]
tests/models/test_inplane_oriented_thick_pol3D.py::test_apply_inverse_transfer_function[cpu] PASSED                                                                                                                                                         [ 94%]
tests/models/test_inplane_oriented_thick_pol3D.py::test_apply_inverse_transfer_function[mps] PASSED                                                                                                                                                         [100%]

@ziw-liu ziw-liu added enhancement New feature or request GPU Accelerated compute devices labels Sep 6, 2023
@ziw-liu
Copy link
Contributor Author

ziw-liu commented Sep 19, 2023

A step towards #144.

@ziw-liu
Copy link
Contributor Author

ziw-liu commented Sep 21, 2023

Also tested the cuda backend with an AMD GPU (RX6800XT, ROCm 5.6.1, Linux 6.5.3), although it probably won't be officially supported by us. This needs a special PyTorch build installed before waveorder.

pip install torch --index-url https://download.pytorch.org/whl/rocm5.4.2

talonchandler
talonchandler previously approved these changes Jan 6, 2024
Copy link
Collaborator

@talonchandler talonchandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Sorry I'm slow to review this...thanks @ziw-liu.

@ziw-liu
Copy link
Contributor Author

ziw-liu commented Jan 6, 2024

Speed comparison:

import torch
from waveorder.models import inplane_oriented_thick_pol3d


def test_apply_inverse_transfer_function(device):
    input_shape = (5, 100, 2048, 2048)
    czyx_data = torch.rand(input_shape, device=device)

    intensity_to_stokes_matrix = (
        inplane_oriented_thick_pol3d.calculate_transfer_function(
            swing=0.1,
            scheme="5-State",
        ).to(device)
    )

    _ = inplane_oriented_thick_pol3d.apply_inverse_transfer_function(
        czyx_data=czyx_data,
        intensity_to_stokes_matrix=intensity_to_stokes_matrix,
    )

AMD EPYC 7302P CPU:

test_apply_inverse_transfer_function("cpu")
# 11.6 s ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

A single NVIDIA A40 GPU is 60x faster (and also faster than a typical camera's framerate):

test_apply_inverse_transfer_function("cuda")
# 193 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

@ziw-liu ziw-liu merged commit 8f1f65c into main Jan 6, 2024
3 checks passed
@ziw-liu ziw-liu deleted the pol-cuda branch January 6, 2024 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request GPU Accelerated compute devices
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants