Skip to content
This repository has been archived by the owner on Sep 25, 2023. It is now read-only.

Perf Improvements to SOS Filter #377

Merged
merged 5 commits into from May 25, 2021

Conversation

mnicely
Copy link
Contributor

@mnicely mnicely commented May 23, 2021

The PR speeds up SOS Filter by moving zi arithmetic to registers and remove required SMEM

Results

OLD V100

--------------- benchmark 'SOSFilt': 12 tests ----------------
Name (time in ms)                               Mean          
--------------------------------------------------------------
test_sosfilt_gpu[float64-1-32768-32]          4.5768 (1.0)    
test_sosfilt_gpu[float64-10-32768-32]         4.5822 (1.00)   
test_sosfilt_gpu[float64-2-32768-32]          4.6098 (1.01)   
test_sosfilt_gpu[float64-1-32768-64]          4.8506 (1.06)   
test_sosfilt_gpu[float64-2-32768-64]          4.8632 (1.06)   
test_sosfilt_gpu[float64-10-32768-64]         4.8746 (1.07)   
test_sosfilt_gpu[float64-1-1048576-32]      156.1625 (34.12)  
test_sosfilt_gpu[float64-2-1048576-32]      156.2263 (34.13)  
test_sosfilt_gpu[float64-10-1048576-32]     156.3793 (34.17)  
test_sosfilt_gpu[float64-1-1048576-64]      166.1297 (36.30)  
test_sosfilt_gpu[float64-2-1048576-64]      166.2324 (36.32)  
test_sosfilt_gpu[float64-10-1048576-64]     166.5826 (36.40)  
--------------------------------------------------------------

NEW V100

--------------- benchmark 'SOSFilt': 12 tests ----------------
Name (time in ms)                               Mean          
--------------------------------------------------------------
test_sosfilt_gpu[float64-1-32768-32]          3.3047 (1.0)    
test_sosfilt_gpu[float64-2-32768-32]          3.3225 (1.01)   
test_sosfilt_gpu[float64-10-32768-32]         3.3245 (1.01)   
test_sosfilt_gpu[float64-1-32768-64]          3.4527 (1.04)   
test_sosfilt_gpu[float64-2-32768-64]          3.4801 (1.05)   
test_sosfilt_gpu[float64-10-32768-64]         3.4838 (1.05)   
test_sosfilt_gpu[float64-1-1048576-32]      117.7112 (35.62)  
test_sosfilt_gpu[float64-2-1048576-32]      117.8699 (35.67)  
test_sosfilt_gpu[float64-10-1048576-32]     118.0036 (35.71)  
test_sosfilt_gpu[float64-1-1048576-64]      122.9908 (37.22)  
test_sosfilt_gpu[float64-10-1048576-64]     123.8355 (37.47)  
test_sosfilt_gpu[float64-2-1048576-64]      124.6956 (37.73)  
--------------------------------------------------------------

OLD TITAN RTX

--------------- benchmark 'SOSFilt': 12 tests ----------------
Name (time in ms)                               Mean          
--------------- benchmark 'SOSFilt': 12 tests ----------------
Name (time in ms)                               Mean          
--------------------------------------------------------------
test_sosfilt_gpu[float64-1-32768-32]          8.5484 (1.0)    
test_sosfilt_gpu[float64-2-32768-32]          8.5649 (1.00)   
test_sosfilt_gpu[float64-10-32768-32]         8.5718 (1.00)   
test_sosfilt_gpu[float64-1-32768-64]          8.8087 (1.03)   
test_sosfilt_gpu[float64-2-32768-64]          8.8140 (1.03)   
test_sosfilt_gpu[float64-10-32768-64]         8.8212 (1.03)   
test_sosfilt_gpu[float64-1-1048576-32]      285.1278 (33.35)  
test_sosfilt_gpu[float64-2-1048576-32]      285.1480 (33.36)  
test_sosfilt_gpu[float64-10-1048576-32]     287.5959 (33.64)  
test_sosfilt_gpu[float64-1-1048576-64]      292.9561 (34.27)  
test_sosfilt_gpu[float64-2-1048576-64]      292.9968 (34.28)  
test_sosfilt_gpu[float64-10-1048576-64]     295.4800 (34.57)  
--------------------------------------------------------------

NEW TITAN RTX

--------------- benchmark 'SOSFilt': 12 tests ----------------
Name (time in ms)                               Mean          
--------------------------------------------------------------
test_sosfilt_gpu[float64-10-32768-32]         5.6839 (1.0)    
test_sosfilt_gpu[float64-2-32768-32]          5.6884 (1.00)   
test_sosfilt_gpu[float64-1-32768-64]          5.7230 (1.01)   
test_sosfilt_gpu[float64-2-32768-64]          5.7306 (1.01)   
test_sosfilt_gpu[float64-10-32768-64]         5.7400 (1.01)   
test_sosfilt_gpu[float64-1-32768-32]          5.7771 (1.02)   
test_sosfilt_gpu[float64-1-1048576-32]      192.6604 (33.90)  
test_sosfilt_gpu[float64-2-1048576-32]      192.6867 (33.90)  
test_sosfilt_gpu[float64-10-1048576-32]     192.9184 (33.94)  
test_sosfilt_gpu[float64-1-1048576-64]      193.9454 (34.12)  
test_sosfilt_gpu[float64-2-1048576-64]      193.9595 (34.12)  
test_sosfilt_gpu[float64-10-1048576-64]     194.1982 (34.17)  
--------------------------------------------------------------

@mnicely mnicely added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 23, 2021
@mnicely mnicely requested a review from awthomp May 23, 2021 12:45
@mnicely mnicely requested a review from a team as a code owner May 23, 2021 12:45
@mnicely mnicely self-assigned this May 23, 2021
@mnicely mnicely added this to PR-WIP in cusignal v21.06 Release via automation May 23, 2021
@mnicely mnicely added 2 - In Progress Currenty a work in progress and removed 3 - Ready for Review Ready for review by team labels May 24, 2021
@mnicely mnicely added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currenty a work in progress labels May 24, 2021
@mnicely mnicely changed the title Speedup sos Perf Improvements to SOS Filter May 24, 2021
cusignal v21.06 Release automation moved this from PR-WIP to PR-Reviewer approved May 24, 2021
@awthomp
Copy link
Member

awthomp commented May 24, 2021

rerun tests

1 similar comment
@awthomp
Copy link
Member

awthomp commented May 25, 2021

rerun tests

@awthomp
Copy link
Member

awthomp commented May 25, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 5138415 into rapidsai:branch-21.06 May 25, 2021
cusignal v21.06 Release automation moved this from PR-Reviewer approved to Done May 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
3 - Ready for Review Ready for review by team cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants