Skip to content

Conversation

@igchor
Copy link
Contributor

@igchor igchor commented Jul 30, 2024

to allow switching the implementation to v2::ur_event_handle_t. This is similar to how ur_queue_handle_t behaves right now.

@github-actions github-actions bot added images UR images level-zero L0 adapter specific issues command-buffer Command Buffer feature addition/changes/specification labels Jul 30, 2024
to allow switching the implementation to v2::ur_event_handle_t.
This is similar to how ur_queue_handle_t behaves right now.
@github-actions
Copy link
Contributor

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/10176071033

@github-actions
Copy link
Contributor

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/10176071033
Job status: success. Test status: success.

Summary

Name Result %
This PR 100.00%
baseline 102.44%

Benchmark Results

---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl, mean execution time per 10 kernels
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>Imm-CmdLists-OFF

        This PR (23.04 μs)   : crit, 0, 23

        baseline (22.705 μs)   :  0, 22

    -   : 0, 0

    -   : 0, 0

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>Imm-CmdLists-OFF

        This PR (26.196 μs)   : crit, 0, 26

        baseline (23.606 μs)   :  0, 23

    -   : 0, 0

    -   : 0, 0

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>

        This PR (23.467 μs)   : crit, 0, 23

        baseline (23.62 μs)   :  0, 23

    -   : 0, 0

    -   : 0, 0

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>

        This PR (25.934 μs)   : crit, 0, 25

        baseline (25.476 μs)   :  0, 25

    -   : 0, 0

    -   : 0, 0

Loading
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Hashtable
    todayMarker off
    dateFormat  X
    axisFormat %s

    section hashtable<br>Imm-CmdLists-OFF

        This PR (328.870693 M keys/sec)   : crit, 0, 328

        baseline (306.262877 M keys/sec)   :  0, 306

    -   : 0, 0

    -   : 0, 0

    section hashtable<br>

        This PR (331.496686 M keys/sec)   : crit, 0, 331

        baseline (360.15055 M keys/sec)   :  0, 360

    -   : 0, 0

    -   : 0, 0

Loading
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Bitcracker
    todayMarker off
    dateFormat  X
    axisFormat %s

    section bitcracker<br>Imm-CmdLists-OFF

        This PR (35.6003 s)   : crit, 0, 35

        baseline (39.0378 s)   :  0, 39

    -   : 0, 0

    -   : 0, 0

    section bitcracker<br>

        This PR (35.6791 s)   : crit, 0, 35

        baseline (35.6105 s)   :  0, 35

    -   : 0, 0

    -   : 0, 0

Loading
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Easywave
    todayMarker off
    dateFormat  X
    axisFormat %s

    section easywave<br>Imm-CmdLists-OFF

        This PR (435 ms)   : crit, 0, 435

        baseline (606.0 ms)   :  0, 606

    -   : 0, 0

    -   : 0, 0

    section easywave<br>

        This PR (247 ms)   : crit, 0, 247

        baseline (241.0 ms)   :  0, 241

    -   : 0, 0

    -   : 0, 0

Loading
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench QuickSilver
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QuickSilver<br>

        This PR (117.45 MMS/CTT)   : crit, 0, 117

        baseline (110.88 MMS/CTT)   :  0, 110

    -   : 0, 0

    -   : 0, 0

Loading
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Sobel Filter
    todayMarker off
    dateFormat  X
    axisFormat %s

    section sobel_filter<br>Imm-CmdLists-OFF

        This PR (609.857 ms)   : crit, 0, 609

        baseline (609.227 ms)   :  0, 609

    -   : 0, 0

    -   : 0, 0

    section sobel_filter<br>

        This PR (604.705 ms)   : crit, 0, 604

        baseline (548.773 ms)   :  0, 548

    -   : 0, 0

    -   : 0, 0

Loading

Details

SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0) Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),23.040,23.198,9.19%,20.644,660.076,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0) Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),26.196,26.197,6.89%,24.402,545.454,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),23.467,23.385,5.80%,22.693,426.678,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.934,26.000,6.14%,22.739,481.314,[CPU],[us]

hashtable Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.408117 s
328.870693 million keys/second

hashtable

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.404884 s
331.496686 million keys/second

bitcracker Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00551626 s
bitcracker - total time for whole calculation: 35.6003 s

bitcracker

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00470873 s
bitcracker - total time for whole calculation: 35.6791 s

easywave Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.29735+27)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

easywave

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.29735+27)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

QuickSilver

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1
QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.348020e-01 6.180640e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.454960e-01 7.508410e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.446820e-01 7.707550e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.820720e-01 8.365420e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.491430e-01 7.984550e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.459840e-01 7.731310e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.416770e-01 7.716150e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.455780e-01 7.932000e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.451280e-01 7.924590e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.462900e-01 7.655020e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.125e+07 1.125e+07 1.125e+07 0.000e+00 100.00
cycleInit 10 3.581e+06 3.581e+06 3.581e+06 0.000e+00 100.00
cycleTracking 10 7.671e+06 7.671e+06 7.671e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.945e+06 4.945e+06 4.945e+06 0.000e+00 100.00
cycleTracking_MPI 117 1.951e+05 1.951e+05 1.951e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 3.950e+02 3.950e+02 3.950e+02 0.000e+00 100.00
Figure Of Merit 117.45 [Num Mega Segments / Cycle Tracking Time]

sobel_filter Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0
OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.46602 s
sobelfilter - total time for whole calculation: 0.609857 s

sobel_filter

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1
OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.43169 s
sobelfilter - total time for whole calculation: 0.604705 s

Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not the biggest fan of having to do this, but I don't see a better option...

@igchor
Copy link
Contributor Author

igchor commented Jul 31, 2024

Alternative approach (not fully working yet): #1913

@igchor
Copy link
Contributor Author

igchor commented Aug 5, 2024

Replaced by #1913

@igchor igchor closed this Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

command-buffer Command Buffer feature addition/changes/specification images UR images level-zero L0 adapter specific issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants