[SYCL] refactor #6408

airMeng · 2024-03-31T10:54:09Z

according to #5277 (reply in thread), the PR does the following:

separate dpct generated headers for future maintaining
separate GEMM related operators for future template-based library introduction , AKA XeTLA
~~- [ ] let the common backend to handle H2D/D2H memcpy.~~ let the PR as simple as possible

airMeng · 2024-03-31T10:58:21Z

@slaren Since we can put SYCL related code under a directory instead of a single file, I might introduce headers-only library for performance optimization, as well as simplifying our effort too (my job during work time 😁 )

@ggerganov @mingfeima for aware

github-actions · 2024-03-31T11:14:48Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3: 504 iterations 🚀

Concurrent users: 8, duration: 10m
HTTP request : avg=9274.74ms p(90)=26479.05ms fails=0, finish reason: stop=504 truncated=0
Prompt processing (pp): avg=241.61tk/s p(90)=732.4tk/s total=200.65tk/s
Token generation (tg): avg=102.96tk/s p(90)=278.78tk/s total=129.75tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=sycl-refactor commit=a2e77e60d6d1e208096aae27e24a23ff9821c58b

Time series

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 504 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1711883059 --> 1711883683
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 342.78, 342.78, 342.78, 342.78, 342.78, 719.03, 719.03, 719.03, 719.03, 719.03, 746.58, 746.58, 746.58, 746.58, 746.58, 764.11, 764.11, 764.11, 764.11, 764.11, 801.5, 801.5, 801.5, 801.5, 801.5, 801.28, 801.28, 801.28, 801.28, 801.28, 795.7, 795.7, 795.7, 795.7, 795.7, 775.96, 775.96, 775.96, 775.96, 775.96, 772.84, 772.84, 772.84, 772.84, 772.84, 776.85, 776.85, 776.85, 776.85, 776.85, 770.35, 770.35, 770.35, 770.35, 770.35, 774.25, 774.25, 774.25, 774.25, 774.25, 772.86, 772.86, 772.86, 772.86, 772.86, 785.43, 785.43, 785.43, 785.43, 785.43, 811.79, 811.79, 811.79, 811.79, 811.79, 757.51, 757.51, 757.51, 757.51, 757.51, 758.3, 758.3, 758.3, 758.3, 758.3, 755.16, 755.16, 755.16, 755.16, 755.16, 760.58, 760.58, 760.58, 760.58, 760.58, 757.26, 757.26, 757.26, 757.26, 757.26, 754.56, 754.56, 754.56, 754.56, 754.56, 753.07, 753.07, 753.07, 753.07, 753.07, 754.99, 754.99, 754.99, 754.99, 754.99, 754.33, 754.33, 754.33, 754.33, 754.33, 748.26, 748.26, 748.26, 748.26, 748.26, 754.61, 754.61, 754.61, 754.61, 754.61, 750.37, 750.37, 750.37, 750.37, 750.37, 749.32, 749.32, 749.32, 749.32, 749.32, 754.07, 754.07, 754.07, 754.07, 754.07, 751.72, 751.72, 751.72, 751.72, 751.72, 750.1, 750.1, 750.1, 750.1, 750.1, 750.7, 750.7, 750.7, 750.7, 750.7, 751.0, 751.0, 751.0, 751.0, 751.0, 749.45, 749.45, 749.45, 749.45, 749.45, 751.5, 751.5, 751.5, 751.5, 751.5, 761.28, 761.28, 761.28, 761.28, 761.28, 763.73, 763.73, 763.73, 763.73, 763.73, 764.15, 764.15, 764.15, 764.15, 764.15, 766.85, 766.85, 766.85, 766.85, 766.85, 764.68, 764.68, 764.68, 764.68, 764.68, 763.52, 763.52, 763.52, 763.52, 763.52, 748.97, 748.97, 748.97, 748.97, 748.97, 755.98, 755.98, 755.98, 755.98, 755.98, 731.38, 731.38, 731.38, 731.38, 731.38, 727.87, 727.87, 727.87, 727.87, 727.87, 727.34, 727.34, 727.34, 727.34, 727.34, 725.19, 725.19, 725.19, 725.19, 725.19, 723.16, 723.16, 723.16, 723.16, 723.16, 720.23, 720.23, 720.23, 720.23, 720.23, 721.25, 721.25, 721.25, 721.25, 721.25, 725.33, 725.33, 725.33, 725.33, 725.33, 725.09, 725.09, 725.09, 725.09, 725.09, 724.9, 724.9, 724.9, 724.9, 724.9, 727.62, 727.62, 727.62, 727.62, 727.62, 730.84, 730.84, 730.84, 730.84, 730.84, 731.61, 731.61, 731.61, 731.61, 731.61, 731.14, 731.14, 731.14, 731.14, 731.14, 730.26, 730.26, 730.26, 730.26, 730.26, 732.05, 732.05, 732.05, 732.05, 732.05, 732.57, 732.57, 732.57, 732.57, 732.57, 731.56, 731.56, 731.56]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 504 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1711883059 --> 1711883683
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 33.19, 33.19, 33.19, 33.19, 33.19, 26.34, 26.34, 26.34, 26.34, 26.34, 16.98, 16.98, 16.98, 16.98, 16.98, 17.33, 17.33, 17.33, 17.33, 17.33, 17.51, 17.51, 17.51, 17.51, 17.51, 18.31, 18.31, 18.31, 18.31, 18.31, 19.13, 19.13, 19.13, 19.13, 19.13, 19.71, 19.71, 19.71, 19.71, 19.71, 19.78, 19.78, 19.78, 19.78, 19.78, 19.86, 19.86, 19.86, 19.86, 19.86, 19.79, 19.79, 19.79, 19.79, 19.79, 19.71, 19.71, 19.71, 19.71, 19.71, 19.39, 19.39, 19.39, 19.39, 19.39, 19.13, 19.13, 19.13, 19.13, 19.13, 18.87, 18.87, 18.87, 18.87, 18.87, 18.25, 18.25, 18.25, 18.25, 18.25, 18.14, 18.14, 18.14, 18.14, 18.14, 18.14, 18.14, 18.14, 18.14, 18.14, 18.28, 18.28, 18.28, 18.28, 18.28, 18.18, 18.18, 18.18, 18.18, 18.18, 18.0, 18.0, 18.0, 18.0, 18.0, 17.97, 17.97, 17.97, 17.97, 17.97, 17.84, 17.84, 17.84, 17.84, 17.84, 17.82, 17.82, 17.82, 17.82, 17.82, 17.89, 17.89, 17.89, 17.89, 17.89, 17.93, 17.93, 17.93, 17.93, 17.93, 17.86, 17.86, 17.86, 17.86, 17.86, 17.87, 17.87, 17.87, 17.87, 17.87, 17.94, 17.94, 17.94, 17.94, 17.94, 17.9, 17.9, 17.9, 17.9, 17.9, 17.85, 17.85, 17.85, 17.85, 17.85, 17.99, 17.99, 17.99, 17.99, 17.99, 18.08, 18.08, 18.08, 18.08, 18.08, 18.19, 18.19, 18.19, 18.19, 18.19, 18.29, 18.29, 18.29, 18.29, 18.29, 18.29, 18.29, 18.29, 18.29, 18.29, 18.3, 18.3, 18.3, 18.3, 18.3, 18.29, 18.29, 18.29, 18.29, 18.29, 18.19, 18.19, 18.19, 18.19, 18.19, 18.18, 18.18, 18.18, 18.18, 18.18, 18.22, 18.22, 18.22, 18.22, 18.22, 18.33, 18.33, 18.33, 18.33, 18.33, 18.37, 18.37, 18.37, 18.37, 18.37, 18.32, 18.32, 18.32, 18.32, 18.32, 18.28, 18.28, 18.28, 18.28, 18.28, 18.2, 18.2, 18.2, 18.2, 18.2, 17.98, 17.98, 17.98, 17.98, 17.98, 17.77, 17.77, 17.77, 17.77, 17.77, 17.51, 17.51, 17.51, 17.51, 17.51, 17.35, 17.35, 17.35, 17.35, 17.35, 17.33, 17.33, 17.33, 17.33, 17.33, 17.37, 17.37, 17.37, 17.37, 17.37, 17.44, 17.44, 17.44, 17.44, 17.44, 17.46, 17.46, 17.46, 17.46, 17.46, 17.5, 17.5, 17.5, 17.5, 17.5, 17.53, 17.53, 17.53, 17.53, 17.53, 17.53, 17.53, 17.53, 17.53, 17.53, 17.5, 17.5, 17.5, 17.5, 17.5, 17.48, 17.48, 17.48, 17.48, 17.48, 17.46, 17.46, 17.46, 17.46, 17.46, 17.52, 17.52, 17.52]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 504 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1711883059 --> 1711883683
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.09, 0.09, 0.09, 0.09, 0.09, 0.25, 0.25, 0.25, 0.25, 0.25, 0.2, 0.2, 0.2, 0.2, 0.2, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.08, 0.08, 0.08, 0.08, 0.08, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.27, 0.27, 0.27, 0.27, 0.27, 0.21, 0.21, 0.21, 0.21, 0.21, 0.29, 0.29, 0.29, 0.29, 0.29, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.23, 0.23, 0.23, 0.23, 0.23, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.27, 0.27, 0.27, 0.27, 0.27, 0.26, 0.26, 0.26, 0.26, 0.26, 0.24, 0.24, 0.24, 0.24, 0.24, 0.31, 0.31, 0.31, 0.31, 0.31, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.31, 0.31, 0.31, 0.31, 0.31, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.3, 0.3, 0.3, 0.3, 0.3, 0.22, 0.22, 0.22, 0.22, 0.22, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.07, 0.07, 0.07, 0.07, 0.07, 0.25, 0.25, 0.25, 0.25, 0.25, 0.41, 0.41, 0.41, 0.41, 0.41, 0.46, 0.46, 0.46, 0.46, 0.46, 0.5, 0.5, 0.5, 0.5, 0.5, 0.52, 0.52, 0.52, 0.52, 0.52, 0.54, 0.54, 0.54, 0.54, 0.54, 0.3, 0.3, 0.3, 0.3, 0.3, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.26, 0.26, 0.26, 0.26, 0.26, 0.23, 0.23, 0.23, 0.23, 0.23, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 504 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1711883059 --> 1711883683
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0]

slaren · 2024-03-31T19:56:54Z

@slaren Since we can put SYCL related code under a directory instead of a single file, I might introduce headers-only library for performance optimization, as well as simplifying our effort too (my job during work time 😁 )

I think that's good, I plan to start using CUTLASS in the CUDA backend as well.

NeoZhangJianyu · 2024-04-01T02:10:12Z

It's great to see the new structure.
Current SYCL backend has bugs which impact IQ3 model and UT pass rate is dropped too.
I'm working to fix them now.
Is it possible to wait for my fix?

abhilash1910 · 2024-04-01T04:21:37Z

This is a good refactoring, and would be helpful for debug . I would suggest waiting for some iq quant prs and then resume work on this.

airMeng · 2024-04-01T04:47:44Z

This is a good refactoring, and would be helpful for debug . I would suggest waiting for some iq quant prs and then resume work on this.

It's great to see the new structure. Current SYCL backend has bugs which impact IQ3 model and UT pass rate is dropped too. I'm working to fix them now. Is it possible to wait for my fix?

yes, drop a note when you finished.

NeoZhangJianyu · 2024-04-07T03:04:01Z

@airMeng
All IQ types in this PR are supported/fixed by #6521.
You could continue your work now.

Thank you!

airMeng · 2024-05-05T13:56:18Z

@NeoZhangJianyu @abhilash1910

ggml-sycl/common.hpp

ggml-sycl.cpp

NeoZhangJianyu · 2024-05-06T00:41:33Z

Build with fp16 is fault, please check and fix.
Please run ci/run.sh to make sure the quality not be reduced.

NeoZhangJianyu · 2024-05-06T08:55:29Z

for sub folder: dpct
I suggest not to use folder for "dpct". save them to two file in ggml-sycl foder, like dpct-helper.cpp/hpp.

There won't be more files in dpct part. no need to add a subfolder for 2 files.
The dpct files are updated for llama.cpp requirement manfully.
save them to dpct folder, will make other think it's copied from dcpt directly.

github-actions · 2024-05-22T06:12:37Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 545 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8562.08ms p(95)=21241.45ms fails=, finish reason: stop=483 truncated=62
Prompt processing (pp): avg=100.34tk/s p(95)=436.58tk/s
Token generation (tg): avg=34.62tk/s p(95)=48.49tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=sycl-refactor commit=50dffa13d8f947a077a03478aaf26dc70bdc7ecd

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716792903 --> 1716793529
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 260.58, 260.58, 260.58, 260.58, 260.58, 529.01, 529.01, 529.01, 529.01, 529.01, 580.94, 580.94, 580.94, 580.94, 580.94, 604.47, 604.47, 604.47, 604.47, 604.47, 666.81, 666.81, 666.81, 666.81, 666.81, 702.25, 702.25, 702.25, 702.25, 702.25, 706.59, 706.59, 706.59, 706.59, 706.59, 722.98, 722.98, 722.98, 722.98, 722.98, 734.39, 734.39, 734.39, 734.39, 734.39, 748.38, 748.38, 748.38, 748.38, 748.38, 751.73, 751.73, 751.73, 751.73, 751.73, 776.2, 776.2, 776.2, 776.2, 776.2, 809.65, 809.65, 809.65, 809.65, 809.65, 826.3, 826.3, 826.3, 826.3, 826.3, 799.07, 799.07, 799.07, 799.07, 799.07, 805.11, 805.11, 805.11, 805.11, 805.11, 806.56, 806.56, 806.56, 806.56, 806.56, 828.54, 828.54, 828.54, 828.54, 828.54, 826.27, 826.27, 826.27, 826.27, 826.27, 828.39, 828.39, 828.39, 828.39, 828.39, 835.14, 835.14, 835.14, 835.14, 835.14, 837.46, 837.46, 837.46, 837.46, 837.46, 839.34, 839.34, 839.34, 839.34, 839.34, 833.72, 833.72, 833.72, 833.72, 833.72, 835.88, 835.88, 835.88, 835.88, 835.88, 833.6, 833.6, 833.6, 833.6, 833.6, 829.17, 829.17, 829.17, 829.17, 829.17, 826.01, 826.01, 826.01, 826.01, 826.01, 826.41, 826.41, 826.41, 826.41, 826.41, 825.1, 825.1, 825.1, 825.1, 825.1, 830.84, 830.84, 830.84, 830.84, 830.84, 830.68, 830.68, 830.68, 830.68, 830.68, 831.17, 831.17, 831.17, 831.17, 831.17, 830.41, 830.41, 830.41, 830.41, 830.41, 837.67, 837.67, 837.67, 837.67, 837.67, 844.09, 844.09, 844.09, 844.09, 844.09, 829.71, 829.71, 829.71, 829.71, 829.71, 828.34, 828.34, 828.34, 828.34, 828.34, 825.44, 825.44, 825.44, 825.44, 825.44, 828.53, 828.53, 828.53, 828.53, 828.53, 831.82, 831.82, 831.82, 831.82, 831.82, 842.69, 842.69, 842.69, 842.69, 842.69, 849.9, 849.9, 849.9, 849.9, 849.9, 850.11, 850.11, 850.11, 850.11, 850.11, 848.99, 848.99, 848.99, 848.99, 848.99, 847.99, 847.99, 847.99, 847.99, 847.99, 848.54, 848.54, 848.54, 848.54, 848.54, 854.38, 854.38, 854.38, 854.38, 854.38, 854.0, 854.0, 854.0, 854.0, 854.0, 859.98, 859.98, 859.98, 859.98, 859.98, 858.29, 858.29, 858.29, 858.29, 858.29, 862.63, 862.63, 862.63, 862.63, 862.63, 865.27, 865.27, 865.27, 865.27, 865.27, 864.52, 864.52, 864.52, 864.52, 864.52, 870.62, 870.62, 870.62, 870.62, 870.62, 869.74, 869.74, 869.74, 869.74, 869.74, 870.07, 870.07, 870.07, 870.07, 870.07, 870.66, 870.66, 870.66, 870.66, 870.66, 870.74, 870.74, 870.74, 870.74, 870.74, 872.29, 872.29, 872.29, 872.29, 872.29, 875.08, 875.08, 875.08, 875.08]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716792903 --> 1716793529
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 30.13, 30.13, 30.13, 30.13, 30.13, 27.02, 27.02, 27.02, 27.02, 27.02, 27.75, 27.75, 27.75, 27.75, 27.75, 28.89, 28.89, 28.89, 28.89, 28.89, 30.25, 30.25, 30.25, 30.25, 30.25, 32.6, 32.6, 32.6, 32.6, 32.6, 33.76, 33.76, 33.76, 33.76, 33.76, 33.96, 33.96, 33.96, 33.96, 33.96, 34.06, 34.06, 34.06, 34.06, 34.06, 33.46, 33.46, 33.46, 33.46, 33.46, 33.69, 33.69, 33.69, 33.69, 33.69, 33.24, 33.24, 33.24, 33.24, 33.24, 32.91, 32.91, 32.91, 32.91, 32.91, 32.47, 32.47, 32.47, 32.47, 32.47, 32.17, 32.17, 32.17, 32.17, 32.17, 32.46, 32.46, 32.46, 32.46, 32.46, 32.48, 32.48, 32.48, 32.48, 32.48, 32.05, 32.05, 32.05, 32.05, 32.05, 31.74, 31.74, 31.74, 31.74, 31.74, 31.47, 31.47, 31.47, 31.47, 31.47, 31.54, 31.54, 31.54, 31.54, 31.54, 31.63, 31.63, 31.63, 31.63, 31.63, 31.33, 31.33, 31.33, 31.33, 31.33, 31.51, 31.51, 31.51, 31.51, 31.51, 31.77, 31.77, 31.77, 31.77, 31.77, 31.88, 31.88, 31.88, 31.88, 31.88, 31.64, 31.64, 31.64, 31.64, 31.64, 31.17, 31.17, 31.17, 31.17, 31.17, 31.26, 31.26, 31.26, 31.26, 31.26, 31.43, 31.43, 31.43, 31.43, 31.43, 31.63, 31.63, 31.63, 31.63, 31.63, 31.87, 31.87, 31.87, 31.87, 31.87, 31.9, 31.9, 31.9, 31.9, 31.9, 31.79, 31.79, 31.79, 31.79, 31.79, 31.62, 31.62, 31.62, 31.62, 31.62, 31.6, 31.6, 31.6, 31.6, 31.6, 31.58, 31.58, 31.58, 31.58, 31.58, 31.49, 31.49, 31.49, 31.49, 31.49, 31.65, 31.65, 31.65, 31.65, 31.65, 31.8, 31.8, 31.8, 31.8, 31.8, 31.9, 31.9, 31.9, 31.9, 31.9, 31.69, 31.69, 31.69, 31.69, 31.69, 31.3, 31.3, 31.3, 31.3, 31.3, 30.92, 30.92, 30.92, 30.92, 30.92, 30.43, 30.43, 30.43, 30.43, 30.43, 29.76, 29.76, 29.76, 29.76, 29.76, 29.78, 29.78, 29.78, 29.78, 29.78, 29.83, 29.83, 29.83, 29.83, 29.83, 30.0, 30.0, 30.0, 30.0, 30.0, 30.08, 30.08, 30.08, 30.08, 30.08, 30.16, 30.16, 30.16, 30.16, 30.16, 30.17, 30.17, 30.17, 30.17, 30.17, 29.98, 29.98, 29.98, 29.98, 29.98, 29.93, 29.93, 29.93, 29.93, 29.93, 29.9, 29.9, 29.9, 29.9, 29.9, 30.02, 30.02, 30.02, 30.02, 30.02, 30.18, 30.18, 30.18, 30.18, 30.18, 30.25, 30.25, 30.25, 30.25, 30.25, 30.34, 30.34, 30.34, 30.34, 30.34, 30.38, 30.38, 30.38, 30.38]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716792903 --> 1716793529
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.07, 0.07, 0.07, 0.07, 0.07, 0.35, 0.35, 0.35, 0.35, 0.35, 0.31, 0.31, 0.31, 0.31, 0.31, 0.18, 0.18, 0.18, 0.18, 0.18, 0.2, 0.2, 0.2, 0.2, 0.2, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.12, 0.12, 0.12, 0.12, 0.12, 0.23, 0.23, 0.23, 0.23, 0.23, 0.28, 0.28, 0.28, 0.28, 0.28, 0.19, 0.19, 0.19, 0.19, 0.19, 0.2, 0.2, 0.2, 0.2, 0.2, 0.19, 0.19, 0.19, 0.19, 0.19, 0.16, 0.16, 0.16, 0.16, 0.16, 0.3, 0.3, 0.3, 0.3, 0.3, 0.23, 0.23, 0.23, 0.23, 0.23, 0.33, 0.33, 0.33, 0.33, 0.33, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.31, 0.31, 0.31, 0.31, 0.31, 0.31, 0.31, 0.31, 0.31, 0.31, 0.26, 0.26, 0.26, 0.26, 0.26, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.28, 0.28, 0.28, 0.28, 0.28, 0.22, 0.22, 0.22, 0.22, 0.22, 0.08, 0.08, 0.08, 0.08, 0.08, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.35, 0.35, 0.35, 0.35, 0.35, 0.47, 0.47, 0.47, 0.47, 0.47, 0.57, 0.57, 0.57, 0.57, 0.57, 0.52, 0.52, 0.52, 0.52, 0.52, 0.5, 0.5, 0.5, 0.5, 0.5, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.2, 0.2, 0.2, 0.2, 0.2, 0.27, 0.27, 0.27, 0.27, 0.27, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.08, 0.08, 0.08, 0.08, 0.08, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716792903 --> 1716793529
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0]

mofosyne · 2024-05-22T09:03:00Z

There was a lot of changes which caused conflict. Unable to see how to easily resolve it. @airMeng can you see if there is much that needs to be fixed?

airMeng · 2024-05-22T09:06:42Z

There was a lot of changes which caused conflict. Unable to see how to easily resolve it. @airMeng can you see if there is much that needs to be fixed?

never mind. just more time needed.

seperate mmq, mmvq, dmmv from the main files avoid g_sycl_gpu_mgr null fix no new line fix backend no new line fix fp16 issues no final newlines backup backup backup backup backup

airMeng requested review from slaren, NeoZhangJianyu and abhilash1910 March 31, 2024 10:54

airMeng changed the title ~~[SYCL refactor~~ [SYCL] refactor Apr 1, 2024

phymbert mentioned this pull request Apr 1, 2024

server: bench: continuous performance testing #6233

Open

16 tasks

airMeng force-pushed the sycl-refactor branch from a2e77e6 to de88518 Compare April 25, 2024 14:03

airMeng marked this pull request as draft April 25, 2024 14:04

airMeng force-pushed the sycl-refactor branch from 839cc90 to ee2f923 Compare April 30, 2024 09:23

airMeng marked this pull request as ready for review April 30, 2024 09:37

abhilash1910 reviewed May 5, 2024

View reviewed changes

ggml-sycl/common.hpp Show resolved Hide resolved

abhilash1910 reviewed May 5, 2024

View reviewed changes

ggml-sycl.cpp Outdated Show resolved Hide resolved

abhilash1910 reviewed May 5, 2024

View reviewed changes

ggml-sycl.cpp Outdated Show resolved Hide resolved

airMeng force-pushed the sycl-refactor branch from dc57207 to adc3d54 Compare May 6, 2024 03:47

mofosyne added review complexity : high Generally require indepth knowledge of LLMs or GPUs enhancement New feature or request labels May 10, 2024

Thellton mentioned this pull request May 13, 2024

Native Intel IPEX-LLM Support #7190

Open

airMeng force-pushed the sycl-refactor branch from 4b561bd to 27c3f29 Compare May 22, 2024 03:55

github-actions bot added build Compilation issues ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 22, 2024

mofosyne marked this pull request as draft May 22, 2024 09:52

seperate dpct helper functions

50dffa1

seperate mmq, mmvq, dmmv from the main files avoid g_sycl_gpu_mgr null fix no new line fix backend no new line fix fp16 issues no final newlines backup backup backup backup backup

airMeng force-pushed the sycl-refactor branch from a458e6a to 50dffa1 Compare May 27, 2024 06:35

airMeng mentioned this pull request May 27, 2024

[SYCL] Align GEMM dispatch #7566

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] refactor #6408

[SYCL] refactor #6408

airMeng commented Mar 31, 2024 •

edited

airMeng commented Mar 31, 2024 •

edited

github-actions bot commented Mar 31, 2024

slaren commented Mar 31, 2024

NeoZhangJianyu commented Apr 1, 2024

abhilash1910 commented Apr 1, 2024

airMeng commented Apr 1, 2024

NeoZhangJianyu commented Apr 7, 2024

airMeng commented May 5, 2024

NeoZhangJianyu commented May 6, 2024

NeoZhangJianyu commented May 6, 2024

github-actions bot commented May 22, 2024 •

edited

mofosyne commented May 22, 2024

airMeng commented May 22, 2024

[SYCL] refactor #6408

Are you sure you want to change the base?

[SYCL] refactor #6408

Conversation

airMeng commented Mar 31, 2024 • edited

airMeng commented Mar 31, 2024 • edited

github-actions bot commented Mar 31, 2024

slaren commented Mar 31, 2024

NeoZhangJianyu commented Apr 1, 2024

abhilash1910 commented Apr 1, 2024

airMeng commented Apr 1, 2024

NeoZhangJianyu commented Apr 7, 2024

airMeng commented May 5, 2024

NeoZhangJianyu commented May 6, 2024

NeoZhangJianyu commented May 6, 2024

github-actions bot commented May 22, 2024 • edited

mofosyne commented May 22, 2024

airMeng commented May 22, 2024

airMeng commented Mar 31, 2024 •

edited

airMeng commented Mar 31, 2024 •

edited

github-actions bot commented May 22, 2024 •

edited