Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP server: bench: init #1

Closed
wants to merge 9 commits into from
Closed

Conversation

phymbert
Copy link
Owner

No description provided.

@phymbert phymbert changed the title server: bench: init WIP server: bench: init Mar 24, 2024
@phymbert phymbert force-pushed the hp/server/bench/workflow branch 6 times, most recently from d56cf62 to 1d24ac8 Compare March 25, 2024 17:09
@phymbert phymbert force-pushed the hp/server/bench/workflow branch 2 times, most recently from ad40723 to 66b7597 Compare March 25, 2024 17:38
Copy link

github-actions bot commented Mar 25, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3: 550 iterations 🚀

  • Concurrent users: 8
  • HTTP request : avg=8532.39ms p(90)=22744.31ms passes=550reqs fails=0reqs
  • Prompt processing (pp): avg=234.45tk/s p(90)=692.7tk/s total=210.59tk/s
  • Token generation (tg): avg=104.38tk/s p(90)=285.38tk/s total=141.86tk/s
  • Finish reason : stop=550reqs truncated=0
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=hp/server/bench/workflow commit=337c13b22688f627775c3892613926c54d2da1d1

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 550 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1711438926 --> 1711439562
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 293.25, 293.25, 293.25, 293.25, 293.25, 496.62, 496.62, 496.62, 496.62, 496.62, 529.88, 529.88, 529.88, 529.88, 529.88, 580.58, 580.58, 580.58, 580.58, 580.58, 610.76, 610.76, 610.76, 610.76, 610.76, 612.43, 612.43, 612.43, 612.43, 612.43, 624.14, 624.14, 624.14, 624.14, 624.14, 636.09, 636.09, 636.09, 636.09, 636.09, 653.65, 653.65, 653.65, 653.65, 653.65, 653.91, 653.91, 653.91, 653.91, 653.91, 659.43, 659.43, 659.43, 659.43, 659.43, 681.13, 681.13, 681.13, 681.13, 681.13, 719.61, 719.61, 719.61, 719.61, 719.61, 734.1, 734.1, 734.1, 734.1, 734.1, 728.06, 728.06, 728.06, 728.06, 728.06, 721.28, 721.28, 721.28, 721.28, 721.28, 721.06, 721.06, 721.06, 721.06, 721.06, 731.03, 731.03, 731.03, 731.03, 731.03, 732.75, 732.75, 732.75, 732.75, 732.75, 734.31, 734.31, 734.31, 734.31, 734.31, 733.6, 733.6, 733.6, 733.6, 733.6, 739.21, 739.21, 739.21, 739.21, 739.21, 743.2, 743.2, 743.2, 743.2, 743.2, 762.16, 762.16, 762.16, 762.16, 762.16, 765.73, 765.73, 765.73, 765.73, 765.73, 760.28, 760.28, 760.28, 760.28, 760.28, 758.78, 758.78, 758.78, 758.78, 758.78, 760.59, 760.59, 760.59, 760.59, 760.59, 760.55, 760.55, 760.55, 760.55, 760.55, 759.75, 759.75, 759.75, 759.75, 759.75, 762.93, 762.93, 762.93, 762.93, 762.93, 768.99, 768.99, 768.99, 768.99, 768.99, 771.3, 771.3, 771.3, 771.3, 771.3, 774.59, 774.59, 774.59, 774.59, 774.59, 783.55, 783.55, 783.55, 783.55, 783.55, 783.56, 783.56, 783.56, 783.56, 783.56, 782.78, 782.78, 782.78, 782.78, 782.78, 783.61, 783.61, 783.61, 783.61, 783.61, 786.84, 786.84, 786.84, 786.84, 786.84, 777.85, 777.85, 777.85, 777.85, 777.85, 774.45, 774.45, 774.45, 774.45, 774.45, 768.23, 768.23, 768.23, 768.23, 768.23, 765.2, 765.2, 765.2, 765.2, 765.2, 764.62, 764.62, 764.62, 764.62, 764.62, 761.44, 761.44, 761.44, 761.44, 761.44, 759.56, 759.56, 759.56, 759.56, 759.56, 760.69, 760.69, 760.69, 760.69, 760.69, 765.59, 765.59, 765.59, 765.59, 765.59, 765.69, 765.69, 765.69, 765.69, 765.69, 770.22, 770.22, 770.22, 770.22, 770.22, 771.6, 771.6, 771.6, 771.6, 771.6, 774.11, 774.11, 774.11, 774.11, 774.11, 772.95, 772.95, 772.95, 772.95, 772.95, 772.85, 772.85, 772.85, 772.85, 772.85, 771.26, 771.26, 771.26, 771.26, 771.26, 773.73, 773.73, 773.73, 773.73, 773.73, 773.61, 773.61, 773.61, 773.61, 773.61, 775.13, 775.13, 775.13, 775.13, 775.13, 775.88, 775.88, 775.88, 775.88, 775.88, 778.28, 778.28, 778.28, 778.28, 778.28, 778.28, 778.28, 778.28, 778.28]
                    

predicted_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 550 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1711438926 --> 1711439562
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 32.03, 32.03, 32.03, 32.03, 32.03, 18.98, 18.98, 18.98, 18.98, 18.98, 18.39, 18.39, 18.39, 18.39, 18.39, 18.63, 18.63, 18.63, 18.63, 18.63, 18.71, 18.71, 18.71, 18.71, 18.71, 19.28, 19.28, 19.28, 19.28, 19.28, 20.77, 20.77, 20.77, 20.77, 20.77, 21.04, 21.04, 21.04, 21.04, 21.04, 21.19, 21.19, 21.19, 21.19, 21.19, 21.25, 21.25, 21.25, 21.25, 21.25, 21.38, 21.38, 21.38, 21.38, 21.38, 21.45, 21.45, 21.45, 21.45, 21.45, 21.28, 21.28, 21.28, 21.28, 21.28, 21.06, 21.06, 21.06, 21.06, 21.06, 20.77, 20.77, 20.77, 20.77, 20.77, 20.19, 20.19, 20.19, 20.19, 20.19, 20.35, 20.35, 20.35, 20.35, 20.35, 20.5, 20.5, 20.5, 20.5, 20.5, 20.35, 20.35, 20.35, 20.35, 20.35, 20.22, 20.22, 20.22, 20.22, 20.22, 20.14, 20.14, 20.14, 20.14, 20.14, 20.01, 20.01, 20.01, 20.01, 20.01, 19.99, 19.99, 19.99, 19.99, 19.99, 19.93, 19.93, 19.93, 19.93, 19.93, 20.0, 20.0, 20.0, 20.0, 20.0, 20.11, 20.11, 20.11, 20.11, 20.11, 20.06, 20.06, 20.06, 20.06, 20.06, 20.15, 20.15, 20.15, 20.15, 20.15, 20.3, 20.3, 20.3, 20.3, 20.3, 20.37, 20.37, 20.37, 20.37, 20.37, 20.51, 20.51, 20.51, 20.51, 20.51, 20.56, 20.56, 20.56, 20.56, 20.56, 20.48, 20.48, 20.48, 20.48, 20.48, 20.4, 20.4, 20.4, 20.4, 20.4, 20.27, 20.27, 20.27, 20.27, 20.27, 20.24, 20.24, 20.24, 20.24, 20.24, 20.3, 20.3, 20.3, 20.3, 20.3, 20.35, 20.35, 20.35, 20.35, 20.35, 20.39, 20.39, 20.39, 20.39, 20.39, 20.37, 20.37, 20.37, 20.37, 20.37, 20.29, 20.29, 20.29, 20.29, 20.29, 20.17, 20.17, 20.17, 20.17, 20.17, 19.92, 19.92, 19.92, 19.92, 19.92, 19.86, 19.86, 19.86, 19.86, 19.86, 19.65, 19.65, 19.65, 19.65, 19.65, 19.12, 19.12, 19.12, 19.12, 19.12, 18.83, 18.83, 18.83, 18.83, 18.83, 18.85, 18.85, 18.85, 18.85, 18.85, 18.88, 18.88, 18.88, 18.88, 18.88, 18.9, 18.9, 18.9, 18.9, 18.9, 18.97, 18.97, 18.97, 18.97, 18.97, 18.98, 18.98, 18.98, 18.98, 18.98, 18.96, 18.96, 18.96, 18.96, 18.96, 18.96, 18.96, 18.96, 18.96, 18.96, 18.92, 18.92, 18.92, 18.92, 18.92, 18.97, 18.97, 18.97, 18.97, 18.97, 19.05, 19.05, 19.05, 19.05, 19.05, 19.13, 19.13, 19.13, 19.13, 19.13, 19.2, 19.2, 19.2, 19.2, 19.2, 19.27, 19.27, 19.27, 19.27, 19.27, 19.31, 19.31, 19.31, 19.31]
                    

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 550 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1711438926 --> 1711439562
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.17, 0.17, 0.17, 0.17, 0.17, 0.3, 0.3, 0.3, 0.3, 0.3, 0.18, 0.18, 0.18, 0.18, 0.18, 0.25, 0.25, 0.25, 0.25, 0.25, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.22, 0.22, 0.22, 0.22, 0.22, 0.29, 0.29, 0.29, 0.29, 0.29, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.1, 0.1, 0.1, 0.1, 0.1, 0.27, 0.27, 0.27, 0.27, 0.27, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.28, 0.28, 0.28, 0.28, 0.28, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.23, 0.23, 0.23, 0.23, 0.23, 0.16, 0.16, 0.16, 0.16, 0.16, 0.09, 0.09, 0.09, 0.09, 0.09, 0.13, 0.13, 0.13, 0.13, 0.13, 0.07, 0.07, 0.07, 0.07, 0.07, 0.28, 0.28, 0.28, 0.28, 0.28, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.09, 0.09, 0.09, 0.09, 0.09, 0.25, 0.25, 0.25, 0.25, 0.25, 0.34, 0.34, 0.34, 0.34, 0.34, 0.52, 0.52, 0.52, 0.52, 0.52, 0.59, 0.59, 0.59, 0.59, 0.59, 0.57, 0.57, 0.57, 0.57, 0.57, 0.62, 0.62, 0.62, 0.62, 0.62, 0.48, 0.48, 0.48, 0.48, 0.48, 0.39, 0.39, 0.39, 0.39, 0.39, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.09, 0.09, 0.09, 0.09, 0.09, 0.16, 0.16, 0.16, 0.16, 0.16, 0.29, 0.29, 0.29, 0.29, 0.29, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.07, 0.07, 0.07, 0.07, 0.07, 0.08, 0.08, 0.08, 0.08, 0.08, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.24, 0.24, 0.24, 0.24]
                    
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 550 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1711438926 --> 1711439562
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0]
                    

Repository owner deleted a comment from github-actions bot Mar 25, 2024
@phymbert phymbert force-pushed the hp/server/bench/workflow branch 2 times, most recently from 583e35d to d4bd981 Compare March 25, 2024 22:12
@phymbert phymbert force-pushed the master branch 3 times, most recently from 03fb674 to 561c8b8 Compare March 31, 2024 14:06
@phymbert phymbert closed this Apr 1, 2024
@phymbert phymbert deleted the hp/server/bench/workflow branch April 3, 2024 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant