Skip to content

Conversation

@jaime-m-p
Copy link
Collaborator

Use flags for each unicode category (\p{N}, \p{L}, \p{Z}, ...) instead of definitions CODEPOINT_TYPE_*.

Including helper flags for common regex params like \s (only this for now), \d, \w...

This simplifies writing custom regexs.

All flags are precomputed in unicode-data.cpp generated by gen-unicode-data.py.

@mofosyne mofosyne added Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level enhancement New feature or request labels May 13, 2024
@github-actions
Copy link
Contributor

github-actions bot commented May 14, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 568 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8218.58ms p(95)=19284.72ms fails=, finish reason: stop=510 truncated=58
  • Prompt processing (pp): avg=98.81tk/s p(95)=467.94tk/s
  • Token generation (tg): avg=34.6tk/s p(95)=46.94tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=codepoint-flags commit=2642da0ca8883994d20a73bebcd80f6f59b06c69

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 568 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715781090 --> 1715781724
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 552.77, 552.77, 552.77, 552.77, 552.77, 530.25, 530.25, 530.25, 530.25, 530.25, 565.57, 565.57, 565.57, 565.57, 565.57, 594.7, 594.7, 594.7, 594.7, 594.7, 663.85, 663.85, 663.85, 663.85, 663.85, 666.18, 666.18, 666.18, 666.18, 666.18, 671.2, 671.2, 671.2, 671.2, 671.2, 711.84, 711.84, 711.84, 711.84, 711.84, 726.85, 726.85, 726.85, 726.85, 726.85, 730.85, 730.85, 730.85, 730.85, 730.85, 759.78, 759.78, 759.78, 759.78, 759.78, 806.08, 806.08, 806.08, 806.08, 806.08, 838.98, 838.98, 838.98, 838.98, 838.98, 755.13, 755.13, 755.13, 755.13, 755.13, 763.35, 763.35, 763.35, 763.35, 763.35, 765.57, 765.57, 765.57, 765.57, 765.57, 789.55, 789.55, 789.55, 789.55, 789.55, 790.63, 790.63, 790.63, 790.63, 790.63, 792.19, 792.19, 792.19, 792.19, 792.19, 800.03, 800.03, 800.03, 800.03, 800.03, 803.15, 803.15, 803.15, 803.15, 803.15, 821.46, 821.46, 821.46, 821.46, 821.46, 819.94, 819.94, 819.94, 819.94, 819.94, 822.38, 822.38, 822.38, 822.38, 822.38, 804.32, 804.32, 804.32, 804.32, 804.32, 802.09, 802.09, 802.09, 802.09, 802.09, 802.51, 802.51, 802.51, 802.51, 802.51, 802.77, 802.77, 802.77, 802.77, 802.77, 807.95, 807.95, 807.95, 807.95, 807.95, 809.86, 809.86, 809.86, 809.86, 809.86, 811.02, 811.02, 811.02, 811.02, 811.02, 819.16, 819.16, 819.16, 819.16, 819.16, 833.24, 833.24, 833.24, 833.24, 833.24, 834.96, 834.96, 834.96, 834.96, 834.96, 844.27, 844.27, 844.27, 844.27, 844.27, 842.2, 842.2, 842.2, 842.2, 842.2, 842.25, 842.25, 842.25, 842.25, 842.25, 842.5, 842.5, 842.5, 842.5, 842.5, 842.64, 842.64, 842.64, 842.64, 842.64, 843.79, 843.79, 843.79, 843.79, 843.79, 839.55, 839.55, 839.55, 839.55, 839.55, 839.09, 839.09, 839.09, 839.09, 839.09, 838.12, 838.12, 838.12, 838.12, 838.12, 840.85, 840.85, 840.85, 840.85, 840.85, 843.17, 843.17, 843.17, 843.17, 843.17, 842.97, 842.97, 842.97, 842.97, 842.97, 846.65, 846.65, 846.65, 846.65, 846.65, 846.46, 846.46, 846.46, 846.46, 846.46, 849.43, 849.43, 849.43, 849.43, 849.43, 853.25, 853.25, 853.25, 853.25, 853.25, 852.79, 852.79, 852.79, 852.79, 852.79, 858.26, 858.26, 858.26, 858.26, 858.26, 859.93, 859.93, 859.93, 859.93, 859.93, 859.92, 859.92, 859.92, 859.92, 859.92, 861.01, 861.01, 861.01, 861.01, 861.01, 860.51, 860.51, 860.51, 860.51, 860.51, 861.91, 861.91, 861.91, 861.91, 861.91, 864.81, 864.81, 864.81, 864.81, 864.81, 863.75, 863.75, 863.75, 863.75, 863.75, 862.4, 862.4, 862.4, 862.4, 862.4, 861.87, 861.87, 861.87]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 568 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715781090 --> 1715781724
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 40.6, 40.6, 40.6, 40.6, 40.6, 41.18, 41.18, 41.18, 41.18, 41.18, 31.88, 31.88, 31.88, 31.88, 31.88, 32.84, 32.84, 32.84, 32.84, 32.84, 33.39, 33.39, 33.39, 33.39, 33.39, 33.82, 33.82, 33.82, 33.82, 33.82, 34.63, 34.63, 34.63, 34.63, 34.63, 35.41, 35.41, 35.41, 35.41, 35.41, 35.8, 35.8, 35.8, 35.8, 35.8, 35.56, 35.56, 35.56, 35.56, 35.56, 35.49, 35.49, 35.49, 35.49, 35.49, 34.85, 34.85, 34.85, 34.85, 34.85, 34.19, 34.19, 34.19, 34.19, 34.19, 33.99, 33.99, 33.99, 33.99, 33.99, 33.58, 33.58, 33.58, 33.58, 33.58, 33.74, 33.74, 33.74, 33.74, 33.74, 33.49, 33.49, 33.49, 33.49, 33.49, 33.23, 33.23, 33.23, 33.23, 33.23, 33.16, 33.16, 33.16, 33.16, 33.16, 33.02, 33.02, 33.02, 33.02, 33.02, 33.09, 33.09, 33.09, 33.09, 33.09, 33.2, 33.2, 33.2, 33.2, 33.2, 32.92, 32.92, 32.92, 32.92, 32.92, 33.01, 33.01, 33.01, 33.01, 33.01, 33.13, 33.13, 33.13, 33.13, 33.13, 32.86, 32.86, 32.86, 32.86, 32.86, 32.22, 32.22, 32.22, 32.22, 32.22, 32.17, 32.17, 32.17, 32.17, 32.17, 32.38, 32.38, 32.38, 32.38, 32.38, 32.5, 32.5, 32.5, 32.5, 32.5, 32.74, 32.74, 32.74, 32.74, 32.74, 32.85, 32.85, 32.85, 32.85, 32.85, 32.6, 32.6, 32.6, 32.6, 32.6, 32.53, 32.53, 32.53, 32.53, 32.53, 32.14, 32.14, 32.14, 32.14, 32.14, 32.16, 32.16, 32.16, 32.16, 32.16, 32.23, 32.23, 32.23, 32.23, 32.23, 32.25, 32.25, 32.25, 32.25, 32.25, 32.42, 32.42, 32.42, 32.42, 32.42, 32.58, 32.58, 32.58, 32.58, 32.58, 31.92, 31.92, 31.92, 31.92, 31.92, 31.81, 31.81, 31.81, 31.81, 31.81, 31.46, 31.46, 31.46, 31.46, 31.46, 30.81, 30.81, 30.81, 30.81, 30.81, 30.72, 30.72, 30.72, 30.72, 30.72, 30.75, 30.75, 30.75, 30.75, 30.75, 30.83, 30.83, 30.83, 30.83, 30.83, 30.92, 30.92, 30.92, 30.92, 30.92, 30.99, 30.99, 30.99, 30.99, 30.99, 30.94, 30.94, 30.94, 30.94, 30.94, 30.83, 30.83, 30.83, 30.83, 30.83, 30.79, 30.79, 30.79, 30.79, 30.79, 30.87, 30.87, 30.87, 30.87, 30.87, 31.01, 31.01, 31.01, 31.01, 31.01, 31.09, 31.09, 31.09, 31.09, 31.09, 31.13, 31.13, 31.13, 31.13, 31.13, 31.21, 31.21, 31.21, 31.21, 31.21, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.34, 31.34, 31.34, 31.34, 31.34, 31.43, 31.43, 31.43]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 568 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715781090 --> 1715781724
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.13, 0.13, 0.13, 0.13, 0.13, 0.41, 0.41, 0.41, 0.41, 0.41, 0.17, 0.17, 0.17, 0.17, 0.17, 0.23, 0.23, 0.23, 0.23, 0.23, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.09, 0.09, 0.09, 0.09, 0.09, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.27, 0.27, 0.27, 0.27, 0.27, 0.25, 0.25, 0.25, 0.25, 0.25, 0.26, 0.26, 0.26, 0.26, 0.26, 0.17, 0.17, 0.17, 0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.3, 0.3, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.19, 0.19, 0.19, 0.19, 0.19, 0.12, 0.12, 0.12, 0.12, 0.12, 0.33, 0.33, 0.33, 0.33, 0.33, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.27, 0.27, 0.27, 0.27, 0.27, 0.33, 0.33, 0.33, 0.33, 0.33, 0.27, 0.27, 0.27, 0.27, 0.27, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.31, 0.31, 0.31, 0.31, 0.31, 0.42, 0.42, 0.42, 0.42, 0.42, 0.49, 0.49, 0.49, 0.49, 0.49, 0.4, 0.4, 0.4, 0.4, 0.4, 0.21, 0.21, 0.21, 0.21, 0.21, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.26, 0.26, 0.26, 0.26, 0.26, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.09, 0.09, 0.09, 0.09, 0.09, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.09, 0.09, 0.09, 0.09, 0.09, 0.18, 0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 568 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715781090 --> 1715781724
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
                    
Loading

@ggerganov
Copy link
Member

Looks like the tokenizer tests are failing on Windows for some reason:

https://github.com/ggerganov/llama.cpp/actions/runs/9096294810/job/25001393493?pr=7245#step:12:2583

@jaime-m-p
Copy link
Collaborator Author

Looks like the tokenizer tests are failing on Windows for some reason:

https://github.com/ggerganov/llama.cpp/actions/runs/9096294810/job/25001393493?pr=7245#step:12:2583

I can not debug this in local, it is possible to skip all but the failing test?

I have reviewed the previous logs but that test was not executed, so I think i'm going to start from a clean point and redo all commits until I see the fail.

Also I found that compiling tests with BUILD_SHARED_LIBS ON fails with missing pthread_create, I will check later, but seems -pthread flag or -lpthread lib.

@jaime-m-p
Copy link
Collaborator Author

The problem is the stack size limit in Windows.

According to MSVC \STACK documentation:
For ARM64, x86, and x64 machines, the default stack size is 1 MB.

sizeof( std::array<codepoint_flags, MAX_CODEPOINTS> ) ~ 2MB.

@jaime-m-p
Copy link
Collaborator Author

I think I'm done here.

Now I have the base to fix tokenizers.
Brute force test found fail cases while testing more models (even llama-3 custom regex is failing).

@jaime-m-p jaime-m-p merged commit b43272a into ggml-org:master May 17, 2024
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request May 18, 2024
* Replace CODEPOINT_TYPE_* with codepoint_flags
* Update and bugfix brute force random test
* Deterministic brute force random test
* Unicode normalization NFD
* Get rid of BOM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants