Unicode codepoint flags for custom regexs #7245

jaime-m-p · 2024-05-12T23:45:46Z

Use flags for each unicode category (\p{N}, \p{L}, \p{Z}, ...) instead of definitions CODEPOINT_TYPE_*.

Including helper flags for common regex params like \s (only this for now), \d, \w...

This simplifies writing custom regexs.

All flags are precomputed in unicode-data.cpp generated by gen-unicode-data.py.

github-actions · 2024-05-14T20:37:51Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 568 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8218.58ms p(95)=19284.72ms fails=, finish reason: stop=510 truncated=58
Prompt processing (pp): avg=98.81tk/s p(95)=467.94tk/s
Token generation (tg): avg=34.6tk/s p(95)=46.94tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=codepoint-flags commit=2642da0ca8883994d20a73bebcd80f6f59b06c69

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 568 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715781090 --> 1715781724
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 552.77, 552.77, 552.77, 552.77, 552.77, 530.25, 530.25, 530.25, 530.25, 530.25, 565.57, 565.57, 565.57, 565.57, 565.57, 594.7, 594.7, 594.7, 594.7, 594.7, 663.85, 663.85, 663.85, 663.85, 663.85, 666.18, 666.18, 666.18, 666.18, 666.18, 671.2, 671.2, 671.2, 671.2, 671.2, 711.84, 711.84, 711.84, 711.84, 711.84, 726.85, 726.85, 726.85, 726.85, 726.85, 730.85, 730.85, 730.85, 730.85, 730.85, 759.78, 759.78, 759.78, 759.78, 759.78, 806.08, 806.08, 806.08, 806.08, 806.08, 838.98, 838.98, 838.98, 838.98, 838.98, 755.13, 755.13, 755.13, 755.13, 755.13, 763.35, 763.35, 763.35, 763.35, 763.35, 765.57, 765.57, 765.57, 765.57, 765.57, 789.55, 789.55, 789.55, 789.55, 789.55, 790.63, 790.63, 790.63, 790.63, 790.63, 792.19, 792.19, 792.19, 792.19, 792.19, 800.03, 800.03, 800.03, 800.03, 800.03, 803.15, 803.15, 803.15, 803.15, 803.15, 821.46, 821.46, 821.46, 821.46, 821.46, 819.94, 819.94, 819.94, 819.94, 819.94, 822.38, 822.38, 822.38, 822.38, 822.38, 804.32, 804.32, 804.32, 804.32, 804.32, 802.09, 802.09, 802.09, 802.09, 802.09, 802.51, 802.51, 802.51, 802.51, 802.51, 802.77, 802.77, 802.77, 802.77, 802.77, 807.95, 807.95, 807.95, 807.95, 807.95, 809.86, 809.86, 809.86, 809.86, 809.86, 811.02, 811.02, 811.02, 811.02, 811.02, 819.16, 819.16, 819.16, 819.16, 819.16, 833.24, 833.24, 833.24, 833.24, 833.24, 834.96, 834.96, 834.96, 834.96, 834.96, 844.27, 844.27, 844.27, 844.27, 844.27, 842.2, 842.2, 842.2, 842.2, 842.2, 842.25, 842.25, 842.25, 842.25, 842.25, 842.5, 842.5, 842.5, 842.5, 842.5, 842.64, 842.64, 842.64, 842.64, 842.64, 843.79, 843.79, 843.79, 843.79, 843.79, 839.55, 839.55, 839.55, 839.55, 839.55, 839.09, 839.09, 839.09, 839.09, 839.09, 838.12, 838.12, 838.12, 838.12, 838.12, 840.85, 840.85, 840.85, 840.85, 840.85, 843.17, 843.17, 843.17, 843.17, 843.17, 842.97, 842.97, 842.97, 842.97, 842.97, 846.65, 846.65, 846.65, 846.65, 846.65, 846.46, 846.46, 846.46, 846.46, 846.46, 849.43, 849.43, 849.43, 849.43, 849.43, 853.25, 853.25, 853.25, 853.25, 853.25, 852.79, 852.79, 852.79, 852.79, 852.79, 858.26, 858.26, 858.26, 858.26, 858.26, 859.93, 859.93, 859.93, 859.93, 859.93, 859.92, 859.92, 859.92, 859.92, 859.92, 861.01, 861.01, 861.01, 861.01, 861.01, 860.51, 860.51, 860.51, 860.51, 860.51, 861.91, 861.91, 861.91, 861.91, 861.91, 864.81, 864.81, 864.81, 864.81, 864.81, 863.75, 863.75, 863.75, 863.75, 863.75, 862.4, 862.4, 862.4, 862.4, 862.4, 861.87, 861.87, 861.87]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 568 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715781090 --> 1715781724
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 40.6, 40.6, 40.6, 40.6, 40.6, 41.18, 41.18, 41.18, 41.18, 41.18, 31.88, 31.88, 31.88, 31.88, 31.88, 32.84, 32.84, 32.84, 32.84, 32.84, 33.39, 33.39, 33.39, 33.39, 33.39, 33.82, 33.82, 33.82, 33.82, 33.82, 34.63, 34.63, 34.63, 34.63, 34.63, 35.41, 35.41, 35.41, 35.41, 35.41, 35.8, 35.8, 35.8, 35.8, 35.8, 35.56, 35.56, 35.56, 35.56, 35.56, 35.49, 35.49, 35.49, 35.49, 35.49, 34.85, 34.85, 34.85, 34.85, 34.85, 34.19, 34.19, 34.19, 34.19, 34.19, 33.99, 33.99, 33.99, 33.99, 33.99, 33.58, 33.58, 33.58, 33.58, 33.58, 33.74, 33.74, 33.74, 33.74, 33.74, 33.49, 33.49, 33.49, 33.49, 33.49, 33.23, 33.23, 33.23, 33.23, 33.23, 33.16, 33.16, 33.16, 33.16, 33.16, 33.02, 33.02, 33.02, 33.02, 33.02, 33.09, 33.09, 33.09, 33.09, 33.09, 33.2, 33.2, 33.2, 33.2, 33.2, 32.92, 32.92, 32.92, 32.92, 32.92, 33.01, 33.01, 33.01, 33.01, 33.01, 33.13, 33.13, 33.13, 33.13, 33.13, 32.86, 32.86, 32.86, 32.86, 32.86, 32.22, 32.22, 32.22, 32.22, 32.22, 32.17, 32.17, 32.17, 32.17, 32.17, 32.38, 32.38, 32.38, 32.38, 32.38, 32.5, 32.5, 32.5, 32.5, 32.5, 32.74, 32.74, 32.74, 32.74, 32.74, 32.85, 32.85, 32.85, 32.85, 32.85, 32.6, 32.6, 32.6, 32.6, 32.6, 32.53, 32.53, 32.53, 32.53, 32.53, 32.14, 32.14, 32.14, 32.14, 32.14, 32.16, 32.16, 32.16, 32.16, 32.16, 32.23, 32.23, 32.23, 32.23, 32.23, 32.25, 32.25, 32.25, 32.25, 32.25, 32.42, 32.42, 32.42, 32.42, 32.42, 32.58, 32.58, 32.58, 32.58, 32.58, 31.92, 31.92, 31.92, 31.92, 31.92, 31.81, 31.81, 31.81, 31.81, 31.81, 31.46, 31.46, 31.46, 31.46, 31.46, 30.81, 30.81, 30.81, 30.81, 30.81, 30.72, 30.72, 30.72, 30.72, 30.72, 30.75, 30.75, 30.75, 30.75, 30.75, 30.83, 30.83, 30.83, 30.83, 30.83, 30.92, 30.92, 30.92, 30.92, 30.92, 30.99, 30.99, 30.99, 30.99, 30.99, 30.94, 30.94, 30.94, 30.94, 30.94, 30.83, 30.83, 30.83, 30.83, 30.83, 30.79, 30.79, 30.79, 30.79, 30.79, 30.87, 30.87, 30.87, 30.87, 30.87, 31.01, 31.01, 31.01, 31.01, 31.01, 31.09, 31.09, 31.09, 31.09, 31.09, 31.13, 31.13, 31.13, 31.13, 31.13, 31.21, 31.21, 31.21, 31.21, 31.21, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.34, 31.34, 31.34, 31.34, 31.34, 31.43, 31.43, 31.43]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 568 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715781090 --> 1715781724
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.13, 0.13, 0.13, 0.13, 0.13, 0.41, 0.41, 0.41, 0.41, 0.41, 0.17, 0.17, 0.17, 0.17, 0.17, 0.23, 0.23, 0.23, 0.23, 0.23, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.09, 0.09, 0.09, 0.09, 0.09, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.27, 0.27, 0.27, 0.27, 0.27, 0.25, 0.25, 0.25, 0.25, 0.25, 0.26, 0.26, 0.26, 0.26, 0.26, 0.17, 0.17, 0.17, 0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.3, 0.3, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.19, 0.19, 0.19, 0.19, 0.19, 0.12, 0.12, 0.12, 0.12, 0.12, 0.33, 0.33, 0.33, 0.33, 0.33, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.27, 0.27, 0.27, 0.27, 0.27, 0.33, 0.33, 0.33, 0.33, 0.33, 0.27, 0.27, 0.27, 0.27, 0.27, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.31, 0.31, 0.31, 0.31, 0.31, 0.42, 0.42, 0.42, 0.42, 0.42, 0.49, 0.49, 0.49, 0.49, 0.49, 0.4, 0.4, 0.4, 0.4, 0.4, 0.21, 0.21, 0.21, 0.21, 0.21, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.26, 0.26, 0.26, 0.26, 0.26, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.09, 0.09, 0.09, 0.09, 0.09, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.09, 0.09, 0.09, 0.09, 0.09, 0.18, 0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 568 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715781090 --> 1715781724
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

ggerganov · 2024-05-16T11:05:24Z

Looks like the tokenizer tests are failing on Windows for some reason:

https://github.com/ggerganov/llama.cpp/actions/runs/9096294810/job/25001393493?pr=7245#step:12:2583

jaime-m-p · 2024-05-16T19:40:22Z

Looks like the tokenizer tests are failing on Windows for some reason:

https://github.com/ggerganov/llama.cpp/actions/runs/9096294810/job/25001393493?pr=7245#step:12:2583

I can not debug this in local, it is possible to skip all but the failing test?

I have reviewed the previous logs but that test was not executed, so I think i'm going to start from a clean point and redo all commits until I see the fail.

Also I found that compiling tests with BUILD_SHARED_LIBS ON fails with missing pthread_create, I will check later, but seems -pthread flag or -lpthread lib.

jaime-m-p · 2024-05-16T23:44:44Z

The problem is the stack size limit in Windows.

According to MSVC \STACK documentation:
For ARM64, x86, and x64 machines, the default stack size is 1 MB.

sizeof( std::array<codepoint_flags, MAX_CODEPOINTS> ) ~ 2MB.

unicode-data.h

jaime-m-p · 2024-05-17T17:25:30Z

I think I'm done here.

Now I have the base to fix tokenizers.
Brute force test found fail cases while testing more models (even llama-3 custom regex is failing).

* Replace CODEPOINT_TYPE_* with codepoint_flags * Update and bugfix brute force random test * Deterministic brute force random test * Unicode normalization NFD * Get rid of BOM

mofosyne added Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level enhancement New feature or request labels May 13, 2024

jaime-m-p added 8 commits May 16, 2024 21:49

Replace CODEPOINT_TYPE_* with codepoint_flags

e44e608

Update and bugfix brute force random test

bb205ee

Deterministic brute force random test

707a08d

Unicode normalization NFD

641944a

Fix unicode_ranges_nfd

1714e1a

Minor + style

9bc5d83

Minor + style

a9d8329

Get rid of BOM

c12db90

Using std::vector instead std::array

6ca6c46

jaime-m-p force-pushed the codepoint-flags branch from afcbcb5 to 6ca6c46 Compare May 17, 2024 00:01

ggerganov approved these changes May 17, 2024

View reviewed changes

unicode-data.h Outdated Show resolved Hide resolved

Using range_nfd instead of std::tuple

a28dfdc

jaime-m-p merged commit b43272a into ggml-org:master May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unicode codepoint flags for custom regexs #7245

Unicode codepoint flags for custom regexs #7245

jaime-m-p commented May 12, 2024

Uh oh!

github-actions bot commented May 14, 2024 •

edited

Loading

Uh oh!

ggerganov commented May 16, 2024

Uh oh!

jaime-m-p commented May 16, 2024

Uh oh!

jaime-m-p commented May 16, 2024

Uh oh!

Uh oh!

jaime-m-p commented May 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Unicode codepoint flags for custom regexs #7245

Unicode codepoint flags for custom regexs #7245

Conversation

jaime-m-p commented May 12, 2024

Uh oh!

github-actions bot commented May 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented May 16, 2024

Uh oh!

jaime-m-p commented May 16, 2024

Uh oh!

jaime-m-p commented May 16, 2024

Uh oh!

Uh oh!

jaime-m-p commented May 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented May 14, 2024 •

edited

Loading