Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSE UTF16 => latin1 #311

Merged
merged 3 commits into from
Sep 19, 2023
Merged

SSE UTF16 => latin1 #311

merged 3 commits into from
Sep 19, 2023

Conversation

Nick-Nuon
Copy link
Collaborator

@Nick-Nuon Nick-Nuon commented Sep 18, 2023

EDIT: Benchmarks for lastest commits:

convert_utf16_to_latin1+haswell, input size: 864610, iterations: 30000, dataset: /home/leorio/unicode_lipsum/wikipedia_mars/french.utflatin16.txt
   0.345 ins/byte,    0.407 cycle/byte,    7.853 GB/s (1.3 %),     3.198 GHz,    0.848 ins/cycle 
   0.345 ins/char,    0.407 cycle/char,    7.853 Gc/s (1.3 %)     1.00 byte/char 
convert_utf16_to_latin1+icelake, input size: 864610, iterations: 30000, dataset: /home/leorio/unicode_lipsum/wikipedia_mars/french.utflatin16.txt
   0.143 ins/byte,    0.448 cycle/byte,    6.909 GB/s (1.2 %),     3.097 GHz,    0.318 ins/cycle 
   0.143 ins/char,    0.448 cycle/char,    6.909 Gc/s (1.2 %)     1.00 byte/char 
convert_utf16_to_latin1+iconv, input size: 864610, iterations: 30000, dataset: /home/leorio/unicode_lipsum/wikipedia_mars/french.utflatin16.txt
  17.011 ins/byte,    3.024 cycle/byte,    1.056 GB/s (29.1 %),     3.193 GHz,    5.625 ins/cycle 
  34.022 ins/char,    6.048 cycle/char,    0.528 Gc/s (29.1 %)     2.00 byte/char 
WARNING: Measurements are noisy, try increasing iteration count (-I).
convert_utf16_to_latin1+icu, input size: 864610, iterations: 30000, dataset: /home/leorio/unicode_lipsum/wikipedia_mars/french.utflatin16.txt
   3.444 ins/byte,    0.763 cycle/byte,    4.187 GB/s (1.1 %),     3.195 GHz,    4.513 ins/cycle 
   3.444 ins/char,    0.763 cycle/char,    4.187 Gc/s (1.1 %)     1.00 byte/char 
convert_utf16_to_latin1+westmere, input size: 864610, iterations: 30000, dataset: /home/leorio/unicode_lipsum/wikipedia_mars/french.utflatin16.txt
   1.251 ins/byte,    0.269 cycle/byte,   11.889 GB/s (1.3 %),     3.199 GHz,    4.649 ins/cycle 
   1.251 ins/char,    0.269 cycle/char,   11.889 Gc/s (1.3 %)     1.00 byte/char 


convert_utf16_to_latin1_with_errors+haswell, input size: 864610, iterations: 30000, dataset: /home/leorio/unicode_lipsum/wikipedia_mars/french.utflatin16.txt
   3.251 ins/byte,    0.743 cycle/byte,    4.300 GB/s (1.2 %),     3.195 GHz,    4.375 ins/cycle 
   3.251 ins/char,    0.743 cycle/char,    4.300 Gc/s (1.2 %)     1.00 byte/char 
convert_utf16_to_latin1_with_errors+icelake, input size: 864610, iterations: 30000, dataset: /home/leorio/unicode_lipsum/wikipedia_mars/french.utflatin16.txt
   3.251 ins/byte,    0.738 cycle/byte,    4.326 GB/s (1.1 %),     3.195 GHz,    4.402 ins/cycle 
   3.251 ins/char,    0.738 cycle/char,    4.326 Gc/s (1.1 %)     1.00 byte/char 
convert_utf16_to_latin1_with_errors+westmere, input size: 864610, iterations: 30000, dataset: /home/leorio/unicode_lipsum/wikipedia_mars/french.utflatin16.txt
   5.376 ins/byte,    0.907 cycle/byte,    3.523 GB/s (1.8 %),     3.194 GHz,    5.928 ins/cycle 
   5.376 ins/char,    0.907 cycle/char,    3.523 Gc/s (1.8 %)     1.00 byte/char  ```

@Nick-Nuon Nick-Nuon changed the title base commit SSE UTF16 => latin1 Sep 18, 2023
@lemire
Copy link
Member

lemire commented Sep 18, 2023

Looks good.

@lemire lemire added this to In progress in Latin 1 support Sep 18, 2023
@lemire lemire linked an issue Sep 18, 2023 that may be closed by this pull request
@Nick-Nuon
Copy link
Collaborator Author

I think this one is good to go

@lemire
Copy link
Member

lemire commented Sep 19, 2023

Merging.

@lemire lemire merged commit 7761599 into simdutf:master Sep 19, 2023
36 checks passed
@lemire lemire moved this from In progress to Done in Latin 1 support Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Support Latin 1 <= UTF 16 (SSE)
2 participants