Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the matchLenSSE4 asm implementation. #55

Merged
merged 1 commit into from
Oct 16, 2016
Merged

Optimize the matchLenSSE4 asm implementation. #55

merged 1 commit into from
Oct 16, 2016

Conversation

nigeltao
Copy link
Contributor

It's also simpler, in that it doesn't use SSE any more.

name old speed new speed delta
EncodeDigitsDefault1e4-8 39.1MB/s ± 0% 39.9MB/s ± 0% +1.89% (p=0.016 n=4+5)
EncodeDigitsDefault1e5-8 30.9MB/s ± 0% 31.8MB/s ± 0% +2.92% (p=0.008 n=5+5)
EncodeDigitsDefault1e6-8 29.7MB/s ± 1% 30.6MB/s ± 0% +3.04% (p=0.008 n=5+5)
EncodeDigitsCompress1e4-8 33.3MB/s ± 0% 33.8MB/s ± 0% +1.49% (p=0.008 n=5+5)
EncodeDigitsCompress1e5-8 23.7MB/s ± 0% 24.2MB/s ± 1% +2.23% (p=0.008 n=5+5)
EncodeDigitsCompress1e6-8 22.4MB/s ± 0% 22.9MB/s ± 1% +2.05% (p=0.008 n=5+5)
EncodeTwainDefault1e4-8 37.4MB/s ± 0% 38.1MB/s ± 0% +1.93% (p=0.016 n=4+5)
EncodeTwainDefault1e5-8 39.3MB/s ± 1% 40.4MB/s ± 1% +2.85% (p=0.008 n=5+5)
EncodeTwainDefault1e6-8 39.3MB/s ± 2% 40.4MB/s ± 2% +2.62% (p=0.032 n=5+5)
EncodeTwainCompress1e4-8 29.3MB/s ± 0% 29.5MB/s ± 1% +0.61% (p=0.048 n=5+5)
EncodeTwainCompress1e5-8 18.6MB/s ± 1% 18.9MB/s ± 1% +1.47% (p=0.016 n=5+5)
EncodeTwainCompress1e6-8 17.0MB/s ± 2% 17.2MB/s ± 1% ~ (p=0.151 n=5+5)

@nigeltao
Copy link
Contributor Author

The same patch would apply to the snappy package, except #54 is the bigger issue.

It's also simpler, in that it doesn't use SSE any more.

name                       old speed      new speed      delta
EncodeDigitsDefault1e4-8   39.1MB/s ± 0%  39.9MB/s ± 0%  +1.89%  (p=0.016 n=4+5)
EncodeDigitsDefault1e5-8   30.9MB/s ± 0%  31.8MB/s ± 0%  +2.92%  (p=0.008 n=5+5)
EncodeDigitsDefault1e6-8   29.7MB/s ± 1%  30.6MB/s ± 0%  +3.04%  (p=0.008 n=5+5)
EncodeDigitsCompress1e4-8  33.3MB/s ± 0%  33.8MB/s ± 0%  +1.49%  (p=0.008 n=5+5)
EncodeDigitsCompress1e5-8  23.7MB/s ± 0%  24.2MB/s ± 1%  +2.23%  (p=0.008 n=5+5)
EncodeDigitsCompress1e6-8  22.4MB/s ± 0%  22.9MB/s ± 1%  +2.05%  (p=0.008 n=5+5)
EncodeTwainDefault1e4-8    37.4MB/s ± 0%  38.1MB/s ± 0%  +1.93%  (p=0.016 n=4+5)
EncodeTwainDefault1e5-8    39.3MB/s ± 1%  40.4MB/s ± 1%  +2.85%  (p=0.008 n=5+5)
EncodeTwainDefault1e6-8    39.3MB/s ± 2%  40.4MB/s ± 2%  +2.62%  (p=0.032 n=5+5)
EncodeTwainCompress1e4-8   29.3MB/s ± 0%  29.5MB/s ± 1%  +0.61%  (p=0.048 n=5+5)
EncodeTwainCompress1e5-8   18.6MB/s ± 1%  18.9MB/s ± 1%  +1.47%  (p=0.016 n=5+5)
EncodeTwainCompress1e6-8   17.0MB/s ± 2%  17.2MB/s ± 1%    ~     (p=0.151 n=5+5)
@klauspost
Copy link
Owner

Finally got around to testing it thoroughly. No issues found - thanks for the contribution.

@klauspost klauspost merged commit d79e91e into klauspost:master Oct 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants