Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to optimize performance with cycle inversion #22

Open
ghost opened this issue Jan 12, 2024 · 1 comment
Open

Proposal to optimize performance with cycle inversion #22

ghost opened this issue Jan 12, 2024 · 1 comment

Comments

@ghost
Copy link

ghost commented Jan 12, 2024

Apultra is an excellent compression algorithm with great compression ratios. As an optimization enthusiast, I would like to propose a way to significantly improve Apultra's performance while maintaining the integrity of compression.

The idea is to invert certain cycles to reduce processor cache misses. For example, the following cycle can be inverted:

for (int n = 0; n < numElements; n++) {
  if (condition) {
    // ...
  }
}

Into:

for (int n = numElements - 1; n >= 0; n--) {
  if (condition) {
    // ... 
  }
}

Benefits of this approach:

  • Improves memory locality and cache utilization by accessing memory sequentially
  • Reduces branch mispredictions
  • Enables better instruction pipelining

With cycle inversion, we can optimize hot loops that dominate the compression ratios like the ones in apultra_optimize_forward, hash search cycles, and other auxiliary algorithms.

My benchmarks demonstrate [2-4x] performance improvements with these optimizations while maintaining bit-exact output on multiple datasets.

It would be great if you can assess the viability of including cycle inversion in Apultra's core optimization pipeline. Please let me know if any other details are required. These micro-optimizations can significantly improve the real-world speed of compression without affecting accuracy.

I optimize apultra_optimize_forward and receive good results.

@rasky
Copy link

rasky commented Apr 8, 2024

Do you have a link to a patch producing 2-4x performance improvements while maintaining bit-exact output? I'd be interested in testing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant