Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable compilation/optimisation on powerpc #87

Closed
wants to merge 1 commit into from

Conversation

kif
Copy link
Contributor

@kif kif commented Sep 30, 2020

On PowerPC, gcc and clang offer an automatic translation of the SSE2 code to Altivec.

For arm32 and arm64, both mcpu and march options are available.
On intel x86 computers, mcpu does not exist.
On PowerPC, march option does not exist.
For other architectures (mips, ...), mcpu is more likely to be present.

On PowerPC, gcc and clang offer an automatic translation of the SSE2 code to Altivec.

For arm32 and arm64, both `mcpu` and `march` options are available.
On intel x86 computers, `mcpu` does not exist.
On PowerPC, `march` option does not exist.
@kif
Copy link
Contributor Author

kif commented Sep 30, 2020

Here is a simple demo program:

#!/usr/bin/python3
import numpy
import h5py
import sys
import time

with h5py.File(sys.argv[1], "r") as h:
    t0 = time.time()
    for i, f in enumerate(h["entry_0000/measurement/data"]):
       npa = f[()]
    t1 = time.time()
print(f"Time to read {i+1} frames of size {npa.shape}: {t1-t0:.4f}s. {(i+1)/(t1-t0):.2f} fps")

@kif
Copy link
Contributor Author

kif commented Sep 30, 2020

And here are the results:

~/workspace/bitshuffle$ OMP_NUM_THREADS=1 HDF5_PLUGIN_PATH=./build/lib.linux-ppc64le-3.8/bitshuffle/plugin ./read_speed.py eiger_0000.h5 
Time to read 1100 frames of size (2162, 2068): 21.2342s. 51.80 fps
~/workspace/bitshuffle$ OMP_NUM_THREADS=1 HDF5_PLUGIN_PATH=./build/lib.linux-ppc64le-3.8-ref/bitshuffle/plugin ./read_speed.py eiger_0000.h5 
Time to read 1100 frames of size (2162, 2068): 34.5846s. 31.81 fps
~/workspace/bitshuffle$ OMP_NUM_THREADS=1 HDF5_PLUGIN_PATH=./build/lib.linux-ppc64le-3.8/bitshuffle/plugin ./read_speed.py eiger_0000.h5 
Time to read 1100 frames of size (2162, 2068): 21.2336s. 51.80 fps

The activation of "SSE2" code on an IBM power9 provides a gain of about 30% in speed.

@james-s-willis
Copy link
Collaborator

Hey @kif, thanks for these changes! Are you able to update your fork? I tried myself but I didn't have the permissions. Then I can merge this PR. Thanks.

@james-s-willis
Copy link
Collaborator

Merged in #102.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants