Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build error on MacOS/ARM 3.8 #115

Closed
Anthchirp opened this issue May 26, 2021 · 3 comments
Closed

build error on MacOS/ARM 3.8 #115

Anthchirp opened this issue May 26, 2021 · 3 comments
Milestone

Comments

@Anthchirp
Copy link
Contributor

I'm trying to build hdf5plugin 2.3.2 on MacOS on ARM64 for conda-forge (conda-forge/hdf5plugin-feedstock#20), and I see the following build error:

arm64-apple-darwin20.0.0-clang -fno-strict-aliasing -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O3 -Wall -Wstrict-prototypes -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O3 -pipe -fdebug-prefix-map=${SRC_DIR}=/usr/local/src/conda/${PKG_NAME}-${PKG_VERSION} -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/hdf5plugin_1622044221898/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol=/usr/local/src/conda-prefix -flto -Wl,-export_dynamic -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/runner/miniforge3/conda-bld/hdf5plugin_1622044221898/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/hdf5plugin_1622044221898/work=/usr/local/src/conda/hdf5plugin-2.3.2 -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/hdf5plugin_1622044221898/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol=/usr/local/src/conda-prefix -D_FORTIFY_SOURCE=2 -isystem /Users/runner/miniforge3/conda-bld/hdf5plugin_1622044221898/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include -mmacosx-version-min=11.0 -DHAVE_LZ4=1 -DHAVE_ZLIB=1 -DHAVE_ZSTD=1 -DH5_USE_18_API -DHAVE_SNAPPY=1 -DSHUFFLE_SSE2_ENABLED=1 -I/Users/runner/miniforge3/conda-bld/hdf5plugin_1622044221898/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include -Isrc/c-blosc/ -Isrc/c-blosc/blosc -Isrc/c-blosc/internal-complibs/lz4-1.9.1 -Isrc/c-blosc/internal-complibs/zlib-1.2.8 -Isrc/c-blosc/internal-complibs/zstd-1.4.1 -Isrc/c-blosc/internal-complibs/zstd-1.4.1/common -Isrc/hdf5-blosc/src/ -Isrc/c-blosc/internal-complibs/snappy-1.1.1 -I/Users/runner/miniforge3/conda-bld/hdf5plugin_1622044221898/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/include/python3.8 -c src/c-blosc/blosc/shuffle.c -o build/temp.macosx-11.0-arm64-3.8/src/c-blosc/blosc/shuffle.o -msse2
    clang-11: warning: -Wl,-export_dynamic: 'linker' input unused [-Wunused-command-line-argument]
    clang-11: warning: argument unused during compilation: '-msse2' [-Wunused-command-line-argument]
    src/c-blosc/blosc/shuffle.c:163:5: error: invalid output constraint '=b' in asm
        "=b" (cpuInfo[1]),
        ^
    src/c-blosc/blosc/shuffle.c:188:5: error: invalid output constraint '=a' in asm
        "=a" (eax),
        ^
    2 errors generated.
    error: command 'arm64-apple-darwin20.0.0-clang' failed with exit status 1

Is MacOS/ARM supported by hdf5plugin?

@t20100 t20100 added this to the 3.0.0 milestone May 27, 2021
@t20100
Copy link
Member

t20100 commented May 27, 2021

We haven't tested it on macos/ARM since we don't have access to an ARM64 mac (yet), so it is not supported yet, but we would like to support it.

I made PR #116 which should fix the issue by disabling SSE2 and AVX2 on arm64 but I can't test it...

@Anthchirp
Copy link
Contributor Author

I made a successful test build using commit f1768e0 (post-116-merge) in conda-forge/hdf5plugin-feedstock#21, so I assume this ticket can be closed.

Do you know when the next release of hdf5plugin will happen?

@t20100
Copy link
Member

t20100 commented May 27, 2021

Thanks for the testing!

With M1 support fixed we should make a release soon.

closed by PR #116

t20100 added a commit to t20100/hdf5plugin that referenced this issue Oct 14, 2021
REVERT: 2b63814b Tag open source release 1.1.9.
REVERT: 9c1be179 'size' remains unused if none of ZLIB, LZO and LZ4 are available.
REVERT: 78650d12 Add project goals to CONTRIBUTING.md.
REVERT: 5e7c14bd Add stubs for abseil flags.
REVERT: 80a2a10c Remove unused run_microbenchmarks flag.
REVERT: 453942b3 Add absl::GetFlag and absl::SetFlag to uses of flags.
REVERT: ea368c2f Add AppVeyor status badge.
REVERT: d1d1f486 Remove unused include in snappy_benchmark.cc.
REVERT: 4ebd8b2f Split benchmarks and test tools into separate targets.
REVERT: 0793e2ae Merge pull request silx-kit#117 from cmumford:disable-osx-fuzzer
REVERT: ac55f842 Test stub improvements.
REVERT: 6e9ae724 Disable fuzzing on OSX.
REVERT: 402d8881 Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#115)
REVERT: 6badb0a2 Merge pull request silx-kit#114 from cmumford:werror-only-clang
REVERT: bc53daa7 Fixed endif clause.
REVERT: e9a6a084 Matching clang.
REVERT: 955a5dd1 Building with `-Werror` only with clang.
REVERT: 42d1dd7e Fix CHECK_EQ to call ok() instead of CheckSuccess().
REVERT: eaaa0ed0 Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#111)
REVERT: e1e91ee4 Rework file:: stubs.
REVERT: 6aa79cb4 Wrap snappy_unittest in an anonymous namespace and remove static from functions.
REVERT: bae9f9be Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#110)
REVERT: 5f913be0 Fix unused local variable warnings.
REVERT: 549685a5 Remove custom testing and benchmarking code.
REVERT: 11f9a77a Add Travis-CI build status badge to README.md.
REVERT: 49540965 Update Travis CI config.
REVERT: 8995ffab Replace #pragma nounroll with equivalent used elsewhere.
REVERT: d1daa830 Remove inline qualifier from static variables.
REVERT: 3b571656 1) Improve the lookup table data to require less instructions to extract the necessary data. We now store len - offset in a signed int16, this happens to remove masking offset in the calculations and the calculations that need to be done precisely give the flags that we need for testing correctness. 2) Replace offset extraction with a lookup mask. This is less uops and is needed because we need to special case type 3 to always return 0 as to properly trigger the fallback. 3) Unroll the loop twice, this removes some loop-condition checks AND it improves the generated assembly. The loop variables tend to end up in a different register requiring mov's having two consecutive copies allows the elision of the mov's.
REVERT: a9730ed5 Optimize zippy decompression by making IncrementalCopy faster.
REVERT: 56c2c247 Internal change
REVERT: a94be58e Optimize zippy decompression by making IncrementalCopy faster.
REVERT: 01a566f8 Fix opensource version
REVERT: 616b8229 Add LZ4 as a benchmark option. Snappy is starting to look really good compared to LZ4. LZ4 is considered the fastest solution by many on internet. We now see that Snappy is actually becoming very competitive with compression a little faster and decompression slower but certainly not terribly slower.
REVERT: e4a6e97b Extend validate benchmarks over all types and also add a medley for validation.
REVERT: 719bed0a Bug fix. Error on 0 offset copies.
REVERT: 289c8a3c Make zippy decompression branchless
REVERT: 3bfa265a Revert zippy optimization that causes heap buffer overflows.
REVERT: 4d2dc9dc Optimize zippy unzipping by upto >10% by making IncrementalCopy faster.
REVERT: 11e5165b Add a benchmark that decreased the branch prediction memorization by increasing the amount of independent branches executed per benchmark iteration.
REVERT: 6835abd9 Change hash function for Compress.
REVERT: 368b01c8 Merge pull request silx-kit#107 from jsteemann:bug-fix/fix-compile-warning
REVERT: 1ce58af2 Fix the use of op + len when op is nullptr and len is non-zero. See https://reviews.llvm.org/D67122 for some discussion of why this can matter. I don't think this should have any noticeable effect on performance.
REVERT: 0b990db2 Run clang-format
REVERT: cb2b3c7e fix compile warnings due to missing override specifiers
REVERT: 7ffaf77c Replace ARCH_K8 with __x86_64__.
REVERT: 4dd277fe Replace the division with a constant table in IncrementalCopy
REVERT: f16eda34 Correct uninitialized variable.
REVERT: 837f38b3 Revise stubs for ARCH_{K8,PPC,ARM}.
REVERT: e1353b9f Remove ARCH_* guards around Bits::FindLSBSetNonZero64().
REVERT: c98344f6 Fix Clang/GCC compilation warnings.
REVERT: 113cd97a Tighten types on a few for loops.
REVERT: abde3abb Fix Travis CI build.
REVERT: e6506681 Fix accidental double std:: qualifiers.
REVERT: 63620c06 Add some std:: qualifiers to types and functions.
REVERT: 5417da69 Switch from C headers to C++ headers.
REVERT: 251d935d Remove #include <string> from snappy-stubs-public.h.
REVERT: 4f195aee Remove mismatched #endif.
REVERT: 041c6080 Remove platform-dependent code for unaligned loads/stores.
REVERT: 27ff130f Remove platform-dependent code for little-endian loads and stores.
REVERT: a4cdb5d1 Introduce SNAPPY_ATTRIBUTE_ALWAYS_INLINE.
REVERT: 231b8be0 Migrate to standard integral types.
REVERT: 14bef662 Modernize memcpy() and memmove() usage.
REVERT: d674348a Improve zippy with 5-10%.
REVERT: 4dfcad9f assertion failure on darwin_x86_64, have to investigage
REVERT: e1917874 assertion failure on darwin_x86_64, have to investigage
REVERT: 0faf5637 This cl does two things 1) It shaves of a few cycles from the data dependency chain. By using "shrd" instead of a load. 2) The important loop is finding small copies (4-12) which are either "copy 1", or "copy 2" depending if the offset fits <2048. It turns out that this is a branch that is mispredicted often. Due to the long dependency chain the CPU is running with IPC~1 anyway so we can freely add instructions to instead emit copies branchfree. This reduces the branch misspredicts from 15% to 11% (for BM_ZFlat/6 txt1) and from 5.6% to 4% (for BM_ZFlat/10 or pb).
REVERT: 0c7ed08a The result on protobuf benchmark is around 19%. Results vary by their propensity for compression. As the frequency of finding matches influences the amount of branch misspredicts and the amount of hashing.
REVERT: 3c77e014 1) Make the output pointer a local variable such it doesn't need a load add store on it's loop carried dependency chain. 2) Reduce the input pointer loop carried dependency chain from 7 cycles to 4 cycles by using pre-loading. This is a very subtle point. 3) Just brutally copy 64 bytes which removes a difficult to predict branch from the inner most loop. There is enough bandwidth to do so in the intrinsic cycles of the loop. 4) Implement limit pointers that include the slop region. This removes unnecessary instructions from the hot path. 5) It seems the removal of the difficult to predict branch has removed the code sensitivity to alignment, so remove the asm nop's.
REVERT: 9eabb7ba Cut a load from the critical dependency chain of the input pointer by speculating the uncommon case of COPY_4 is not happening.
REVERT: cddd9c08 Improve comments in IncrementalCopy, add an assert.

git-subtree-dir: src/snappy
git-subtree-split: 537f4ad6240e586970fe554614542e9717df7902
t20100 added a commit to t20100/hdf5plugin that referenced this issue Oct 21, 2022
2b63814b Tag open source release 1.1.9.
9c1be179 'size' remains unused if none of ZLIB, LZO and LZ4 are available.
78650d12 Add project goals to CONTRIBUTING.md.
5e7c14bd Add stubs for abseil flags.
80a2a10c Remove unused run_microbenchmarks flag.
453942b3 Add absl::GetFlag and absl::SetFlag to uses of flags.
ea368c2f Add AppVeyor status badge.
d1d1f486 Remove unused include in snappy_benchmark.cc.
4ebd8b2f Split benchmarks and test tools into separate targets.
0793e2ae Merge pull request silx-kit#117 from cmumford:disable-osx-fuzzer
ac55f842 Test stub improvements.
6e9ae724 Disable fuzzing on OSX.
402d8881 Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#115)
6badb0a2 Merge pull request silx-kit#114 from cmumford:werror-only-clang
bc53daa7 Fixed endif clause.
e9a6a084 Matching clang.
955a5dd1 Building with `-Werror` only with clang.
42d1dd7e Fix CHECK_EQ to call ok() instead of CheckSuccess().
eaaa0ed0 Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#111)
e1e91ee4 Rework file:: stubs.
6aa79cb4 Wrap snappy_unittest in an anonymous namespace and remove static from functions.
bae9f9be Fixup for adding the third_party/{benchmark, googletest} submodules. (silx-kit#110)
5f913be0 Fix unused local variable warnings.
549685a5 Remove custom testing and benchmarking code.
11f9a77a Add Travis-CI build status badge to README.md.
49540965 Update Travis CI config.
8995ffab Replace #pragma nounroll with equivalent used elsewhere.
d1daa830 Remove inline qualifier from static variables.
3b571656 1) Improve the lookup table data to require less instructions to extract the necessary data. We now store len - offset in a signed int16, this happens to remove masking offset in the calculations and the calculations that need to be done precisely give the flags that we need for testing correctness. 2) Replace offset extraction with a lookup mask. This is less uops and is needed because we need to special case type 3 to always return 0 as to properly trigger the fallback. 3) Unroll the loop twice, this removes some loop-condition checks AND it improves the generated assembly. The loop variables tend to end up in a different register requiring mov's having two consecutive copies allows the elision of the mov's.
a9730ed5 Optimize zippy decompression by making IncrementalCopy faster.
56c2c247 Internal change
a94be58e Optimize zippy decompression by making IncrementalCopy faster.
01a566f8 Fix opensource version
616b8229 Add LZ4 as a benchmark option. Snappy is starting to look really good compared to LZ4. LZ4 is considered the fastest solution by many on internet. We now see that Snappy is actually becoming very competitive with compression a little faster and decompression slower but certainly not terribly slower.
e4a6e97b Extend validate benchmarks over all types and also add a medley for validation.
719bed0a Bug fix. Error on 0 offset copies.
289c8a3c Make zippy decompression branchless
3bfa265a Revert zippy optimization that causes heap buffer overflows.
4d2dc9dc Optimize zippy unzipping by upto >10% by making IncrementalCopy faster.
11e5165b Add a benchmark that decreased the branch prediction memorization by increasing the amount of independent branches executed per benchmark iteration.
6835abd9 Change hash function for Compress.
368b01c8 Merge pull request silx-kit#107 from jsteemann:bug-fix/fix-compile-warning
1ce58af2 Fix the use of op + len when op is nullptr and len is non-zero. See https://reviews.llvm.org/D67122 for some discussion of why this can matter. I don't think this should have any noticeable effect on performance.
0b990db2 Run clang-format
cb2b3c7e fix compile warnings due to missing override specifiers
7ffaf77c Replace ARCH_K8 with __x86_64__.
4dd277fe Replace the division with a constant table in IncrementalCopy
f16eda34 Correct uninitialized variable.
837f38b3 Revise stubs for ARCH_{K8,PPC,ARM}.
e1353b9f Remove ARCH_* guards around Bits::FindLSBSetNonZero64().
c98344f6 Fix Clang/GCC compilation warnings.
113cd97a Tighten types on a few for loops.
abde3abb Fix Travis CI build.
e6506681 Fix accidental double std:: qualifiers.
63620c06 Add some std:: qualifiers to types and functions.
5417da69 Switch from C headers to C++ headers.
251d935d Remove #include <string> from snappy-stubs-public.h.
4f195aee Remove mismatched #endif.
041c6080 Remove platform-dependent code for unaligned loads/stores.
27ff130f Remove platform-dependent code for little-endian loads and stores.
a4cdb5d1 Introduce SNAPPY_ATTRIBUTE_ALWAYS_INLINE.
231b8be0 Migrate to standard integral types.
14bef662 Modernize memcpy() and memmove() usage.
d674348a Improve zippy with 5-10%.
4dfcad9f assertion failure on darwin_x86_64, have to investigage
e1917874 assertion failure on darwin_x86_64, have to investigage
0faf5637 This cl does two things 1) It shaves of a few cycles from the data dependency chain. By using "shrd" instead of a load. 2) The important loop is finding small copies (4-12) which are either "copy 1", or "copy 2" depending if the offset fits <2048. It turns out that this is a branch that is mispredicted often. Due to the long dependency chain the CPU is running with IPC~1 anyway so we can freely add instructions to instead emit copies branchfree. This reduces the branch misspredicts from 15% to 11% (for BM_ZFlat/6 txt1) and from 5.6% to 4% (for BM_ZFlat/10 or pb).
0c7ed08a The result on protobuf benchmark is around 19%. Results vary by their propensity for compression. As the frequency of finding matches influences the amount of branch misspredicts and the amount of hashing.
3c77e014 1) Make the output pointer a local variable such it doesn't need a load add store on it's loop carried dependency chain. 2) Reduce the input pointer loop carried dependency chain from 7 cycles to 4 cycles by using pre-loading. This is a very subtle point. 3) Just brutally copy 64 bytes which removes a difficult to predict branch from the inner most loop. There is enough bandwidth to do so in the intrinsic cycles of the loop. 4) Implement limit pointers that include the slop region. This removes unnecessary instructions from the hot path. 5) It seems the removal of the difficult to predict branch has removed the code sensitivity to alignment, so remove the asm nop's.
9eabb7ba Cut a load from the critical dependency chain of the input pointer by speculating the uncommon case of COPY_4 is not happening.
cddd9c08 Improve comments in IncrementalCopy, add an assert.

git-subtree-dir: src/snappy
git-subtree-split: 2b63814b15a2aaae54b7943f0cd935892fae628f
t20100 added a commit to t20100/hdf5plugin that referenced this issue Sep 4, 2023
092190c38 Fix spelling errors (silx-kit#128)
b241ca0d0 Fix the CMake multiconfig var and test and tools handling (silx-kit#127)
7b33e4f19 Fix some make output issues (silx-kit#126)
51d221848 Documentation fixes and fixes from testing 1.1.1 (silx-kit#125)
4ae30429c CMake: (fix) HDF5 1.14.0 (silx-kit#121)
e1072ec5b Update Github actions  (silx-kit#118)
b021532ea Fix leak and consistify generic API with ZFP C API (silx-kit#117)
06408bb22 Enable CMake Testing (silx-kit#116)
eb5440043 Fix version info in docs
ad61dfa41 Revert "rework feat-cmake_tests CMake and replace scripts (silx-kit#111)" (silx-kit#115)
03b86413f rework feat-cmake_tests CMake and replace scripts (silx-kit#111)
08116fa9e Update README.md (silx-kit#114)
a10bd7bdb Merge branch 'master' of github.com:LLNL/H5Z-ZFP
1660d94a4 Fix h5repack test (silx-kit#107)
fd01bf1d7 mention generic interface
8d5ec2e74 Update docs (silx-kit#106)
c76f847d9 Added new endianness test (silx-kit#102)
8acf8244b fix reading of byte-swapped input files (silx-kit#95) (silx-kit#101)
62ab4eb6d Minor code clean-up. (silx-kit#103)
60032f640 Bugfix/hdf5 1.14.0 (silx-kit#99)
8fad85d72 CMake: (fix) MPI parallel HDF5 1.14.0 (silx-kit#97)
e81aec114 Don't flag a skipped test as a failed test.  (silx-kit#94)

git-subtree-dir: src/H5Z-ZFP
git-subtree-split: 092190c38d5760b3212ce4d96cced767b90aa189
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants