Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified packet capture interface, take 2 #166

Open
wants to merge 440 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
440 commits
Select commit Hold shift + click to select a range
4529e9b
Added a LGTM config file.
jaycedowell Jun 29, 2021
56ae21f
Merge branch 'disk-readers' into ibverb-support
jaycedowell Jun 30, 2021
6b2154e
Major re-write of Verbs support.
jaycedowell Jul 1, 2021
2d89136
Merge branch 'ibverb-support' of https://github.com/jaycedowell/bifro…
Jul 8, 2021
d463a25
Py3 re-fixes
Jul 13, 2021
55784ee
Remove some divergences from JD's ibverb-support branch
Jul 15, 2021
e39cf99
Make compiling actually work when XGPU isn't used
Jul 15, 2021
2e7fb1c
Don't use "pragma pack"
Jul 15, 2021
29536e7
Merge branch 'lwa352-like-top-py3' into lwa352
Jul 15, 2021
f46f364
Always include xgpu header file
Jul 15, 2021
5767a0e
Bigger packet buffers
Jul 15, 2021
fc4704b
re-speedup IBV (and presumably fix a bug)
Jul 20, 2021
15ef4d2
@realtimeradio found a missing "->next".
jaycedowell Jul 20, 2021
b482898
Use the upper 8 bits in the COR frame_count word to store the channel…
jaycedowell Jul 20, 2021
4347ab9
Missing 'n'.
jaycedowell Jul 20, 2021
109b04b
Merge branch 'disk-readers' into ibverb-support
jaycedowell Jul 20, 2021
35a7b46
Make beamformer GEMM do the right thing
Aug 17, 2021
e2b3b97
Update beamformer calls to match actual data ordering
Aug 17, 2021
787464e
The COR packets 'navg' field has the same units as 'time_tag'.
jaycedowell Aug 17, 2021
9a6e56a
Supply beam coeffs in chan x beam x input [C-ordered] order
Aug 18, 2021
b9bda79
Catch mmap error
Aug 27, 2021
b107bfc
Implement source blanking.
Nov 18, 2021
316871d
Revert "Implement source blanking."
jack-h Jan 9, 2022
8c0699f
Add option to transpose xgpu output
Jan 21, 2022
26f91b7
Added in IB verbs support for sending packets.
jaycedowell Mar 3, 2022
dc4c194
Update the docs.
jaycedowell Mar 3, 2022
8ed17e2
Initial work on RDMA for sending rings between servers.
jaycedowell Mar 3, 2022
c87addf
Merge branch 'ibverb-support' of https://github.com/jaycedowell/bifro…
jaycedowell Mar 22, 2022
d60f94e
Merge branch 'jaycedowell-ibverb-support' into ibverb-support
jaycedowell Mar 22, 2022
2ab86ca
Various fixes.
jaycedowell Mar 22, 2022
e208e37
Added in verbs and RDMA info.
jaycedowell Mar 23, 2022
0ecd7a5
Python3 fixes.
jaycedowell Mar 23, 2022
45b5c6d
MacOS fixes (but more are needed).
jaycedowell Mar 23, 2022
4c98211
More MacOS fixes.
jaycedowell Mar 23, 2022
e576775
Add in Socket.hpp to clear the last few problems on MacOS.
jaycedowell Mar 23, 2022
c326768
Move the RDMA buffer size control into configure and config.h.
jaycedowell Mar 23, 2022
1ae5f6e
Rebuild configure.
jaycedowell Mar 23, 2022
e768e2e
Is this the answer?
jaycedowell Mar 25, 2022
7cc53ec
Removing debugging.
jaycedowell Mar 25, 2022
e45956c
Merge branch 'master' into ibverb-support
jaycedowell Mar 29, 2022
12a6d7a
Fix library names in configure.
jaycedowell Mar 29, 2022
509e782
Add in missing object.
jaycedowell Mar 29, 2022
8585bdd
Use config.h.
jaycedowell Mar 29, 2022
120ca75
Update for libhwloc 2.
jaycedowell Mar 29, 2022
6fa6106
Also catch cudaMemoryTypeUnregistered in bfGetSpace().
jaycedowell Mar 29, 2022
0938ee8
Catch by reference.
jaycedowell Mar 29, 2022
9212dd2
I know it is only 16 bytes long.
jaycedowell Mar 29, 2022
474d870
Merge pull request #2 from jaycedowell/caltech-bifrost-dsp
jack-h Mar 31, 2022
e0ab683
Enable Verbs and RDMA support by default.
jaycedowell Apr 15, 2022
df487ac
Merge branch 'master' into ibverb-support
jaycedowell Jun 10, 2022
2398102
We already require C++11.
jaycedowell Jun 10, 2022
b591d93
Cleanup.
jaycedowell Jun 23, 2022
f512f22
Merge branch 'socket-deinline' into ibverb-support
league Aug 3, 2022
35fcb14
Change the sleep from 1 to 5 ms. Change the multicast address.
jaycedowell Aug 4, 2022
b2aa888
Skip the first two frames?
jaycedowell Aug 4, 2022
e0f83bb
Allow the 32 and 64-bit integer types. Allow the numpy 64-bit floati…
jaycedowell Aug 4, 2022
4ed1217
Use the explicit numpy integer/floating point types.
jaycedowell Aug 4, 2022
726ef67
Don't resize, reallocate.
jaycedowell Aug 4, 2022
df530a5
Update dates.
jaycedowell Aug 4, 2022
8b84db1
Debug.
jaycedowell Aug 4, 2022
db91254
Expand all loops.
jaycedowell Aug 4, 2022
5957cd3
Formatting.
jaycedowell Aug 5, 2022
38f520b
Back to 1 ms.
jaycedowell Aug 5, 2022
a3a35b2
Work on moving ring memory binding control over to hwloc.
jaycedowell Aug 8, 2022
e964ad9
Add hwloc into the software that gets installed.
jaycedowell Aug 8, 2022
180047e
Remove stray reference to BF_NUMA_ENABLED
league Aug 14, 2022
768b3fb
Add test of bifrost.version program
league Aug 15, 2022
0ba58be
Use unittest discover -v
league Aug 15, 2022
139590b
Fix make_dir so it chmods only if new dir created
league Aug 5, 2022
aa2b469
Fix the 'packets sent' stats.
jaycedowell Nov 1, 2022
80d516e
diable -> disable typo (#192)
jack-h Dec 6, 2022
854ffd1
Enable py3.10 in CI and tests
league Dec 6, 2022
2c51bb8
Pin ubuntu-20.04 for python-3.6
league Dec 6, 2022
b0d6740
Add in tests for SSE/AVX/AVX512 support.
jaycedowell Feb 16, 2023
cc15021
Rebuild configure.
jaycedowell Feb 16, 2023
f7e79f4
What using SSE/AVX for CHIPS?
jaycedowell Feb 16, 2023
01808d2
Merge the ethernet+IPv4+UDP headers into a single structure. Reuse t…
jaycedowell Feb 20, 2023
ac40ad4
Typos.
jaycedowell Feb 20, 2023
3dbc82b
Missed one.
jaycedowell Feb 20, 2023
6bf7807
Switch to pointer arithmetic.
jaycedowell Feb 20, 2023
e269e20
Boilerplate update.
jaycedowell Feb 20, 2023
6a02f67
Further cleanup.
jaycedowell Feb 20, 2023
974bb76
Revert.
jaycedowell Feb 20, 2023
1a08f1e
Move bf_comb_udp_hdr here and fix a bug.
jaycedowell Feb 20, 2023
b20462b
Moves and renames.
jaycedowell Feb 20, 2023
647e2b3
Catch by reference.
jaycedowell Feb 20, 2023
689312c
There must have been some kind of non-printing character in there...
jaycedowell Feb 20, 2023
8146012
More SSE.
jaycedowell Feb 20, 2023
51fd2c7
Typo.
jaycedowell Feb 20, 2023
395a263
Merge branch 'master' into ibverb-support
jaycedowell May 19, 2023
4452b71
Fixed AVX512 detection and report on SSE/AVX/AVX512 support.
jaycedowell May 31, 2023
9a628e7
More explicit.
jaycedowell May 31, 2023
bb09b87
Merge branch 'ibverb-support' of https://github.com/ledatelescope/bif…
jaycedowell Jun 1, 2023
8a953cc
Need to actually save files...
jaycedowell Jun 1, 2023
9088671
Revert to the ibverb-support version.
jaycedowell Jun 1, 2023
aa064b1
Focus on packet formats for now.
jaycedowell Jun 1, 2023
4e1d318
Focus on packet formats for now.
jaycedowell Jun 1, 2023
4972230
Move verbs buffer control into configure.
jaycedowell Jun 1, 2023
d07b153
lwa352_vbeam_* -> vbeam_*
jaycedowell Jun 1, 2023
b37bf9b
Remove some debugging.
jaycedowell Jun 1, 2023
9d67edb
Catch here as well.
jaycedowell Jun 1, 2023
bc6add7
Attempt to add a non-AVX version of the snap2 packet processor.
jaycedowell Jun 1, 2023
7432067
Now in formats/base.hpp.
jaycedowell Jun 1, 2023
d9736b4
Revert to the ibverb-support version.
jaycedowell Jun 1, 2023
70bf291
Ugh.
jaycedowell Jun 1, 2023
21210ba
This block seems to be causing problems in CI.
jaycedowell Jun 1, 2023
08e947b
Add in writer support for 8+8-bit complex DRX8.
jaycedowell Jul 18, 2023
e9a9a4b
Missed a couple of declaration changes.
jaycedowell Jul 18, 2023
516b9d0
Add BFpacketwriter_drx8_impl.
jaycedowell Jul 19, 2023
f9223b8
Add in support for DRX8 capture.
jaycedowell Jul 19, 2023
8859ca4
Add missing set_drx8 method.
jaycedowell Jul 19, 2023
f352ab7
Merge branch 'ibverb-support' into lwa352
jaycedowell Jul 20, 2023
02e2e94
Clean up header filler.
jaycedowell Jul 20, 2023
e783a24
Re-enable missing source blanking.
jaycedowell Jul 20, 2023
ae61dd8
Ugh.
jaycedowell Jul 20, 2023
a5cb91d
Use the TBF "unassigned" field to hold the number of stands.
jaycedowell Aug 25, 2023
c8c99b1
Update the TBF format to take the number of stands in after an unders…
jaycedowell Aug 25, 2023
cf4dbc4
Drop the frame size since it is now variable.
jaycedowell Aug 28, 2023
c966503
Add a helpful message for https://github.com/ledatelescope/bifrost/is…
jaycedowell Sep 5, 2023
3d24679
Protect 192.168.1.100 from matching 192.168.1.10.
jaycedowell Sep 14, 2023
00c65ee
Cleanup send flags.
jaycedowell Sep 14, 2023
d294fad
Cleanup send work request linking.
jaycedowell Sep 14, 2023
70e6a0e
Work on splitting Verbs into a send side and a receive side.
jaycedowell Sep 14, 2023
0b8bdcf
Verbs -> VerbsSend
jaycedowell Sep 14, 2023
927b897
Checksum offloading is only for send.
jaycedowell Sep 14, 2023
c4bf808
More Verbs -> VerbsSend.
jaycedowell Sep 14, 2023
89bcba1
A few more things related to the split.
jaycedowell Sep 14, 2023
58e234b
Tweak the AVX/AVX512 tests.
jaycedowell Sep 14, 2023
6f456a8
A few more things.
jaycedowell Sep 14, 2023
26216c4
Error message cleanup now that there is send and receive.
jaycedowell Sep 14, 2023
358de64
Cleanup naming.
jaycedowell Sep 15, 2023
6268a3a
Bad rename.
jaycedowell Sep 15, 2023
6023189
Remove duplicate declarations and add in a place to store a rate limit.
jaycedowell Sep 15, 2023
0ec3452
More work on hardware packet pacing.
jaycedowell Sep 15, 2023
97483f4
Fix and rename.
jaycedowell Sep 15, 2023
955ba45
Inverted logic.
jaycedowell Sep 15, 2023
9e4c054
Bits not bytes.
jaycedowell Sep 15, 2023
35d75be
Fix SSE/AVX/AVX512 support detection.
jaycedowell Sep 18, 2023
b4d0656
Fix the fix.
jaycedowell Sep 18, 2023
74385c8
Better?
jaycedowell Sep 18, 2023
6dd3ac1
Missed one LIBS -> NVCCLIBS.
jaycedowell Sep 18, 2023
9cd4056
Move RateLimiter instantiation frinto PacketWriterMethod to make it w…
jaycedowell Sep 18, 2023
b2ed582
Cleanups to get things to compile.
jaycedowell Sep 18, 2023
5c265ed
Extra _CPU_STATE.
jaycedowell Sep 20, 2023
e7edee8
Move CUDA gencode options into configure.
jaycedowell Sep 25, 2023
ebe95b7
Missing space.
jaycedowell Sep 25, 2023
f5f604b
We still need NVCC_GENCODE for the cuFFT library.
jaycedowell Sep 25, 2023
5d2e720
Ugh, typo.
jaycedowell Sep 25, 2023
9465265
Set a burst size for rate limiting and add a mechanism to wait for th…
jaycedowell Oct 4, 2023
54c6e96
Unused variable.
jaycedowell Oct 4, 2023
6f21a1c
Remove most of the popen() calls.
jaycedowell Oct 6, 2023
9573c02
Bug fix plus try to minimize how much gets reset at each call.
jaycedowell Oct 6, 2023
8124424
Ugh.
jaycedowell Oct 6, 2023
03a4586
Make sure we only accept complete ARP entries.
jaycedowell Oct 11, 2023
c421587
Request completion event notification as soon as the CQ is created.
jaycedowell Oct 11, 2023
4a7e633
Use a more detailed accounting of packets to make sure we don't overw…
jaycedowell Oct 11, 2023
c6e428b
Wrong return type.
jaycedowell Oct 11, 2023
f07c6d0
Use compatible headers.
jaycedowell Oct 11, 2023
88625a6
A couple of fixes to get things working.
jaycedowell Oct 11, 2023
0eb382b
Enable burst sizes for packet rate limiting for all writer methods.
jaycedowell Oct 27, 2023
75c74f6
Fix an off-by-one error from a header mis-match in how packets are ge…
jaycedowell Dec 4, 2023
fae28f7
Merge pull request #3 from jaycedowell/caltech-bifrost-dsp
jaycedowell Dec 8, 2023
84cde94
Merge remote-tracking branch 'upstream/ibverb-support' into lwa352
jaycedowell Dec 8, 2023
b0254e8
I thought this was in configure.ac already.
jaycedowell Dec 8, 2023
b16dc15
Fix a bad merge.
jaycedowell Dec 8, 2023
749f0ba
Give up on packet pacing for now.
jaycedowell Dec 8, 2023
e59a8ca
Make sure everything expects an OS index and use PU instead of CORE.
jaycedowell Dec 11, 2023
451650b
Additional verbs configuration options and packet pacing testing.
jaycedowell Dec 11, 2023
53e4189
Merge remote-tracking branch 'upstream/ibverb-support' into lwa352
jaycedowell Dec 11, 2023
860b522
Reset tmp before trying to find the NUMA node.
jaycedowell Dec 11, 2023
a2aea31
Merge remote-tracking branch 'upstream/ibverb-support' into lwa352
jaycedowell Dec 11, 2023
801ea2c
began simple header
dentalfloss1 Jan 19, 2024
c416496
progress
dentalfloss1 Jan 19, 2024
166c566
progress
dentalfloss1 Jan 24, 2024
5d04501
simple packet working
dentalfloss1 Jan 26, 2024
d043ac5
re-enable optimization
dentalfloss1 Jan 26, 2024
978e4d2
Missed a couple.
jaycedowell Jan 26, 2024
066f356
Also update the packet size so we know what to expect in the future.
jaycedowell Jan 26, 2024
a1cf369
There is so little to the header that we cannot tell if the observing…
jaycedowell Jan 26, 2024
c63f37e
Fixed not copying all the data to ring buffer
dentalfloss1 Feb 8, 2024
e89a650
simplify
dentalfloss1 Feb 8, 2024
c7478b1
Update simple.hpp
dentalfloss1 Feb 16, 2024
0407a30
Merge branch 'master' into ibverb-support
jaycedowell Feb 19, 2024
77ee338
Bad merge.
jaycedowell Feb 19, 2024
10f6437
Getting closer with this update.
jaycedowell Feb 19, 2024
03fa46a
Update configure.
jaycedowell Feb 19, 2024
08508b4
Missed one.
jaycedowell Feb 19, 2024
38264a8
Rebuild.
jaycedowell Feb 19, 2024
b6e6799
I think this is the correct fix.
jaycedowell Feb 19, 2024
4b9dd18
Merge branch 'ibverb-support' of https://github.com/ledatelescope/bif…
jaycedowell Feb 19, 2024
c9275ac
Lower cased names for the 'io' enum as well.
jaycedowell Feb 19, 2024
d1bb73a
Type hints.
jaycedowell Feb 19, 2024
c383f32
Type hints and cleanups.
jaycedowell Feb 19, 2024
7c07ef0
Drop some more Py2 stuff.
jaycedowell Feb 19, 2024
a9640e1
Typo.
jaycedowell Feb 19, 2024
67f9b70
Updates to also test on Py3.12 and clear up some Node.js 16 warnings.
jaycedowell Feb 20, 2024
82939d3
No imp module in Py3.12.
jaycedowell Feb 20, 2024
012aa2f
Drop the GUPPI tests in testbench.
jaycedowell Feb 20, 2024
53f0f2d
RegEx fix.
jaycedowell Feb 20, 2024
962339f
RegEx fix.
jaycedowell Feb 20, 2024
0c8f19b
Another regex fix.
jaycedowell Feb 20, 2024
567297b
not working yet
dentalfloss1 Feb 22, 2024
9c4306f
working disk io test for simple
dentalfloss1 Feb 23, 2024
6570bab
working test_udp
dentalfloss1 Feb 23, 2024
390f047
Merge branch 'ibverb-support' into add-simple-packet
dentalfloss1 Feb 23, 2024
9e5946a
remove debugging print statements
dentalfloss1 Feb 27, 2024
f5e371e
remove more debugging print statements
dentalfloss1 Feb 27, 2024
2906a95
undo line break
dentalfloss1 Feb 27, 2024
a7a5237
correct packet format size for simple
dentalfloss1 Feb 27, 2024
00756b7
Update packet_writer.hpp
dentalfloss1 Feb 27, 2024
801f6be
Update simple.hpp
dentalfloss1 Feb 27, 2024
463529b
Update packet_writer.hpp
dentalfloss1 Feb 27, 2024
d7ebb9b
Update test_disk_io.py
dentalfloss1 Feb 27, 2024
06ccc82
working on udp io test
dentalfloss1 Feb 27, 2024
36cf54e
remove debug file copy
dentalfloss1 Feb 27, 2024
66b0197
test udp io not working
dentalfloss1 Feb 27, 2024
4829990
retry
dentalfloss1 Feb 28, 2024
3fb4194
fix
dentalfloss1 Feb 28, 2024
f081fd1
working version of packet tests
dentalfloss1 Feb 28, 2024
8dce32d
Need be64toh, not 32
dentalfloss1 Mar 6, 2024
e1d950b
and the header filler too
dentalfloss1 Mar 6, 2024
30a03f9
debugging
dentalfloss1 Mar 7, 2024
9aff659
remove debugging
dentalfloss1 Mar 7, 2024
6458835
remove debugging
Mar 8, 2024
fe154b6
Merge branch 'add-simple-packet' of https://github.com/ledatelescope/…
Mar 8, 2024
3b27351
Unlock memory before free.
jaycedowell Mar 9, 2024
8d80917
Missed a couple unlocks. Plus, just use _limiter for the rate limit.
jaycedowell Mar 9, 2024
d97c645
Drop provide a default packet size.
jaycedowell Mar 9, 2024
012c9ee
Check both the upper and lower ends of the packet pacing range.
jaycedowell Mar 9, 2024
cfad6a1
Change assignment.
jaycedowell Mar 9, 2024
e443012
Make a copy of the ring's data (#231).
jaycedowell Mar 15, 2024
351000a
Refactor into a TestCase per packet format.
jaycedowell Mar 16, 2024
dfccdda
Merge branch 'ibverb-support' into add-simple-packet
jaycedowell Mar 16, 2024
edfaca0
Naming cleanup.
jaycedowell Mar 17, 2024
0e6d1d8
Even more cleanups to make the test suite more useful as an example.
jaycedowell Mar 17, 2024
6de4594
Merge branch 'ibverb-support' into add-simple-packet
jaycedowell Mar 17, 2024
03d5073
More with closing().
jaycedowell Mar 17, 2024
53cd504
Formatting.
jaycedowell Mar 17, 2024
46693b3
Formatting.
jaycedowell Mar 17, 2024
96bc19a
The SIMPLE tests seem sensitive to what timetag0 is.
jaycedowell Mar 17, 2024
c22556d
Merge pull request #225 from ledatelescope/add-simple-packet
jaycedowell Mar 19, 2024
4562c9d
Merge remote-tracking branch 'upstream/ibverb-support' into lwa352
jaycedowell Mar 19, 2024
e32a32b
Nice.
jaycedowell Mar 19, 2024
587044a
Don't shrink writer buffers.
jaycedowell Mar 22, 2024
607b20b
Cleanups plus a switch to non-temporal stores for the SSE/AVX branches.
jaycedowell Apr 18, 2024
ce76668
Copy real and complex at the same time.
jaycedowell Apr 18, 2024
1751d05
Fixes.
jaycedowell Apr 18, 2024
0ca5998
Another one.
jaycedowell Apr 18, 2024
9d18f89
Merge pull request #206 from realtimeradio/lwa352
jaycedowell Apr 19, 2024
a360f0b
Formatting.
jaycedowell Apr 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
strategy:
matrix:
os: [self-hosted, ubuntu-latest, macos-latest]
python-version: ['3.8', '3.10']
python-version: ['3.8', '3.10', '3.12']
include:
- os: ubuntu-20.04
python-version: '3.6'
Expand All @@ -39,6 +39,7 @@ jobs:
exuberant-ctags \
gfortran \
git \
libhwloc-dev \
libopenblas-dev \
pkg-config \
software-properties-common
Expand All @@ -50,8 +51,9 @@ jobs:
ctags-exuberant \
gawk \
gnu-sed \
hwloc \
pkg-config
- uses: actions/setup-python@v4.3.0
- uses: actions/setup-python@v5.0.0
with:
python-version: ${{ matrix.python-version }}
- name: "Software Install - Python"
Expand Down Expand Up @@ -110,8 +112,6 @@ jobs:
coverage run --source=bifrost.ring,bifrost,bifrost.pipeline test_fft.py
coverage run --source=bifrost.ring,bifrost,bifrost.pipeline your_first_block.py
python download_breakthrough_listen_data.py -y
coverage run --source=bifrost.ring,bifrost,bifrost.pipeline test_guppi.py
coverage run --source=bifrost.ring,bifrost,bifrost.pipeline test_guppi_reader.py
coverage run --source=bifrost.ring,bifrost,bifrost.pipeline test_fdmt.py ./testdata/pulsars/blc0_guppi_57407_61054_PSR_J1840%2B5640_0004.fil
coverage xml
- name: "Upload Coverage"
Expand Down
2 changes: 1 addition & 1 deletion Makefile.in
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ libbifrost:
test:
#$(MAKE) -C $(SRC_DIR) test
ifeq ($(HAVE_PYTHON),1)
cd test && ./download_test_data.sh ; python -m unittest discover
cd test && ./download_test_data.sh ; python -m unittest discover -v
endif
.PHONY: test
clean:
Expand Down
72 changes: 48 additions & 24 deletions config/cuda.m4
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@ AC_DEFUN([AX_CHECK_CUDA],
[enable_cuda=no],
[enable_cuda=yes])

NVCCLIBS=""
ac_compile_save="$ac_compile"
ac_link_save="$ac_link"
ac_run_save="$ac_run"

AC_SUBST([HAVE_CUDA], [0])
AC_SUBST([CUDA_VERSION], [0])
AC_SUBST([CUDA_HAVE_CXX20], [0])
Expand All @@ -38,31 +43,43 @@ AC_DEFUN([AX_CHECK_CUDA],

CXXFLAGS_save="$CXXFLAGS"
LDFLAGS_save="$LDFLAGS"
LIBS_save="$LIBS"
NVCCLIBS_save="$NVCCLIBS"

ac_compile='$NVCC -c $NVCCFLAGS conftest.$ac_ext >&5'
LDFLAGS="-L$CUDA_HOME/lib64 -L$CUDA_HOME/lib"
LIBS="$LIBS -lcuda -lcudart"
NVCCLIBS="$LIBS -lcuda -lcudart"

ac_link='$NVCC -o conftest$ac_exeext $NVCCFLAGS $LDFLAGS $LIBS conftest.$ac_ext >&5'
AC_LINK_IFELSE([
AC_LANG_PROGRAM([[
#include <cuda.h>
#include <cuda_runtime.h>]],
[[cudaMalloc(0, 0);]])],
[CUDA_VERSION=$( ${NVCC} --version | ${GREP} -Po -e "release.*," | cut -d, -f1 | cut -d\ -f2 )
CUDA_MAJOR=$( echo "${CUDA_VERSION}" | cut -d. -f1 )
if test "${CUDA_MAJOR}" -ge 10; then
AC_MSG_RESULT(yes - v$CUDA_VERSION)
else
AC_MSG_RESULT(no - found v$CUDA_VERSION)
fi],
[AC_MSG_RESULT(no - build failure)
AC_SUBST([HAVE_CUDA], [0])])
[],
[AC_SUBST([HAVE_CUDA], [0])])

if test "$HAVE_CUDA" = "1"; then
LDFLAGS="-L$CUDA_HOME/lib64 -L$CUDA_HOME/lib"
NVCCLIBS="$NVCCLIBS -lcuda -lcudart"

ac_link='$NVCC -o conftest$ac_exeext $NVCCFLAGS $LDFLAGS $LIBS $NVCCLIBS conftest.$ac_ext >&5'
AC_LINK_IFELSE([
AC_LANG_PROGRAM([[
#include <cuda.h>
#include <cuda_runtime.h>]],
[[cudaMalloc(0, 0);]])],
[CUDA_VERSION=$( ${NVCC} --version | ${GREP} -Po -e "release.*," | cut -d, -f1 | cut -d\ -f2 )
AC_MSG_RESULT(yes - v$CUDA_VERSION)],
[AC_MSG_RESULT(no)
AC_SUBST([HAVE_CUDA], [0])])
else
AC_MSG_RESULT(no)
AC_SUBST([HAVE_CUDA], [0])
fi

CXXFLAGS="$CXXFLAGS_save"
LDFLAGS="$LDFLAGS_save"
LIBS="$LIBS_save"
NVCCLIBS="$NVCCLIBS_save"
fi

if test "$HAVE_CUDA" = "1"; then
Expand Down Expand Up @@ -131,7 +148,7 @@ AC_DEFUN([AX_CHECK_CUDA],
CXXFLAGS="$CXXFLAGS -DBF_CUDA_ENABLED=1"
NVCCFLAGS="$NVCCFLAGS -DBF_CUDA_ENABLED=1"
LDFLAGS="$LDFLAGS -L$CUDA_HOME/lib64 -L$CUDA_HOME/lib"
LIBS="$LIBS -lcuda -lcudart -lnvrtc -lcublas -lcudadevrt -L. -lcufft_static_pruned -lculibos -lnvToolsExt"
NVCCLIBS="$NVCCLIBS -lcuda -lcudart -lnvrtc -lcublas -lcudadevrt -L. -lcufft_static_pruned -lculibos -lnvToolsExt"
fi

AC_ARG_WITH([gpu_archs],
Expand All @@ -150,11 +167,11 @@ AC_DEFUN([AX_CHECK_CUDA],

CXXFLAGS_save="$CXXFLAGS"
LDFLAGS_save="$LDFLAGS"
LIBS_save="$LIBS"
NVCCLIBS_save="$NVCCLIBS"

LDFLAGS="-L$CUDA_HOME/lib64 -L$CUDA_HOME/lib"
LIBS="-lcuda -lcudart"
ac_run='$NVCC -o conftest$ac_ext $LDFLAGS $LIBS conftest.$ac_ext>&5'
NVCCLIBS="-lcuda -lcudart"
ac_run='$NVCC -o conftest$ac_ext $LDFLAGS $LIBS $NVCCLIBS conftest.$ac_ext>&5'
AC_RUN_IFELSE([
AC_LANG_PROGRAM([[
#include <cuda.h>
Expand Down Expand Up @@ -204,7 +221,7 @@ AC_DEFUN([AX_CHECK_CUDA],

CXXFLAGS="$CXXFLAGS_save"
LDFLAGS="$LDFLAGS_save"
LIBS="$LIBS_save"
NVCCLIBS="$NVCCLIBS_save"
else
AC_SUBST([GPU_ARCHS], [$with_gpu_archs])
fi
Expand Down Expand Up @@ -234,10 +251,10 @@ AC_DEFUN([AX_CHECK_CUDA],

CXXFLAGS_save="$CXXFLAGS"
LDFLAGS_save="$LDFLAGS"
LIBS_save="$LIBS"
NVCCLIBS_save="$NVCCLIBS"

LDFLAGS="-L$CUDA_HOME/lib64 -L$CUDA_HOME/lib"
LIBS="-lcuda -lcudart"
NVCCLIBS="-lcuda -lcudart"
ac_run='$NVCC -o conftest$ac_ext $LDFLAGS $LIBS conftest.$ac_ext>&5'
AC_RUN_IFELSE([
AC_LANG_PROGRAM([[
Expand Down Expand Up @@ -275,7 +292,7 @@ AC_DEFUN([AX_CHECK_CUDA],

CXXFLAGS="$CXXFLAGS_save"
LDFLAGS="$LDFLAGS_save"
LIBS="$LIBS_save"
NVCCLIBS="$NVCCLIBS_save"
else
AC_SUBST([GPU_SHAREDMEM], [$with_shared_mem])
fi
Expand All @@ -293,11 +310,11 @@ AC_DEFUN([AX_CHECK_CUDA],
AC_MSG_CHECKING([for thrust pinned allocated support])
CXXFLAGS_save="$CXXFLAGS"
LDFLAGS_save="$LDFLAGS"
LIBS_save="$LIBS"
NVCCLIBS_save="$NVCCLIBS"

LDFLAGS="-L$CUDA_HOME/lib64 -L$CUDA_HOME/lib"
LIBS="-lcuda -lcudart"
ac_run='$NVCC -o conftest$ac_ext $LDFLAGS $LIBS conftest.$ac_ext>&5'
NVCCLIBS="-lcuda -lcudart"
ac_run='$NVCC -o conftest$ac_ext $LDFLAGS $LIBS $NVCCLIBS conftest.$ac_ext>&5'
AC_RUN_IFELSE([
AC_LANG_PROGRAM([[
#include <cuda.h>
Expand All @@ -311,6 +328,13 @@ AC_DEFUN([AX_CHECK_CUDA],

CXXFLAGS="$CXXFLAGS_save"
LDFLAGS="$LDFLAGS_save"
LIBS="$LIBS_save"
NVCCLIBS="$NVCCLIBS_save"
else
AC_SUBST([GPU_PASCAL_MANAGEDMEM], [0])
AC_SUBST([GPU_EXP_PINNED_ALLOC], [1])
fi

ac_compile="$ac_compile_save"
ac_link="$ac_link_save"
ac_run="$ac_run_save"
])
121 changes: 121 additions & 0 deletions config/intrinsics.m4
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
#
# SSE
#

AC_DEFUN([AX_CHECK_SSE],
[
AC_PROVIDE([AX_CHECK_SSE])
AC_ARG_ENABLE([sse],
[AS_HELP_STRING([--disable-sse],
[disable SSE support (default=no)])],
[enable_sse=no],
[enable_sse=yes])

AC_SUBST([HAVE_SSE], [0])

if test "$enable_sse" = "yes"; then
AC_MSG_CHECKING([for SSE support via '-msse'])

CXXFLAGS_temp="$CXXFLAGS -msse"

ac_run="$CXX -o conftest$ac_ext $CXXFLAGS_temp conftest.$ac_ext>&5"
AC_RUN_IFELSE([
AC_LANG_PROGRAM([[
#include <xmmintrin.h>]],
[[
__m128 x = _mm_set1_ps(1.0f);
x = _mm_add_ps(x, x);
return _mm_cvtss_f32(x) != 2.0f;]])],
[
CXXFLAGS="$CXXFLAGS -msse"
AC_SUBST([HAVE_SSE], [1])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])
fi
])



#
# AVX
#

AC_DEFUN([AX_CHECK_AVX],
[
AC_PROVIDE([AX_CHECK_AVX])
AC_ARG_ENABLE([avx],
[AS_HELP_STRING([--disable-avx],
[disable AVX support (default=no)])],
[enable_avx=no],
[enable_avx=yes])

AC_SUBST([HAVE_AVX], [0])

if test "$enable_avx" = "yes"; then
AC_MSG_CHECKING([for AVX support via '-mavx'])

CXXFLAGS_temp="$CXXFLAGS -mavx"
ac_run_save="$ac_run"

ac_run="$CXX -o conftest$ac_ext $CXXFLAGS_temp conftest.$ac_ext>&5"
AC_RUN_IFELSE([
AC_LANG_PROGRAM([[
#include <immintrin.h>]],
[[
__m256d x = _mm256_set1_pd(1.0);
x = _mm256_add_pd(x, x);
return _mm256_cvtsd_f64(x) != 2.0;]])],
[
CXXFLAGS="$CXXFLAGS -mavx"
AC_SUBST([HAVE_AVX], [1])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])

ac_run="$ac_run_save"
fi
])

#
# AVX512
#

AC_DEFUN([AX_CHECK_AVX512],
[
AC_PROVIDE([AX_CHECK_AVX512])
AC_ARG_ENABLE([avx512],
[AS_HELP_STRING([--disable-avx512],
[disable AVX512 support (default=no)])],
[enable_avx512=no],
[enable_avx512=yes])

AC_SUBST([HAVE_AVX512], [0])

if test "$enable_avx512" = "yes"; then
AC_MSG_CHECKING([for AVX-512 support via '-mavx512f'])

CXXFLAGS_temp="$CXXFLAGS -mavx512f"
ac_run_save="$ac_run"

ac_run="$CXX -o conftest$ac_ext $CXXFLAGS_temp conftest.$ac_ext>&5"
AC_RUN_IFELSE([
AC_LANG_PROGRAM([[
#include <immintrin.h>]],
[[
__m512d x = _mm512_set1_pd(1.0);
x = _mm512_add_pd(x, x);
return _mm512_cvtsd_f64(x) != 2.0;]])],
[
CXXFLAGS="$CXXFLAGS -mavx512f"
AC_SUBST([HAVE_AVX512], [1])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])

ac_run="$ac_run_save"
fi
])