Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble Configuring HDF5 #20

Closed
fredt00 opened this issue Jan 11, 2024 · 8 comments
Closed

Trouble Configuring HDF5 #20

fredt00 opened this issue Jan 11, 2024 · 8 comments

Comments

@fredt00
Copy link

fredt00 commented Jan 11, 2024

Hi,

I'm having some trouble building the code with HDF5 enabled. I configured with the options:

FC=FC=mpif90 ./configure --enable-simd=sse --enable-mcmodel=large --enable-hdf5 --disable-gpu

And then followed the instructions in the examples to link HDF5 correctly (https://github.com/nbody6ppgpu/Nbody6PPGPU-beijing/tree/stable/examples). However I found that I kept getting errors like this:

custom_output_facility.o: In function output_merger_':
custom_output_facility.F:(.text+0x3cc4): undefined reference to __h5g_MOD_h5gcreate_f' custom_output_facility.F:(.text+0x4766): undefined reference to __h5f_MOD_h5fflush_f'
custom_output_facility.F:(.text+0x4775): undefined reference to __h5g_MOD_h5gclose_f' custom_output_facility.F:(.text+0x48ba): undefined reference to '__h5g_MOD_h5gopen_f'

In the end I finally managed to execute:
make clean && make -j

But I had to edit the build/Makefile created by configure to read on line 25:
HDF5_FLAGS = -D H5OUTPUT -I/usr/include/hdf5/openmpi

and on lines 120 and 129:

$(RESULT): $(OBJECTS) $(EXTRAOBJ) $(FC) $(FFLAGS) $(LDFLAGS) $(OBJECTS) $(EXTRAOBJ) -lstdc++ -L/usr/lib/x86_64-linux-gnu/hdf5/openmpi -lhdf5_fortran

nb6++dumpb2a: dump_btoa.F $(FC) $(FFLAGS) $^ -o nb6++dumpb2a -L/usr/lib/x86_64-linux-gnu/hdf5/openmpi -lhdf5_fortran

This successfully created nbody6++.sse.mpi.hdf5. However, when I execute this on an input file that worked with the version I had before I enabled hdf5, immediately get the error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x15182960b2ed in ???
#1 0x15182960a503 in ???
#2 0x151828a58f0f in ???
#3 0x151827bd62f9 in ???
#4 0x151827bd6691 in ???
#5 0x151827bd66f0 in ???
#6 0x151828200264 in ???
#7 0x1518282212aa in ???
#8 0x151829a18a17 in ???
#9 0x563082f22bcd in ???
#10 0x563082f1ea65 in ???
#11 0x151828a3bc86 in ???
#12 0x563082f1ec09 in ???
#13 0xffffffffffffffff in ???
/mnt/zfsusers/fthompson/nb6_exec: line 4: 28113 Segmentation fault (core dumped) $HOME/Nbody6PPGPU-beijing/build/nbody6++.sse.mpi.hdf5 < $1

Any help with this would be much appreciated!

@scaedufax
Copy link
Contributor

scaedufax commented Jan 11, 2024

Hi @fredt00,

unfortunately HDF5 build is a bit buggy and requires a few tricks by hand.

Generally the --enable-hdf5 flag is entirely broken as of now, and you should refrain from using it during ./configure.

The key issues seems to be hdf5_fortran not being linked correctly. I think you've linked hdf5_fortran in the wrong places.

The overall steps for building Nbody6++GPU with HDF5 support are as follows (details may vary):

  1. Make sure to use ./configure without --enable-hdf5 (your other flags should be fine. You might want to consider --disable-mpi depending on your setup, or for testing purposes, though this is definitely not the issue), so
    ./configure --enable-simd=sse --enable-mcmodel=large --disable-gpu
    (note that omitting --enable-hdf5 should also rid you of the necessity of setting the fortran compiler manually FC=mpif90 and do the magic automatically!)
  2. Ensure that line 25 in build/Makefile looks something like
    HDF5_FLAGS = -D H5OUTPUT -I/path/to/hdf5_fortran/include -L/path/to/hdf5_fortran/lib -lhdf5_fortran
    make sure to adjust the /path/to/[...] stuff i.e., from what I can see in your case it should be:
    HDF5_FLAGS = -D H5OUTPUT -I/usr/include/hdf5/openmpi -L/usr/lib/x86_64-linux-gnu/hdf5/openmpi -lhdf5_fortran
  3. Ensure that line 30 in build/Makefile contains ${HDF5_FLAGS}, i.e., it looks something like this:
    FFLAGS = -O3 -fPIC -mcmodel=large -fopenmp -I../include $(GPU_FLAGS) $(MPI_FLAGS) ${SIMD_FLAGS} ${OMP_FLAGS} ${HDF5_FLAGS}`
  4. (Optionally but for convention) append .hdf5 to RESULT in both build/Makefile and Makefile, for example
    RESULT = nbody6++.sse.mpi.hdf5
  5. build using make clean and make -j

You should not need to touch lines 120 and 129 if you follow theses steps. Let me know if it helps! The key lines in the Makefile are 25 (HDF5_FLAGS) and 30 (FFLAGS).

On a side note: I know that this all works using mpif77, and if I remember correctly, that's what's used most of the time. I don't know about mpif90.
EDIT: just tested mpif90-compiler and it works just fine...

best wishes
Uli

@fredt00
Copy link
Author

fredt00 commented Jan 12, 2024

Hi Uli,

Thanks for the quick reply!

I followed your steps, however I still get errors when I do make clean && make -j:

custom_output_facility.o: In function hdf5_close_.part.0': custom_output_facility.F:(.text+0x31): undefined reference to __h5f_MOD_h5fclose_f'
custom_output_facility.o: In function hdf5_write_integer_vector_as_dset_': custom_output_facility.F:(.text+0x28c): undefined reference to __h5s_MOD_h5screate_simple_f'
custom_output_facility.F:(.text+0x2d9): undefined reference to __h5p_MOD_h5pcreate_f' custom_output_facility.F:(.text+0x2ed): undefined reference to __h5p_MOD_h5pset_chunk_f'
custom_output_facility.F:(.text+0x32b): undefined reference to __h5d_MOD_h5dcreate_f' custom_output_facility.F:(.text+0x3a2): undefined reference to __h5_gen_MOD_h5dwrite_ikind_4_rank_1'
custom_output_facility.F:(.text+0x3cb): undefined reference to __h5d_MOD_h5dclose_f' custom_output_facility.F:(.text+0x433): undefined reference to __h5s_MOD_h5screate_simple_f'
custom_output_facility.F:(.text+0x49a): undefined reference to __h5d_MOD_h5dset_extent_f' custom_output_facility.F:(.text+0x4e8): undefined reference to __h5s_MOD_h5screate_simple_f'
custom_output_facility.F:(.text+0x505): undefined reference to __h5d_MOD_h5dget_space_f' custom_output_facility.F:(.text+0x54b): undefined reference to __h5s_MOD_h5sselect_hyperslab_f'
custom_output_facility.F:(.text+0x5c9): undefined reference to __h5_gen_MOD_h5dwrite_ikind_4_rank_1' custom_output_facility.F:(.text+0x5d9): undefined reference to __h5s_MOD_h5sclose_f'
custom_output_facility.o: In function hdf5_write_real_vector_as_dset_': custom_output_facility.F:(.text+0x750): undefined reference to __h5s_MOD_h5screate_simple_f'
custom_output_facility.F:(.text+0x79d): undefined reference to __h5p_MOD_h5pcreate_f' custom_output_facility.F:(.text+0x7b1): undefined reference to __h5p_MOD_h5pset_chunk_f'
custom_output_facility.F:(.text+0x802): undefined reference to __h5d_MOD_h5dcreate_f' custom_output_facility.F:(.text+0x873): undefined reference to __h5_gen_MOD_h5dwrite_rkind_4_rank_1'
custom_output_facility.F:(.text+0x895): undefined reference to __h5d_MOD_h5dclose_f' custom_output_facility.F:(.text+0x8fb): undefined reference to __h5s_MOD_h5screate_simple_f'
custom_output_facility.F:(.text+0x972): undefined reference to __h5d_MOD_h5dset_extent_f' custom_output_facility.F:(.text+0x9b7): undefined reference to __h5s_MOD_h5screate_simple_f'
custom_output_facility.F:(.text+0x9d4): undefined reference to __h5d_MOD_h5dget_space_f' custom_output_facility.F:(.text+0xa1a): undefined reference to __h5s_MOD_h5sselect_hyperslab_f'
custom_output_facility.o: In function hdf5_write_attribute_scalar_real_': custom_output_facility.F:(.text+0xb3b): undefined reference to __h5s_MOD_h5screate_f'
custom_output_facility.F:(.text+0xbab): undefined reference to __h5a_MOD_h5acreate_f' custom_output_facility.F:(.text+0xc0c): undefined reference to __h5_gen_MOD_h5awrite_rkind_4_rank_0'
custom_output_facility.F:(.text+0xc33): undefined reference to __h5a_MOD_h5aclose_f' custom_output_facility.F:(.text+0xc48): undefined reference to __h5s_MOD_h5sclose_f'
custom_output_facility.o: In function hdf5_write_attribute_scalar_integer_': custom_output_facility.F:(.text+0xccb): undefined reference to __h5s_MOD_h5screate_f'
custom_output_facility.F:(.text+0xd3b): undefined reference to __h5a_MOD_h5acreate_f' custom_output_facility.F:(.text+0xd94): undefined reference to __h5_gen_MOD_h5awrite_ikind_4_rank_0'
custom_output_facility.F:(.text+0xdbb): undefined reference to __h5a_MOD_h5aclose_f' custom_output_facility.F:(.text+0xdd0): undefined reference to __h5s_MOD_h5sclose_f'
custom_output_facility.o: In function hdf5_init_': custom_output_facility.F:(.text+0xea2): undefined reference to _h5lib_MOD_h5open_f'
custom_output_facility.F:(.text+0xf3d): undefined reference to __h5f_MOD_h5fcreate_f' custom_output_facility.o: In function hdf5_write_attribute_simple_real
':
custom_output_facility.F:(.text+0x10d9): undefined reference to __h5s_MOD_h5screate_f' custom_output_facility.F:(.text+0x1130): undefined reference to __h5a_MOD_h5acreate_f'
custom_output_facility.F:(.text+0x11be): undefined reference to __h5_gen_MOD_h5awrite_rkind_4_rank_1' custom_output_facility.F:(.text+0x11cd): undefined reference to _h5a_MOD_h5aclose_f'
custom_output_facility.F:(.text+0x11e2): undefined reference to __h5s_MOD_h5sclose_f' custom_output_facility.o: In function output_single
':
custom_output_facility.F:(.text+0x1697): undefined reference to __h5g_MOD_h5gcreate_f' custom_output_facility.F:(.text+0x18ae): undefined reference to __h5g_MOD_h5gopen_f'
custom_output_facility.F:(.text+0x21ef): undefined reference to __h5f_MOD_h5fflush_f' custom_output_facility.F:(.text+0x21fe): undefined reference to __h5g_MOD_h5gclose_f'
custom_output_facility.o: In function output_binary_': custom_output_facility.F:(.text+0x255d): undefined reference to __h5g_MOD_h5gcreate_f'
custom_output_facility.F:(.text+0x337a): undefined reference to __h5f_MOD_h5fflush_f' custom_output_facility.F:(.text+0x3389): undefined reference to _h5g_MOD_h5gclose_f'
custom_output_facility.F:(.text+0x34d0): undefined reference to __h5g_MOD_h5gopen_f' custom_output_facility.o: In function output_merger
':
custom_output_facility.F:(.text+0x3cc4): undefined reference to __h5g_MOD_h5gcreate_f' custom_output_facility.F:(.text+0x4766): undefined reference to __h5f_MOD_h5fflush_f'
custom_output_facility.F:(.text+0x4775): undefined reference to __h5g_MOD_h5gclose_f' custom_output_facility.F:(.text+0x48ba): undefined reference to __h5g_MOD_h5gopen_f'
collect2: error: ld returned 1 exit status
Makefile:121: recipe for target 'nbody6++.sse.mpi.hdf5' failed
make[1]: *** [nbody6++.sse.mpi.hdf5] Error 1
make[1]: Leaving directory '/mnt/zfsusers/fthompson/Nbody6PPGPU-beijing/build'
Makefile:15: recipe for target 'nbody6++.sse.mpi.hdf5' failed
make: *** [nbody6++.sse.mpi.hdf5] Error 2

This only goes away if I instead specify the linker commands after the object files in line 120 and 129 like before. But then I still get the segmentation fault on an input file that worked before I tried enabling hdf5.

Best,
Fred

@FrancescoFlammini
Copy link
Member

Hi fred,

can you please share with us your "config.log" file?

In this way we can understand what compilers are compiling the files. Also it would help to update your compilers just in case (this solve several issues normally).

Alsoalso, did you add that hdf5 in the compilation (like Uli suggested in point 4) or you forgot to remove "-enable--hdf5" from the \configure command? If that is the latter, the issue is that one.

Cheers,
Francesco

@fredt00
Copy link
Author

fredt00 commented Jan 12, 2024

Hi Francesco,

Here's my config.log file:
config.log

I'm running this on a shared cluster so I don't have the permissions to update the compilers, but I'll ask if this is possible.

And yes, I removed --enable-hdf5 and added ${HDF5_FLAGS} to Makefile with .hdf5 at the end of the result.

@FrancescoFlammini
Copy link
Member

Hi Fred,

I checked the configure and I see that you don't have installed cuda. Cuda is necessary for mpi computation. If you have it installed, you need to update the path in your ".bashrc" file:

export PATH=$PATH:/usr/local/cuda-7.5/bin
export CUDADIR=/usr/local/cuda-7.5

Here it is CUDA 7.5, but you may have some different version. If you have a non-nvidia gpu (as intel), you cannot use it for now (we are currently also try for AMD, but work in progress).

No worries, you can still use the code without it (just disable-mpi in the configure). If you still encounter issues, please come back here.

best regards,
Francesco

@fredt00
Copy link
Author

fredt00 commented Jan 12, 2024

Hi Francesco,

Strangely I am able to build nbody6++.sse.mpi without any cuda specified as long as I do --diable-gpu and it still runs correctly with multiple cores.

However, even when I specify cuda (version 12.3), it fails to build as soon as I add the above flags to the Makefile for hdf5 with the same 'undefined reference' errors. Then if I remove the references to hdf5, I am able to build nbody6++.sse.mpi.gpu as well.

I tried --disable-mpi as well and was able to build nbody6++.sse. Adding the hdf5 information to the Makefile again generates the same errors.

However, instead editing the Makefile to read on lines 25, 30, 120 and 129 respectively:

HDF5_FLAGS = -D H5OUTPUT -I/usr/include/hdf5/openmpi

FFLAGS = -I../extra_inc/nompi -O3 -fPIC -mcmodel=large -fopenmp -I../include ${SIMD_FLAGS} $(GPU_FLAGS) ${OMP_FLAGS} ${HDF5_FLAGS}

$(RESULT): $(OBJECTS) $(EXTRAOBJ) $(FC) $(FFLAGS) $(LDFLAGS) $(OBJECTS) $(EXTRAOBJ) -lstdc++ -L/usr/lib/x86_64-linux-gnu/hdf5/openmpi -lhdf5_fortran

nb6++dumpb2a: dump_btoa.F $(FC) $(FFLAGS) $^ -o nb6++dumpb2a -L/usr/lib/x86_64-linux-gnu/hdf5/openmpi -lhdf5_fortran

successfully built nbody6++.sse.hdf5 and I was able to run a 1000 particle simulation with the correct .h5part output.

I think the key is specifying the linker commands for hdf5 after the object files as I think we require the library with symbol definitions to follow the object code containing symbol references. That said I'm not sure why when I build nbody6++.sse.mpi.hdf5 it does not run correctly...

Best,
Fred

@scaedufax
Copy link
Contributor

scaedufax commented Jan 14, 2024

Hi Fred,

I think I've managed to encounter similar errors you have. There seems to be something with the HDF5 files when not using MPI and/or cuda. That might also be related to older versions...

In short what did the trick for me was using the h5fc wrapper as Fortran compiler.

So basically follow my first answer, add the --disable-mpi flag to the configure script, and make sure to set FC=h5fc in line 14 of build/Makefile after step 1 and before step 5. At least that's how I managed to run the 1k test simulation on an older ubuntu bionic machine.

I hope this helps and I'll find some time soon to investigate more closely what is the issue here...

kind regards
Uli

@kaiwu-astro
Copy link
Member

I am closing this issue because there is no conversation for 6 months. If you have any further question, feel free to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants