Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve nanopolish build suite #145

Closed
mmokrejs opened this issue May 22, 2017 · 10 comments
Closed

Improve nanopolish build suite #145

mmokrejs opened this issue May 22, 2017 · 10 comments

Comments

@mmokrejs
Copy link
Contributor

Hi,
I would like to prepare a package definition file for nanopolish in Gentoo Linux. I see several problems here which prevent me to do that actually.

  1. make calls wget to download file. That should be moved to download or fetch target and not called by default.

  2. a top-level configure would ideally be used or at least, Makefile be more configurable so that it would use my system-wide installed hdf5 library (no reason to fetch and compile this for me at all).

  3. compile process breaks on hdf5 anyway:

  CCLD     libhdf5.la
gcc: error: .libs/H5.o: No such file or directory
gcc: error: .libs/H5checksum.o: No such file or directory
gcc: error: .libs/H5dbg.o: No such file or directory
gcc: error: .libs/H5system.o: No such file or directory
gcc: error: .libs/H5timer.o: No such file or directory
gcc: error: .libs/H5trace.o: No such file or directory
gcc: error: .libs/H5A.o: No such file or directory
gcc: error: .libs/H5Abtree2.o: No such file or directory
gcc: error: .libs/H5Adense.o: No such file or directory
gcc: error: .libs/H5Adeprec.o: No such file or directory
gcc: error: .libs/H5Aint.o: No such file or directory
gcc: error: .libs/H5Atest.o: No such file or directory
gcc: error: .libs/H5AC.o: No such file or directory
gcc: error: .libs/H5B.o: No such file or directory
gcc: error: .libs/H5Bcache.o: No such file or directory
gcc: error: .libs/H5Bdbg.o: No such file or directory
gcc: error: .libs/H5B2.o: No such file or directory
gcc: error: .libs/H5B2cache.o: No such file or directory
gcc: error: .libs/H5B2dbg.o: No such file or directory
gcc: error: .libs/H5B2hdr.o: No such file or directory
gcc: error: .libs/H5B2int.o: No such file or directory
gcc: error: .libs/H5B2stat.o: No such file or directory
gcc: error: .libs/H5B2test.o: No such file or directory
gcc: error: .libs/H5C.o: No such file or directory
gcc: error: .libs/H5CS.o: No such file or directory
gcc: error: .libs/H5D.o: No such file or directory
gcc: error: .libs/H5Dbtree.o: No such file or directory
gcc: error: .libs/H5Dchunk.o: No such file or directory
gcc: error: .libs/H5Dcompact.o: No such file or directory
gcc: error: .libs/H5Dcontig.o: No such file or directory
gcc: error: .libs/H5Ddbg.o: No such file or directory
gcc: error: .libs/H5Ddeprec.o: No such file or directory
gcc: error: .libs/H5Defl.o: No such file or directory
gcc: error: .libs/H5Dfill.o: No such file or directory
gcc: error: .libs/H5Dint.o: No such file or directory
gcc: error: .libs/H5Dio.o: No such file or directory
make[3]: *** [Makefile:939: libhdf5.la] Error 1
make[3]: Leaving directory '/scratch/work/project/bio/nanopolish/hdf5-1.8.14/src'
make[2]: *** [Makefile:850: all] Error 2
make[2]: Leaving directory '/scratch/work/project/bio/nanopolish/hdf5-1.8.14/src'
make[1]: *** [Makefile:586: all-recursive] Error 1
make[1]: Leaving directory '/scratch/work/project/bio/nanopolish/hdf5-1.8.14'
make[1]: Entering directory '/scratch/work/project/bio/nanopolish/hdf5-1.8.14'
Making install in src
make[2]: Entering directory '/scratch/work/project/bio/nanopolish/hdf5-1.8.14/src'
  CCLD     libhdf5.la
gcc: error: .libs/H5.o: No such file or directory
gcc: error: .libs/H5checksum.o: No such file or directory
gcc: error: .libs/H5dbg.o: No such file or directory
gcc: error: .libs/H5system.o: No such file or directory
gcc: error: .libs/H5timer.o: No such file or directory
gcc: error: .libs/H5trace.o: No such file or directory
gcc: error: .libs/H5A.o: No such file or directory
gcc: error: .libs/H5Abtree2.o: No such file or directory
gcc: error: .libs/H5Adense.o: No such file or directory
gcc: error: .libs/H5Adeprec.o: No such file or directory
gcc: error: .libs/H5Aint.o: No such file or directory
gcc: error: .libs/H5Atest.o: No such file or directory
gcc: error: .libs/H5AC.o: No such file or directory
gcc: error: .libs/H5B.o: No such file or directory
gcc: error: .libs/H5Bcache.o: No such file or directory
gcc: error: .libs/H5Bdbg.o: No such file or directory
gcc: error: .libs/H5B2.o: No such file or directory
gcc: error: .libs/H5B2cache.o: No such file or directory
gcc: error: .libs/H5B2dbg.o: No such file or directory
gcc: error: .libs/H5B2hdr.o: No such file or directory
gcc: error: .libs/H5B2int.o: No such file or directory
gcc: error: .libs/H5B2stat.o: No such file or directory
gcc: error: .libs/H5B2test.o: No such file or directory
gcc: error: .libs/H5C.o: No such file or directory
gcc: error: .libs/H5CS.o: No such file or directory
gcc: error: .libs/H5D.o: No such file or directory
gcc: error: .libs/H5Dbtree.o: No such file or directory
gcc: error: .libs/H5Dchunk.o: No such file or directory
gcc: error: .libs/H5Dcompact.o: No such file or directory
gcc: error: .libs/H5Dcontig.o: No such file or directory
gcc: error: .libs/H5Ddbg.o: No such file or directory
gcc: error: .libs/H5Ddeprec.o: No such file or directory
gcc: error: .libs/H5Defl.o: No such file or directory
gcc: error: .libs/H5Dfill.o: No such file or directory
gcc: error: .libs/H5Dint.o: No such file or directory
gcc: error: .libs/H5Dio.o: No such file or directory
make[2]: *** [Makefile:939: libhdf5.la] Error 1
make[2]: Leaving directory '/scratch/work/project/bio/nanopolish/hdf5-1.8.14/src'
make[1]: *** [Makefile:586: install-recursive] Error 1
make[1]: Leaving directory '/scratch/work/project/bio/nanopolish/hdf5-1.8.14'
Makefile:100: .depend: No such file or directory
make: *** [Makefile:68: lib/libhdf5.a] Error 2
$ git log | head -n 1
commit 4d012ca95c28af4cdadd472b0ef702c64b6a2b7b
$
  1. Also eigen is downloaded by wget from http://eigen.tuxfamily.org/ . Again, this shoudl be disabled by default. I already have dev-cpp/eigen-3.3.3 installed, site-wide.

  2. Makefile should not override users CC, CXX, CFLAGS, CXXFLAGS. I could append to it important options. I propose the following:

CXXFLAGS ?= -g -O3
CXXFLAGS += -std=c++11 -fopenmp
CFLAGS?=-O3
CXX?=g++
CC?=gcc

This way I could use:

CFLAGS="-O2 -pipe -march=native -ftree-vectorize" CXXFLAGS="-O2 -pipe -march=native -ftree-vectorize" make

This is less agressive than -O3 while making AVX instructions use the respective YMM registers (instead of XMM).

  1. There is no clean target in the Makefile.
@jts
Copy link
Owner

jts commented May 22, 2017

Hi,

Thanks for the feedback, I'd like to improve the build script. A few comments on the issues you raised:

  1. Most package managers will install the default HDF5, which is not threadsafe. This will cause problems for many users so we decided to automatically install a known version of HDF5 compiled with the correct options. I'm open to moving these commands to a download or fetch but I want to keep the automatic download as the default.

  2. If you set HDF5=noinstall (or any string really) it should use your own HDF5 and skip the download step (please tell me if this doesn't work).

  3. Can you provide instructions on how to reproduce this?

  4. Similar to HDF5, this is automatically installed for user convenience. Many users are not able to install their own packages so we try to make nanopolish more accessible by automatically downloading dependencies. I can provide an option to turn this off though.

  5. Agreed, this should be changed. Thanks for the suggestions.

  6. I'll fix this.

Jared

@mmokrejs
Copy link
Contributor Author

Hi Jared,

  1. I'm open to moving these commands to a download or fetch but I want to keep the automatic download as the default.

OK, please move them but keep them called by default as you wish. It would be nice if an automated check for $PREFIX/usr/lib64/libhdf5.settings could check the system-wide installation support threads. Mine not:

$ tail /usr/lib64/libhdf5.settings
Features:
---------
                  Parallel HDF5: yes
             High Level library: yes
                   Threadsafety: no
            Default API Mapping: v18
 With Deprecated Public Symbols: yes
         I/O filters (external): deflate(zlib)
                            MPE: 
                     Direct VFD: no
                        dmalloc: no
Clear file buffers before write: yes
           Using memory checker: no
         Function Stack Tracing: no
      Strict File Format Checks: no
   Optimization Instrumentation: no

Currently on Gentoo we support the following configure options. In my case I opted for mpi but no threads support but that can be changed.

$ emerge -pv hdf5
[ebuild   R    ] sci-libs/hdf5-1.8.18:0/1.8.18::gentoo  USE="cxx hl mpi zlib -debug -examples -fortran -fortran2003 -static-libs -szip -threads" 0 KiB
  1. If you set HDF5=noinstall (or any string really) it ...

OK, works for me.

HDF5=noinstall CFLAGS="-O2 -pipe -march=native -ftree-vectorize" CXXFLAGS="-O2 -pipe -march=native -ftree-vectorize" make

  1. I can provide an option to turn this off though.

Please do, I will test afterwards. Currently, I compiled eigen with:

$ emerge -pv dev-cpp/eigen
[ebuild   R    ] dev-cpp/eigen-3.3.3:3::gentoo  USE="openmp (-altivec) -c++11 -cuda -debug -doc (-neon) {-test}" CPU_FLAGS_X86="avx sse2 sse3 sse4_1 sse4_2 ssse3 -avx2 -f16c -fma3" 0 KiB

@jts
Copy link
Owner

jts commented May 23, 2017

7834207 adds support for EIGEN=nofetch. Please let me know if this works for you. There is already a clean target in the Makefile so I did not need to add this.

@jts jts closed this as completed Jun 28, 2017
@mmokrejs
Copy link
Contributor Author

mmokrejs commented Jul 15, 2017

Hi,
first of all, I do not see in README.md a note what version of htslib and fast5 is currently needed/provided. From git logs I see last commit is 8ca328a177f27508bf214da11267d85d52d04c83. This is a post-1.2.1 snapshot AFAICT.

The following still does not work for me:

HDF5="noinstall" EIGEN="nofetch" emake compile HTS_LIB=-lhts HTS_INCLUDE=-I"${D}"/include/htslib

as it still used git to fetch the sources and used wget to the hdf5 and eigen tarballs. I want to ensure none of the bundled versions has a chance to sneak into the final binaries so I want to run

rm -rf hdf5* eigen htslib fast5

before calling make compile as listed above with all args.

I would also like to call rm -rf libs before linking step but the Makefile would need to be mor efaine-grained.

I am not familiar with the internals of hdf5 but do you really require threads support enabled? Unfortunately one cannot enable mpi support concurrently with threads or cxx or even fortran.
Here is what is feasible:

- sci-libs/hdf5-1.8.18::gentoo (Change USE: +threads, this change violates use flag constraints defined by sci-libs/hdf5-1.8.18: 'threads? ( !cxx !mpi !fortran !hl ) fortran2003? ( fortran )')

Basically, if I opt for threads support then all other features are in a conflict and are disabled.

@mmokrejs
Copy link
Contributor Author

mmokrejs commented Jul 15, 2017

Here is the package definition file for Gentoo Linux, not yet pushed out into the official tree for reasons above:

nanopolish-9999.ebuild.txt

@jts
Copy link
Owner

jts commented Jul 15, 2017

first of all, I do not see in README.md a note what version of htslib and fast5 is currently needed/provided

the commit IDs are visible from the main page of the repo (fast5 @ e6e577c, htslib @ 8ca328a)

The following still does not work for me:
HDF5="noinstall" EIGEN="nofetch" emake compile HTS_LIB=-lhts HTS_INCLUDE=-I"${D}"/include/htslib

You are assigning environment variables, not assigning make variables. The assignments must come after make. Try:

make HDF5="noinstall" EIGEN="nofetch"

I have tested this with GNU make, not emake.

I am not familiar with the internals of hdf5 but do you really require threads support enabled?

HDF5 is not thread safe by default and nanopolish currently requires a threadsafe HDF5. This is one reason we download and compile it in the Makefile (the main reason is to make it easier for most users). The other options cxx, fortran, mpi are not needed by nanopolish (we use the C bindings).

Jared

@mmokrejs
Copy link
Contributor Author

mmokrejs commented Jul 15, 2017

the commit IDs are visible from the main page of the repo (fast5 @ e6e577c, htslib @ 8ca328a)

That still does not answer what release versions are needed. Or do you depend on pre-release snapshots?

You are assigning environment variables, not assigning make variables. The assignments must come after make.

I will have to figure out why threads AND cxx AND mpi are mutually exclusive during hdf5's configure calling.

Thank you for the note on env/args handling by make, I forgot to check how they are processed in Makefile.

Now I get due to the rm -rf hdf5* eigen htslib fast5 call:

>>> Compiling source in /scratch/var/tmp/portage/sci-biology/nanopolish-9999/work/nanopolish-9999 ...
make -j2 HDF5=noinstall EIGEN=nofetch HTS_LIB=-lhts HTS_INCLUDE=-I/scratch/var/tmp/portage/sci-biology/nanopolish-9999/image//include/htslib FAST5_INCLUDE=-I/scratch/var/tmp/portage/sci-biology/nanopolish-9999/image//include/fast5 
rm -f ./.depend
c++ -O2 -pipe -mpclmul -mpopcnt -march=native -ftree-vectorize -std=c++11 -fopenmp  -I/scratch/var/tmp/portage/sci-biology/nanopolish-9999/image//include/htslib -I/scratch/var/tmp/portage/sci-biology/nanopolish-9999/image//include/fast5 -I./src -I./src/hmm -I./src/thirdparty -I./src/common -I./src/alignment -MM  src/nanopolish_getmodel.cpp src/nanopolish_call_methylation.cpp src/training_core.cpp src/nanopolish_squiggle_read.cpp src/nanopolish_consensus.cpp src/nanopolish_methyltrain.cpp src/nanopolish_variant_db.cpp src/nanopolish_haplotype.cpp src/nanopolish_scorereads.cpp src/nanopolish_phase_reads.cpp src/nanopolish_extract.cpp src/nanopolish_poremodel.cpp src/nanopolish_call_variants.cpp src/nanopolish_train_poremodel_from_basecalls.cpp  src/hmm/nanopolish_transition_parameters.cpp src/hmm/nanopolish_duration_model.cpp src/hmm/nanopolish_profile_hmm_r9.cpp src/hmm/nanopolish_pore_model_set.cpp src/hmm/nanopolish_profile_hmm.cpp src/hmm/nanopolish_profile_hmm_r7.cpp    src/common/nanopolish_bam_processor.cpp src/common/logsum.cpp src/common/nanopolish_klcs.cpp src/common/nanopolish_model_names.cpp src/common/nanopolish_alphabet.cpp src/common/nanopolish_fast5_map.cpp src/common/nanopolish_iupac.cpp src/common/nanopolish_common.cpp src/common/nanopolish_variant.cpp src/common/nanopolish_bam_utils.cpp  src/alignment/nanopolish_eventalign.cpp src/alignment/nanopolish_anchor.cpp src/alignment/nanopolish_alignment_db.cpp      src/thirdparty/stdaln.c     > ./.depend;
src/nanopolish_methyltrain.cpp:45:32: fatal error: ../eigen/Eigen/Dense: No such file or directory
 #include "../eigen/Eigen/Dense"
                                ^
compilation terminated.
make: *** [Makefile:107: .depend] Error 1

So I cannot really drop the bundled sources.

@jts
Copy link
Owner

jts commented Jul 15, 2017

That still does not answer what release versions are needed. Or do you depend on pre-release snapshots?

I don't depend on pre-release snapshots but the git submodule system uses commit IDs so I use them. If you require a defined release I would expect that the next release after those commits would work.

#include "../eigen/Eigen/Dense"

I will fix this.

@jts
Copy link
Owner

jts commented Jul 15, 2017

6918f65 allows you to set the EIGEN include path using EIGEN_INCLUDE

@mmokrejs
Copy link
Contributor Author

That has helped, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants