Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding gem3-mapper to bioconda #9

Closed
karl616 opened this issue Jul 16, 2018 · 22 comments
Closed

Adding gem3-mapper to bioconda #9

karl616 opened this issue Jul 16, 2018 · 22 comments

Comments

@karl616
Copy link
Contributor

karl616 commented Jul 16, 2018

Hi,
I find gem3-mapper a very nice tool and would like to have access to it in bioconda. I have set up an recipe that works, but before I push it through I would be happy if I could have some feedback. And in order to keep it alive it would need a little bit of investment from your side.

This is the recipe:
karl616/bioconda-recipes@c6a0254

I'm quite sure that this works or at least isn't too far off, but there are a couple of things I would like improve/confirm before I go ahead.

  1. Bioconda recommends pointing to git releases rather than fixed commits, hence I went ahead with the latest candidate (v3.6). One problem I had here was how CUDA was handled. I know you have fixed this already why I don't think this problem will persist. As I cannot guarantee that all users have CUDA-compatible hardware I decided to disable it. This came with a problem in gpu_config.h and I decided to include a snippet from HEAD (https://github.com/karl616/bioconda-recipes/blob/c6a02543b1be99bc6b05ed3b4e48e777403ac609/recipes/gem3-mapper/build.sh#L12-L24). Without full understanding of gem3-mapper I would ask if this is correct?
  2. The second problem comes with submodules as they aren't included in the release archive. In the Makefile you handle a missing submodule by pulling it from the repository. The problem is that the archive isn't a repository and the build fails. My solution here is to create a complete archive (see Archives including submodules #8). The second step where I don't know a way around manual work is to create and include such an archive for each release candidate. For testing purposes I did this on my fork (https://github.com/karl616/gem3-mapper/releases/tag/v3.6). The recipe is currently pointing to this, but it would be nicer to point directly to your repository. This is would be your current "investment".

I think the former issue is easily solved by the next release candidate and with the inclusion of #8, the latter is done in 15min/release.

What do you think?

@achacond
Copy link
Contributor

Hi @karl616,
Thanks for all your support, very appreciated all your feedback.
GEM3 autodetects if you have all the CUDA SDK installation and checks all the requirements regarding GPU. If the configure detects that you are not able to compile and run GPU code, it generates a CPU-only version.
The point (1) is fully covered natively by the application, you don't need to do nothing specific.
Best,
Alex

@smarco
Copy link
Owner

smarco commented Jul 16, 2018

Hi @karl616,

What @achacond points out is correct, however not for the v3.6 tagged version. Some patches were pushed as to handle possible cases where some submodule was missing, the hardware was not compatible, etc.

I've pushed another tag (v3.6.1) with all the latest commits (also it was about time to do this). Can you try the process again, but now against this new tag? It should do the trick.
Then, if you need something else on top of this, don't hesitate and let me know.

Cheers,

@karl616
Copy link
Contributor Author

karl616 commented Jul 16, 2018

Hi @achacond hi @smarco,
you are welcome. I hope to benefit as well... :)

With regard to CUDA, it might well be that I made a mistake. Starting out I had problems similar to #5. I'm a bit unsure at the moment as I wasn't able to repeat my complications from yesterday. It might have been the missing submodule that played double tricks on me.

I will definitely, but that has to wait until tomorrow.

Thanks for the support.

@karl616
Copy link
Contributor Author

karl616 commented Jul 17, 2018

Hi @smarco,

the new release worked nicely. Thanks!

I made a pull request to bioconda: bioconda/bioconda-recipes#9937

They are asking for references and I'm citing the 2012 paper: https://www.nature.com/articles/nmeth.2221 and your entry on biotools (https://bio.tools/gemmapper). Is this correct? The latter could do with an update though. It is still pointing to sourceforge... :)

@smarco
Copy link
Owner

smarco commented Jul 19, 2018

Hi @karl616,

Thanks again for the effort to push gem3 into bioconda. Note that this reference https://bio.tools/gemmapper is quite old and we should try to use https://bio.tools/GEM_Mapper if possible.

Thanks,

@karl616
Copy link
Contributor Author

karl616 commented Jul 19, 2018 via email

@karl616
Copy link
Contributor Author

karl616 commented Jul 23, 2018

gem3-mapper is now a part of bioconda :)

With the bioconda repository activated it can be installed with:
conda install gem3-mapper
I'll close this issue.
Thanks for the help.

@karl616 karl616 closed this as completed Jul 23, 2018
@karl616
Copy link
Contributor Author

karl616 commented Jul 23, 2018

Hi @smarco,
I have a problem with the conda installed version and I suspect it has to do with missing dependencies. Both gem-mapper and gem-indexer crashes. This is the error I get from gem-indexer:

2018/7/23 17:48:30 -- [Inspecting MultiFASTA]
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>> GEM.System.Error::Signal raised (no=4) [errno=0,Success]
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

I can only trace it to the general error handling function, but not beyond this. Does it tell you something?

Best,
Karl

@karl616 karl616 reopened this Jul 23, 2018
@smarco
Copy link
Owner

smarco commented Jul 23, 2018 via email

@karl616
Copy link
Contributor Author

karl616 commented Jul 24, 2018 via email

@karl616
Copy link
Contributor Author

karl616 commented Jul 24, 2018

It should happen somewhere here, shouldn't it?

ticker_percentage_reset(&ticker,verbose,"Inspecting MultiFASTA",0,0,true); // Prepare ticket
uint64_t enc_text_length = 0;
while (true) {
// Get line
line_length = fm_getline(&line_buffer,&line_allocated,input_multifasta_file->file_manager);
if (line_length == -1) break;
// Account the line length
if (line_buffer[0] != FASTA_TAG_BEGIN) {
enc_text_length += line_length-1;
} else {
++enc_text_length; // Separator
}
}
++enc_text_length; // Separator
ticker_finish(&ticker);
if (line_buffer != NULL) free(line_buffer); // Free
// Configure RC generation
if (archive_builder->type!=archive_dna_forward) {
enc_text_length = 2*enc_text_length; // Add complement length
}
// Configure Bisulfite generation
if (archive_builder->type==archive_dna_bisulfite) {
enc_text_length = 2*enc_text_length;
}
++enc_text_length; // Add extra separator (Close text)
// Rewind input MULTIFASTA
fm_seek(input_multifasta_file->file_manager,0);
input_multifasta_file->line_no = 0;
// Log
tfprintf(gem_log_get_stream(),
"Inspected text %"PRIu64" characters (index_complement=%s). Requesting %"PRIu64" MB (encoded text)\n",

I have also found a way to reproduce it locally, or at least in a docker image, without having to rely on the bioconda build process...

My current suspicion is that the linking is bad and a wild guess is that it has something to do with libgomp. I'll see if I can get closer

@smarco
Copy link
Owner

smarco commented Jul 24, 2018 via email

@karl616
Copy link
Contributor Author

karl616 commented Jul 24, 2018

I'm not sure. It only happens when I build gem3-mapper through conda. If I build it on on my own computer it works as intended.

looking at the broken gem-indexer binary with ldd I get this:

# ldd $(which gem-indexer )
	/lib64/ld-linux-x86-64.so.2 (0x7f1cf2f0e000)
	libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x7f1cf2f0e000)
	libm.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7f1cf2f0e000)
	librt.so.1 => /lib64/ld-linux-x86-64.so.2 (0x7f1cf2f0e000)
	libz.so.1 => /usr/local/bin/../lib/libz.so.1 (0x7f1cf2cf1000)
	libbz2.so.1.0 => /usr/local/bin/../lib/libbz2.so.1.0 (0x7f1cf2ae1000)
	libgomp.so.1 => /usr/local/bin/../lib/libgomp.so.1 (0x7f1cf28be000)
	libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7f1cf2f0e000)
	libdl.so.2 => /lib64/ld-linux-x86-64.so.2 (0x7f1cf2f0e000)
Error relocating /usr/local/bin/../lib/libgomp.so.1: pthread_attr_setaffinity_np: symbol not found
Error relocating /usr/local/bin/gem-indexer: backtrace_symbols_fd: symbol not found
Error relocating /usr/local/bin/gem-indexer: backtrace: symbol not found

The first error is why I think libgomp is the problem.

@smarco
Copy link
Owner

smarco commented Jul 24, 2018

I've tried using the build through conda and it worked in my case:

> gem-indexer -i ../data/chr1.fa -o chr1
2018/7/24 21:33:26 -- [Inspecting MultiFASTA]
2018/7/24 21:33:28 --  100% ... done [2.382 s]
2018/7/24 21:33:28 -- Inspected text 498501247 characters (index_complement=yes). Requesting 475 MB (encoded text)
2018/7/24 21:33:28 -- [Reading MultiFASTA]
2018/7/24 21:33:30 --  100000000 bases parsed
2018/7/24 21:33:31 --  200000000 bases parsed
2018/7/24 21:33:32 -- Total 254235634 bases parsed ...done [3.222 s]
2018/7/24 21:33:32 -- [Generating Text (explicit Reverse-Complement)]
2018/7/24 21:33:32 --  100% ... done [0.535 s]
2018/7/24 21:33:32 -- [Generating BWT Forward-Text]
2018/7/24 21:33:32 -- [Building-BWT::Counting K-mers]
2018/7/24 21:33:34 --  100% ... done [1.457 s]
2018/7/24 21:33:34 -- [Building-BWT::Generating SA-Positions]
2018/7/24 21:33:34 --    2% 

In my case, I'm not missing any library:

> ldd $(which gem-indexer)
	linux-vdso.so.1 =>  (0x00007ffd3d9da000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f946c6ab000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f946c3a2000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f946c19a000)
	libz.so.1 => /home/smarco/miniconda2/envs/ddocent_env/bin/../lib/libz.so.1 (0x00007f946bf7d000)
	libbz2.so.1.0 => /home/smarco/miniconda2/envs/ddocent_env/bin/../lib/libbz2.so.1.0 (0x00007f946bd6d000)
	libgomp.so.1 => /home/smarco/miniconda2/envs/ddocent_env/bin/../lib/libgomp.so.1 (0x00007f946bb4a000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f946b780000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f946c8c8000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f946b57c000)

In any case, current version of gem doesn't rely on the openMP library nor it's using it. It's being linked by mistake (as it was used in older versions) and the linking remains there for no reason. I'll remove it in the next push.

Besides, the part of the code were you are looking is just parsing. Thus I believe the error should not be related to the OMP lib. Can you give me access to the input .fa file and maybe the broken binary?

Thanks,

@karl616
Copy link
Contributor Author

karl616 commented Jul 25, 2018

It's good it works for you. The last thing I did yesterday was to remove openmp from the dependencies. Here is a copy of the binary:

gem-indexer.gz

And as for sequence, I'm able to replicate it with something like this:

echo -e ">seq\nATATAGGGTATAGATA" > test.fa
gem-indexer -i test.fa -o test

Compiled locally the behavior is how it should, but with the bioconda version it fails.

@smarco
Copy link
Owner

smarco commented Jul 25, 2018

I've tried your binary and input.

> ./gem-indexer -i test.fa -o test
2018/7/25 16:51:27 -- [Inspecting MultiFASTA]
2018/7/25 16:51:27 --  100% ... done [0.000 s]
2018/7/25 16:51:27 -- Inspected text 37 characters (index_complement=yes). Requesting 0 MB (encoded text)
2018/7/25 16:51:27 -- [Reading MultiFASTA]
2018/7/25 16:51:27 -- Total 17 bases parsed ...done [0.000 s]
2018/7/25 16:51:27 -- [Generating Text (explicit Reverse-Complement)]
2018/7/25 16:51:27 --  100% ... done [0.000 s]
2018/7/25 16:51:27 -- [Generating BWT Forward-Text]
2018/7/25 16:51:27 -- [Building-BWT::Counting K-mers]
2018/7/25 16:51:28 --  100% ... done [0.179 s]
2018/7/25 16:51:28 -- [Building-BWT::Generating SA-Positions]
2018/7/25 16:51:28 --  100% ... done [0.000 s]
2018/7/25 16:51:28 -- [Building-BWT::Sorting SA]

Can you give me more information about your system specs (both SO and hardware)?

@karl616
Copy link
Contributor Author

karl616 commented Jul 25, 2018

Yes it has to be something with my system(s). I have four systems were I have tried it, three that fails and common to all of them is that they are a bit older. I had a discussion about gemBS and the -march-native flag was mentioned. These are my testing systems:

system1: Intel Xeon E5-2667 with CentOS 6.7 (fail)
system2: Intel Core i5-3570K with Fedora 28 (fail)
system3: Intex Xeon E5-2670 with Debian 7.11 (fail)
system4: Intel Core Skylake (cloud) with CentOS 7.5.1804 (works)

Is that enough info?

I double-checked the checksum of the binary... it is the same on all systems.

If it has to do with the hardware, that explains why it works when I compiled it locally...

@heathsc
Copy link
Collaborator

heathsc commented Jul 25, 2018 via email

@karl616
Copy link
Contributor Author

karl616 commented Jul 25, 2018

This fits well to what I see. strace comes up with a SIGILL.
My initial attempt was to change -march=native to '-march=x86-64 -mtune=generic'
But then I should also change -Ofast to -O3?

@heathsc
Copy link
Collaborator

heathsc commented Jul 25, 2018 via email

@karl616
Copy link
Contributor Author

karl616 commented Jul 25, 2018

Then I change that as well.
This was it, the conda installation now works on my system as well... and I have learned to think of them as old.

@karl616
Copy link
Contributor Author

karl616 commented Jul 27, 2018

OK, gem3-mapper is installed and works on my system. I'll close this issue again.

@karl616 karl616 closed this as completed Jul 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants