Skip to content

Conversation

@anton-molyboha
Copy link

It looks like sci-biology/fastqc-0.11.3 is supposed to install some java *.class files but fails to do so.

'emerge sci-biology/fastqc' completes without an error, but when I try running fastqc with any meaningful options, I am getting:
Error: Could not find or load main class uk.ac.babraham.FastQC.FastQCApplication

The ebuild's src_install() only installs the wrapper scripts and docs, and has a comment:

dobin fastqc run_fastqc.bat
dodoc README.txt RELEASE_NOTES.txt

# There is no fastqc.jar.  The output from the compilation is the set of
# .class files (a jar file is just a zip file full of .class files).  All
# you need to copy out is the contents of the bin subdirectory, the rest of
# the download you can discard.
[... snip the discussion of where to get dependencies]

I am guessing that the .class files mentioned in the comment should also be installed somewhere (not really sure where, though) If my guess is true, I will come up with a work-around and will post it here.

@anton-molyboha
Copy link
Author

This version works for me but has some issues:

  1. The binaries directory is hardcoded as /usr/bin
    I see other ebuilds using ${bindir} variable, but I could not find its documentation in devmanual.gentoo.org and don't have a good guess for what exactly it does.
  2. The bundled dependencies are, probably, used in place of the latest versions of the libraries available in portage.
    The comment in the original ebuild suggests ignoring the bundled "sam-1.103.jar" and "cisd-jhdf5.jar" and using sci-biology/picard and sci-libs/jhdf5 instead - which makes sense. This ebuild does not delete the bundled jars and they do get installed.
  3. Everything is installed into /opt/${PN}
    The main executable, fastqc, is a perl script which finds the accompanying java code relative to the executable's location. This means we cannot (easily) separate the executable into a bin directory and put the java code wherever java code belongs.

@mmokrejs
Copy link
Contributor

mmokrejs commented Nov 29, 2017

Hi,
I think I am the one who commited this lousy ebuild file, I am sorry for that. I was hoping I will manage but my java-knowledge is zero. Anyway, I was in contact with original author when 0.11.2 version was out in 2015. I received from Simon Andrews somewhat tested build.xml.txt for Apache ant but I later bumped into other issues with jhdf5. I would be happy if somebody managed to get this package functional.

Here are bits of our communication with Simon:

We don't actually have an automated build system for fastqc.  Our
development is all done within the eclipse IDE which does the compilation
internally (you don't need to supply a build file as long as you're not
doing anything clever - which we're not).

Eclipse can actually generate an ant build file, so I've made one and have
attached it to this mail.  Ant isn't a tool I've used so I've not tested
this, but I think other people have used these to build the system
themselves before.

I see there is build.xml in current fastqc_v0.11.5_source.zip so the one attached is probably outdated. Here I attache build.log.txt file from Simon showing how the compilation went for him.

You can't move the launcher scripts form their location in the folder you
extracted them from.  As per the install instructions, if you want to put
the program into an existing directory in the path you need to create a
symlink from there to the original FastQC folder, don't move any of the
files.

Assuming you have the fastqc.zip file in /opt and you want to put the
program in /usr/bin then the process would be:

unzip fastqc.zip
chmod 755 FastQC/fastqc
ln -s /opt/FastQC/fastqc /usr/bin/fastqc

The program will now be in your path under /usr/bin/fastqc but should work
correctly.
There is no fastqc.jar.  The output from the compilation is the set of
.class files (a jar file is just a zip file full of .class files).  All
you need to copy out is the contents of the bin subdirectory, the rest of
the download you can discard.

For the other dependencies if you wanted to go back to source you'd need
to compile all of the jar files which are bundled with fastqc.

The cisd-jhdf5 library comes from
https://wiki-bsse.ethz.ch/pages/viewpage.action?pageId=26609113 and is
needed if you want to analyse fast5 files (the kind which come from
nanopore sequencers).

The jbzip2 library comes from https://code.google.com/p/jbzip2/ and is
needed for working with bzip compressed files.

The sam library comes from
http://sourceforge.net/projects/picard/files/sam-jdk/.  Note that there is
a newer version of this codebase at https://github.com/samtools/htsjdk but
that FastQC is NOT yet compatible with the updated API (this will probably
happen in a future release).  This library is needed to read SAM/BAM
format files.

> Would you mind if the build.xml created the jar file automagically out of
> the
> compile class files? Please. ;)

FastQC doesn't compile to a jar.  It's designed to run from a controlled
folder structure at the class level.
>> The cisd-jhdf5 library comes from
>> https://wiki-bsse.ethz.ch/pages/viewpage.action?pageId=26609113 and is
>> needed if you want to analyse fast5 files (the kind which come from
>> nanopore sequencers).
>
> OK, Gentoo has only package for http://www.hdfgroup.org/HDF5 so I will
> make
> one for the swiss library.

The library we have is a java API for the generic HDF5 library.  You'll
probably find that compiling the cisd-jhdf5 library will require at least
the headers from the existing gentoo package.
>> The sam library comes from
>> http://sourceforge.net/projects/picard/files/sam-jdk/.  Note that there
>> is
>> a newer version of this codebase at https://github.com/samtools/htsjdk
>> but
>> that FastQC is NOT yet compatible with the updated API (this will
>> probably
>> happen in a future release).  This library is needed to read SAM/BAM
>> format files.
>
> We have package for this already, currently have 1.103 version. So what
> is the last compatible version fastqc will work with?

It should work with anything prior to the shift to the htsjdk codebase.
They broke API compatibility at that point.
If you need to do a complete source compile to package for a distribution
then you'll probably have to put the bundled jars in separate packages and
then make them dependencies since we don't ship the source with FastQC
itself.  All of the bundled jars come from other open source projects and
should be fairly easy to build separately.

If you wanted to have those put separately and have more control over
where the files for FastQC itself goes then you might need to edit the
launcher script to manually specify the locations of the jar files rather
than have the program figure this out for itself.  This will simply
require you to create a suitable CLASSPATH variable to pass to the final
java command so that all of the required jar files can be found.

If it's any help I know that someone did a build for Debian/Ubuntu so you
could see how they did it.  I don't know much about the packaging rules
for Gentoo so I can't give very specific help, but if you can tell me
where the various components need to go I can give you some more specific
edits for the launcher to make things work.
> How does one install the package?
> https://wiki-bsse.ethz.ch/display/JHDF5/Documentation+page is just crap. The is neither Makefile,
> build.xml nor the maven definition file.
>
> The src/sis-jhdf5-src.zip unpacks between others META-INF/MANIFEST.MF:
> Manifest-Version: 1.0
> Ant-Version: Apache Ant 1.8.2
> Created-By: 1.7.0_25-b15 (Oracle Corporation)
> Version: 14.12.1
> Build-Number: 14.12.1 (r33502,clean)
>
> I never wrote a single line in java so I am not sure if I am willing to
> pursue
> all of this.

I've no idea - I've never tried compiling that from source and, as you
say, the install instructions are non-existent.  I'm afraid you'd need to
get information from the library authors if you want to pursue this.
There's no set of standard things to try - especially for a library like
this which needs to link to native libraries to work.

Thank you for pursuing packaging it for Gentoo.

@anton-molyboha
Copy link
Author

Thanks a lot for your reply - and for the comments you left in the e-build file, they are super helpful! I did work with Java before and the words you say and quote make sense to me. Will need some help with the ebuild writing, though.

I'm a bit busy at work right now, so as time permits the plan is:

  • find a way to test the two bundled libraries
  • attempt to delete them and see if the package is able to pick up the ones installed as dependencies
  • ask for help with the ebuild-writing questions I will have by that time
  • ping here again with the results.

@TheChymera
Copy link
Collaborator

@mmokrejs @anton-molyboha are you still interested in maintaining this? We recently fixed up the overlay and fixed 6k+ QA errors, and are looking to improve the quality overall. As I do not work with Java either, I would be willing to help and learn if you would get involved as well, but if the onus falls fully on me, I think we would need to stop providing the package.

@anton-molyboha
Copy link
Author

I am willing to bring this package into a "works for me" state and update the pull request. What else can I do to help maintain it?

@epsilon-0
Copy link
Contributor

I am willing to bring this package into a "works for me" state and update the pull request. What else can I do to help maintain it?

If you could also enable tests for the package and get them working, that would be highly appreciated.

@TheChymera
Copy link
Collaborator

@anton-molyboha providing your email address in metadata.xml in the maintainer section (like so) would also help a lot, so that if there are issues we have a better account of whom to reach out to. It also signals that somebody is serious about needing/maintaining this, so if there ever is another cleanup effort like the recent one, you'd get notified before we delete anything.

@anton-molyboha
Copy link
Author

Thanks for the feedback. I think I will be able to get to it at the end of the week, and will update with the progress by Tuesday. Let me know if there is any timeline related to this (if I'm blocking somebody, or if there is a scheduled release that would be nice to hit, etc)

@anton-molyboha
Copy link
Author

I have updated the Pull Request to a works-for-me state.

Testing done:

  • ebuild fastqc-0.11.3.ebuild install completed successfully
  • I can run fastqc in GUI mode, open an input *.fastq file, and observe an output.
  • I don't have the expertise to judge if the output is correct, but it has all the advertised components.

I suggest that making an automated test be a separate PR.

@anton-molyboha
Copy link
Author

From a conversation with epsilonKNOT on #gentoo-science, several suggestions:

  1. update to EAPI 7
  2. update links to https
  3. add keywords "~amd64"
  4. remove empty IUSE
  5. check if ant can be removed as an explicit dependency (it is probably implied by java-ant-2 eclass)
  6. remove the symlink to the executable and instead add its location to PATH. https://github.com/gentoo/sci/blob/master/dev-util/Hermes/files/00hermes can be used as an example for this.

@mmokrejs
Copy link
Contributor

Hi @anton-molyboha Feel free to take over maintainership of this or even other packages I worked on in the past in Gentoo, I am too busy these days. Good luick!

@anton-molyboha
Copy link
Author

Thanks, @mmokrejs

@anton-molyboha
Copy link
Author

A new version of the PR!

  • Accounts for comments from epsilon-0
  • repoman does not complain
  • travis-ci does not complain
  • works-for-me from a clean install into a Prefix

There are still issues re missing tests and two libraries being bundled by the upstream (instead of us using the Gentoo versions, which would probably be better) I would prefer to look at these issues separately.

@epsilon-0
Copy link
Contributor

@anton-molyboha
please add ~amd64 keywords to sci-libs/jhdf5 as well (if you would take that package that would be a nice bonus, but no pressure).

RDEPEND="${DEPEND}
RDEPEND="
dev-lang/perl
>=virtual/jre-1.5:*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the jdk/jre requirement, the eclasses ensure that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right! Will remove them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I remove jdk from the dependencies, I am getting

QA Notice: Package is using java-ant, but doesn't depend on a Java VM
!!! ERROR: Couldn't find a VM dep
!!! ERROR: Couldn't find a VM dep
 * Could not find valid -source/-target values

and the build fails. I guess, an explicit jdk dependency is still necessary.


src_prepare(){
cp "${FILESDIR}"/build.xml . || die
eapply_user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do default, which also includes this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, thanks.

doins -r bin
chmod a+x "${ED}/opt/${PN}/bin/fastqc"
# Add the package's bin directory to the PATH.
echo "PATH=\"${EPREFIX}/opt/${PN}/bin\"" > "${T}/00fastqc" || die
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to create this file in files/00fastqc instead of doing an echo.
Don't need the ${EPREFIX}.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the ${EPREFIX} it would not work in a Prefix environment, would it?

Copy link
Contributor

@epsilon-0 epsilon-0 Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env-update handles the creating of $PATH from env.d files, it manages the ${EPREFIX} too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experiments: env-update is called automatically, but does not manage ${EPREFIX}. Instead, one needs to call hprefixify defined by the prefix eclass. Example: baselayout-2.7.ebuild, line 187

dev-lang/perl
>=virtual/jre-1.5:*
"
DEPEND="$RDEPEND
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit add {} to make it ${RDEPEND}.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, thanks.


dodoc README.txt RELEASE_NOTES.txt

# There is no fastqc.jar. The output from the compilation is the set of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I know that this is not your fault but could this message be cleared up? Do we still need it?
If it is actually important and needs action by the user, can you make this into a message that is given to the user in pkg_postinst?
Why do we not have a dependency on these packages sci-libs/jhdf5 and sci-biology/picard? You seem to have dropped sci-libs/jhdf5 in the update?
Give the fact that we don't have tests, we manually need to make sure we are installing the proper dependencies.
Seeing that you haven't made a version bump, possibly upstream has not made any new releases, so it is unlikely that we are now magically compatible with the newer version of the codebase. Is it possible (though unlikely) that upstream has changed to a new url?
I know these are a lot of questions but unfortunately with the lack of tests the only way I can know anything is by voicing it out :(

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • By "this message" you mean the long comment below the highlighted code? Yes, I think it can be cleared up. It was a note for a future maintainner (me) from the previous maintainer (mmokrejs).
  • I have dropped the dependency on sci-libs/jhdf5 and sci-biology/picard because they are bundled with the package itself, so that fastqc is using its own copy of them.
  • Yes, to "manually make sure we are installing the proper dependencies" I install the package into a Prefix environment and check that it runs for a sample input. For jhdf5 and picard, I also see that their .jar files are indeed included as promised, and get installed with the package, and that the main program adds them to the java's CLASSPATH.
  • I believe upstream did make new releases, but I am trying to follow the motto of "make many small changes, rather than one huge one". Small changes are easier to review, and easier to not make mistakes in. I promise I will make a version bump separately.
  • "Voicing it out" works for me! Really appreciate it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing though, they are using a perl script to invoke a java binary...
Absolutely amazing...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anton-molyboha Indeed, 0.11.9 is out. The HDF5 is necessary to work with Oxford nanopore FAST5 files, IMO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmokrejs Thanks. It's a moon-shot, but do you happen, by any chance, have an example of a FAST5 file I could use for testing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a .sam file...

Copy link
Contributor

@mmokrejs mmokrejs Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to fetch some from NCBI SRA. Maybe by default you get only FASTQ format but it should be possible to fetch some using sra-toolkit to obtain the original FAST5-formatted data.

https://www.ncbi.nlm.nih.gov//sra?term=(nanopore)%20NOT%20cluster_dbgap%5BPROP%5D

Here is another FAST5 set: https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP166020

Alternatively, here you have plenty demo human data:
https://github.com/nanopore-wgs-consortium/NA12878

Some more background info:
https://simpsonlab.github.io/2017/02/27/packing_fast5/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, to download from NCBI SRA I would have to fix the sra-toolkit package first, which is also broken! I will try the nanopore-wgs-consortium link you gave.

@anton-molyboha
Copy link
Author

@epsilon-0 Do you want to take a look at the updated PR?

On the other hand, in my testing, .fast5 files don't work (thanks, mmokrejs, for the test data) Two options make sense to me:

  • Merge this PR (if everything else looks good) and argue that having support for .fastq files (they are more commonly used than .fast5 in my understanding) is still better than nothing.
  • Give up on this ebuild and start working on a new one for the latest version.

@epsilon-0
Copy link
Contributor

Lets merge it, it looks good enough to me (given the horrible state it was in previously).
But do try and make a PR for the updated version.

@epsilon-0
Copy link
Contributor

Squash all your commits into one commit and add the sign-off to that.
Will merge then 👍

@anton-molyboha
Copy link
Author

I think Travis CI is failing because of the scikits_learn rename... It started when I rebased my changes onto the latest HEAD, and the dependency errors all mention scikits_learn. I see people are working on it right now, so I guess I'll just wait until it is resolved.

@epsilon-0
Copy link
Contributor

Yes, the update happened yesterday, we have A LOT of packages depending on scikit-learn.
The change I made for scikit-learn is gonna leave some deep scars if not fixed soon in the depending overlays.

Based on local testing, the program now works for .fastq files, but does
not work for .fast5 files. This is also not the latest version. However,
this is a step forward from "not working at all". The issues will need
to be solved in the future.

Signed-off-by: Anton Molyboha <anton.stay.connected@gmail.com>
@epsilon-0
Copy link
Contributor

thanks for the fix 😸

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants