-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add biom2 datatype #4519
add biom2 datatype #4519
Conversation
for extracting metadata from biom2/hdf5 datatypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @shiltemann!
lib/galaxy/datatypes/binary.py
Outdated
|
||
def set_peek(self, dataset, is_multi_byte=False): | ||
if not dataset.dataset.purged: | ||
dataset.peek = "Biom2 (HDF5) file" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add here a little bit metadata as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
@@ -32,6 +32,7 @@ six==1.10.0 | |||
Whoosh==2.7.4 | |||
testfixtures==4.10.0 | |||
galaxy_sequence_utils==1.0.2 | |||
h5py==2.7.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a wheel for this on cargo-port. A start might be here: galaxyproject/starforge#145
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems there was also already another PR for this too from a while back? galaxyproject/starforge#44 ..what's the best way forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wheels are built upstream and are available in PyPI so I don't think anything else should be needed here. I'll copy them to wheels.galaxyproject.org so we have them in case they disappear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well update to 2.7.1
while you're in here.
lib/galaxy/datatypes/binary.py
Outdated
elif 'format' in attributes: # biom 2.0 | ||
dataset.metadata.format = attributes['format'] | ||
dataset.metadata.type = attributes['type'] | ||
dataset.metadata.shape = str(list(attributes['shape'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this supposed to be a tuple? Like it is here the brackets will end up in the metadata as well, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, but my concern was that anything that's not a string was not being displayed in the GUI when looking at the metadata, and users might quite like to see that information. But sounds like that should proably be fixed elsewhere then (same for nnz, it is an int but doesn't show up in interface unless I cast it to string)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @shiltemann! Just one issue with the merge error in data.py
that crept in from dev. I am not a datatype expert but this looks good to me.
lib/galaxy/datatypes/binary.py
Outdated
import tempfile | ||
import zipfile | ||
from json import dumps | ||
import h5py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically for PEP-8 this should be:
import zipfile
from json import dumps
import h5py
import pysam
from bx.seq.twobit import TWOBIT_MAGIC_NUMBER, TWOBIT_MAGIC_NUMBER_SWAP, TWOBIT_MAGIC_SIZE
lib/galaxy/datatypes/data.py
Outdated
@@ -369,7 +369,6 @@ def display_data(self, trans, data, preview=False, filename=None, to_ext=None, * | |||
if os.path.exists(file_path): | |||
if os.path.isdir(file_path): | |||
tmp_fh = tempfile.NamedTemporaryFile(delete=False) | |||
tmp_file_name = tmp_fh.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was due to a bad merge elsewhere in dev, it shouldn't be deleted. It'll fix the failing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah gotcha, reverted it, thanks :)
@@ -32,6 +32,7 @@ six==1.10.0 | |||
Whoosh==2.7.4 | |||
testfixtures==4.10.0 | |||
galaxy_sequence_utils==1.0.2 | |||
h5py==2.7.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well update to 2.7.1
while you're in here.
This reverts commit 4119caf.
Looks good! Travis osx boxes look hung, but everything else is passing nicely. |
Hi all,
this adds the biom2 (hdf5-formatted) metagenomics datatype (http://biom-format.org/documentation/biom_format.html)
also adds converters to and from the biom1 (json-formatted) datatype