Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add biom2 datatype #4519

Merged
merged 18 commits into from
Sep 7, 2017
Merged

add biom2 datatype #4519

merged 18 commits into from
Sep 7, 2017

Conversation

shiltemann
Copy link
Member

Hi all,

this adds the biom2 (hdf5-formatted) metagenomics datatype (http://biom-format.org/documentation/biom_format.html)

also adds converters to and from the biom1 (json-formatted) datatype

@galaxybot galaxybot added this to the 17.09 milestone Aug 30, 2017
Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shiltemann!


def set_peek(self, dataset, is_multi_byte=False):
if not dataset.dataset.purged:
dataset.peek = "Biom2 (HDF5) file"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add here a little bit metadata as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

@@ -32,6 +32,7 @@ six==1.10.0
Whoosh==2.7.4
testfixtures==4.10.0
galaxy_sequence_utils==1.0.2
h5py==2.7.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a wheel for this on cargo-port. A start might be here: galaxyproject/starforge#145

Copy link
Member Author

@shiltemann shiltemann Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems there was also already another PR for this too from a while back? galaxyproject/starforge#44 ..what's the best way forward?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wheels are built upstream and are available in PyPI so I don't think anything else should be needed here. I'll copy them to wheels.galaxyproject.org so we have them in case they disappear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well update to 2.7.1 while you're in here.

elif 'format' in attributes: # biom 2.0
dataset.metadata.format = attributes['format']
dataset.metadata.type = attributes['type']
dataset.metadata.shape = str(list(attributes['shape']))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this supposed to be a tuple? Like it is here the brackets will end up in the metadata as well, isn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, but my concern was that anything that's not a string was not being displayed in the GUI when looking at the metadata, and users might quite like to see that information. But sounds like that should proably be fixed elsewhere then (same for nnz, it is an int but doesn't show up in interface unless I cast it to string)

Copy link
Member

@natefoo natefoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shiltemann! Just one issue with the merge error in data.py that crept in from dev. I am not a datatype expert but this looks good to me.

import tempfile
import zipfile
from json import dumps
import h5py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically for PEP-8 this should be:

import zipfile
from json import dumps

import h5py
import pysam
from bx.seq.twobit import TWOBIT_MAGIC_NUMBER, TWOBIT_MAGIC_NUMBER_SWAP, TWOBIT_MAGIC_SIZE

@@ -369,7 +369,6 @@ def display_data(self, trans, data, preview=False, filename=None, to_ext=None, *
if os.path.exists(file_path):
if os.path.isdir(file_path):
tmp_fh = tempfile.NamedTemporaryFile(delete=False)
tmp_file_name = tmp_fh.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was due to a bad merge elsewhere in dev, it shouldn't be deleted. It'll fix the failing tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah gotcha, reverted it, thanks :)

@@ -32,6 +32,7 @@ six==1.10.0
Whoosh==2.7.4
testfixtures==4.10.0
galaxy_sequence_utils==1.0.2
h5py==2.7.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well update to 2.7.1 while you're in here.

@dannon
Copy link
Member

dannon commented Sep 7, 2017

Looks good! Travis osx boxes look hung, but everything else is passing nicely.

@dannon dannon merged commit a8df872 into galaxyproject:dev Sep 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants