Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

small updates to Mothur datatypes #2429

Merged
merged 5 commits into from Jun 3, 2016

Conversation

Projects
None yet
6 participants
@shiltemann
Copy link
Member

commented May 30, 2016

  • adds mothur.tre (subclass) datatype
  • when setting metadata, scan the entire file (for example for large group files not all group names were detected)
  • add otunames metadata to mothur.otu datatype

xRef: follow-up of PR #2038 for the mothur tool suite galaxyproject/tools-iuc#671

shiltemann added some commits May 23, 2016

scan entire file when setting metadata
for large group files this would not find all groups

@galaxybot galaxybot added the triage label May 30, 2016

@galaxybot galaxybot added this to the 16.07 milestone May 30, 2016

@@ -142,7 +150,7 @@ def set_meta(self, dataset, overwrite=True, skip=1, max_data_lines=100000, **kwd
comment_lines = 0
ncols = 0

headers = get_headers(dataset.file_name, sep='\t', count=max_data_lines)
headers = get_headers(dataset.file_name, sep='\t', count=-1)

This comment has been minimized.

Copy link
@bgruening

bgruening May 30, 2016

Member

How large are the largest files, do we really need to look at every line?

This comment has been minimized.

Copy link
@shiltemann

shiltemann May 30, 2016

Author Member

for the group files and a few others at least yes ..these files never get crazy big, but we need to detect all the groups because many of the mothur tools use this metadata to let you select which groups to perform your analysis on.

I'll go over the file once more to see for which datatypes this is really necessary and for which perhaps not so important :)


def __init__(self, **kwd):
Text.__init__(self, **kwd)

def set_meta(self, dataset, overwrite=True, **kwd):
if dataset.has_data():
label_names = set()
otulabel_names = set()

This comment has been minimized.

Copy link
@yhoogstrate

yhoogstrate May 31, 2016

Member

if you set it to list you don't need to cast it later on

This comment has been minimized.

Copy link
@martenson

martenson May 31, 2016

Member

maybe there are duplicate headers added below? (that would be eliminated by set)

This comment has been minimized.

Copy link
@shiltemann

shiltemann Jun 1, 2016

Author Member

indeed

@shiltemann shiltemann force-pushed the shiltemann:mothur_datatypes branch from 151228c to 61589b3 May 31, 2016

@martenson

This comment has been minimized.

Copy link
Member

commented Jun 1, 2016

👍

data_lines += 1
dataset.metadata.comment_lines = 1
dataset.metadata.data_lines = data_lines - 1 if data_lines > 0 else 0
Tabular.set_meta(self, dataset, overwrite=overwrite, **kwd)

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Jun 1, 2016

Member

super(AlignCheck, self).set_meta(dataset, overwrite=overwrite, **kwd)

@@ -717,19 +722,17 @@ def __init__(self, **kwd):
self.column_names = ['name', 'total']

def set_meta(self, dataset, overwrite=True, skip=1, max_data_lines=None, **kwd):
data_lines = 0
headers = get_headers(dataset.file_name, sep='\t', count=-1)
Tabular.set_meta(self, dataset, overwrite=overwrite, **kwd)

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Jun 1, 2016

Member

super(CountTable, self).set_meta(self, dataset, overwrite=overwrite, **kwd)

This comment has been minimized.

Copy link
@shiltemann

shiltemann Jun 2, 2016

Author Member

@nsoranzo thanks :) ..however, if I do that I get the following error: TypeError: set_meta() got multiple values for keyword argument 'overwrite'when I upload my file ..what is the best way to handle this?

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Jun 2, 2016

Member

Ops, sorry, self should be removed from the arguments:
super(CountTable, self).set_meta(dataset, overwrite=overwrite, **kwd)

This comment has been minimized.

Copy link
@shiltemann

shiltemann Jun 2, 2016

Author Member

grazie!

@bgruening

This comment has been minimized.

Copy link
Member

commented Jun 3, 2016

👍

@bgruening bgruening merged commit 30fd1d6 into galaxyproject:dev Jun 3, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@martenson

This comment has been minimized.

Copy link
Member

commented Jun 3, 2016

thanks @shiltemann @bgruening ! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.