Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

small updates to Mothur datatypes #2429

Merged
merged 5 commits into from Jun 3, 2016

Conversation

shiltemann
Copy link
Member

  • adds mothur.tre (subclass) datatype
  • when setting metadata, scan the entire file (for example for large group files not all group names were detected)
  • add otunames metadata to mothur.otu datatype

xRef: follow-up of PR #2038 for the mothur tool suite galaxyproject/tools-iuc#671

@galaxybot galaxybot added this to the 16.07 milestone May 30, 2016
@@ -142,7 +150,7 @@ def set_meta(self, dataset, overwrite=True, skip=1, max_data_lines=100000, **kwd
comment_lines = 0
ncols = 0

headers = get_headers(dataset.file_name, sep='\t', count=max_data_lines)
headers = get_headers(dataset.file_name, sep='\t', count=-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large are the largest files, do we really need to look at every line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the group files and a few others at least yes ..these files never get crazy big, but we need to detect all the groups because many of the mothur tools use this metadata to let you select which groups to perform your analysis on.

I'll go over the file once more to see for which datatypes this is really necessary and for which perhaps not so important :)


def __init__(self, **kwd):
Text.__init__(self, **kwd)

def set_meta(self, dataset, overwrite=True, **kwd):
if dataset.has_data():
label_names = set()
otulabel_names = set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you set it to list you don't need to cast it later on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe there are duplicate headers added below? (that would be eliminated by set)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed

@martenson
Copy link
Member

👍

data_lines += 1
dataset.metadata.comment_lines = 1
dataset.metadata.data_lines = data_lines - 1 if data_lines > 0 else 0
Tabular.set_meta(self, dataset, overwrite=overwrite, **kwd)
Copy link
Member

@nsoranzo nsoranzo Jun 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super(AlignCheck, self).set_meta(dataset, overwrite=overwrite, **kwd)

@bgruening
Copy link
Member

👍

@bgruening bgruening merged commit 30fd1d6 into galaxyproject:dev Jun 3, 2016
@martenson
Copy link
Member

thanks @shiltemann @bgruening ! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants