New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
small updates to Mothur datatypes #2429
Conversation
for large group files this would not find all groups
@@ -142,7 +150,7 @@ def set_meta(self, dataset, overwrite=True, skip=1, max_data_lines=100000, **kwd | |||
comment_lines = 0 | |||
ncols = 0 | |||
|
|||
headers = get_headers(dataset.file_name, sep='\t', count=max_data_lines) | |||
headers = get_headers(dataset.file_name, sep='\t', count=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large are the largest files, do we really need to look at every line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the group files and a few others at least yes ..these files never get crazy big, but we need to detect all the groups because many of the mothur tools use this metadata to let you select which groups to perform your analysis on.
I'll go over the file once more to see for which datatypes this is really necessary and for which perhaps not so important :)
|
||
def __init__(self, **kwd): | ||
Text.__init__(self, **kwd) | ||
|
||
def set_meta(self, dataset, overwrite=True, **kwd): | ||
if dataset.has_data(): | ||
label_names = set() | ||
otulabel_names = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you set it to list you don't need to cast it later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe there are duplicate headers added below? (that would be eliminated by set
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed
151228c
to
61589b3
Compare
👍 |
data_lines += 1 | ||
dataset.metadata.comment_lines = 1 | ||
dataset.metadata.data_lines = data_lines - 1 if data_lines > 0 else 0 | ||
Tabular.set_meta(self, dataset, overwrite=overwrite, **kwd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super(AlignCheck, self).set_meta(dataset, overwrite=overwrite, **kwd)
👍 |
thanks @shiltemann @bgruening ! 🎉 |
xRef: follow-up of PR #2038 for the mothur tool suite galaxyproject/tools-iuc#671