-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Golay error decoding for the EMP protocol #97
Conversation
One more note, without the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick pass today, food for thought. I will do a more in-depth review soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lookin' good @wasade, made one more pass. Next time through I want to take a closer look at tests. Otherwise looking good! Thanks! 💯
@@ -57,3 +57,19 @@ class EMPPairedEndCasavaDirFmt(model.DirectoryFormat): | |||
|
|||
barcodes = model.File( | |||
r'Undetermined_S0_L001_I1_001.fastq.gz', format=FastqGzFormat) | |||
|
|||
|
|||
class ErrorCorrectionDetailsFmt(model.TextFileFormat): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about you add the header/type mapping as a class property to this format? Then, you can use it here and here (and in the rest of the transformers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third comment on this format - can you make this a TSV? If you did, then you could actually just use the qiime2.Metadata
object for validation:
for column in self.METADATA_COLUMNS.keys():
try:
md.get_column(column)
except ValueError as md_exc:
raise ValidationError(md_exc) from md_exc
As well, your transformers could use qiime2.Metadata
for loading and saving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
I got in the header / type mapping stuff. I was less sure about how to do the changes to support the Metadata
object though, is there a recommended example to base off of by chance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a little bit here in this open PR: qiime2/q2-types#210
Okay, I have just a few minor things to add to this, then it is set:
Thanks! |
This pull request adds in Golay error decoding for the Earth Microbiome Project protocol (solving #72). The code included is essentially a direct copy from QIIME1, which was originally authored by @justin212k, with a few exceptions outlined below:
functools.lru_cache
to avoid decoding unless necessaryThe contained code does more than only decoding, so it is possible to use the included library methods for the purpose of creating new Golay codes. However, since
demux
only presently supports the EMP protocol, it didn't seem necessary at this time to expose any of that additional functionality.As this code was brought from an existing project, some additional detail such as the module docstring was retained.
Finally, this PR does introduce an API change to both
emp-single
andemp-paired
. Specifically, we now record detail about the error correction process. This information is dumped into aDataFrame
and provided as an output. The original intention was for this output to be compatible withqiime metadata tabluate
, however the sample-id column is now unique (as multiple reads within a sample may have errors) so it does not currently work with that visualizer.