-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH/BUG/MAINT: support comments/blank lines and better validation in "per-sample sequence" formats #136
Conversation
…"per-sample sequence" formats All changes in this commit are related to the "per-sample sequence" formats. Several changes: - Comments and blank lines are now supported in "per-sample sequence" manifest formats, including sniffers and transformers. - Duplicate samples are now additionally detected and disallowed in "relative path" manifest formats (previously this validation only happened in the "absolute path" manifest formats). - Error messages reported by transformers are both more precise and accurate. - Refactored sniffer and transformer code to reduce redundancy and perform consistent validation and parsing in all sniffers/transformers. - Added unit tests, increased coverage to 100% for "per-sample sequence" formats, and made existing unit tests and test data more robust. Fixes qiime2#126. Fixes qiime2#129. Fixes qiime2#132. Pair-programmed with @jairideout.
@@ -0,0 +1,4 @@ | |||
sample-id,absolute-filepath,direction | |||
Human-Kneecap,/some/groovy/folder/Human-Kneecap_S1_L001_R1_001.fastq.gz,forward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎶
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no comments guys, besides the fact that this PR looks great! Thanks so much for jamming on this 🎸! This is nice and clean, and easy to follow --- looking forward to all this centralized/standardized validation (does that make this the validation station 🚋 ?)!
os.path.expandvars(record['absolute-filepath']) | ||
_validate_path(record['absolute-filepath']) | ||
_validate_direction(record['direction']) | ||
raise ValueError('Empty cells are not supported in ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
path = record[expected_header[1]] | ||
if absolute: | ||
if not os.path.isabs(path): | ||
raise ValueError('All paths provided in manifest must be ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
'absolute but found relative path: %s' % path) | ||
else: | ||
if os.path.isabs(path): | ||
raise ValueError('All paths provided in manifest must be ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
path = os.path.join(os.path.dirname(manifest_fh.name), path) | ||
if not os.path.exists(path): | ||
raise FileNotFoundError( | ||
'A path specified in the manifest does not exist ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
def _validate_header(manifest, expected_header): | ||
header = manifest.columns.tolist() | ||
if header != expected_header: | ||
raise ValueError('Expected manifest header %r but ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
'but the following directions were observed: %s' | ||
% ', '.join(directions)) | ||
|
||
'contain only forward or reverse reads, ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -252,7 +236,7 @@ def _validate_paired_end_fastq_manifest_directions(manifest): | |||
elif direction == 'reverse': | |||
reverse_direction_sample_ids.append(sample_id) | |||
else: | |||
raise ValueError('Directions can only be "forward" and ' | |||
raise ValueError('Directions can only be "forward" or ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops! good catch!
for file in files: | ||
filepath = self.get_data_path(file) | ||
for format in self.formats: | ||
with self.assertRaisesRegex(ValueError, format.__name__): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
'relative_manifests/jagged-MANIFEST'] | ||
for file in files: | ||
filepath = self.get_data_path(file) | ||
with self.assertRaisesRegex(ValueError, 'FastqManifestFormat'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
@@ -13,15 +13,74 @@ | |||
from ..plugin_setup import plugin | |||
|
|||
|
|||
class FastqManifestFormat(model.TextFileFormat): | |||
class _FastqManifestBase(model.TextFileFormat): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, I like this game plan.
Missing test assets from #136
All changes in this commit are related to the "per-sample sequence" formats.
Several changes:
Fixes #126.
Fixes #129.
Fixes #132.
Pair-programmed with @jairideout .