Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/BUG/MAINT: support comments/blank lines and better validation in "per-sample sequence" formats #136

Merged
merged 1 commit into from
Jul 24, 2017

Conversation

maxvonhippel
Copy link
Contributor

All changes in this commit are related to the "per-sample sequence" formats.

Several changes:

  • Comments and blank lines are now supported in "per-sample sequence" manifest formats, including sniffers and transformers.
  • Duplicate samples are now additionally detected and disallowed in "relative path" manifest formats (previously this validation only happened in the "absolute path" manifest formats).
  • Error messages reported by transformers are both more precise and accurate.
  • Refactored sniffer and transformer code to reduce redundancy and perform consistent validation and parsing in all sniffers/transformers.
  • Added unit tests, increased coverage to 100% for "per-sample sequence" formats, and made existing unit tests and test data more robust.

Fixes #126.
Fixes #129.
Fixes #132.

Pair-programmed with @jairideout .

…"per-sample sequence" formats

All changes in this commit are related to the "per-sample sequence" formats.

Several changes:

- Comments and blank lines are now supported in "per-sample sequence" manifest formats, including sniffers and transformers.
- Duplicate samples are now additionally detected and disallowed in "relative path" manifest formats (previously this validation only happened in the "absolute path" manifest formats).
- Error messages reported by transformers are both more precise and accurate.
- Refactored sniffer and transformer code to reduce redundancy and perform consistent validation and parsing in all sniffers/transformers.
- Added unit tests, increased coverage to 100% for "per-sample sequence" formats, and made existing unit tests and test data more robust.

Fixes qiime2#126.
Fixes qiime2#129.
Fixes qiime2#132.

Pair-programmed with @jairideout.
@@ -0,0 +1,4 @@
sample-id,absolute-filepath,direction
Human-Kneecap,/some/groovy/folder/Human-Kneecap_S1_L001_R1_001.fastq.gz,forward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎶

Copy link
Contributor

@thermokarst thermokarst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no comments guys, besides the fact that this PR looks great! Thanks so much for jamming on this 🎸! This is nice and clean, and easy to follow --- looking forward to all this centralized/standardized validation (does that make this the validation station 🚋 ?)!

os.path.expandvars(record['absolute-filepath'])
_validate_path(record['absolute-filepath'])
_validate_direction(record['direction'])
raise ValueError('Empty cells are not supported in '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

path = record[expected_header[1]]
if absolute:
if not os.path.isabs(path):
raise ValueError('All paths provided in manifest must be '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

'absolute but found relative path: %s' % path)
else:
if os.path.isabs(path):
raise ValueError('All paths provided in manifest must be '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

path = os.path.join(os.path.dirname(manifest_fh.name), path)
if not os.path.exists(path):
raise FileNotFoundError(
'A path specified in the manifest does not exist '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

def _validate_header(manifest, expected_header):
header = manifest.columns.tolist()
if header != expected_header:
raise ValueError('Expected manifest header %r but '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

'but the following directions were observed: %s'
% ', '.join(directions))

'contain only forward or reverse reads, '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -252,7 +236,7 @@ def _validate_paired_end_fastq_manifest_directions(manifest):
elif direction == 'reverse':
reverse_direction_sample_ids.append(sample_id)
else:
raise ValueError('Directions can only be "forward" and '
raise ValueError('Directions can only be "forward" or '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops! good catch!

for file in files:
filepath = self.get_data_path(file)
for format in self.formats:
with self.assertRaisesRegex(ValueError, format.__name__):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

'relative_manifests/jagged-MANIFEST']
for file in files:
filepath = self.get_data_path(file)
with self.assertRaisesRegex(ValueError, 'FastqManifestFormat'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

@@ -13,15 +13,74 @@
from ..plugin_setup import plugin


class FastqManifestFormat(model.TextFileFormat):
class _FastqManifestBase(model.TextFileFormat):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, I like this game plan.

@thermokarst thermokarst merged commit 407a91f into qiime2:master Jul 24, 2017
ebolyen pushed a commit that referenced this pull request Jul 24, 2017
Missing test assets from #136
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement type:bug Something is wrong.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants