Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty output files when no targets are found in inputs #51

Merged
merged 7 commits into from
Jul 13, 2018

Conversation

peterk87
Copy link
Contributor

@peterk87 peterk87 commented Jul 6, 2018

This pull request:

  • Fixes Empty files when no targets are found #49, there's always going to be a subtyping entry for each input
  • Refactors code to be more DRY, modular and, hopefully, more obvious
  • Added unit tests for important utility functions

pkruczkiewicz added 2 commits July 6, 2018 17:03
Signed-off-by: pkruczkiewicz <peter.kruczkiewicz@canada.ca>
Signed-off-by: pkruczkiewicz <peter.kruczkiewicz@canada.ca>
@peterk87 peterk87 changed the title WIP: Fix empty output files when no targets are found in inputs Fix empty output files when no targets are found in inputs Jul 10, 2018
@peterk87 peterk87 requested a review from gcttong July 10, 2018 18:30
scheme_subtype_counts=scheme_subtype_counts,
n_threads=n_threads)
logging.info('Generated %s subtyping results from %s contigs samples', len(contigs_results), len(input_contigs))
subtype_results += contigs_results
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is subtype_results iterable? did you mean subtype_results.append(contigs_results)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nice thing you can do in Python where you can add 2 lists together with +:

list1 = [1, 2, 3] 
list1 += [4, 5]
assert list1 == [1,2,3,4,5]
#or
list1 = [1, 2, 3] 
list2 = [4, 5]
list3 = list1 + list2
assert list3 == [1,2,3,4,5]
# list1 and list2 are unmodified

Hopefully that makes sense!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok I see, that makes sense, thanks!

scheme_subtype_counts=scheme_subtype_counts,
n_threads=n_threads)
logging.info('Generated %s subtyping results from %s contigs samples', len(reads_results), len(input_reads))
subtype_results += reads_results
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from typing import Optional, List, Any, Tuple

import attr
import os
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't "os" a built-in module, shouldn't it be in the first group?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! Fixed in acf4335

input_genomes, reads = collect_inputs(args)
if len(input_genomes) == 0 and len(reads) == 0:
input_contigs, input_reads = collect_inputs(args)
if len(input_contigs) == 0 and len(input_reads) == 0:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an "or" statement, that if either of these are empty, then it raises the error? Or is it ok if one of them is empty, that's still ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're allowing the user to specify a mix of file types. They can specify contigs (FASTA) and reads (FASTQ) in the same analysis. If both lists are empty then no input whatsoever has been specified so we can't do anything except let the user know they haven't specified anything to analyze.

organize imports
@@ -4,10 +4,10 @@

from pandas import DataFrame
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, would it be good to import pandas as pd just like the other files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. Why from pandas import DataFrame where everywhere else it's import pandas as pd? There's no good reason for that to remain the way it is. Fixed in 0a8fe89

Signed-off-by: pkruczkiewicz <peter.kruczkiewicz@canada.ca>
Copy link

@gcttong gcttong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, for the most part, looks good with me, just a few comments!

pkruczkiewicz and others added 2 commits July 13, 2018 10:53
@peterk87
Copy link
Contributor Author

Great! @gcttong If it looks good to you, then can you merge the changes? Thanks for taking a look at this PR.

@gcttong gcttong merged commit 3d97c26 into development Jul 13, 2018
@peterk87 peterk87 mentioned this pull request Jul 31, 2018
@peterk87 peterk87 deleted the fix-49/empty-output-files branch September 28, 2018 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants