Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added progress report to console when running sourmash compute with --singleton #84

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
18 changes: 17 additions & 1 deletion sourmash_lib/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,9 @@ def compute(self, args):
parser.add_argument('--name-from-first', action='store_true')
parser.add_argument('--with-cardinality', action='store_true')
parser.add_argument('--track-abundance', action='store_true')
parser.add_argument('--progress-stepsize', type=int,
default = 10000,
help='number of entries in input after which progress is printed to stderr.')
args = parser.parse_args(args)

if args.input_is_protein and args.dna:
Expand Down Expand Up @@ -294,13 +297,26 @@ def save_siglist(siglist, output_fp, filename=None):
if args.singleton:
siglist = []
for n, record in enumerate(screed.open(filename)):
if n % args.progress_stepsize == 0:
if sys.version_info[0] < 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think we should avoid adding too many of these checks to the code. Is it too bad to just not flush the output?

And not for this PR, but we should definitely start using the logging module for this kind of output, instead of printing stuff. Also makes it easier to filter what output we want (INFO for printing progress, for example?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think we should avoid adding too many of these checks to the code. Is it too bad to just not flush the output?

It is a progress report. Not flushing the output kind of defeats the purpose.

And not for this PR, but we should definitely start using the logging module for this kind of output, instead of printing stuff. Also makes it easier to filter what output we want (INFO for printing progress, for example?)

Big yes about using logging (I am trying to match the current code base and not have changes and new features are the same time to the extent possible - I am happy to go both if not creating too much trouble). Note: Flushing is implicitly done by logging with each call, I believe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call sys.stderr.flush() in both cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would work... although it doesn't make the implicit point that Python 2 should eventually go ;-)

print('%s: %i entries' % (filename, n+1), file=sys.stderr, end='\r')
sys.stderr.flush()
else:
print('%s: %i entries' % (filename, n+1), file=sys.stderr, end='\r', flush=True)

# make estimators for each sequence
Elist = make_estimators()
add_seq(Elist, record.sequence,
args.input_is_protein, args.check_sequence)

siglist += build_siglist(args.email, Elist, filename,
name=record.name)
if sys.version_info[0] < 3:
print('%s: %i entries' % (filename, n+1), file=sys.stderr, end='\r')
sys.stderr.flush()
else:
print('%s: %i entries' % (filename, n+1), file=sys.stderr, end='\r', flush=True)

print('calculated {} signatures for {} sequences in {}'.\
format(len(siglist), n + 1, filename))
else:
Expand All @@ -312,7 +328,7 @@ def save_siglist(siglist, output_fp, filename=None):
file=sys.stderr)
name = None
for n, record in enumerate(screed.open(filename)):
if n % 10000 == 0:
if n % args.progress_stepsize == 0:
if n:
print('...', filename, n, file=sys.stderr)
elif args.name_from_first:
Expand Down