Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report sketch progress from sketch fromfile #2197

Open
bluegenes opened this issue Aug 11, 2022 · 0 comments
Open

report sketch progress from sketch fromfile #2197

bluegenes opened this issue Aug 11, 2022 · 0 comments

Comments

@bluegenes
Copy link
Contributor

When running large databases (e.g. building a new alphabet or ksize for all of gtdb), it would help to have some progress output from fromfile, since we have the whole list of signatures we'll build ahead of time.

Current output is informative for each file, but not for building the database as a whole.

... reading sequences from genbank/proteomes/GCF_013073765.1_protein.faa.gz
calculated 1 signatures for 4386 sequences in genbank/proteomes/GCF_013073765.1_protein.faa.gz
... reading sequences from genbank/proteomes/GCF_005041175.1_protein.faa.gz
calculated 1 signatures for 4905 sequences in genbank/proteomes/GCF_005041175.1_protein.faa.gz
... reading sequences from genbank/proteomes/GCF_003721835.1_protein.faa.gz
calculated 1 signatures for 4787 sequences in genbank/proteomes/GCF_003721835.1_protein.faa.gz
... reading sequences from genbank/proteomes/GCF_003304895.1_protein.faa.gz
calculated 1 signatures for 4815 sequences in genbank/proteomes/GCF_003304895.1_protein.faa.gz
... reading sequences from genbank/proteomes/GCF_000627885.1_protein.faa.gz
calculated 1 signatures for 4775 sequences in genbank/proteomes/GCF_000627885.1_protein.faa.gz
... reading sequences from genbank/proteomes/GCF_000326325.1_protein.faa.gz
calculated 1 signatures for 4614 sequences in genbank/proteomes/GCF_000326325.1_protein.faa.gz
... reading sequences from genbank/proteomes/GCF_014878905.1_protein.faa.gz
calculated 1 signatures for 4463 sequences in genbank/proteomes/GCF_014878905.1_protein.faa.gz
... reading sequences from genbank/proteomes/GCF_000622655.2_protein.faa.gz
calculated 1 signatures for 5261 sequences in genbank/proteomes/GCF_000622655.2_protein.faa.gz
... reading sequences from genbank/proteomes/GCF_016433285.1_protein.faa.gz

I think it'd be useful to hide this output or only output if the user wants verbosity, and instead report build percent.

Just to have some code to start from, merge percent code, here, does a similar thing:

if m % 100 == 0:
    merge_percent = float(n)/len(found_idents) * 100
    notify(f"...merging sigs for {merge_name} ({merge_percent:.1f}% of sigs merged)", end="\r")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant