Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETA missing when building KMCP index #36

Closed
ericvdtoorn opened this issue Aug 7, 2023 · 4 comments
Closed

ETA missing when building KMCP index #36

ericvdtoorn opened this issue Aug 7, 2023 · 4 comments

Comments

@ericvdtoorn
Copy link

When building the KMCP index for Humgut (by following the instructions here, the ETA is stuck at 0s, even after several blocks have been completed.

kmcp index -j 32 -I humgut-k21-n10 -O humgut.kmcp -n 1 -f 0.3
13:31:35.908 [INFO] kmcp v0.9.3
13:31:35.909 [INFO]   https://github.com/shenwei356/kmcp
13:31:35.909 [INFO]
13:31:35.909 [INFO] loading .unik file infos from file: humgut-k21-n10/_info.txt
13:31:36.518 [INFO]   306910 cached file infos loaded
13:31:36.585 [INFO]
13:31:36.585 [INFO] -------------------- [main parameters] --------------------
13:31:36.585 [INFO]   number of hashes: 1
13:31:36.585 [INFO]   false positive rate: 0.300000
13:31:36.585 [INFO]   k-mer size(s): 21
13:31:36.585 [INFO]   split seqequence size: 0, overlap: 20
13:31:36.585 [INFO]   block-sizeX-kmers-t: 10.00 M
13:31:36.585 [INFO]   block-sizeX        : 256
13:31:36.585 [INFO]   block-size8-kmers-t: 20.00 M
13:31:36.585 [INFO]   block-size1-kmers-t: 200.00 M
13:31:36.585 [INFO] -------------------- [main parameters] --------------------
13:31:36.585 [INFO]
13:31:36.586 [INFO] building index ...
13:31:36.753 [INFO]
13:31:36.753 [INFO]   block size: 9592
13:31:36.753 [INFO]   number of index files: 32 (may be more)
13:31:36.753 [INFO]
13:31:36.753 [block #001] 1199 / 1199  100 %
13:31:36.753 [block #002] 1199 / 1199  100 %
13:31:36.754 [block #003] 1199 / 1199  100 %
13:32:30.922 [block #004] 1199 / 1199  100 %
13:32:34.941 [block #005] 1199 / 1199  100 %
13:33:33.902 [block #006] 1199 / 1199  100 %
13:33:40.757 [block #007] 1199 / 1199  100 %
13:34:45.743 [block #008] 1199 / 1199  100 %
13:34:54.006 [block #009] 1199 / 1199  100 %
13:35:59.125 [block #010] 1199 / 1199  100 %
13:36:08.695 [block #011] 1060 / 1199 [==========================>---]  88 %
13:37:15.240 [block #012]  847 / 1199 [====================>---------]  71 %
[saved index files]     10 / 32 [==========>-----------------------] ETA: 0s
@ericvdtoorn
Copy link
Author

Of course, right after I post this, the ETA is suddenly defined. Guess that it only shows up after a sufficient number of files have been processed? (12 in my case)

@shenwei356
Copy link
Owner

Oh, it's strange. It should be updated right after one index file been saved.

BTW, K-mer file processing and index writing are asynchronous. That means while block 11 and 12 being procesing, the index file of block 10 might not finished writing. One possible reason is the disk (NAS?) is to slow for writing big index files.

You can add --dry-run to check the size of each index file before really executing index buiding.

@ericvdtoorn
Copy link
Author

Could be that writing the blocks just took that long (for the first block to finish writing and produce an ETA)?

@shenwei356
Copy link
Owner

Yes, that's what I mean. The speed depends on the size of a index file and disk speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants