Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate index is slow #1654

Closed
dirkmc opened this issue Aug 30, 2023 · 4 comments
Closed

Generate index is slow #1654

dirkmc opened this issue Aug 30, 2023 · 4 comments

Comments

@dirkmc
Copy link
Contributor

dirkmc commented Aug 30, 2023

On our production miner (filcollins) it takes 5 minutes to generate the index for a 32 GB piece with 400k block size:

$ boostd lid gen-index baga6ea4seaqes3t2b72dfwdmisyfvogudr33ej7uhv2u2rp7642siiexsy7eqki
Generating index for piece baga6ea4seaqes3t2b72dfwdmisyfvogudr33ej7uhv2u2rp7642siiexsy7eqki
Generated index in 4m42.602963148s

select count(*) from idx.PieceBlockOffsetSize where piececid = 0x0181e203922020496e7a0ff432d86c44b05ab8d41c77b227f43d754d45fff735242097963e4829;

 count
-------
 82066

(1 rows)

32 Gib / 82,066 ~= 400k
@dirkmc
Copy link
Contributor Author

dirkmc commented Aug 30, 2023

The second time it took 53s:

2023-08-30T04:39:48.932-0400	DEBUG	piecedirectory	piecedirectory/piecedirectory.go:281
	build index: get piece deals	{"pieceCid": "baga6ea4seaqes3t2b72dfwdmisyfvogudr33ej7uhv2u2rp7642siiexsy7eqki"}
2023-08-30T04:39:48.936-0400	DEBUG	piecedirectory	piecedirectory/piecedirectory.go:210
	add index: wait for open throttle position	{"pieceCid": "baga6ea4seaqes3t2b72dfwdmisyfvogudr33ej7uhv2u2rp7642siiexsy7eqki", "queued": 0, "queue-limit": 4}
2023-08-30T04:39:48.936-0400	DEBUG	piecedirectory	piecedirectory/piecedirectory.go:233
	add index: get index	{"pieceCid": "baga6ea4seaqes3t2b72dfwdmisyfvogudr33ej7uhv2u2rp7642siiexsy7eqki"}
2023-08-30T04:39:48.956-0400	DEBUG	piecedirectory	piecedirectory/piecedirectory.go:240
	add index: read index	{"pieceCid": "baga6ea4seaqes3t2b72dfwdmisyfvogudr33ej7uhv2u2rp7642siiexsy7eqki"}

2023-08-30T04:40:32.607-0400	DEBUG	piecedirectory	piecedirectory/piecedirectory.go:266
	add index: store index in local index directory	{"pieceCid": "baga6ea4seaqes3t2b72dfwdmisyfvogudr33ej7uhv2u2rp7642siiexsy7eqki"}

2023-08-30T04:40:41.394-0400	DEBUG	piecedirectory	piecedirectory/piecedirectory.go:227
	add index: completed	{"pieceCid": "baga6ea4seaqes3t2b72dfwdmisyfvogudr33ej7uhv2u2rp7642siiexsy7eqki"}

The logs above show that

  • it takes ~44s to stream the data through the CAR block reader (building the index in memory)
  • it takes ~9s to store the index in LID

@willscott
Copy link
Collaborator

  • is the piece / data fully local, or coming from a remote reader?
  • is it io bandwidth limited?

@dirkmc
Copy link
Contributor Author

dirkmc commented Aug 30, 2023

I think the problem is probably io. Retrieving the file from booster-http on localhost is very slow:

wget "http://localhost:7777/piece/baga6ea4seaqes3t2b72dfwdmisyfvogudr33ej7uhv2u2rp7642siiexsy7eqki"
--2023-08-30 04:49:29--  http://localhost:7777/piece/baga6ea4seaqes3t2b72dfwdmisyfvogudr33ej7uhv2u2rp7642siiexsy7eqki
Length: 34091302912 (32G) [application/piece]
Saving to: ‘baga6ea4seaqes3t2b72dfwdmisyfvogudr33ej7uhv2u2rp7642siiexsy7eqki’
...
2023-08-30 04:51:19 (296 MB/s) - ‘baga6ea4seaqes3t2b72dfwdmisyfvogudr33ej7uhv2u2rp7642siiexsy7eqki’ saved [34091302912/34091302912]

Total wall clock time: 1m 50s
Downloaded: 1 files, 32G in 1m 50s (296 MB/s)

@dirkmc
Copy link
Contributor Author

dirkmc commented Aug 30, 2023

Running the same operation on my local machine the block reader processes data at about 2Gib / s, so it looks like the slowness is definitely due to slow i/o on filcollins.

@dirkmc dirkmc closed this as completed Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants