ipfs add stalls on directory with a very large amount of files #7596

ohsqueezy · 2020-08-14T20:41:30Z

Version information:

go-ipfs version: 0.6.0
Repo version: 10
System version: amd64/linux
Golang version: go1.14.4

Description:

When adding a very large directory of over 3m files, the add operation stalls around 15% of the way through and seems to hang for a while before eventually continuing. The time estimate starts at around 10 minutes, and after stalling, the estimate increases to about 35 minutes. It happens intermittently throughout the add, and in the end it takes between 2.5 - 3 hours to add everything. The files are each around 700 bytes, and an rsync of the same files takes 10 - 15 minutes.

I'm running add using --nocopy and --offline, and I have file sharding enabled.

$ ipfs add -r --nocopy --offline data/

I've read this related issue, but I'm not sure if it's the same issue.

This is a bit separate, but just to add some more information, after the directory is added, retrieving the directory over the local gateway sometimes works after a long time and sometimes dies with a 502 Proxy Error. By comparison, Apache will return the files, although it takes a long time to list them. Retrieving a single file in the directory works ordinarily.

The hash I'm referring to is QmXDZ3KzdW9DnuCHvFpttaZSWokAs42ZBayLSCbPeha7B6 and the metadata file for the set can be viewed at https://ipfs.io/ipfs/QmXDZ3KzdW9DnuCHvFpttaZSWokAs42ZBayLSCbPeha7B6/metadata.json, although the rest of the files are gzipped CSV format.

The text was updated successfully, but these errors were encountered:

welcome · 2020-08-14T20:41:31Z

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

"Priority" labels will show how urgent this is for the team.
"Status" labels will show if this is ready to be worked on, blocked, or in progress.
"Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

Stebalien · 2020-08-17T17:55:30Z

Try enabling directory sharding: ``` ipfs config --json Experimental.ShardingEnabled true ```

ohsqueezy · 2020-08-18T18:21:38Z

Thanks, I actually have sharding and filestore enabled already

  "Experimental": {
    "FilestoreEnabled": true,
    "UrlstoreEnabled": false,
    "ShardingEnabled": true,
    "GraphsyncEnabled": false,
    "StrategicProviding": false
  }

If I remember correctly, when I started using sharding, it fixed another issue where the add would die at the end, but I don't think it had an affect on the speed.

Not sure if it's relevant, but this is the kernel I'm using, and the filesystem is ext4

$ uname -r
4.4.0-185-generic

Stebalien · 2020-08-19T18:35:35Z

Hm.

What datastore? (I assume flatfs)
What filesystem? (ext4?)

Once the add starts hanging, could you run https://github.com/ipfs/go-ipfs/blob/master/bin/collect-profiles.sh? This will take a snapshot of IPFS stack traces, a CPU usage sample, etc. and help us figure out where it's stuck.

ohsqueezy · 2020-08-20T21:09:04Z

Yes, datastore is flatfs and filesystem is ext4

I took a snapshot at 9% complete, the first time the add stalled, and it was stopped for about a minute

snapshot: ipfs-profile-mega-ipfs-node-2020-08-20T15:36:55-0400.tar.gz
add progress output: ipfs-add-progress.txt

Took another one around 50%

snapshot: ipfs-profile-mega-ipfs-node-2020-08-20T15:56:15-0400.tar.gz
add progress output: ipfs-add-progress-2.txt

The full add took about 1.5 hours, which is faster than previous adds, but the time estimate displayed 20 minutes while it was running. The files were already added previously, but that was also the case last time I added this directory. Right before running, I updated golang and did ipfs repo gc

$ go version
go version go1.15 linux/amd64

Stebalien · 2020-08-20T22:19:21Z

I'd recommend trying badger. While the filestore will prevent you from writing blocks for files, you'll still have to write blocks for directory chunks. Given the current sharding algorithm, this'll lead to hundreds of thousands of blocks. On flatfs, this equates to hundreds of thousands of synchronously written files.

Stebalien · 2020-08-20T22:27:01Z

The CPU profile agrees. We're spending a lot of time writing files.

ohsqueezy · 2020-08-28T23:35:40Z

Ok, thank you for looking into it and checking the CPU profile. I'm on board with converting to badgerds. I may not get to it for a little while, so I'll close this for now and report back later with how it affects the add operation.

ohsqueezy added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Aug 14, 2020

ohsqueezy changed the title ~~ipfs add stalls on very large directory of files~~ ipfs add stalls on directory with a very large amount of files Aug 16, 2020

ohsqueezy closed this as completed Aug 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ipfs add stalls on directory with a very large amount of files #7596

ipfs add stalls on directory with a very large amount of files #7596

ohsqueezy commented Aug 14, 2020 •

edited

welcome bot commented Aug 14, 2020

Stebalien commented Aug 17, 2020 via email

ohsqueezy commented Aug 18, 2020

Stebalien commented Aug 19, 2020

ohsqueezy commented Aug 20, 2020

Stebalien commented Aug 20, 2020

Stebalien commented Aug 20, 2020

ohsqueezy commented Aug 28, 2020

ipfs add stalls on directory with a very large amount of files #7596

ipfs add stalls on directory with a very large amount of files #7596

Comments

ohsqueezy commented Aug 14, 2020 • edited

Version information:

Description:

welcome bot commented Aug 14, 2020

Stebalien commented Aug 17, 2020 via email

ohsqueezy commented Aug 18, 2020

Stebalien commented Aug 19, 2020

ohsqueezy commented Aug 20, 2020

Stebalien commented Aug 20, 2020

Stebalien commented Aug 20, 2020

ohsqueezy commented Aug 28, 2020

ohsqueezy commented Aug 14, 2020 •

edited