Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipfs add stalls on directory with a very large amount of files #7596

Closed
ohsqueezy opened this issue Aug 14, 2020 · 8 comments
Closed

ipfs add stalls on directory with a very large amount of files #7596

ohsqueezy opened this issue Aug 14, 2020 · 8 comments
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization

Comments

@ohsqueezy
Copy link

ohsqueezy commented Aug 14, 2020

Version information:

go-ipfs version: 0.6.0
Repo version: 10
System version: amd64/linux
Golang version: go1.14.4

Description:

When adding a very large directory of over 3m files, the add operation stalls around 15% of the way through and seems to hang for a while before eventually continuing. The time estimate starts at around 10 minutes, and after stalling, the estimate increases to about 35 minutes. It happens intermittently throughout the add, and in the end it takes between 2.5 - 3 hours to add everything. The files are each around 700 bytes, and an rsync of the same files takes 10 - 15 minutes.

I'm running add using --nocopy and --offline, and I have file sharding enabled.

$ ipfs add -r --nocopy --offline data/

I've read this related issue, but I'm not sure if it's the same issue.

This is a bit separate, but just to add some more information, after the directory is added, retrieving the directory over the local gateway sometimes works after a long time and sometimes dies with a 502 Proxy Error. By comparison, Apache will return the files, although it takes a long time to list them. Retrieving a single file in the directory works ordinarily.

The hash I'm referring to is QmXDZ3KzdW9DnuCHvFpttaZSWokAs42ZBayLSCbPeha7B6 and the metadata file for the set can be viewed at https://ipfs.io/ipfs/QmXDZ3KzdW9DnuCHvFpttaZSWokAs42ZBayLSCbPeha7B6/metadata.json, although the rest of the files are gzipped CSV format.

@ohsqueezy ohsqueezy added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Aug 14, 2020
@welcome
Copy link

welcome bot commented Aug 14, 2020

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@ohsqueezy ohsqueezy changed the title ipfs add stalls on very large directory of files ipfs add stalls on directory with a very large amount of files Aug 16, 2020
@Stebalien
Copy link
Member

Stebalien commented Aug 17, 2020 via email

@ohsqueezy
Copy link
Author

Thanks, I actually have sharding and filestore enabled already

  "Experimental": {
    "FilestoreEnabled": true,
    "UrlstoreEnabled": false,
    "ShardingEnabled": true,
    "GraphsyncEnabled": false,
    "StrategicProviding": false
  }

If I remember correctly, when I started using sharding, it fixed another issue where the add would die at the end, but I don't think it had an affect on the speed.

Not sure if it's relevant, but this is the kernel I'm using, and the filesystem is ext4

$ uname -r
4.4.0-185-generic

@Stebalien
Copy link
Member

Hm.

  • What datastore? (I assume flatfs)
  • What filesystem? (ext4?)

Once the add starts hanging, could you run https://github.com/ipfs/go-ipfs/blob/master/bin/collect-profiles.sh? This will take a snapshot of IPFS stack traces, a CPU usage sample, etc. and help us figure out where it's stuck.

@ohsqueezy
Copy link
Author

Yes, datastore is flatfs and filesystem is ext4

I took a snapshot at 9% complete, the first time the add stalled, and it was stopped for about a minute

snapshot: ipfs-profile-mega-ipfs-node-2020-08-20T15:36:55-0400.tar.gz
add progress output: ipfs-add-progress.txt

Took another one around 50%

snapshot: ipfs-profile-mega-ipfs-node-2020-08-20T15:56:15-0400.tar.gz
add progress output: ipfs-add-progress-2.txt

The full add took about 1.5 hours, which is faster than previous adds, but the time estimate displayed 20 minutes while it was running. The files were already added previously, but that was also the case last time I added this directory. Right before running, I updated golang and did ipfs repo gc

$ go version
go version go1.15 linux/amd64

@Stebalien
Copy link
Member

I'd recommend trying badger. While the filestore will prevent you from writing blocks for files, you'll still have to write blocks for directory chunks. Given the current sharding algorithm, this'll lead to hundreds of thousands of blocks. On flatfs, this equates to hundreds of thousands of synchronously written files.

@Stebalien
Copy link
Member

The CPU profile agrees. We're spending a lot of time writing files.

@ohsqueezy
Copy link
Author

Ok, thank you for looking into it and checking the CPU profile. I'm on board with converting to badgerds. I may not get to it for a little while, so I'll close this for now and report back later with how it affects the add operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

2 participants