Adding file to repo with many files is slow #3545

vertis · 2016-12-29T01:43:01Z

Version information:

go-ipfs version: 0.4.4-
Repo version: 4
System version: amd64/linux
Golang version: go1.7

Type: Bug

Priority: P1

Description:

Adding a file to an existing repo after many files have been imported is very slow.

This is local, without the daemon running (though that doesn't seem to matter).

# existing repo
root@localhost ~# ipfs repo stat
NumObjects 	 189555
RepoSize 	 43563229872
RepoPath 	 /data/ipfs
Version 	 fs-repo@4
root@localhost ~# dd if=/dev/random of=test.bin count=1024
0+1024 records in
113+1 records out
58024 bytes (58 kB) copied, 0.0343963 s, 1.7 MB/s
root@localhost ~# time -p ipfs add test.bin
added QmZikmEEAK923N6n1gpADRjWn7gSn3AyUGdSJXXqJmhk8W test.bin
real 11.49
user 0.16
sys 0.16

# new repo
root@localhost ~# ipfs repo stat
NumObjects 	 83
RepoSize 	 14358417
RepoPath 	 /data/ipfs2
Version 	 fs-repo@4
root@localhost ~# time -p ipfs add test.bin
added QmZikmEEAK923N6n1gpADRjWn7gSn3AyUGdSJXXqJmhk8W test.bin
real 0.26
user 0.01
sys 0.01

I had originally though it might just be the time writing the file out, however this doesn't appear to be the case. Even a tiny file incurs the ~10 second penalty.

Is this expected behaviour? because it sort of rules out using it for anything large like rubygems (which is what I was trying to import).

This isn't on a particularly beefy machine, however it doesn't seem to be CPU bound or IO bound (I'm not a massive expert on the later, however zpool iostat isn't showing high values).

The text was updated successfully, but these errors were encountered:

vertis · 2016-12-29T03:11:10Z

Running this locally on my Macbook (to confirm it wasn't something weird with my ZFS drives), it does about 50k objects and then goes from roaring along at 0.05-0.06 seconds per file to 1-3 seconds per file

NumObjects 	 53533
RepoSize 	 3234337159
RepoPath 	 /Volumes/file/temp_ipfs
Version 	 fs-repo@4

Edit: correct number of objects downwards

ghost · 2016-12-29T16:45:21Z

I think I've been seeing similar behaviour while adding the cdn.media.ccc.de mirror. I haven't measured it, but the repo was initially nearly empty, then the 3.8TB add got slower and slower over time, and now additional adds are just as slow.

ghost · 2016-12-30T02:08:51Z

I just started another add, but this time with only-hash set (ipfs add -n -r cdn.media.ccc.de), to narrow this down to datastore. My hopeful expectation is that the add's duration stays constant.

vertis · 2016-12-30T02:10:25Z

Yeah, it’s definitely not slowing down for me with only-hash set Edit: clarify

jdgcs · 2016-12-30T06:00:50Z

similar issue:
Slowly handling large number of files

./ipfs add became very slow when handling about 45K files(~300GB), it took about 3+ seconds to wait after the process bar finished.

% ./ipfs repo stat
NumObjects 9996740
RepoSize 381559053765
RepoPath /home/liu/.ipfs
Version fs-repo@4

vertis · 2016-12-30T07:10:42Z

@jdgcs damn, I'd say it's the same thing...I even looked to see if there was already an issue and couldn't see anything

mateon1 · 2016-12-30T07:32:59Z

I see that you are both using the 0.4.4 release. Could you try the 0.4.5 prerelease?
Performance has improved in 0.4.5, and I do not get this slowdown on a Windows machine, with a NTFS filesystem on HDD with a repo 41 GB in size, with 280318 objects.

$ time ipfs add random.bin
 1024 B / 1024 B [=========================================================================================] 100.00% 0s
added QmUXjD2mkCwCdqoN4cvcToBunKMvv7zG98GWUwer2xzvSH random.bin

real    0m0.350s
user    0m0.000s
sys     0m0.030s

repo stat:

$ ipfs repo stat
NumObjects       280318
RepoSize         41058762331
RepoPath         D:\ipfs
Version          fs-repo@4

vertis · 2016-12-30T07:47:25Z

@mateon1 Thanks for the reminder. I did actually try 0.4.5-dev yesterday after I reported this. It did still have the problem:

# ipfs version
ipfs version 0.4.5-dev
# dd if=/dev/random of=random.bin count=1024
1024+0 records in
1024+0 records out
524288 bytes transferred in 0.035843 secs (14627292 bytes/sec)

# du -sh random.bin 
512K	random.bin

# time -p ipfs add random.bin
added QmWhjMhVXWmEcUCWisJTEjgcQAXwxfmeUBAYxvNGKi1tUb random.bin
real         3.18
user         0.15
sys          0.12

# ipfs repo stat
NumObjects 	 59950
RepoSize 	 3312096091
RepoPath 	 /Volumes/WorldBuilder/temp_ipfs
Version 	 fs-repo@4

This is now locally on my MacbookPro on a USB3 external HD (HFS+).

Interestingly, it's after the progress bar, and even the debug messages just seem to sit there.

ghost · 2017-01-02T22:17:24Z

I just started another add, but this time with only-hash set (ipfs add -n -r cdn.media.ccc.de), to narrow this down to datastore. My hopeful expectation is that the add's duration stays constant.

It took about 9h15m in the end, so it definitely is storage-related.

rht · 2017-02-01T08:58:49Z

I test-benchmarked this on 0.4.5-rc1,

code to reproduce: https://github.com/rht/sfpi-benchmark/tree/master/add_growing_repo (./benchmark.sh)

The slowdown is >twice when the repo size hits ~5GB. I don't have enough storage space to quantify the ~TB scale though.
The spikes are possibly due to io flush (NoSync is used).

FortisFortuna · 2018-06-28T01:23:48Z

I experience this as well.

Stebalien · 2018-06-28T03:45:05Z

@forstmeier try using the experimental badger datastore. You can create a node configured to use it by instantiating it with --profile=badgerds.

I'm guessing this is due to the default datastore creating tons of tiny files (which filesystems generally don't like). Badgerds doesn't create one file per block so it should perform better (also, it'll use a hell of a lot less space).

ghost added topic/commands:add Topic commands:add topic/perf Performance labels Dec 30, 2016

momack2 added this to Inbox in ipfs/go-ipfs May 9, 2019

DonaldTsang mentioned this issue May 16, 2019

Performace, or How IPFS will be better than BitTorrent #6342

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding file to repo with many files is slow #3545

Adding file to repo with many files is slow #3545

vertis commented Dec 29, 2016

vertis commented Dec 29, 2016 •

edited

ghost commented Dec 29, 2016

ghost commented Dec 30, 2016

vertis commented Dec 30, 2016 via email •

edited

jdgcs commented Dec 30, 2016 •

edited

vertis commented Dec 30, 2016

mateon1 commented Dec 30, 2016 •

edited

vertis commented Dec 30, 2016

ghost commented Jan 2, 2017

rht commented Feb 1, 2017 •

edited

FortisFortuna commented Jun 28, 2018

Stebalien commented Jun 28, 2018

Adding file to repo with many files is slow #3545

Adding file to repo with many files is slow #3545

Comments

vertis commented Dec 29, 2016

Version information:

Type: Bug

Priority: P1

Description:

vertis commented Dec 29, 2016 • edited

ghost commented Dec 29, 2016

ghost commented Dec 30, 2016

vertis commented Dec 30, 2016 via email • edited

jdgcs commented Dec 30, 2016 • edited

vertis commented Dec 30, 2016

mateon1 commented Dec 30, 2016 • edited

vertis commented Dec 30, 2016

ghost commented Jan 2, 2017

rht commented Feb 1, 2017 • edited

FortisFortuna commented Jun 28, 2018

Stebalien commented Jun 28, 2018

vertis commented Dec 29, 2016 •

edited

vertis commented Dec 30, 2016 via email •

edited

jdgcs commented Dec 30, 2016 •

edited

mateon1 commented Dec 30, 2016 •

edited

rht commented Feb 1, 2017 •

edited