Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFS startup with Badger Datastore is hitting the systemd-timeout #7273

Open
RubenKelevra opened this issue May 5, 2020 · 4 comments
Open
Labels
exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue kind/bug A bug in existing code (including security flaws) P2 Medium: Good to have, but can wait until someone steps up status/ready Ready to be worked topic/badger Topic badger topic/daemon + init topic/repo Topic repo

Comments

@RubenKelevra
Copy link
Contributor

Version information:

go-ipfs version: 0.6.0-dev
Repo version: 9
System version: amd64/linux
Golang version: go1.14.2

Commit 9c7cdac

Description:

With the change of the systemd-service file from simple to notify we introduced a one minute 30 seconds timeout to startup the IPFS service.

This seems to not enough to startup ipfs on servers with low IO capacity:

Mai 04 11:32:06 vidar.pacman.store systemd[1]: Stopping InterPlanetary File System (IPFS) daemon...
Mai 04 11:32:06 vidar.pacman.store ipfs[2091]: Received interrupt signal, shutting down...
Mai 04 11:32:06 vidar.pacman.store ipfs[2091]: (Hit ctrl-c again to force-shutdown the daemon.)
Mai 04 11:32:07 vidar.pacman.store systemd[1]: Stopped InterPlanetary File System (IPFS) daemon.
Mai 04 11:32:36 vidar.pacman.store systemd[1]: Starting InterPlanetary File System (IPFS) daemon...
Mai 04 11:32:36 vidar.pacman.store ipfs[3130]: Initializing daemon...
Mai 04 11:32:36 vidar.pacman.store ipfs[3130]: go-ipfs version: 0.6.0-dev
Mai 04 11:32:36 vidar.pacman.store ipfs[3130]: Repo version: 9
Mai 04 11:32:36 vidar.pacman.store ipfs[3130]: System version: amd64/linux
Mai 04 11:32:36 vidar.pacman.store ipfs[3130]: Golang version: go1.14.2
Mai 04 11:34:06 vidar.pacman.store systemd[1]: ipfs@ipfs.service start operation timed out. Terminating.
Mai 04 11:35:36 vidar.pacman.store systemd[1]: ipfs@ipfs.service stop-sigterm timed out. Killing.
Mai 04 11:35:36 vidar.pacman.store systemd[1]: ipfs@ipfs.service: main process exited, code=killed, status=9/KILL
Mai 04 11:35:36 vidar.pacman.store systemd[1]: Failed to start InterPlanetary File System (IPFS) daemon.
Mai 04 11:35:36 vidar.pacman.store systemd[1]: Unit ipfs@ipfs.service entered failed state.
Mai 04 11:35:36 vidar.pacman.store systemd[1]: ipfs@ipfs.service failed.
Mai 04 11:35:36 vidar.pacman.store systemd[1]: ipfs@ipfs.service holdoff time over, scheduling restart.
Mai 04 11:35:36 vidar.pacman.store systemd[1]: Stopped InterPlanetary File System (IPFS) daemon.
Mai 04 11:35:36 vidar.pacman.store systemd[1]: Starting InterPlanetary File System (IPFS) daemon...
Mai 04 11:35:36 vidar.pacman.store ipfs[3196]: Initializing daemon...
Mai 04 11:35:36 vidar.pacman.store ipfs[3196]: go-ipfs version: 0.6.0-dev
Mai 04 11:35:36 vidar.pacman.store ipfs[3196]: Repo version: 9
Mai 04 11:35:36 vidar.pacman.store ipfs[3196]: System version: amd64/linux
Mai 04 11:35:36 vidar.pacman.store ipfs[3196]: Golang version: go1.14.2
Mai 04 11:37:06 vidar.pacman.store systemd[1]: ipfs@ipfs.service start operation timed out. Terminating.
Mai 04 11:38:37 vidar.pacman.store systemd[1]: ipfs@ipfs.service stop-sigterm timed out. Killing.
Mai 04 11:38:37 vidar.pacman.store systemd[1]: ipfs@ipfs.service: main process exited, code=killed, status=9/KILL
Mai 04 11:38:37 vidar.pacman.store systemd[1]: Failed to start InterPlanetary File System (IPFS) daemon.
Mai 04 11:38:37 vidar.pacman.store systemd[1]: Unit ipfs@ipfs.service entered failed state.
Mai 04 11:38:37 vidar.pacman.store systemd[1]: ipfs@ipfs.service failed.
Mai 04 11:38:37 vidar.pacman.store systemd[1]: ipfs@ipfs.service holdoff time over, scheduling restart.
Mai 04 11:38:37 vidar.pacman.store systemd[1]: Stopped InterPlanetary File System (IPFS) daemon.
Mai 04 11:38:37 vidar.pacman.store systemd[1]: Starting InterPlanetary File System (IPFS) daemon...
Mai 04 11:38:37 vidar.pacman.store ipfs[3210]: Initializing daemon...
Mai 04 11:38:37 vidar.pacman.store ipfs[3210]: go-ipfs version: 0.6.0-dev
Mai 04 11:38:37 vidar.pacman.store ipfs[3210]: Repo version: 9
Mai 04 11:38:37 vidar.pacman.store ipfs[3210]: System version: amd64/linux
Mai 04 11:38:37 vidar.pacman.store ipfs[3210]: Golang version: go1.14.2
Mai 04 11:40:07 vidar.pacman.store systemd[1]: ipfs@ipfs.service start operation timed out. Terminating.
Mai 04 11:41:37 vidar.pacman.store systemd[1]: ipfs@ipfs.service stop-sigterm timed out. Killing.
Mai 04 11:41:37 vidar.pacman.store systemd[1]: ipfs@ipfs.service: main process exited, code=killed, status=9/KILL
Mai 04 11:41:37 vidar.pacman.store systemd[1]: Failed to start InterPlanetary File System (IPFS) daemon.
Mai 04 11:41:37 vidar.pacman.store systemd[1]: Unit ipfs@ipfs.service entered failed state.
Mai 04 11:41:37 vidar.pacman.store systemd[1]: ipfs@ipfs.service failed.
Mai 04 11:41:38 vidar.pacman.store systemd[1]: ipfs@ipfs.service holdoff time over, scheduling restart.
Mai 04 11:41:38 vidar.pacman.store systemd[1]: Stopped InterPlanetary File System (IPFS) daemon.
Mai 04 11:41:38 vidar.pacman.store systemd[1]: Starting InterPlanetary File System (IPFS) daemon...
Mai 04 11:41:38 vidar.pacman.store ipfs[3222]: Initializing daemon...
Mai 04 11:41:38 vidar.pacman.store ipfs[3222]: go-ipfs version: 0.6.0-dev
Mai 04 11:41:38 vidar.pacman.store ipfs[3222]: Repo version: 9
Mai 04 11:41:38 vidar.pacman.store ipfs[3222]: System version: amd64/linux
Mai 04 11:41:38 vidar.pacman.store ipfs[3222]: Golang version: go1.14.2
Mai 04 11:43:08 vidar.pacman.store systemd[1]: ipfs@ipfs.service start operation timed out. Terminating.

This VM is has one virtual core of a Intel Xeon CPU E5-2609 v3 @ 1.90GHz, and 2 GB of memory and no other background tasks. OS is CentOS7.

The repo stats:

[ipfs@vidar ~]$ ipfs repo stat --human
NumObjects: 677400
RepoSize:   154 GB
StorageMax: 1.0 TB
RepoPath:   /home/ipfs/.ipfs
Version:    fs-repo@9

I tried to measure the latency, and it isn't really looking bad:

[ipfs@vidar badgerds]$ time dd if=000000.vlog of=/dev/zero bs=4K count=1
1+0 records in
1+0 records out
4096 bytes (4,1 kB) copied, 0,0225068 s, 182 kB/s

real    0m0.085s
user    0m0.003s
sys     0m0.000s
[ipfs@vidar badgerds]$ time dd if=000001.vlog of=/dev/zero bs=4K count=1
1+0 records in
1+0 records out
4096 bytes (4,1 kB) copied, 0,0204767 s, 200 kB/s

real    0m0.023s
user    0m0.000s
sys     0m0.002s
[ipfs@vidar badgerds]$ time dd if=000002.vlog of=/dev/zero bs=4K count=1
1+0 records in
1+0 records out
4096 bytes (4,1 kB) copied, 0,0491106 s, 83,4 kB/s

real    0m0.051s
user    0m0.001s
sys     0m0.001s
[ipfs@vidar badgerds]$ time dd if=000003.vlog of=/dev/zero bs=4K count=1
1+0 records in
1+0 records out
4096 bytes (4,1 kB) copied, 0,00929099 s, 441 kB/s

real    0m0.012s
user    0m0.001s
sys     0m0.001s

This ticket was created from comments following this discussion:

https://github.com/ipfs/go-ipfs/pull/7265/files#discussion_r418645508

@RubenKelevra RubenKelevra added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels May 5, 2020
@Stebalien
Copy link
Member

  • Latency: 10-80ms
  • IOPS: 200.

@Stebalien
Copy link
Member

  • Could you run ipfs repo stat?
  • Can you reproduce this issue with a smaller repo?

I have a machine with 200IOPS for 4kbyte chunks and a 10ms read latency. It starts in 3 seconds and stops in < 2 seconds.

Repo stat:

NumObjects: 586055
RepoSize:   28211263791
StorageMax: 10000000000
RepoPath:   /var/db/ipfs-go/.ipfs
Version:    fs-repo@9

(i.e., ~28GiB of data).

@RubenKelevra
Copy link
Contributor Author

  • Could you run ipfs repo stat?

Well, the repo stats were already included.

  • Can you reproduce this issue with a smaller repo?

I created a new badgerds and just fetched QmdB8kVBeWvLKyZrvxAAzrVfkLZC3zqcu6o7twLAqUcC67 (a bunch of images).

The machine was idle for hours after that.

I initiated a shutdown, and IPFS started to do disk reading, while nothing was written for minutes (my timeout is raised to 15 minutes).

I created a bug report for this, as well - not sure if this is related or not: #7283

And yes, the startup is extremely slow after this hard shutdown.

I killed the process with SIGABRT to get a stack trace of this as well:

stacktrace_startup_2020_05_06.txt

The repo stat of ipfs with the one pin:

time ipfs repo stat 
NumObjects: 148051
RepoSize:   38152146926
StorageMax: 1000000000000
RepoPath:   /home/ipfs/.ipfs
Version:    fs-repo@9

real    0m0.890s
user    0m0.022s
sys     0m0.041s

And the startup log - first timed out, and then started within one second:

ipfs_log_2020_05_06.txt

@Stebalien Stebalien added exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue P2 Medium: Good to have, but can wait until someone steps up status/ready Ready to be worked topic/badger Topic badger topic/daemon + init topic/repo Topic repo labels May 22, 2020
@Stebalien
Copy link
Member

The "fix" for this issue is the same as the fix for #7269. When opening the datastore, we should repeatedly extend the startup timeout.

@Stebalien Stebalien removed the need/triage Needs initial labeling and prioritization label May 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue kind/bug A bug in existing code (including security flaws) P2 Medium: Good to have, but can wait until someone steps up status/ready Ready to be worked topic/badger Topic badger topic/daemon + init topic/repo Topic repo
Projects
None yet
Development

No branches or pull requests

2 participants