Skip to content

Dgraph do not respond to any requests when it logs " LOG Compact" when I perform online mutation or liveload #4960

@JimWen

Description

@JimWen

What version of Dgraph are you using?

Dgraph version : v1.2.1
Dgraph SHA-256 : 3f18ff84570b2944f4d75f6f508d55d902715c7ca2310799cc2991064eb046f8
Commit SHA-1 : ddcda92
Commit timestamp : 2020-02-06 15:31:05 -0800
Branch : HEAD
Go version : go1.13.5

Have you tried reproducing the issue with the latest release?

With v1.2.1

What is the hardware spec (RAM, OS)?

128G mem & 1.8T SSD

Linux version 3.10.0-1062.9.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Dec 6 15:49:49 UTC 2019

Steps to reproduce the issue (command/config used to run Dgraph).

3 nodes cluster

zero rebalance_interval is set here to disable auto rebalance
nohup $dgpath zero --my=$local_ip:$port -o $offset --idx $index --replicas $replica --config $cfpath --cwd zero$index --rebalance_interval 12000h > ./log/zero$index.log 2>&1 &

alpha
nohup $dgpath alpha --snapshot_after=1000 --abort_older_than 15m --my=$local_ip:$port -o $offset --zero=$zero_1:5080 --config $cfpath --cwd alpha$index > ./log/alpha$index.log 2>&1 &

mutation through api or liveload with about 10 billion nquards
comand line is like :

./bin/dgraph live -b 1000 -c 20 -f ./data/2020-03-01 --format rdf -a "172.21.46.35:9080,172.21.46.39:9080,172.21.46.40:9080"

Expected behaviour and actual result.

The server works fine at first, and then it is in a suspended state for a long time , then it resumes, and then suspends like this...
The log is as followings:
I0318 12:14:37.877077 272392 log.go:34] Rolling up Sent 4905402 keys
I0318 12:14:37.877101 272392 draft.go:1102] Rolled up 4905396 keys. Done
I0318 12:14:37.877110 272392 draft.go:445] List rollup at Ts 10799123: OK.
I0318 12:14:37.878363 272392 predicate_move.go:195] Was instructed to delete tablet: dgraph.password
I0318 12:14:37.878618 272392 index.go:1083] Dropping predicate: [dgraph.password]
I0318 12:14:37.878632 272392 log.go:34] Writes flushed. Stopping compactions now...
E0318 12:15:07.114528 272392 log.go:32] unable to process discardstats with error: failed to push discard stats to write channel: Writes are blocked, possibly due to DropAll or Close
I0318 12:15:07.293204 272392 log.go:34] LOG Compact 1->2, del 9 tables, add 9 tables, took 41.192207806s
I0318 12:15:07.293262 272392 log.go:34] Compaction for level: 1 DONE
I0318 12:15:07.293311 272392 log.go:34] Got compaction priority: {level:0 score:1 dropPrefix:[]}
I0318 12:15:07.293339 272392 log.go:34] Running for level: 0
I0318 12:15:24.745589 272392 log.go:34] LOG Compact 0->1, del 9 tables, add 4 tables, took 17.45222122s
I0318 12:15:24.745642 272392 log.go:34] Compaction for level: 0 DONE
I0318 12:15:24.802984 272392 log.go:34] Got compaction priority: {level:0 score:1.74 dropPrefix:[0 0 15 100 103 114 97 112 104 46 112 97 115 115 119 111 114 100]}
I0318 12:15:24.803061 272392 log.go:34] Running for level: 0
I0318 12:15:39.769220 272392 draft.go:1353] Skipping snapshot at index: 10672779. Insufficient discard entries: 1. MinPendingStartTs: 10825054
I0318 12:15:41.545892 272392 log.go:34] LOG Compact 0->1, del 6 tables, add 4 tables, took 16.742809109s
I0318 12:15:41.545935 272392 log.go:34] Compaction for level: 0 DONE
I0318 12:15:43.236355 272392 log.go:34] LOG Compact 1->1, del 1 tables, add 1 tables, took 1.690368584s
I0318 12:15:46.814592 272392 log.go:34] LOG Compact 2->2, del 1 tables, add 1 tables, took 3.578099591s
I0318 12:15:51.135498 272392 log.go:34] LOG Compact 3->3, del 1 tables, add 1 tables, took 4.320559547s
I0318 12:15:51.135522 272392 log.go:34] DropPrefix done
I0318 12:15:51.135540 272392 log.go:34] Resuming writes

It seems to be an old problem, but it is still there, and it totally affetct the online business and performace.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/questionSomething requiring a response.status/needs-attentionThis issue needs more eyes on it, more investigation might be required before accepting/rejecting it

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions