Skip to content

Bulkload OOM when loading big dataset #5574

@JimWen

Description

@JimWen

What version of Dgraph are you using?

Dgraph version : v20.03.3
Dgraph SHA-256 : 08424035910be6b6720570427948bab8352a0b5a6d59a0d20c3ec5ed29533121
Commit SHA-1 : fa3c191
Commit timestamp : 2020-06-02 16:47:25 -0700
Branch : HEAD
Go version : go1.14.1

Have you tried reproducing the issue with the latest release?

yes

What is the hardware spec (RAM, OS)?

128G mem & 1.8T SSD

Linux version 3.10.0-1062.9.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Dec 6 15:49:49 UTC 2019

Steps to reproduce the issue (command/config used to run Dgraph).

Bulkload big dataset

Expected behaviour and actual result.

Log is as followings:

[14:50:58+0800] REDUCE 02h56m55s 28.96% edge_count:2.924G edge_speed:879.5k/sec plist_count:1.837G plist_speed:552.4k/sec. Num Encoding: 0
[14:50:59+0800] REDUCE 02h56m56s 28.96% edge_count:2.924G edge_speed:879.3k/sec plist_count:1.837G plist_speed:552.3k/sec. Num Encoding: 0
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x195a44b, 0x16)
        /usr/local/go/src/runtime/panic.go:1114 +0x72
runtime.sysMap(0xddd4000000, 0xc8000000, 0x2834d78)
        /usr/local/go/src/runtime/mem_linux.go:169 +0xc5
runtime.(*mheap).sysAlloc(0x281f840, 0xc5800000, 0x281f848, 0x62b4a)
        /usr/local/go/src/runtime/malloc.go:715 +0x1cd
runtime.(*mheap).grow(0x281f840, 0x62b4a, 0x0)
        /usr/local/go/src/runtime/mheap.go:1286 +0x11c
runtime.(*mheap).allocSpan(0x281f840, 0x62b4a, 0x0, 0x2834d88, 0xfffffffffffffade)
        /usr/local/go/src/runtime/mheap.go:1124 +0x6a0
runtime.(*mheap).alloc.func1()
        /usr/local/go/src/runtime/mheap.go:871 +0x64
runtime.(*mheap).alloc(0x281f840, 0x62b4a, 0xc000000100, 0xc6a8164180)
        /usr/local/go/src/runtime/mheap.go:865 +0x81
runtime.largeAlloc(0xc5694000, 0xc6bd610001, 0x4)
        /usr/local/go/src/runtime/malloc.go:1152 +0x92
runtime.mallocgc.func1()
        /usr/local/go/src/runtime/malloc.go:1047 +0x46
runtime.systemstack(0x0)
        /usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
        /usr/local/go/src/runtime/proc.go:1041

goroutine 6534 [running]:
runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc0006e8bf8 sp=0xc0006e8bf0 pc=0x9b16b0
runtime.mallocgc(0xc5694000, 0x172afa0, 0xc6a8bdfe01, 0x9ad6f0)
        /usr/local/go/src/runtime/malloc.go:1046 +0x895 fp=0xc0006e8c98 sp=0xc0006e8bf8 pc=0x957fb5
runtime.growslice(0x172afa0, 0xdc9429a000, 0x68abeb8, 0x69492aa, 0x697152d, 0xae1a8, 0xc6555, 0xae1a8)
        /usr/local/go/src/runtime/slice.go:181 +0x1e2 fp=0xc0006e8d00 sp=0xc0006e8c98 pc=0x998c52
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).reduce(0xcbc1d060f0, 0xc6e13a8000, 0x4ede, 0x5000, 0xc6e0b50000, 0x7e3, 0x900, 0xc6ba306040)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:364 +0x48f fp=0xc0006e8e50 sp=0xc0006e8d00 pc=0x156394f
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run.func1(0xc00039e700, 0xc6a54e4000, 0x3, 0x64, 0xcbc1d060f0, 0x0, 0xd6a0c87400)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:91 +0x53e fp=0xc0006e8fa8 sp=0xc0006e8e50 pc=0x1568d4e
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc0006e8fb0 sp=0xc0006e8fa8 pc=0x9b37c1
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:67 +0x240

goroutine 1 [select, 55 minutes]:
github.com/dgraph-io/badger/v2/y.(*Throttle).Do(0xc00039e700, 0x19b72c8, 0xc00039e700)
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/y/y.go:267 +0xb9
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run(0xcbc1d060f0, 0xcbc1d060f0, 0x19456b7)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:64 +0x273
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).reduceStage(0xc0007401b0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:299 +0x8e
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.run()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:241 +0xd99
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.init.0.func1(0xc00018e780, 0xc00001e0b0, 0x0, 0xb)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:49 +0x62
github.com/spf13/cobra.(*Command).execute(0xc00018e780, 0xc00001e000, 0xb, 0xb, 0xc00018e780, 0xc00001e000)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830 +0x29d
github.com/spf13/cobra.(*Command).ExecuteC(0x2616600, 0x0, 0x0, 0x0)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x2fb
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:70 +0x80
main.main()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:78 +0xb0

is seems to be caused by loading all the predicate keys into mem at reduce phase

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions