Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CubDB crashing when under heavy load #21

Closed
sezaru opened this issue Jul 2, 2020 · 5 comments
Closed

CubDB crashing when under heavy load #21

sezaru opened this issue Jul 2, 2020 · 5 comments

Comments

@sezaru
Copy link

sezaru commented Jul 2, 2020

Hello, in my system, there is a step where I can do a backfill of some data which triggers a heavy load of data being added to CubDB in sequence.

Whenever I run this, it doesn't take too much time to CubDB crash with one of the following errors:

21:59:21.270 [error] #PID<0.20761.3> 
↳ Task ChartDataStore started from #PID<0.20761.3> terminating
** (ArgumentError) CubDB.Btree.Diff can only be created from Btree sharing the same store
    (cubdb 1.0.0-rc.3) lib/cubdb/btree/diff.ex:23: CubDB.Btree.Diff.new/2
    (cubdb 1.0.0-rc.3) lib/cubdb/catch_up.ex:30: CubDB.CatchUp.catch_up/3
    (cubdb 1.0.0-rc.3) lib/cubdb/catch_up.ex:25: CubDB.CatchUp.run/4
    (elixir 1.10.3) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
    (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Function: &CubDB.CatchUp.run/4
    Args: [#PID<0.1085.0>, %CubDB.Btree{capacity: 32, dirt: 4983, root: {:b, [{#Reference<0.1680950116.2897215489.63666>, 34686206}, {#Reference<0.1680950116.2897215490.52850>, 34724602}, {#Reference<0.1680950116.2897215491.69273>, 34732122}, {#Reference<0.1680950116.2897215492.66431>, 34647029}, {#Reference<0.1680950116.2897215493.49555>, 34580948}, {#Reference<0.1680950116.2897215494.53090>, 34739581}, {#Reference<0.1680950116.2897215495.57143>, 34480389}, {#Reference<0.1680950116.2897215496.51278>, 34594663}, {#Reference<0.1680950116.2897215497.66685>, 34747154}, {#Reference<0.1680950116.2897215498.50918>, 34754668}, {#Reference<0.1680950116.2897215499.47614>, 23960951}, {#Reference<0.1680950116.2897215499.111489>, 34762331}]}, root_loc: 34763693, size: 5083, store: %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/1.compact", pid: #PID<0.11381.0>}}, %CubDB.Btree{capacity: 32, dirt: 5083, root: {:b, [{#Reference<0.1680950116.2897215489.63666>, 37433148}, {#Reference<0.1680950116.2897215490.57710>, 37466355}, {#Reference<0.1680950116.2897215491.71489>, 37490202}, {#Reference<0.1680950116.2897215492.68894>, 37382964}, {#Reference<0.1680950116.2897215493.52301>, 37341219}, {#Reference<0.1680950116.2897215494.55098>, 37449686}, {#Reference<0.1680950116.2897215495.59217>, 37194153}, {#Reference<0.1680950116.2897215496.52447>, 37316715}, {#Reference<0.1680950116.2897215497.67787>, 37474377}, {#Reference<0.1680950116.2897215498.51040>, 37457638}, {#Reference<0.1680950116.2897215499.50688>, 26328362}, {#Reference<0.1680950116.2897215499.114166>, 37482154}]}, root_loc: 37491270, size: 5083, store: %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/0.cub", pid: #PID<0.1086.0>}}, %CubDB.Btree{capacity: 32, dirt: 4957, root: {:b, [{#Reference<0.1680950116.2897215489.63666>, 34770944}, {#Reference<0.1680950116.2897215490.57532>, 34708580}, {#Reference<0.1680950116.2897215491.71430>, 34738840}, {#Reference<0.1680950116.2897215492.88501>, 34661574}, {#Reference<0.1680950116.2897215493.91491>, 34595318}, {#Reference<0.1680950116.2897215494.76776>, 34723961}, {#Reference<0.1680950116.2897215495.55915>, 34478481}, {#Reference<0.1680950116.2897215496.51183>, 34762857}, {#Reference<0.1680950116.2897215497.68116>, 34746498}, {#Reference<0.1680950116.2897215498.53319>, 34731222}, {#Reference<0.1680950116.2897215499.50095>, 24128183}, {#Reference<0.1680950116.2897215499.113642>, 34753867}]}, root_loc: 34772139, size: 5085, store: %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/2.cub", pid: #PID<0.20758.3>}}]
 
21:59:21.270 [error] #PID<0.20761.3> :proc_lib "crash_report/4" "proc_lib.erl" 525 
↳ Process #PID<0.20761.3> terminating
** (ArgumentError) CubDB.Btree.Diff can only be created from Btree sharing the same store
    (cubdb 1.0.0-rc.3) lib/cubdb/btree/diff.ex:23: CubDB.Btree.Diff.new/2
    (cubdb 1.0.0-rc.3) lib/cubdb/catch_up.ex:30: CubDB.CatchUp.catch_up/3
    (cubdb 1.0.0-rc.3) lib/cubdb/catch_up.ex:25: CubDB.CatchUp.run/4
    (elixir 1.10.3) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
    (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Initial Call: CubDB.CatchUp.run/4
Ancestors: [#PID<0.1088.0>, ChartDataStore, Alert.Supervisor, #PID<0.1060.0>]
Message Queue Length: 0
Messages: []
Links: [#PID<0.1088.0>]
Dictionary: ["$callers": [#PID<0.1085.0>]]
Trapping Exits: false
Status: :running
Heap Size: 6772
Stack Size: 28
Reductions: 15611
22:09:55.544 [error] #PID<0.6342.2> 
↳ Task ChartDataStore started from #PID<0.6342.2> terminating
** (stop) exited in: GenServer.call(#PID<0.16818.1>, {:get, #Function<2.122651076/1 in CubDB.Store.CubDB.Store.File.get_node/2>}, 5000)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir 1.10.3) lib/gen_server.ex:1023: GenServer.call/3
    (cubdb 1.0.0-rc.3) lib/cubdb/store/file.ex:90: CubDB.Store.CubDB.Store.File.get_node/2
    (cubdb 1.0.0-rc.3) lib/cubdb/btree.ex:548: anonymous fn/2 in Enumerable.CubDB.Btree.get_children/2
    (elixir 1.10.3) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
    (elixir 1.10.3) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
    (cubdb 1.0.0-rc.3) lib/cubdb/btree.ex:547: Enumerable.CubDB.Btree.get_children/2
    (cubdb 1.0.0-rc.3) lib/cubdb/btree/enumerable.ex:57: CubDB.Btree.Enumerable.next/3
    (cubdb 1.0.0-rc.3) lib/cubdb/btree/enumerable.ex:43: CubDB.Btree.Enumerable.do_reduce/5
    (elixir 1.10.3) lib/enum.ex:3383: Enum.reduce/3
    (cubdb 1.0.0-rc.3) lib/cubdb/btree.ex:88: CubDB.Btree.load/3
    (cubdb 1.0.0-rc.3) lib/cubdb/compactor.ex:24: CubDB.Compactor.run/3
    (elixir 1.10.3) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
    (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Function: &CubDB.Compactor.run/3
    Args: [#PID<0.1072.0>, %CubDB.Btree{capacity: 32, dirt: 1218, root: {:b, [{#Reference<0.97821014.1288437761.154545>, 19692398}, {#Reference<0.97821014.1288437762.185281>, 19542772}, {#Reference<0.97821014.1288437763.195732>, 19660766}, {#Reference<0.97821014.1288437765.160023>, 19630266}, {#Reference<0.97821014.1288437767.159770>, 19668938}, {#Reference<0.97821014.1288437768.195912>, 19685024}, {#Reference<0.97821014.1288437769.192515>, 19700492}]}, root_loc: 19701266, size: 3468, store: %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/8.cub", pid: #PID<0.16818.1>}}, %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/A.compact", pid: #PID<0.6341.2>}]

22:09:55.549 [error] #PID<0.1072.0> :gen_server "error_info/7" "gen_server.erl" 934 
↳ GenServer ChartDataStore terminating
** (MatchError) no match of right hand side value: {:error, :enoent}
    (cubdb 1.0.0-rc.3) lib/cubdb.ex:1111: CubDB.finalize_compaction/1
    (cubdb 1.0.0-rc.3) lib/cubdb.ex:970: CubDB.handle_info/2
    (stdlib 3.13) gen_server.erl:680: :gen_server.try_dispatch/4
    (stdlib 3.13) gen_server.erl:756: :gen_server.handle_msg/6
    (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message: {:catch_up, %CubDB.Btree{capacity: 32, dirt: 98, root: {:b, [{#Reference<0.97821014.1288437761.154545>, 16516069}, {#Reference<0.97821014.1288437762.167599>, 16020197}, {#Reference<0.97821014.1288437763.196716>, 16101484}, {#Reference<0.97821014.1288437765.165635>, 16523000}, {#Reference<0.97821014.1288437766.200918>, 16544685}, {#Reference<0.97821014.1288437768.165598>, 16551612}]}, root_loc: 16552890, size: 3474, store: %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/9.compact", pid: #PID<0.4407.2>}}, %CubDB.Btree{capacity: 32, dirt: 1224, root: {:b, [{#Reference<0.97821014.1288437761.154545>, 19715988}, {#Reference<0.97821014.1288437762.185281>, 19542772}, {#Reference<0.97821014.1288437763.195732>, 19660766}, {#Reference<0.97821014.1288437765.160023>, 19730970}, {#Reference<0.97821014.1288437767.159770>, 19746863}, {#Reference<0.97821014.1288437768.195912>, 19723994}, {#Reference<0.97821014.1288437769.192515>, 19739410}]}, root_loc: 19747804, size: 3474, store: %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/8.cub", pid: #PID<0.16818.1>}}}
State: %CubDB.State{auto_compact: {100, 0.25}, auto_file_sync: true, btree: %CubDB.Btree{capacity: 32, dirt: 1224, root: {:b, [{#Reference<0.97821014.1288437761.154545>, 19715988}, {#Reference<0.97821014.1288437762.185281>, 19542772}, {#Reference<0.97821014.1288437763.195732>, 19660766}, {#Reference<0.97821014.1288437765.160023>, 19730970}, {#Reference<0.97821014.1288437767.159770>, 19746863}, {#Reference<0.97821014.1288437768.195912>, 19723994}, {#Reference<0.97821014.1288437769.192515>, 19739410}]}, root_loc: 19747804, size: 3474, store: %CubDB.Store.File{file_path: ".cubdb/alert/chart_data/8.cub", pid: #PID<0.16818.1>}}, catch_up: #PID<0.6483.2>, clean_up: #PID<0.1074.0>, clean_up_pending: false, compactor: #PID<0.6342.2>, data_dir: ".cubdb/alert/chart_data", readers: %{}, subs: [], task_supervisor: #PID<0.1075.0>}

I didn't test it too much, but disabling compaction seems to fix it.

@lucaong
Copy link
Owner

lucaong commented Jul 25, 2020

Hi @sezaru ,
Thanks for reporting this. Compaction seems indeed to be related to your issue. One question: what version of CubDB are you using? I added a couple of improvements to compaction recently, in particular to avoid crashing the main process if compaction crashes.

It does make sense to stop automatic compaction when backfilling data, and trigger it manually when done (or re-enabling auto compaction), but that said if compaction crashes the main process it should definitely be considered a bug.

@sezaru
Copy link
Author

sezaru commented Jul 26, 2020

I'm using the latest one, 1.0.0-rc.3.

@lucaong
Copy link
Owner

lucaong commented Jul 26, 2020

Ok thanks, I’ll have a look into it. At a glance it looks like the problem is due to two compaction operations running concurrently, but I need to investigate a bit.

@lucaong
Copy link
Owner

lucaong commented Jul 29, 2020

Hi @sezaru ,
Thanks again for reporting this. I found the issue (a subtle race condition during compaction), and #22 should have fixed it, I just merged it and created a new release (1.0.0-rc.4). It would be awesome if you could upgrade and let me know if you still experience the issue after this fix.

After the fix, compaction should not crash in the ways you observed, even during heavy write load. That said, if you can predict when you will have heavy write load (for example during a manual data fill), it's more efficient to temporarily disable auto-compaction, and run a manual compaction after the writes instead.

@lucaong
Copy link
Owner

lucaong commented Jul 30, 2020

I am closing this issue, as I am very confident that #22 fixed it, but feel free to comment if you find that your issue still occurs with the new release, and I will reopen this if necessary.

@lucaong lucaong closed this as completed Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants