[Bug] Aggregator pod crashes intermittently #58

tabossert · 2024-02-20T21:43:32Z

Kubecost Helm Chart Version

2.0.2

Kubernetes Version

1.27

Kubernetes Platform

AKS

Description

Intermittently the kubecost pod restarts, due to an error in the aggregator pod as seen below

We have tuned resources as much as possible so it doesn't seem to be related to OOM or disk slowness.

Steps to reproduce

Leave kubecost running in cluster, wait to see when it restarts

Expected behavior

Pod would not be restarting

Impact

Our scripts to pull data out fail when this happens

Screenshots

No response

Logs

│ aggregator goroutine 1144450 [chan send]:                                                                                                                                                │
│ aggregator runtime.gopark(0x1?, 0xc06b7ec400?, 0x50?, 0x1?, 0x41?)                                                                                                                       │
│ aggregator     /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc052fd7d80 sp=0xc052fd7d60 pc=0x651cee                                                                                   │
│ aggregator runtime.chansend(0xc007491b60, 0xc052fd7f10, 0x1, 0xc01aa55f20?)                                                                                                              │
│ aggregator     /usr/local/go/src/runtime/chan.go:259 +0x3a5 fp=0xc052fd7df0 sp=0xc052fd7d80 pc=0x61c445                                                                                  │
│ aggregator runtime.chansend1(0x4e38c19?, 0x16?)                                                                                                                                          │
│ aggregator     /usr/local/go/src/runtime/chan.go:145 +0x17 fp=0xc052fd7e20 sp=0xc052fd7df0 pc=0x61c097                                                                                   │
│ aggregator github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(*AllocationDBQueryService).QueryAllocations.func1.1(0xc049aa7b80, {0xc049aa1600, 0xc, 0x10}, {0x80cab80, 0x │
│ aggregator     /app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1351 +0x2ee fp=0xc052fd7f58 sp=0xc052fd7e20 pc=0x2d9af6e                                      │
│ aggregator github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(*AllocationDBQueryService).QueryAllocations.func1.5()                                                       │
│ aggregator     /app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1380 +0x91 fp=0xc052fd7fe0 sp=0xc052fd7f58 pc=0x2d9ac31                                       │
│ aggregator runtime.goexit()                                                                                                                                                              │
│ aggregator     /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc052fd7fe8 sp=0xc052fd7fe0 pc=0x684b81                                                                               │
│ aggregator created by github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(*AllocationDBQueryService).QueryAllocations.func1 in goroutine 796328                            │
│ aggregator     /app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1338 +0x377

Slack discussion

No response

Troubleshooting

I have read and followed the issue guidelines and this is a bug impacting only the Helm chart.
I have searched other issues in this repository and mine is not recorded.

AjayTripathy · 2024-02-20T23:51:58Z

cc @cliffcolvin can you take a look here? Has this been fixed in the upcoming 2.1 rc's?

chipzoller · 2024-02-21T11:29:43Z

Transferred.

cliffcolvin · 2024-02-21T16:46:22Z

We're taking a look right now.

michaelmdresser · 2024-02-21T17:33:24Z

@tabossert do you have any further log context from this crash? About 5 lines after and 15-20 lines preceding would help me here.

tabossert · 2024-02-21T17:38:22Z

`goroutine 2513173 [runnable]:
runtime.cgocall(0x3225fb0, 0xc0330cb230)
/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0330cb208 sp=0xc0330cb1d0 pc=0x61adcb
github.com/marcboeker/go-duckdb._Cfunc_duckdb_execute_pending(0x7f641502d250, 0xc073047f80)
_cgo_gotypes.go:1180 +0x4b fp=0xc0330cb230 sp=0xc0330cb208 pc=0x2d51f4b
github.com/marcboeker/go-duckdb.(*stmt).execute.func7(0x0?, 0x80cab01?)
/go/pkg/mod/github.com/marcboeker/go-duckdb@v1.5.5/statement.go:225 +0x65 fp=0xc0330cb270 sp=0xc0330cb230 pc=0x2d5d085
github.com/marcboeker/go-duckdb.(*stmt).execute(0xc05877ffb0, {0x5b5dcd8, 0xc088c8d830}, {0x80cab80?, 0x8?, 0x7f64e8088060?})
/go/pkg/mod/github.com/marcboeker/go-duckdb@v1.5.5/statement.go:225 +0x248 fp=0xc0330cb320 sp=0xc0330cb270 pc=0x2d5cca8
github.com/marcboeker/go-duckdb.(*stmt).QueryContext(0xc05877ffb0, {0x5b5dcd8?, 0xc088c8d830?}, {0x80cab80?, 0x0?, 0x176?})
/go/pkg/mod/github.com/marcboeker/go-duckdb@v1.5.5/statement.go:175 +0x34 fp=0xc0330cb398 sp=0xc0330cb320 pc=0x2d5c994
github.com/marcboeker/go-duckdb.(*conn).QueryContext(0xc059b77860, {0x5b5dcd8, 0xc088c8d830}, {0xc084129000, 0x18a}, {0x80cab80, 0x0, 0x0})
/go/pkg/mod/github.com/marcboeker/go-duckdb@v1.5.5/connection.go:96 +0x30a fp=0xc0330cb468 sp=0xc0330cb398 pc=0x2d53cca
database/sql.ctxDriverQuery({0x5b5dcd8?, 0xc088c8d830?}, {0x7f64ec70f130?, 0xc059b77860?}, {0x0?, 0x0?}, {0xc084129000?, 0xc084129000?}, {0x80cab80, 0x0, ...})
/usr/local/go/src/database/sql/ctxutil.go:48 +0xd7 fp=0xc0330cb4f0 sp=0xc0330cb468 pc=0x16eb8d7
database/sql.(*DB).queryDC.func1()
/usr/local/go/src/database/sql/sql.go:1748 +0x165 fp=0xc0330cb5b0 sp=0xc0330cb4f0 pc=0x16f3b65
database/sql.withLock({0x5b44ec8, 0xc0723075f0}, 0xc0330cb708)
/usr/local/go/src/database/sql/sql.go:3502 +0x82 fp=0xc0330cb5f0 sp=0xc0330cb5b0 pc=0x16fb6c2
database/sql.(*DB).queryDC(0x1?, {0x5b5dcd8?, 0xc088c8d830}, {0x0, 0x0}, 0xc0723075f0, 0xc05589dd50, {0xc084129000, 0x18a}, {0x0, ...})
/usr/local/go/src/database/sql/sql.go:1743 +0x209 fp=0xc0330cb798 sp=0xc0330cb5f0 pc=0x16f34e9
database/sql.(*DB).query(0x0?, {0x5b5dcd8, 0xc088c8d830}, {0xc084129000, 0x18a}, {0x0, 0x0, 0x0}, 0x80?)
/usr/local/go/src/database/sql/sql.go:1726 +0xfc fp=0xc0330cb818 sp=0xc0330cb798 pc=0x16f325c
database/sql.(*DB).QueryContext.func1(0x80?)
/usr/local/go/src/database/sql/sql.go:1704 +0x4f fp=0xc0330cb880 sp=0xc0330cb818 pc=0x16f304f
database/sql.(*DB).retry(0x62bdc8?, 0xc0330cb8f0)
/usr/local/go/src/database/sql/sql.go:1538 +0x42 fp=0xc0330cb8c8 sp=0xc0330cb880 pc=0x16f1842
database/sql.(*DB).QueryContext(0x0?, {0x5b5dcd8?, 0xc088c8d830?}, {0xc084129000?, 0x0?}, {0x0?, 0x5b5dcd8?, 0xc088c8d830?})
/usr/local/go/src/database/sql/sql.go:1703 +0xc5 fp=0xc0330cb958 sp=0xc0330cb8c8 pc=0x16f2f65
github.com/uptrace/bun.(*SelectQuery).Rows(0xc084a30000, {0x5b5dcd8, 0xc088c8d830})
/go/pkg/mod/github.com/uptrace/bun@v1.1.16/query_select.go:818 +0x1a8 fp=0xc0330cba18 sp=0xc0330cb958 pc=0x2ce6668
github.com/kubecost/kubecost-cost-model/pkg/duckdb/internal/db.GetLabelsAnnotations({0x5b2a9a0, 0xc000d7cd40}, {0xc0502a0f80, 0x7, 0x8}, {0x80cab80, 0x0, 0x0}, {0xc055890f40, 0x1d}, ...)
/app/kubecost-cost-model/pkg/duckdb/internal/db/common.go:124 +0x7db fp=0xc0330cbe20 sp=0xc0330cba18 pc=0x2d69fbb
github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(*AllocationDBQueryService).QueryAllocations.func1.1(0xc022a89ce0, {0xc02c037000, 0xf, 0x10}, {0x80cab80, 0x0, 0x0}, {0xc0502a0f80, 0x7, 0x8}, ...)
/app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1358 +0x4bd fp=0xc0330cbf58 sp=0xc0330cbe20 pc=0x2d9b13d
github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(*AllocationDBQueryService).QueryAllocations.func1.5()
/app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1380 +0x91 fp=0xc0330cbfe0 sp=0xc0330cbf58 pc=0x2d9ac31
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0330cbfe8 sp=0xc0330cbfe0 pc=0x684b81
created by github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(*AllocationDBQueryService).QueryAllocations.func1 in goroutine 2165198
/app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1338 +0x377

goroutine 2513177 [sync.Mutex.Lock]:
runtime.gopark(0x2d51d7f?, 0x32253b0?, 0x98?, 0x96?, 0xc021ef9698?)
/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc021ef9668 sp=0xc021ef9648 pc=0x651cee
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:404
runtime.semacquire1(0xc004bda6a4, 0x0?, 0x3, 0x1, 0x60?)
/usr/local/go/src/runtime/sema.go:160 +0x218 fp=0xc021ef96d0 sp=0xc021ef9668 pc=0x6631b8
sync.runtime_SemacquireMutex(0xc021ef9748?, 0xd1?, 0x18?)
/usr/local/go/src/runtime/sema.go:77 +0x25 fp=0xc021ef9708 sp=0xc021ef96d0 pc=0x680b25
sync.(*Mutex).lockSlow(0xc004bda6a0)
/usr/local/go/src/sync/mutex.go:171 +0x15d fp=0xc021ef9758 sp=0xc021ef9708 pc=0x68fd1d
sync.(*Mutex).Lock(...)
/usr/local/go/src/sync/mutex.go:90
database/sql.(*driverConn).finalClose(0xc0512c0ab0)
/usr/local/go/src/database/sql/sql.go:648 +0x133 fp=0xc021ef9800 sp=0xc021ef9758 pc=0x16ed4d3
database/sql.finalCloser.finalClose-fm()
:1 +0x25 fp=0xc021ef9818 sp=0xc021ef9800 pc=0x16fcb45
database/sql.(*driverConn).Close(0xc0512c0ab0)
/usr/local/go/src/database/sql/sql.go:623 +0x146 fp=0xc021ef9860 sp=0xc021ef9818 pc=0x16ed366
database/sql.(*DB).putConn(0xc004bda680, 0xc0512c0ab0, {0x0, 0x0}, 0x0?)
/usr/local/go/src/database/sql/sql.go:1484 +0x2d6 fp=0xc021ef98d0 sp=0xc021ef9860 pc=0x16f1476
database/sql.(*driverConn).releaseConn(...)
/usr/local/go/src/database/sql/sql.go:527
database/sql.(*driverConn).releaseConn-fm({0x0?, 0x0?})
:1 +0x3e fp=0xc021ef9908 sp=0xc021ef98d0 pc=0x16fca5e
database/sql.(*Rows).close(0xc03900f950, {0x0, 0x0})
/usr/local/go/src/database/sql/sql.go:3396 +0x1c7 fp=0xc021ef9998 sp=0xc021ef9908 pc=0x16fae27
database/sql.(*Rows).Close(0x5b38d00?)
/usr/local/go/src/database/sql/sql.go:3367 +0x26 fp=0xc021ef99c8 sp=0xc021ef9998 pc=0x16fac46
database/sql.(*Rows).Next(0xc03900f950)
/usr/local/go/src/database/sql/sql.go:2997 +0x96 fp=0xc021ef9a18 sp=0xc021ef99c8 pc=0x16f91b6
github.com/kubecost/kubecost-cost-model/pkg/duckdb/internal/db.GetLabelsAnnotations({0x5b2a9a0, 0xc000d7cd40}, {0xc0502a1780, 0x7, 0x8}, {0x80cab80, 0x0, 0x0}, {0xc02d237a40, 0x1d}, ...)
/app/kubecost-cost-model/pkg/duckdb/internal/db/common.go:136 +0x989 fp=0xc021ef9e20 sp=0xc021ef9a18 pc=0x2d6a169
github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(*AllocationDBQueryService).QueryAllocations.func1.1(0xc022a942c0, {0xc02c037400, 0xf, 0x10}, {0x80cab80, 0x0, 0x0}, {0xc0502a1780, 0x7, 0x8}, ...)
/app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1358 +0x4bd fp=0xc021ef9f58 sp=0xc021ef9e20 pc=0x2d9b13d
github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(*AllocationDBQueryService).QueryAllocations.func1.5()
/app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1380 +0x91 fp=0xc021ef9fe0 sp=0xc021ef9f58 pc=0x2d9ac31
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc021ef9fe8 sp=0xc021ef9fe0 pc=0x684b81
created by github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(*AllocationDBQueryService).QueryAllocations.func1 in goroutine 2165198
/app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1338 +0x377

goroutine 2513180 [runnable]:
runtime.cgocall(0x3225990, 0xc0353cd848)
/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0353cd820 sp=0xc0353cd7e8 pc=0x61adcb
github.com/marcboeker/go-duckdb._Cfunc_duckdb_result_get_chunk({0x2, 0x0, 0x0, 0x0, 0x0, 0x7f64d4db9a60}, 0x0)
_cgo_gotypes.go:1465 +0x4c fp=0xc0353cd848 sp=0xc0353cd820 pc=0x2d52c0c
github.com/marcboeker/go-duckdb.(*rows).Next.func2(0x666c69?)
/go/pkg/mod/github.com/marcboeker/go-duckdb@v1.5.5/rows.go:63 +0x87 fp=0xc0353cd8d0 sp=0xc0353cd848 pc=0x2d56547
github.com/marcboeker/go-duckdb.(*rows).Next(0xc074e44f00, {0xc087be4b20, 0x2, 0x4c3a640?})
/go/pkg/mod/github.com/marcboeker/go-duckdb@v1.5.5/rows.go:63 +0x79 fp=0xc0353cd900 sp=0xc0353cd8d0 pc=0x2d562b9
database/sql.(*Rows).nextLocked(0xc07e6745a0)
/usr/local/go/src/database/sql/sql.go:3019 +0x107 fp=0xc0353cd960 sp=0xc0353cd900 pc=0x16f9367
database/sql.(*Rows).Next.func1()
/usr/local/go/src/database/sql/sql.go:2994 +0x29 fp=0xc0353cd988 sp=0xc0353cd960 pc=0x16f9229
database/sql.withLock({0x5b38d00, 0xc07e6745d8}, 0xc0353cd9e8)
/usr/local/go/src/database/sql/sql.go:3502 +0x82 fp=0xc0353cd9c8 sp=0xc0353cd988 pc=0x16fb6c2
database/sql.(*Rows).Next(0xc07e6745a0)
/usr/local/go/src/database/sql/sql.go:2993 +0x85 fp=0xc0353cda18 sp=0xc0353cd9c8 pc=0x16f91a5
github.com/kubecost/kubecost-cost-model/pkg/duckdb/internal/db.GetLabelsAnnotations({0x5b2a9a0, 0xc000d7cd40}, {0xc0502a1c00, 0x7, 0x8}, {0x80cab80, 0x0, 0x0}, {0xc056c9ff00, 0x1d}, ...)
/app/kubecost-cost-model/pkg/duckdb/internal/db/common.go:136 +0x989 fp=0xc0353cde20 sp=0xc0353cda18 pc=0x2d6a169
github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(*AllocationDBQueryService).QueryAllocations.func1.1(0xc022a946e0, {0xc02c037700, 0xf, 0x10}, {0x80cab80, 0x0, 0x0}, {0xc0502a1c00, 0x7, 0x8}, ...)
/app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1358 +0x4bd fp=0xc0353cdf58 sp=0xc0353cde20 pc=0x2d9b13d
github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(*AllocationDBQueryService).QueryAllocations.func1.5()
/app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1380 +0x91 fp=0xc0353cdfe0 sp=0xc0353cdf58 pc=0x2d9ac31
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0353cdfe8 sp=0xc0353cdfe0 pc=0x684b81
created by github.com/kubecost/kubecost-cost-model/pkg/duckdb/allocation/db.(*AllocationDBQueryService).QueryAllocations.func1 in goroutine 2165198
/app/kubecost-cost-model/pkg/duckdb/allocation/db/allocationqueryservice.go:1338 +0x377

rax 0x0
rbx 0x7f64f498a640
rcx 0x7f653bdba9fc
rdx 0x6
rdi 0x1
rsi 0x7
rbp 0x7
rsp 0x7f64f4987710
r8 0x7f64f49877e0
r9 0x7fffffff
r10 0x8
r11 0x246
r12 0x6
r13 0x16
r14 0xc0000069c0
r15 0x4
rip 0x7f653bdba9fc
rflags 0x246
cs 0x33
fs 0x0
gs 0x0`

michaelmdresser · 2024-02-21T17:42:52Z

@tabossert That's helpful, thank you for the quick response. I'm looking for the first instance of the goroutine ... string in the logs, 15-20 lines preceding that, and the stack trace attached to that specific goroutine. In Go, the stack trace for every goroutine is printed on a panic like this, but the offending goroutine's trace is printed first which is why I'm asking for that, plus the log context that lead us to that trace.

If you'd like, I can make it easier on you -- you can share the log file with me privately via email: michael@kubecost.com

michaelmdresser · 2024-02-21T17:53:57Z

To clarify: I need more log context to understand what's going wrong here. Please either share a full log file or share the requested first trace + surrounding context I mentioned above.

tabossert · 2024-02-21T18:03:36Z

Email sent with full log @michaelmdresser

michaelmdresser · 2024-02-21T18:30:31Z

Thank you @tabossert. I have a pretty strong theory about what's going wrong here -- there are a few different resolution paths if this is what I think it is.

If you are willing to try a pre-production release, please upgrade to Kubecost v2.1.0-rc.6 or v2.1.0 when it is released, which is imminent. I am fairly certain that you are experiencing an issue which has been fixed in v2.1.

Otherwise, if you would like to stay on v2.0.2:

If this crash happens once per day, within a few hours of UTC midnight, disable the Forecasting Pod using the Helm value forecasting.enabled=false
If this crash coincides with the run of your "scripts to pull data out" and if those scripts make a call to /model/allocation, please set the query parameter includeAggregatedMetadata=false. Also, if these queries have no aggregate parameter (or a high-cardinality one like aggregate=pod), I recommend using the limit and offset query parameters to paginate the response, e.g. limit=100&offset=0 -> limit=100&offset=100 -> limit=100&offset=200.

tabossert · 2024-02-21T19:51:23Z

Thanks, we will try those workarounds until the v2.1.0 is released. Thanks for the quick response!

tabossert · 2024-02-22T01:18:11Z

I tried upgrading to 2.1.0-rc6, but it wasn't seeming to load the data, so not sure if I missed something, I went to go back to 2.0.2 but now it gives me this error │ 2024-02-22T01:16:58.356391917Z ERR error doing initial open of DB: error opening db at path /var/configs/waterfowl/duckdb/v0_9_2/kubecost.duckdb.write: migrating up: no migration found for version 20240212233831: read down for version 20240212233831 migrations: file does not exist │ │ panic: runtime error: invalid memory address or nil pointer dereference │ │ [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x16ee895] │ │ │ │ goroutine 27 [running]: │ │ database/sql.(*DB).Close(0x0) │ │ /usr/local/go/src/database/sql/sql.go:877 +0x35 │ │ github.com/kubecost/kubecost-cost-model/pkg/duckdb/write.startIngestor(0xc0009ccba0, 0xc000f0f4b0?) │ │ /app/kubecost-cost-model/pkg/duckdb/write/writer.go:234 +0x28 │ │ github.com/kubecost/kubecost-cost-model/pkg/duckdb/write.NewWriter.func5({0x47a00a0?, 0xc0001feba0?}, 0x1?) │ │ /app/kubecost-cost-model/pkg/duckdb/write/writer.go:125 +0x1b │ │ github.com/looplab/fsm.(*FSM).enterStateCallbacks(0xc000f12000, {0x5b5dd10, 0xc0000da5f0}, 0xc0001feba0?) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:470 +0x82 │ │ github.com/looplab/fsm.(*FSM).Event.(*FSM).Event.func2.func3() │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:363 +0x150 │ │ github.com/looplab/fsm.transitionerStruct.transition(...) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:422 │ │ github.com/looplab/fsm.(*FSM).doTransition(...) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:407 │ │ github.com/looplab/fsm.(*FSM).Event(0xc000f12000, {0x5b5d8e8, 0x80cab80}, {0x4e18562, 0xd}, {0x0, 0x0, 0x0}) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:390 +0x884 │ │ github.com/kubecost/kubecost-cost-model/pkg/duckdb/write.NewWriter(0xc0012b70a0, {0xc00128de40, 0x3a}, {0xc00128df40, 0x39}) │ │ /app/kubecost-cost-model/pkg/duckdb/write/writer.go:180 +0x6ef │ │ github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.createWriter(0xc0012b7040) │ │ /app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:398 +0x33 │ │ github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.NewOrchestrator.func7({0x47a00a0?, 0xc0013ba3f0?}, 0xc000f08000) │ │ /app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:212 +0x25 │ │ github.com/looplab/fsm.(*FSM).enterStateCallbacks(0xc0013bc500, {0x5b5dd10, 0xc0000da500}, 0xc0013ba3f0?) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:470 +0x82 │ │ github.com/looplab/fsm.(*FSM).Event.(*FSM).Event.func2.func3() │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:363 +0x150 │ │ github.com/looplab/fsm.transitionerStruct.transition(...) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:422 │ │ github.com/looplab/fsm.(*FSM).doTransition(...) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:407 │ │ github.com/looplab/fsm.(*FSM).Event(0xc0013bc500, {0x5b5d8e8, 0x80cab80}, {0x4e4a44e, 0x1b}, {0x0, 0x0, 0x0}) │ │ /go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:390 +0x884 │ │ github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.NewOrchestrator.func6.1() │ │ /app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:204 +0x3e │ │ created by github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.NewOrchestrator.func6 in goroutine 1 │ │ /app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:203 +0x505

tabossert · 2024-02-22T01:50:58Z

Actually just did an upgrade to 2.1.0 that was just released and that seems to be loading, will report back if the crashes stop

michaelmdresser · 2024-02-22T17:35:51Z

Thanks for the update and sorry for the confusion about the back-and-forth upgrade. Please let us know if you run into trouble with 2.1.0.

tabossert · 2024-03-18T18:08:31Z

Issue seems to be resolved, thanks!

wiktor2200 · 2024-04-22T09:39:32Z

Hello everyone! @michaelmdresser I experience the similar issue on GKE cluster in version 2.2.2 so it seems to be back.
It works and suddenly it stopped working.
Here's the full go trace:

INF Starting Kubecost Aggregator version kcm-c630c42588_core-c3cb2218df_oc-088f891d8e (c630c425)                                                                                
INF NAMESPACE: kubecost                                                                                                                                                         
ERR error doing initial open of DB: error opening db at path /var/configs/waterfowl/duckdb/v0_9_2/kubecost.duckdb.write: setting up migrations: opening '/var/configs/waterfowl/d
panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                                        
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x17774f5]                                                                                                                                       
                                                                                                                                                                                                               
goroutine 22 [running]:                                                                                                                                                                                        
database/sql.(*DB).Close(0x0)                                                                                                                                                                                  
    /usr/local/go/src/database/sql/sql.go:910 +0x35                                                                                                                                                            
github.com/kubecost/kubecost-cost-model/pkg/duckdb/write.startIngestor(0xc001f93d40, 0xc000afe060)                                                                                                             
    /app/kubecost-cost-model/pkg/duckdb/write/writer.go:342 +0x28                                                                                                                                              
github.com/kubecost/kubecost-cost-model/pkg/duckdb/write.NewWriter.func5({0x461e2c0?, 0xc0014a0a20?}, 0xc0016d1208?)                                                                                           
    /app/kubecost-cost-model/pkg/duckdb/write/writer.go:188 +0x1b                                                                                                                                              
github.com/looplab/fsm.(*FSM).enterStateCallbacks(0xc0014a7c00, {0x63c1568, 0xc003e16190}, 0xc000be00e0)                                                                                                       
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:470 +0x82                                                                                                                                            
github.com/looplab/fsm.(*FSM).Event.(*FSM).Event.func2.func3()                                                                                                                                                 
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:363 +0x150                                                                                                                                           
github.com/looplab/fsm.transitionerStruct.transition(...)                                                                                                                                                      
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:422                                                                                                                                                  
github.com/looplab/fsm.(*FSM).doTransition(...)                                                                                                                                                                
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:407                                                                                                                                                  
github.com/looplab/fsm.(*FSM).Event(0xc0014a7c00, {0x63c10f8, 0x8763380}, {0x4c3e1c7, 0x15}, {0x0, 0x0, 0x0})                                                                                                  
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:390 +0x80a                                                                                                                                           
github.com/kubecost/kubecost-cost-model/pkg/duckdb/write.NewWriter.func7({0x461e2c0?, 0xc0014a0a20?}, 0xc000be0070)                                                                                            
    /app/kubecost-cost-model/pkg/duckdb/write/writer.go:202 +0x11a                                                                                                                                             
github.com/looplab/fsm.(*FSM).enterStateCallbacks(0xc0014a7c00, {0x63c1568, 0xc003e160a0}, 0xc000be0070)                                                                                                       
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:470 +0x82                                                                                                                                            
github.com/looplab/fsm.(*FSM).Event.(*FSM).Event.func2.func3()                                                                                                                                                 
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:363 +0x150                                                                                                                                           
github.com/looplab/fsm.transitionerStruct.transition(...)                                                                                                                                                      
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:422                                                                                                                                                  
github.com/looplab/fsm.(*FSM).doTransition(...)                                                                                                                                                                
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:407                                                                                                                                                  
github.com/looplab/fsm.(*FSM).Event(0xc0014a7c00, {0x63c10f8, 0x8763380}, {0x4c22c8b, 0xd}, {0x0, 0x0, 0x0})                                                                                                   
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:390 +0x80a                                                                                                                                           
github.com/kubecost/kubecost-cost-model/pkg/duckdb/write.NewWriter(0xc000afe060, {0xc003aa81c0, 0x3a}, {0xc003aa82c0, 0x39})                                                                                   
    /app/kubecost-cost-model/pkg/duckdb/write/writer.go:258 +0x7be                                                                                                                                             
github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.createWriter(0xc000afe000)                                                                                                                     
    /app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:400 +0x33                                                                                                                                 
github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.NewOrchestrator.func7({0x461e2c0?, 0xc003a8c510?}, 0xc000adf420)                                                    
    /app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:213 +0x25                                                                                                      
github.com/looplab/fsm.(*FSM).enterStateCallbacks(0xc003a98d80, {0x63c1568, 0xc0016c8050}, 0xc000adf420)                                                                            
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:470 +0x82                                                                                                                 
github.com/looplab/fsm.(*FSM).Event.(*FSM).Event.func2.func3()                                                                                                                      
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:363 +0x150                                                                                                                
github.com/looplab/fsm.transitionerStruct.transition(...)                                                                                                                           
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:422                                                                                                                       
github.com/looplab/fsm.(*FSM).doTransition(...)                                                                                                                                     
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:407                                                                                                                       
github.com/looplab/fsm.(*FSM).Event(0xc003a98d80, {0x63c10f8, 0x8763380}, {0x4c563b8, 0x1b}, {0x0, 0x0, 0x0})                                                                       
    /root/go/pkg/mod/github.com/looplab/fsm@v1.0.1/fsm.go:390 +0x80a                                                                                                                
github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.NewOrchestrator.func6.1()                                                                                           
    /app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:205 +0x3e                                                                                                      
created by github.com/kubecost/kubecost-cost-model/pkg/duckdb/orchestrator.NewOrchestrator.func6 in goroutine 1                                                                     
    /app/kubecost-cost-model/pkg/duckdb/orchestrator/orchestrator.go:204 +0x4e8

wiktor2200 · 2024-04-22T11:40:16Z

I have resolved the issue above with according to this message: #72

tabossert added bug Something isn't working needs-triage A label added by default to all issues indicating it needs to be curated and triaged internally. labels Feb 20, 2024

chipzoller transferred this issue from kubecost/cost-analyzer-helm-chart Feb 21, 2024

tabossert closed this as completed Mar 18, 2024

DerekTBrown mentioned this issue Jun 3, 2024

[Bug] Duckdb corruption after 2.2.5 upgrade #103

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Aggregator pod crashes intermittently #58

[Bug] Aggregator pod crashes intermittently #58

tabossert commented Feb 20, 2024

AjayTripathy commented Feb 20, 2024

chipzoller commented Feb 21, 2024

cliffcolvin commented Feb 21, 2024

michaelmdresser commented Feb 21, 2024

tabossert commented Feb 21, 2024

michaelmdresser commented Feb 21, 2024

michaelmdresser commented Feb 21, 2024

tabossert commented Feb 21, 2024

michaelmdresser commented Feb 21, 2024

tabossert commented Feb 21, 2024

tabossert commented Feb 22, 2024

tabossert commented Feb 22, 2024

michaelmdresser commented Feb 22, 2024

tabossert commented Mar 18, 2024

wiktor2200 commented Apr 22, 2024

wiktor2200 commented Apr 22, 2024

[Bug] Aggregator pod crashes intermittently #58

[Bug] Aggregator pod crashes intermittently #58

Comments

tabossert commented Feb 20, 2024

Kubecost Helm Chart Version

Kubernetes Version

Kubernetes Platform

Description

Steps to reproduce

Expected behavior

Impact

Screenshots

Logs

Slack discussion

Troubleshooting

AjayTripathy commented Feb 20, 2024

chipzoller commented Feb 21, 2024

cliffcolvin commented Feb 21, 2024

michaelmdresser commented Feb 21, 2024

tabossert commented Feb 21, 2024

michaelmdresser commented Feb 21, 2024

michaelmdresser commented Feb 21, 2024

tabossert commented Feb 21, 2024

michaelmdresser commented Feb 21, 2024

tabossert commented Feb 21, 2024

tabossert commented Feb 22, 2024

tabossert commented Feb 22, 2024

michaelmdresser commented Feb 22, 2024

tabossert commented Mar 18, 2024

wiktor2200 commented Apr 22, 2024

wiktor2200 commented Apr 22, 2024