Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCB BigQuery notifier is crashing #6580

Closed
devjgm opened this issue May 15, 2021 · 10 comments
Closed

GCB BigQuery notifier is crashing #6580

devjgm opened this issue May 15, 2021 · 10 comments
Labels
type: cleanup An internal cleanup or hygiene concern.

Comments

@devjgm
Copy link
Contributor

devjgm commented May 15, 2021

image

The BigQuery notifiery job (https://github.com/googleapis/google-cloud-cpp/blob/main/ci/cloudbuild/notifiers/bigquery/README.md) is crashing on start up and not working. This is a community notifier that we run to write GCB notifications to a bigquery table. It was working originally, now it's failing. I'm not sure why, and we didn't write it, it's written in golang, so I can't debug it quickly.

I'm disabling this notifier for the time being until we can get it working again. It may be easier to just rewrite this thing using the C++ functions framework.

@devjgm devjgm added the type: cleanup An internal cleanup or hygiene concern. label May 15, 2021
@coryan
Copy link
Member

coryan commented May 15, 2021

s/not crashing/crashing/?

@devjgm devjgm changed the title GCB BigQuery notifier is not crashing GCB BigQuery notifier is crashing May 15, 2021
@devjgm
Copy link
Contributor Author

devjgm commented May 15, 2021

yes, fixed typo.

@devjgm
Copy link
Contributor Author

devjgm commented May 19, 2021

I restarted everything and it seems to be working now. Not sure why.

@devjgm devjgm closed this as completed May 19, 2021
@devjgm
Copy link
Contributor Author

devjgm commented May 21, 2021

This is still happening (or, happening again):

image

I don't know why or what's going on w/ this notifier. But I suspect the ultimate answer is that we should stop using that one and write our own notifier in C++ following the pattern we used for our logs and alerts notifiers.

To be clear, I think what we probably want is:

  • Stop using the current BQ notifier (not ours; it's written in Go)
  • Create a new table in BQ w/ the schema that we want. The schema we want likely matches the "view" we created named cloudbuild.google_cloud_cpp_denorm.
  • Write a C++ notifier to subscribe to the GCB build notifications. It will run in Cloud Run like our other notifiers.
  • It should write each notification to a new row in the table. Since we don't have a C++ BigQuery client library, we may need to see if we can do this w/ a JSON POST or something. I assume that's possible.

@devjgm
Copy link
Contributor Author

devjgm commented May 23, 2021

I found the following golang stack trace in the logs:


2021/05/21 19:16:54 http: panic serving 169.254.8.129:39528: runtime error: invalid memory address or nil pointer dereference
goroutine 49 [running]:
net/http.(*conn).serve.func1(0xc0003c0e60)
	/usr/local/go/src/net/http/server.go:1824 +0x153
panic(0xd78aa0, 0x1592d30)
	/usr/local/go/src/runtime/panic.go:971 +0x499
main.(*bqNotifier).SendNotification(0xc0003d2d50, 0xff2320, 0xc0001cc240, 0xc000231500, 0x2928, 0x3000)
	/go-src/bigquery/main.go:238 +0x846
github.com/GoogleCloudPlatform/cloud-build-notifiers/lib/notifiers.newReceiver.func1(0xff09c0, 0xc000337260, 0xc000460200)
	/go-src/lib/notifiers/notifiers.go:448 +0x938
net/http.HandlerFunc.ServeHTTP(0xc0005e0ea0, 0xff09c0, 0xc000337260, 0xc000460200)
	/usr/local/go/src/net/http/server.go:2069 +0x44
net/http.(*ServeMux).ServeHTTP(0x15af920, 0xff09c0, 0xc000337260, 0xc000460200)
	/usr/local/go/src/net/http/server.go:2448 +0x1ad
net/http.serverHandler.ServeHTTP(0xc000337180, 0xff09c0, 0xc000337260, 0xc000460200)
	/usr/local/go/src/net/http/server.go:2887 +0xa3
net/http.(*conn).serve(0xc0003c0e60, 0xff23c8, 0xc0001cc180)
	/usr/local/go/src/net/http/server.go:1952 +0x8cd
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:3013 +0x39b 

@coryan
Copy link
Member

coryan commented May 25, 2021

FWIW, the script is deploying the "latest" version of the docker image:

https://github.com/googleapis/google-cloud-cpp/blob/main/ci/cloudbuild/notifiers/bigquery/deploy.sh#L38

maybe try a previous version?

gcloud artifacts docker tags list us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery
Listing items under project gcb-release, location us-east1, repository cloud-build-notifiers.

TAG                IMAGE                                                               DIGEST
bigquery-0-latest  us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:d6365e7e3ce443c7032297a80bfd64e1079d46ce06a4e9a895db0ae534fca666
bigquery-0.0.3     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:104e0d8d671b6d2abf23d9968dd4754a727081f6ff82ac8db62e6984cd003fc7
bigquery-0.0.4     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:69f09f5eb6784b0b801265155e9d12b3ce5a9778bf63874e8d1ccae14d2efde6
bigquery-0.1.0     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:254371128cb351b56673a3a27ac710bda6253cccf16c559c5dac8b8c287fc5b2
bigquery-0.2.0     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:fdb016c973b3c902faf7dcb569317ee1969a2df955096cb946f4a698bccf947f
bigquery-0.3.0     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:535c1de970059cb5db135a1eb9bd30f2242b5d336ba686c4837c47cb379bb068
bigquery-0.3.1     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:f64d4675bb1963b7b37925ffe95b9da8dad441c1011086ce98ee13d040c23844
bigquery-0.3.2     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:61115d6951eee6c19e8e9fad5bb5a076ca401a6bd6e0161ff927b52dba3f99a6
bigquery-0.3.3     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:c75d05a5382b27804964c31e4d3e7f22458bbecd2b78682bc52714c95a807b93
bigquery-0.4.0     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:85b5040a4b0d63e77814da52adf8615db2d40ebf096567667f2a63c464f77dfb
bigquery-0.5.0     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:d6365e7e3ce443c7032297a80bfd64e1079d46ce06a4e9a895db0ae534fca666
bigquery-1-latest  us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:5bb719e16a16a2619342060033371b08fe3f07d326c6629111214c00eef8a071
bigquery-1.0.0     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:7b146cca57cb8e141435bd2a5dafe98e4dc680b8c365cb9a857727d127c05c26
bigquery-1.0.1     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:259b278a8e38adcc13e5158423d098fe87d0ad89c50d4e282dab04a29921a86f
bigquery-1.1.0     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:33b39974d853a0df18a9cf81141a02d281b9b5bda186528c4ecb54e2c70cbe82
bigquery-1.2.0     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:baa0436ae3ffc447613c8a6c055e23dc9e3c8716ad0a9ea22fc44dc34873aded
bigquery-1.3.0     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:5296b1a8f34c08de2039f85f3317e9bb9adef9937b7125a65a67b3bd717f4340
bigquery-1.4.0     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:2ae684b5d60cab65b6dddb332af121c5c7548d5031cc342494b80bfdf7fe414a
bigquery-1.4.1     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:4f5196348ada28a1608e4b854b57f346c0508055bac55008fb124b8f94987643
bigquery-1.5.0     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:c391b33f5a52e7d6975ff465d6cfd4f10853828b3062cda6b4637195bbe2abe9
bigquery-1.6.0     us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:5bb719e16a16a2619342060033371b08fe3f07d326c6629111214c00eef8a071
latest             us-east1-docker.pkg.dev/gcb-release/cloud-build-notifiers/bigquery  sha256:5bb719e16a16a2619342060033371b08fe3f07d326c6629111214c00eef8a071

@devjgm
Copy link
Contributor Author

devjgm commented May 26, 2021

fyi, the GCB folks are looking into the crash of this BQ notifier. The crash is from this line:
https://github.com/GoogleCloudPlatform/cloud-build-notifiers/blob/b77fd4368030af5b735e1bdd296fc8af63bbd6a9/bigquery/main.go#L238

		startTime, err := parsePBTime(step.Timing.StartTime)

And the current guess is that not all build steps have timing info, we adding some nil checks there may fix this crash. I don't know the first thing about golang, but that sounds reasonable/likely to me.

@devjgm
Copy link
Contributor Author

devjgm commented May 27, 2021

Good news: @prabenzom submitted a few fixes to the BigQuery notifier and fixed the panic crashes. Yay! Thank you very much!!!

Bad news: The BQ notifier is still dropping a lot of updates. The reason seems to be that non-SUCCESS build alerts are often missing step.Timing information, and that causes the insert into BQ to fail

"E0527 00:54:23.418135       1 notifiers.go:449] failed to run SendNotification: Error inserting row into BQ: 1 row insertion failed (insertion of row [insertID: "YuH2meB46KYBgLxZNid82J1zGHU"; insertIndex: 0] failed with error: {Location: "steps[5].endtime"; Message: "Invalid datetime string \"0000-00-00 00:00:00\""; Reason: "invalid"})"

So it looks like our BQ data is missing info for all or most failed builds.

@prabenzom suggested inserting some default timestamp if none is present, which seems fine to me (if it results in the BQ inserts succeeding). Not sure if/when they'll have time to do this.

@devjgm
Copy link
Contributor Author

devjgm commented May 27, 2021

The new issue filed for the BQ notifier dropping failed statuses is GoogleCloudPlatform/cloud-build-notifiers#115

@devjgm
Copy link
Contributor Author

devjgm commented May 27, 2021

@prabenzom fixed the issue in GoogleCloudPlatform/cloud-build-notifiers@f246eff#diff-243269b0805de328ce233361b6a6e94d971e1d18e7727433c832fd42e4c19f28 Yay!!

Our logs now look clean, and 2500 unack'd pubsub messages are now being processed:

image

So I think this issue is fixed, and I'm hopeful that our BQ data and dashboards will now be accurate.

@devjgm devjgm closed this as completed May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: cleanup An internal cleanup or hygiene concern.
Projects
None yet
Development

No branches or pull requests

2 participants