Added Google Cloud Pubsub support for data records #209

vic3lord · 2019-01-30T15:03:57Z

Description

Adding Google Cloud Pubsub support for data records, currently just a minimal implementation

Motivation and Context

We are using Google Cloud Pubsub and needed something native for data records

How Has This Been Tested?

running make with all tasks, passed all tests and started flagr server

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed.

codecov-io · 2019-01-30T15:06:50Z

Codecov Report

Merging #209 into master will decrease coverage by 1.11%.
The diff coverage is 50%.

@@            Coverage Diff             @@
##           master     #209      +/-   ##
==========================================
- Coverage   86.73%   85.62%   -1.12%     
==========================================
  Files          23       24       +1     
  Lines        1342     1384      +42     
==========================================
+ Hits         1164     1185      +21     
- Misses        127      144      +17     
- Partials       51       55       +4

Impacted Files	Coverage Δ
pkg/handler/data_recorder.go	`100% <100%> (ø)`	⬆️
pkg/handler/data_recorder_pubsub.go	`47.5% <47.5%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4cc964b...e9328ef. Read the comment docs.

crberube · 2019-01-30T17:05:11Z

Awesome. We are also looking to use Pub/Sub as our data pipeline.

crberube · 2019-01-30T17:07:55Z

pkg/handler/data_recorder_pubsub.go

+	client, err := pubsub.NewClient(
+		context.Background(),
+		config.Config.RecorderPubsubProjectID,
+		option.WithServiceAccountFile(config.Config.RecorderPubsubKeyFile),


This method is deprecated, it is recommended to use WithCredentialsFile instead. Just starting to look into this to check their equivalency, but they seem to operate on the same idea.

As this is an option, does setting it to an empty string affect the use of Application Default Credentials? Would want to make sure that is not the case.

I'm sorry, you are right! this option is deprecated I just used a chunk of code I am already using for the past 2 years in many go services here... I am pushing a fix now (thanks I'm also fixing my own services)

Regarding the empty string, yes it will work with default creds, matter effect this is exactly how we develop on our machines using our own accounts with proper IAM

zhouzhuojie · 2019-01-30T17:22:17Z

pkg/handler/data_recorder_pubsub.go

+	}
+}
+
+type pubsubEvalResult struct {


related to #203, I think it would be great if pubsub also follow the same pattern. The better if we can fix #203 and have a single frame struct for data records.

I totally agree, I looked at the code and tried to follow your style just to make sure everything will pass and be ok on your end... But sure I think there's a lot of place to improve some of the things.

I will take a look at those relevant issues and will update ASAP (this is night over here)

thanks! I think you can use

type pubsubMessageFrame struct { Payload string `json:"payload"` Encrypted bool `json:"encrypted"` }

Consolidating 2 structs is the same amout of work for 3. We can fix #203 later.

Also, make sure the final payload is a JSON struct looks like

{ "payload": "<json marshal of EvalResult>", "encrypted": false }

and test coverage of this file :)

@zhouzhuojie regarding the pubsubMessageFrame, I added it on my end but it looks weird to marshal into json string because pubsub accepts []byte anyways so the conversion happens for no reason... I'm pushing the code to share this with you, LMK what you think

I knew the json string payload at the first look is weird. The reason I want to standardize the log format is that

Extensibility. If people want to end-to-end encryption, compression, or other meta things related to the message frame, they can do it regardless of the choice of the recorder type. They can add more fields into the message frame struct. Payload as a string fits into this decision.

Portability. The data analytics pipeline requires no changes if one wants to switch between providers.

pkg/handler/data_recorder_pubsub.go

crberube · 2019-01-31T17:38:31Z

Need to check further to see if this is a larger-scale thing, but it seems that if the pubsub connection fails then an EvaluationResult fails to be returned. @zhouzhuojie would this be expected behavior? It seems to me that we'd want to return a result anyways given that the data recording should be an async process.

zhouzhuojie · 2019-01-31T18:00:41Z

Need to check further to see if this is a larger-scale thing, but it seems that if the pubsub connection fails then an EvaluationResult fails to be returned. @zhouzhuojie would this be expected behavior? It seems to me that we'd want to return a result anyways given that the data recording should be an async process.

Why do you think "if the pubsub connection fails then an EvaluationResult fails to be returned"? The connection won't affect online evaluation. AsyncRecord function should just log errors if there's any failure, and it shouldn't panic or exit as well.

pkg/handler/data_recorder_pubsub.go

crberube · 2019-01-31T20:08:54Z

Here is what I'm seeing:

Start up Flagr (default SQLite, w/ data recording enabled. in this case pubsub). PubSub is not running to simulate connection issue.
Issue request:

curl --header "Content-Type: application/json" --request POST --data '{"flagId":1}' localhost:18000/api/v1/evaluation
curl: (52) Empty reply from server

Server logs:

INFO[0248] started handling request                      method=POST remote="172.17.0.1:34236" request=/api/v1/evaluation
{"FlagEvalResult":{"evalContext":{"entityID":"randomly_generated_544474078","flagID":1},"evalDebugLog":{"segmentDebugLogs":[]},"flagID":1,"flagKey":"kmmcd1nsd6ze56chh","flagSnapshotID":9,"segmentID":1,"timestamp":"2019-01-31T19:57:38Z","variantAttachment":null,"variantID":null,"variantKey":null}}
ERRO[0308] error pushing to pubsub                       id= pubsub_error="context deadline exceeded"
INFO[0308] completed handling request                    measure#flagr.latency=60002253200 method=POST remote="172.17.0.1:34236" request=/api/v1/evaluation status=200 text_status=OK took=1m0.0022532s

I'd expect the result to be returned immediately, but it's never returned at all. In the OpenAPI client for Go, this ends up setting the error value of the '''PostEvaluationResult''' method.

It looks like it's hanging on the pubsub.NewClient part of the code, but again, still digging.

zhouzhuojie · 2019-01-31T20:53:13Z

@crberube @vic3lord

I see. Inside NewPubsubRecorder should do logrus.Fatal instead of just logrus.Error when we get error from pubsub.NewClient.

zhouzhuojie · 2019-01-31T20:53:51Z

pkg/handler/data_recorder_pubsub.go

+		option.WithCredentialsFile(config.Config.RecorderPubsubKeyFile),
+	)
+	if err != nil {
+		logrus.WithField("pubsub_error", err).Error("error getting pubsub client")


let's Fatal here instead of Error

I think the issue is caused by get method from pubsub which is a blocking operation, see pubsub doc here
we should pass context either with a deadline or timeout to get method instead of just background context.

Can confirm this is the issue. When I disable verbose logging, everything works as expected.

crberube · 2019-02-01T20:34:56Z

pkg/handler/data_recorder_pubsub.go

 	res := p.topic.Publish(ctx, &pubsub.Message{Data: payload})
 	if config.Config.RecorderPubsubVerbose {
+		ctx, cancel := context.WithTimeout(ctx, 5*time.Second)


What do you think about combining this with running the Get within a new goroutine?

Tried this and it seems to work well:

if config.Config.RecorderPubsubVerbose { go func() { ctx, cancel := context.WithTimeout(ctx, 5*time.Second) defer cancel() id, err := res.Get(ctx) if err != nil { logrus.WithFields(logrus.Fields{"pubsub_error": err, "id": id}).Error("error pushing to pubsub") } }() }

+1 for another goroutine. Also, 5s can be moved into env config.

Agreed, I added both

vic3lord · 2019-02-03T11:53:22Z

I need some help regarding the tests, I added test coverage to pubsub but now the data_recorder_test.go fails, I don't really understand how it passed before because it used the same NewPubsubRecorder() function when calling GetDataRecorder().

I can see that we use in all data recorders with their production functions without mocking, meaning connection should fail on all of them, unless I am missing something that runs as a mock server on CI

crberube · 2019-02-04T19:51:47Z

Still figuring out the best way to handle this, but I know what is wrong currently:

What is failing is the data_recorder_pubsub_test. The data_recorder_test.go file passes still and that is because we are stubbing out the NewPubsubRecorder function. For the former function...

When the following code is called:

	client, err := pubsub.NewClient(
		context.Background(),
		config.Config.RecorderPubsubProjectID,
		option.WithCredentialsFile(config.Config.RecorderPubsubKeyFile),
	)

the library uses the following rules to create a new client:

GCP client libraries use a strategy called Application Default Credentials (ADC) to find your application's credentials. When your code uses a client library, the strategy checks for your credentials in the following order:

First, ADC checks to see if the environment variable GOOGLE_APPLICATION_CREDENTIALS is set. If the variable is set, ADC uses the service account file that the variable points to. The next section describes how to set the environment variable.

If the environment variable isn't set, ADC uses the default service account that Compute Engine, Kubernetes Engine, App Engine, and Cloud Functions provide, for applications that run on those services.

If ADC can't use either of the above credentials, an error occurs.

On our machines, step 2 is happening, likely because we have run a gcloud auth command in the past. On the CircleCI containers, this has never been run and so we go to step 3 (error).

NewClient signature is as follows:

func NewClient(ctx context.Context, projectID string, opts ...option.ClientOption) (c *Client, err error) {

and the return value in case of an error is this:

return nil, fmt.Errorf("pubsub: %v", err)

so when the rest of the code executes:

	if err != nil {
		// TODO: use Fatal again after fixing the test expecting to not panic.
		// logrus.WithField("pubsub_error", err).Fatal("error getting pubsub client")
		logrus.WithField("pubsub_error", err).Error("error getting pubsub client")
	}

	return &pubsubRecorder{
		producer: client,
		topic:    client.Topic(config.Config.RecorderPubsubTopicName),
		enabled:  config.Config.RecorderEnabled,
	}

client is a null pointer, and you get a null pointer dereference.

So that's the issue.

Having the Fatal call will prevent us from dereferencing a nil pointer which is good. It is also the same way we handle error cases in the kafka client so it would follow the standards there.

So to fix the issue in the test we need to either
a) allow the client to retrieve credentials or
b) mock the client somehow
c) run the pubsub emulator

a) is not great because you would need to provide some sort of real credentials.
c) is pretty heavy handed for our current use case since all we are testing is that we can create a client
so it sounds like b) is our best bet. It appears that the library has a way to mock a client fairly easily, I'm playing around with it now and seeing if I can get something figured out.

zhouzhuojie · 2019-02-04T19:55:12Z

pkg/config/env.go

+	RecorderPubsubTopicName     string        `env:"FLAGR_RECORDER_PUBSUB_TOPIC_NAME" envDefault:"flagr-records"`
+	RecorderPubsubKeyFile       string        `env:"FLAGR_RECORDER_PUBSUB_KEYFILE" envDefault:""`
+	RecorderPubsubVerbose       bool          `env:"FLAGR_RECORDER_PUBSUB_VERBOSE" envDefault:"false"`
+	RecorderPubsubVerboseCancel time.Duration `env:"FLAGR_RECORDER_PUBSUB_VERBOSE_CANCEL" envDefault:"5s"`


how about RecorderPubsubVerboseCancelTimeout? Otherwise, it's not clear to me that cancel can be a time.Duration.

crberube · 2019-02-04T23:41:24Z

@vic3lord I've created a PR on your fork which fixes the above issues

fix issues raised in checkr/flagr PR openflagr#209

pkg/handler/data_recorder_pubsub.go

crberube · 2019-02-05T09:22:51Z

This looks good to me pending my question about handling potential marshal errors.

crberube · 2019-02-05T15:31:17Z

Cool. @zhouzhuojie how are you feeling about everything at this point?

zhouzhuojie · 2019-02-05T19:06:54Z

pkg/handler/data_recorder_pubsub.go

+	"github.com/checkr/flagr/pkg/config"
+	"github.com/checkr/flagr/swagger_gen/models"
+	"google.golang.org/api/option"
+


nit, no need a new line here

zhouzhuojie · 2019-02-05T19:14:02Z

pkg/handler/data_recorder_pubsub.go

+)
+
+type pubsubRecorder struct {
+	enabled  bool


is this enabled necessary here? eval.go checks it here https://github.com/checkr/flagr/blob/master/pkg/handler/eval.go#L178

oh yeah... this could probably be removed from the kafka and kinesis versions as they do the check as well

didn't realize that it was also used in kafka and kinesis, we can probably clean them up in other PRs

zhouzhuojie

lgtm, I think I left 1 or 2 nit comments, nothing major

Also, I think you can document how to authenticate with pubsub, for example, setting GOOGLE_APPLICATION_CREDENTIALS

vic3lord · 2019-02-06T07:07:56Z

Thank you so much for helping, I fixed the little nits.

Where would it be best to document the pubsub addition and its auth docs?

zhouzhuojie · 2019-02-06T18:28:20Z

Thank you so much for helping, I fixed the little nits.

Where would it be best to document the pubsub addition and its auth docs?

I think you can put it after the Kinesis section in https://github.com/checkr/flagr/blob/master/docs/flagr_env.md

vic3lord added 2 commits January 30, 2019 16:49

Added Google Cloud Pubsub support for data records

f328f53

fix configs

d411568

add Pubsub to UI tooltip

363f695

crberube reviewed Jan 30, 2019

View reviewed changes

zhouzhuojie reviewed Jan 30, 2019

View reviewed changes

remove the use of deprecated function, move to WithCredentialsFile

cd9c759

zhouzhuojie reviewed Jan 30, 2019

View reviewed changes

pkg/handler/data_recorder_pubsub.go Outdated Show resolved Hide resolved

vic3lord added 3 commits January 30, 2019 21:51

move id into WithFields

c61fe4e

dep ensure

59c4d6c

added pubsubMessageFrame

abae98e

no new variable

451c072

zhouzhuojie reviewed Jan 31, 2019

View reviewed changes

pkg/handler/data_recorder_pubsub.go Outdated Show resolved Hide resolved

zhouzhuojie reviewed Jan 31, 2019

View reviewed changes

vic3lord added 2 commits February 1, 2019 22:01

fixing comments

e29c0e5

add context with timeout only to the Get

f010bfd

crberube reviewed Feb 1, 2019

View reviewed changes

vic3lord added 8 commits February 1, 2019 23:00

use encoded frame instead of payload

931f2bb

Add config for duration and run get in a go routine

d679542

return message as is

f25066a

added test coverage

8e55272

import pusub ptest

da7e362

missing vendored packages

a726131

testing without stub reset

ab4628e

revert comment out

af38e8a

remove fatal for now

a5aee07

zhouzhuojie reviewed Feb 4, 2019

View reviewed changes

allow mock pubsub client to be used in test

a70e176

Christopher Berube and others added 2 commits February 4, 2019 21:26

clarify env var name

b8b5f06

Merge pull request #1 from crberube/pubsub

2ff5b65

fix issues raised in checkr/flagr PR openflagr#209

crberube reviewed Feb 5, 2019

View reviewed changes

pkg/handler/data_recorder_pubsub.go Outdated Show resolved Hide resolved

return if cannot marshal payload and messageframe

9ee7ef9

zhouzhuojie reviewed Feb 5, 2019

View reviewed changes

zhouzhuojie approved these changes Feb 5, 2019

View reviewed changes

remove enabled from recorder; clean empty line

ba5ad17

fixed tests using enabled property

6225309

document

e9328ef

zhouzhuojie merged commit 9610385 into openflagr:master Feb 6, 2019

vic3lord deleted the pubsub branch February 7, 2019 07:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Google Cloud Pubsub support for data records #209

Added Google Cloud Pubsub support for data records #209

vic3lord commented Jan 30, 2019 •

edited

Loading

codecov-io commented Jan 30, 2019 •

edited

Loading

crberube commented Jan 30, 2019

crberube Jan 30, 2019

vic3lord Jan 30, 2019

zhouzhuojie Jan 30, 2019

vic3lord Jan 30, 2019

zhouzhuojie Jan 30, 2019

zhouzhuojie Jan 30, 2019

vic3lord Jan 31, 2019

zhouzhuojie Jan 31, 2019

crberube commented Jan 31, 2019

zhouzhuojie commented Jan 31, 2019

crberube commented Jan 31, 2019

zhouzhuojie commented Jan 31, 2019

zhouzhuojie Jan 31, 2019

tosonggao Jan 31, 2019

crberube Jan 31, 2019

crberube Feb 1, 2019 •

edited

Loading

zhouzhuojie Feb 1, 2019

vic3lord Feb 2, 2019

vic3lord commented Feb 3, 2019

crberube commented Feb 4, 2019 •

edited

Loading

zhouzhuojie Feb 4, 2019

crberube commented Feb 4, 2019

crberube commented Feb 5, 2019

crberube commented Feb 5, 2019

zhouzhuojie Feb 5, 2019

zhouzhuojie Feb 5, 2019

crberube Feb 5, 2019

zhouzhuojie Feb 5, 2019

zhouzhuojie left a comment •

edited

Loading

vic3lord commented Feb 6, 2019

zhouzhuojie commented Feb 6, 2019

Added Google Cloud Pubsub support for data records #209

Added Google Cloud Pubsub support for data records #209

Conversation

vic3lord commented Jan 30, 2019 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

codecov-io commented Jan 30, 2019 • edited Loading

Codecov Report

crberube commented Jan 30, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crberube commented Jan 31, 2019

zhouzhuojie commented Jan 31, 2019

crberube commented Jan 31, 2019

zhouzhuojie commented Jan 31, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crberube Feb 1, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vic3lord commented Feb 3, 2019

crberube commented Feb 4, 2019 • edited Loading

Choose a reason for hiding this comment

crberube commented Feb 4, 2019

crberube commented Feb 5, 2019

crberube commented Feb 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhouzhuojie left a comment • edited Loading

Choose a reason for hiding this comment

vic3lord commented Feb 6, 2019

zhouzhuojie commented Feb 6, 2019

vic3lord commented Jan 30, 2019 •

edited

Loading

codecov-io commented Jan 30, 2019 •

edited

Loading

crberube Feb 1, 2019 •

edited

Loading

crberube commented Feb 4, 2019 •

edited

Loading

zhouzhuojie left a comment •

edited

Loading