TF serving support request logging #1229

lluunn · 2018-07-18T01:03:00Z

For #1000

/cc @jlewi

Using fluentd as a sidecar container to export logs.
Choosing bigquery to begin with. Should be easy to add other backends using fluentd plugins.

This change is

lluunn · 2018-07-18T01:05:28Z

/hold
need md

jlewi · 2018-07-18T17:33:08Z

Please update the PR description to describe the approach being taken in this PR to solve the request logging problem.

jlewi · 2018-07-18T17:34:40Z

components/k8s-model-server/request-logger/logging_worker.py

+    time.sleep(10)
+    if not os.path.isfile(target_file):
+      continue
+    print('found the file\n')


Use logging not print.

jlewi · 2018-07-18T17:34:57Z

components/k8s-model-server/request-logger/requirements.txt

@@ -0,0 +1 @@
+google-cloud-bigquery


Why do you need bigquery?

jlewi · 2018-07-18T17:38:17Z

components/k8s-model-server/request-logger/logging_worker.py

+import os
+import time
+
+from google.cloud import bigquery


Why do we need to write our own logging agent as opposed to using fluentd?
For example, https://github.com/kaizenplatform/fluent-plugin-bigquery

jlewi · 2018-07-18T17:42:21Z

components/k8s-model-server/http-proxy/server.py

@@ -19,10 +19,12 @@
 from itertools import repeat
 import base64
 import logging
-import grpc
+import random


If we do it in the http server proxy we won't get this for gRPC requests; right?

jlewi · 2018-07-18T17:44:07Z

Are you emitting the logs in json format?

lluunn · 2018-07-24T18:14:25Z

Using fluentd now, PTAL
@jlewi

lluunn · 2018-07-24T19:37:52Z

/hold cancel

It's working now

lluunn · 2018-07-24T21:11:02Z

/retest

jlewi

Reviewable status: 0 of 5 files reviewed, 10 unresolved discussions (waiting on @jlewi and @lluunn)

components/k8s-model-server/request-logging.md, line 3 at r3 (raw file):

# Request logging for TF Serving

It currently supports streaming to BigQuery.

Is this still true even though we are using fluentd?

I'd suggest putting this into the existing dev doc on serving that you have. User facing documentation can be added to the website.

components/k8s-model-server/request-logging.md, line 16 at r3 (raw file):

Modify bigquery dataset and schema in fluent.conf.

How does fluentd.conf get passed to the agent?

components/k8s-model-server/fluentd-logger/Dockerfile, line 1 at r3 (raw file):

FROM fluent/fluentd:v1.2-debian

Can you add a comment? Are we building our own fluentd agent because we want to use BigQuery?

kubeflow/examples/fluent.conf, line 1 at r3 (raw file):

<source>

I think this belongs in the serving package not the examples package.

kubeflow/examples/fluent.conf, line 16 at r3 (raw file):

  auth_method application_default

  # change these!

Is this comment valid?

How would user actually supply this? Would it be via a ConfigMap?

kubeflow/examples/fluent.conf, line 17 at r3 (raw file):


  # change these!
  project TBD

Can we make the configmap containing conf a prototype and make these parameters?
You can just use ksonnet to build the string and then add in the variables e.g.

local config = ".....
project " + params.project

ksonnet supports raw multiline strings. I forget what the syntax is.

Another approach would be to use environment variables in the conf
https://docs.fluentd.org/v0.12/articles/faq#how-can-i-use-environment-variables-to-configure-parameters-dynamically?

And then specify those environment variables on the pod.

kubeflow/examples/prototypes/tf-serving-with-request-log.jsonnet, line 18 at r3 (raw file):


// Change this!
local gcpSecretName = "TBD";

Why is this TBD? Why not make it a parameter that defaults to "user-gcp-sa" which is the default secret we set up.

jlewi · 2018-07-24T23:11:16Z

This is great.

Can you update the PR description to explain the design choices; i.e. the fact that we are using fluentd and BigQuery?

lluunn · 2018-07-25T00:06:49Z

Some of the comment are regarding how we present the prototype:

use tf-serving package, setting parameters, adding volumes/env if XXX
use examples package, ks generate copies the whole jsonnet, and let users modify directly.

I think the current libsonnet is already messy (filed #1264 ), so I choose option 2 here.

WDYT?

lluunn

Reviewable status: 0 of 5 files reviewed, 10 unresolved discussions (waiting on @lluunn and @jlewi)

components/k8s-model-server/request-logging.md, line 3 at r3 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Is this still true even though we are using fluentd?

I'd suggest putting this into the existing dev doc on serving that you have. User facing documentation can be added to the website.

Yes, I only installed bigquery plugin now.
I will add this info to website under Guides: TF serving

components/k8s-model-server/request-logging.md, line 16 at r3 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

How does fluentd.conf get passed to the agent?

it sits locally with jsonnet prototype. The prototype imports it as configmap

components/k8s-model-server/fluentd-logger/Dockerfile, line 1 at r3 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Can you add a comment? Are we building our own fluentd agent because we want to use BigQuery?

Yes. https://github.com/fluent/fluentd-docker-image#3-customize-dockerfile-to-install-plugins-optional

kubeflow/examples/fluent.conf, line 16 at r3 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Is this comment valid?

How would user actually supply this? Would it be via a ConfigMap?

jsonnet imports this as a configmap

kubeflow/examples/prototypes/tf-serving-with-request-log.jsonnet, line 18 at r3 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Why is this TBD? Why not make it a parameter that defaults to "user-gcp-sa" which is the default secret we set up.

Done.

components/k8s-model-server/request-logger/logging_worker.py, line 27 at r1 (raw file):

obsolete
obsolete

components/k8s-model-server/request-logger/logging_worker.py, line 50 at r1 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Use logging not print.

obsolete

components/k8s-model-server/request-logger/requirements.txt, line 1 at r1 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Why do you need bigquery?

Done.

jlewi

Reviewable status: 0 of 5 files reviewed, 6 unresolved discussions (waiting on @jlewi and @lluunn)

components/k8s-model-server/fluentd-logger/Dockerfile, line 3 at r4 (raw file):

FROM fluent/fluentd:v1.2-debian

# Fluentd image with plugin installed.

Nit move comment to top of file

kubeflow/examples/fluent.conf, line 1 at r3 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

I think this belongs in the serving package not the examples package.

Can we move this to serving package please?

jlewi · 2018-07-25T04:50:16Z

use tf-serving package, setting parameters, adding volumes/env if XXX
use examples package, ks generate copies the whole jsonnet, and let users modify directly.

The prototype belongs in the tf-serving package regardless of how you generate it; i.e. you can still copy the whole jsonnet.

Telling a user to change certain lines in a text file is much less convenient then doing

ks param set fluent-conf project myproject

Since these parameters are required we should try to expose them as ks param set commands rather than requiring users to modify the conf.

lluunn

Reviewable status: 0 of 4 files reviewed, 5 unresolved discussions (waiting on @jlewi)

kubeflow/examples/fluent.conf, line 1 at r3 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Can we move this to serving package please?

Done.

kubeflow/examples/fluent.conf, line 16 at r3 (raw file):

Previously, lluunn (Lun-Kai Hsu) wrote…

jsonnet imports this as a configmap

Done.

kubeflow/examples/fluent.conf, line 17 at r3 (raw file):

Previously, jlewi (Jeremy Lewi) wrote…

Can we make the configmap containing conf a prototype and make these parameters?
You can just use ksonnet to build the string and then add in the variables e.g.

local config = ".....
project " + params.project

ksonnet supports raw multiline strings. I forget what the syntax is.

Another approach would be to use environment variables in the conf
https://docs.fluentd.org/v0.12/articles/faq#how-can-i-use-environment-variables-to-configure-parameters-dynamically?

And then specify those environment variables on the pod.

Done.

lluunn · 2018-07-25T18:49:07Z

Remove the conf file and make it in the prototype. project etc can be set via ks param set.
PTAL, thanks!

lluunn · 2018-07-25T19:48:00Z

test failure: #1266

jlewi · 2018-07-27T05:02:52Z

kubeflow/tf-serving/prototypes/tf-serving-with-request-log.jsonnet

+          },
+          // Http proxy
+          {
+            name: "mnist-http-proxy",


remove mnist in the name.

jlewi · 2018-07-27T05:04:50Z

kubeflow/tf-serving/prototypes/tf-serving-with-request-log.jsonnet

+              },
+            ],
+          },
+          // Logging container.


Lets file an issue to create an admission controller to inject the side car.

jlewi · 2018-07-27T05:05:10Z

kubeflow/tf-serving/prototypes/tf-serving-with-request-log.jsonnet

+          },
+          // Logging container.
+          {
+            name: "mnist-logging",


Don't use mnist in the name.

jlewi · 2018-07-27T05:09:27Z

Sorry for the slow reply; GCP next.

Minor comment about replacing the name "mnist" somewhere.

The duplication of code with the other templates is unfortunate. I think the best option is probably to create an admission controller that can inject the relevant information and then just make that its own prototype. Lets file a follow on issue.

lluunn · 2018-07-27T18:26:13Z

Thanks for the review, PTAL

lluunn · 2018-07-27T18:28:50Z

/retest

lluunn · 2018-07-27T19:02:10Z

/retest

jlewi · 2018-07-29T00:26:09Z

/lgtm
/approve

k8s-ci-robot · 2018-07-29T00:26:13Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jlewi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* tf serving request logging * use fluentd * add md * fix link * fix js lint * comment * make config in prototype * fix js * address comments

tf serving request logging

8dd3315

k8s-ci-robot requested a review from jlewi July 18, 2018 01:03

k8s-ci-robot added the size/L label Jul 18, 2018

k8s-ci-robot added the do-not-merge/hold label Jul 18, 2018

lluunn changed the title ~~TF serving support request logging WIP~~ TF serving support request logging Jul 18, 2018

jlewi reviewed Jul 18, 2018

View reviewed changes

components/k8s-model-server/request-logger/requirements.txt Outdated

@@ -0,0 +1 @@

google-cloud-bigquery

Copy link

Contributor

jlewi Jul 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need bigquery?

jlewi reviewed Jul 18, 2018

View reviewed changes

lluunn added 3 commits July 24, 2018 11:08

use fluentd

e5688a8

add md

da19439

fix link

08b73cd

fix js lint

a6123b0

k8s-ci-robot removed the do-not-merge/hold label Jul 24, 2018

lluunn mentioned this pull request Jul 24, 2018

TF Serving GPU test failing #1262

Closed

jlewi suggested changes Jul 24, 2018

View reviewed changes

lluunn commented Jul 25, 2018

View reviewed changes

comment

8686588

jlewi suggested changes Jul 25, 2018

View reviewed changes

jlewi mentioned this pull request Jul 25, 2018

Make TF serving component more readable and extendable #1264

Closed

make config in prototype

797e679

lluunn commented Jul 25, 2018

View reviewed changes

fix js

ad6fa8d

Merge branch 'master' into reqlog1

6b8b34b

jlewi reviewed Jul 27, 2018

View reviewed changes

address comments

d837edc

lluunn mentioned this pull request Jul 27, 2018

[Test flake] unable to recognize XX: no matches for kind Workflow #1278

Closed

jlewi approved these changes Jul 29, 2018

View reviewed changes

k8s-ci-robot assigned jlewi Jul 29, 2018

k8s-ci-robot added the lgtm label Jul 29, 2018

k8s-ci-robot added the approved label Jul 29, 2018

k8s-ci-robot merged commit 19ca6d4 into kubeflow:master Jul 29, 2018

snyk-bot mentioned this pull request Jul 17, 2020

[Snyk] Fix for 1 vulnerabilities ajesse11x/kubeflow#29

Open

saffaalvi pushed a commit to StatCan/kubeflow that referenced this pull request Feb 11, 2021

TF serving support request logging (kubeflow#1229)

687fdc9

* tf serving request logging * use fluentd * add md * fix link * fix js lint * comment * make config in prototype * fix js * address comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF serving support request logging #1229

TF serving support request logging #1229

lluunn commented Jul 18, 2018 •

edited

lluunn commented Jul 18, 2018

jlewi commented Jul 18, 2018

jlewi Jul 18, 2018

jlewi Jul 18, 2018

jlewi Jul 18, 2018

jlewi Jul 18, 2018

jlewi commented Jul 18, 2018

lluunn commented Jul 24, 2018

lluunn commented Jul 24, 2018

lluunn commented Jul 24, 2018

jlewi left a comment

jlewi commented Jul 24, 2018

lluunn commented Jul 25, 2018

lluunn left a comment

jlewi left a comment

jlewi commented Jul 25, 2018

lluunn left a comment

lluunn commented Jul 25, 2018

lluunn commented Jul 25, 2018

jlewi Jul 27, 2018

lluunn Jul 27, 2018

jlewi Jul 27, 2018

lluunn Jul 27, 2018

jlewi Jul 27, 2018

lluunn Jul 27, 2018

jlewi commented Jul 27, 2018

lluunn commented Jul 27, 2018

lluunn commented Jul 27, 2018

lluunn commented Jul 27, 2018

jlewi commented Jul 29, 2018

k8s-ci-robot commented Jul 29, 2018

TF serving support request logging #1229

TF serving support request logging #1229

Conversation

lluunn commented Jul 18, 2018 • edited

lluunn commented Jul 18, 2018

jlewi commented Jul 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlewi commented Jul 18, 2018

lluunn commented Jul 24, 2018

lluunn commented Jul 24, 2018

lluunn commented Jul 24, 2018

jlewi left a comment

Choose a reason for hiding this comment

jlewi commented Jul 24, 2018

lluunn commented Jul 25, 2018

lluunn left a comment

Choose a reason for hiding this comment

jlewi left a comment

Choose a reason for hiding this comment

jlewi commented Jul 25, 2018

lluunn left a comment

Choose a reason for hiding this comment

lluunn commented Jul 25, 2018

lluunn commented Jul 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlewi commented Jul 27, 2018

lluunn commented Jul 27, 2018

lluunn commented Jul 27, 2018

lluunn commented Jul 27, 2018

jlewi commented Jul 29, 2018

k8s-ci-robot commented Jul 29, 2018

lluunn commented Jul 18, 2018 •

edited