Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Stackdriver Logging e2e tests using PubSub #45255

Conversation

crassirostris
Copy link

@crassirostris crassirostris commented May 3, 2017

Makes tests faster and mitigates Stackdriver Logging quotas issue.

Fixes #47069

@crassirostris crassirostris added area/logging release-note-none Denotes a PR that doesn't merit a release note. labels May 3, 2017
@crassirostris crassirostris added this to the v1.7 milestone May 3, 2017
@crassirostris crassirostris requested a review from piosz May 3, 2017 00:44
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 3, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-github-robot k8s-github-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 3, 2017
@crassirostris crassirostris force-pushed the sd-logging-e2e-performance-refactoring branch 6 times, most recently from 2e9736f to 6d4d077 Compare May 3, 2017 22:53
@crassirostris
Copy link
Author

crassirostris commented May 3, 2017

@fejta GCE e2e suite is failing, because of the following error:

Failed to init GCL logs provider
Expected error:
    <*errors.errorString | 0xc4208119c0>: {
        s: "failed to create Stackdriver Logging sink: googleapi: Error 403: The caller does not have permission, forbidden",
    }
    failed to create Stackdriver Logging sink: googleapi: Error 403: The caller does not have permission, forbidden
not to have occurred

Can we enable Logs Configuration Writer role for the account used in testing?

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 4, 2017
@crassirostris crassirostris force-pushed the sd-logging-e2e-performance-refactoring branch from 6d4d077 to 5254b89 Compare May 4, 2017 15:23
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 4, 2017
@crassirostris
Copy link
Author

@fejta @ixdy Friendly ping

Can we enable Logs Configuration Writer role for the account used in testing?

Is it possible at least?

@ixdy
Copy link
Member

ixdy commented May 5, 2017

@crassirostris which projects need this? all of them?

@fejta
Copy link
Contributor

fejta commented May 5, 2017

@crassirostris We have an internal mechanism for managing IAM grants on our GCP projects. Sending you a link internally this this.

Also fyi please /assign @fejta if you want my attention to make sure it shows up on my pr dashboard! :)

@crassirostris
Copy link
Author

@ixdy @fejta OK, thanks a lot for your help!

@crassirostris
Copy link
Author

@k8s-bot gce etcd3 e2e test this

@crassirostris crassirostris force-pushed the sd-logging-e2e-performance-refactoring branch from 5254b89 to fb44c88 Compare May 10, 2017 14:18
@crassirostris
Copy link
Author

@piosz Tests are passing, please take a look

@piosz
Copy link
Member

piosz commented May 15, 2017

As discussed offline you need some syncrhonization around LogEntryCache because it's accessed from multiple threads.

@crassirostris crassirostris force-pushed the sd-logging-e2e-performance-refactoring branch 3 times, most recently from ab51f4a to f707437 Compare May 18, 2017 18:59
@crassirostris crassirostris force-pushed the sd-logging-e2e-performance-refactoring branch from f707437 to 030ce2d Compare May 19, 2017 13:35
@crassirostris
Copy link
Author

WTF this comes from?

--- Godeps/Godeps.json	2017-05-19 06:38:19.220992837 -0700
+++ /tmp/gopath.43GVDI/src/k8s.io/kubernetes/Godeps/Godeps.json	2017-05-19 07:08:03.276127731 -0700
@@ -2701,10 +2701,6 @@
 			"Rev": "e3824ed33c72bf7e81da0286772c34b987520914"
 		},
 		{
-			"ImportPath": "google.golang.org/api/compute/v0.alpha",
-			"Rev": "e3824ed33c72bf7e81da0286772c34b987520914"
-		},
-		{
 			"ImportPath": "google.golang.org/api/compute/v0.beta",
 			"Rev": "e3824ed33c72bf7e81da0286772c34b987520914"
 		},

@piosz
Copy link
Member

piosz commented May 23, 2017

From HEAD:

$ grep -Rn "v0.alpha" .
./vendor/BUILD:350:        "//vendor/google.golang.org/api/compute/v0.alpha:all-srcs",

So it seems the mentioned lib is not used at all. Did you try to remove it?

@crassirostris
Copy link
Author

@piosz I mean ofc I can, but wtf, why other people didn't encounter that before?

@crassirostris crassirostris force-pushed the sd-logging-e2e-performance-refactoring branch 2 times, most recently from ac7c121 to a9adbff Compare May 23, 2017 11:53
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 23, 2017

@crassirostris: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
Jenkins verification 030ce2daf38bfbb5185e93d996ffe7bf9540c63d link @k8s-bot verify test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@grodrigues3
Copy link
Contributor

@crassirostris All PRs going into 1.7 require an associated issue. Can you please file one if it doesn't exist already and reference in this PR body?

@crassirostris
Copy link
Author

@grodrigues3 Sure, thanks for reminding about this!

@marun marun added the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Jun 8, 2017
@crassirostris crassirostris force-pushed the sd-logging-e2e-performance-refactoring branch from a9adbff to 018b85e Compare June 12, 2017 06:15
Copy link
Member

@piosz piosz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Does it work out of the box on Jenkins or do we need some service account for PubSub? I think the easiest way is to verify this is to merge it. Please check the Jenkins after the merge.

nodes := framework.GetReadySchedulableNodesOrDie(f.ClientSet).Items
nodeCount := len(nodes)
podCount := 30 * nodeCount
loggingDuration := 10 * time.Minute
linesPerSecond := 1000 * nodeCount
linesPerPod := linesPerSecond * int(loggingDuration.Seconds()) / podCount
ingestionTimeout := 60 * time.Minute
ingestionTimeout := 20 * time.Minute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be, since we start ingesting log entries as soon as they start flowing. From the manual runs that's more than enough, but I don't know of any guarantees/SLOs here (docs say log entries should appear "right away". I suggest leaving this as it is and then applying a fix if that's not enough, though I think timing out would indicate a problem with fluentd not being performant enough, rather than test being too restrictive on time.

@@ -56,9 +60,6 @@ var _ = framework.KubeDescribe("Cluster level logging using GCL [Feature:Stackdr
defer f.PodClient().Delete(podName, &meta_v1.DeleteOptions{})
}

By("Waiting for pods to succeed")
time.Sleep(loggingDuration)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this :D

}

framework.Logf("Waiting for log sink to become operational")
time.Sleep(1 * time.Minute)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a todo here to find a better way of verifying whether the sink is working.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a better way, since any solution wouldn't be able to reduce this delay to less than 10 sec, in reality probably more. Which IMO isn't worth additional complexity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But OTOH I agree that we shouldn't rely on sinks becoming operational faster, than in 1 minute

@spiffxp
Copy link
Member

spiffxp commented Jun 16, 2017

/retest

@crassirostris
Copy link
Author

@piosz

Does it work out of the box on Jenkins or do we need some service account for PubSub? I think the easiest way is to verify this is to merge it. Please check the Jenkins after the merge.

I've done it already

@crassirostris crassirostris force-pushed the sd-logging-e2e-performance-refactoring branch from 018b85e to 3cada4c Compare June 19, 2017 08:29
@piosz
Copy link
Member

piosz commented Jun 19, 2017

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 19, 2017
@dchen1107
Copy link
Member

/lgtm

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crassirostris, dchen1107, piosz

Associated issue: 47069

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 19, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 012ffed into kubernetes:master Jun 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/logging cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet