Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part 2 of rkt support #1154

Merged
merged 3 commits into from
Mar 31, 2016
Merged

Part 2 of rkt support #1154

merged 3 commits into from
Mar 31, 2016

Conversation

sjpotter
Copy link
Collaborator

The actual handler

Since I originally wrote it and splitting it in 2 an ignore metrics map was created. I think I captured it, but not 100% sure.

The design of this is to capture 2 of the cgroups as a rkt pod and a rkt container pair.

i.e. under the cgroup fs we'd have

machine-slice//system.slice/

in terms of raw containers this will ignore any system.slice directory under the rkt pod and will not be tracked by cadvisor by either. I think this is correct, but feedback would be appreciated.

I'm also not convinced by the efforts of splitting the fs path, I was reccomended to do this over a regex but again feedback would be appreciated.

@sjpotter
Copy link
Collaborator Author

@k8s-bot
Copy link
Collaborator

k8s-bot commented Mar 10, 2016

Jenkins GCE e2e

Build/test passed for commit 5838ee2.

@vishh
Copy link
Contributor

vishh commented Mar 10, 2016

The raw driver has been simple so far - it handles any cgroups that cannot be handled by other handlers. Any specific reason to change that?

@sjpotter
Copy link
Collaborator Author

It still basically does that. The idea is that "system.slice" cgroup itself isn't a "real" cgroup.

i.e. given a cgroup path

/sys/fs/cgroup/cpuset//machine.slice/machine-rkt\\x2d3c642e7b\\x2d7e44\\x2d42a1\\x2dba5f\\x2d2da88df3909c.scope/system.slice/alpine-sh.service

I want to consider the cgroups under machine.slice to be the "pod" (i.e. it will be the only one to report network stats as they are global to the pod) and the cgroups under system.slice to be the "containers". And I want those containers to be the direct subcontainers of the pod.

So, yea, I could say that the rkt handler doesn't accept or handle "system.slice" but it wouldn't have any parent or child from a raw container so would just disappear, seems better to say explicitly it is ignored.


// Watches the specified directory and all subdirectories. Returns whether the path was
// already being watched and an error (if any).
func (self *rktContainerHandler) watchDirectory(dir string, containerName string) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the rkt handler watching for new containers? I'd expect to have a global watcher which then hands off new containers to one of the handlers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the same exact code as the raw container, I didn't know if it was recursive or not. If it is and it just works because of the raw container support, that's great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not needed here. IIUC, this logic is needed in the raw container handler mainly to watch all the sub-containers of /.
This PR exposes one more abstraction issue. The raw handler must be made specialized, or the watching of subcontainers must be moved out of the handler interface into a something else, which the raw handler can implement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, removed and tested and you're right, works just fine.

@vishh
Copy link
Contributor

vishh commented Mar 14, 2016

Ping @sjpotter..

@sjpotter
Copy link
Collaborator Author

making progress, will update later today hopefully

@k8s-bot
Copy link
Collaborator

k8s-bot commented Mar 15, 2016

Jenkins GCE e2e

Build/test failed for commit aa78191.

@k8s-bot
Copy link
Collaborator

k8s-bot commented Mar 15, 2016

Jenkins GCE e2e

Build/test failed for commit 12b5d62.

@k8s-bot
Copy link
Collaborator

k8s-bot commented Mar 15, 2016

Jenkins GCE e2e

Build/test passed for commit 12b5d62.


resp, err := client.GetInfo(context.Background(), &rktapi.GetInfoRequest{})
if err != nil {
return "", fmt.Errorf("couldn't GetInfo from rkt api servie: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api service

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

)

func Client() (rktapi.PublicAPIClient, error) {
once.Do(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to retry? What if the api service were to be restarting while cAdvisor is being started?
Not for this PR - This is currently an issue with docker client as well, and it needs to be fixed there too.

@vishh
Copy link
Contributor

vishh commented Mar 23, 2016

Completed another review pass. Ping me once you address all the comments @sjpotter.
The main issue with the PR is the lack of resiliency to rkt apiservice failures. We can address that in a subsequent PR though...
How are we planning to add test coverage for the rkt integration?

@k8s-bot
Copy link
Collaborator

k8s-bot commented Mar 24, 2016

Jenkins GCE e2e

Build/test failed for commit ee52fdf.

@vishh
Copy link
Contributor

vishh commented Mar 24, 2016

LGTM. I didn't get a chance to test this PR yet. @sjpotter: What is the plan for testing rkt integration?

@sjpotter
Copy link
Collaborator Author

@vishh need to figure that out, I'm not 100% sure how to do that. In looking at the docker test cases, there isn't much there to inspire me

i can probably test individual functions (such as what I use to parse the cgroups), but unsure how to test every part of the e2e process.

In practice maybe can do integration tests like you have alrady (but aren't running)

@vishh
Copy link
Contributor

vishh commented Mar 24, 2016

We can add a few nodes to the e2e setup that run rkt and have the e2e framework run either docker or rkt tests based on the environment.
The other option is for you to setup an independent e2e and post the status on github.

@vishh
Copy link
Contributor

vishh commented Mar 25, 2016

@k8s-bot test this

@k8s-bot
Copy link
Collaborator

k8s-bot commented Mar 25, 2016

Jenkins GCE e2e

Build/test failed for commit ee52fdf.

@sjpotter
Copy link
Collaborator Author

I've changed this PR a bit to use new rkt api service support to lookup rkt pods by cgroup path but it has to wait till 1.3 release this week (to be vendored properly). This is necessary for it to be reliable in the presence of random slice definitions and pod cgroup names.

@sjpotter
Copy link
Collaborator Author

@k8s-bot test this

@k8s-bot
Copy link
Collaborator

k8s-bot commented Mar 30, 2016

Jenkins GCE e2e

Build/test failed for commit ee52fdf.

@k8s-bot
Copy link
Collaborator

k8s-bot commented Mar 30, 2016

Jenkins GCE e2e

Build/test passed for commit ee52fdf.

@vishh
Copy link
Contributor

vishh commented Mar 31, 2016

Tests are passing. Merging this PR. We can continue iterating on it.

@vishh vishh merged commit 5d7c71a into google:master Mar 31, 2016
@sjpotter sjpotter deleted the rkt branch May 5, 2016 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants