Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cloud ntp addresses #8312

Merged
merged 4 commits into from Mar 14, 2020
Merged

Add cloud ntp addresses #8312

merged 4 commits into from Mar 14, 2020

Conversation

ghost
Copy link

@ghost ghost commented Jan 11, 2020

Hi

This is a in working progress PR which will add the cloud specific NTP server into the NTP config. This will prevent clock drift which could end up in resulting in downtime. AWS as an example will prevent you speaking with there APIs once your clock is out by five minutes.

It would be great to get some feedback on whether you feel this is the best approach I am taking and also if you would prefer more parameters in the function I created rather pre determining the values inside the function with switch statements.

I used switch statements to prevent having to have the same logic again and again when determining the ntp install type.

I would like in another PR to maybe offer custom NTPs that a user could pass into the cluster spec. This might be handy when you have custom NTP servers when hosting Kubernetes on prem.

@mikesplain hope you don't mind me tagging you into this and we already sort of discussed this on slack :)

Simon

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 11, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @simonmacklin. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 11, 2020
@joshbranham
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 11, 2020
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 11, 2020
nodeup/pkg/model/ntp.go Outdated Show resolved Hide resolved
nodeup/pkg/model/ntp.go Outdated Show resolved Hide resolved
nodeup/pkg/model/ntp.go Outdated Show resolved Hide resolved
nodeup/pkg/model/ntp.go Outdated Show resolved Hide resolved
nodeup/pkg/model/ntp.go Outdated Show resolved Hide resolved
nodeup/pkg/model/ntp.go Outdated Show resolved Hide resolved
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 12, 2020
@ghost
Copy link
Author

ghost commented Jan 12, 2020

I have made some changes requested by @johngmyers. I have tested it with the default image and all seems to be working fine. I just would like to run some more tests and double check the code before this is considered to be merged. The regex strings are not perfect either so I need to work on those.

I am really short on time at the moment so I will probably not get a chance to sit and concentrate on this until Tuesday.

Thanks

Simon

@johngmyers
Copy link
Member

/hold
per previous comment

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 13, 2020
@rifelpet
Copy link
Member

@simonmacklin you may need to squash your commits and rebase in order to fix the CLA issue. It looks like this is the offending commit.

added replace method

added cloud ips

updated the func params

removed whitespace at gce address

removed sample ntp.conf

removed whitespace from gce ntp address

created const var ntp type

added a period at the end of the func comment and used the const vars on the case statement.  Will finish sometime this weekend

unexported func and const type

trying to fix git email config issue

changed func param
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jan 13, 2020
@ghost
Copy link
Author

ghost commented Jan 13, 2020

Thanks @rifelpet I think I have fixed it. I formatted my mac and forgot to set my email in the git config. Sorry about that

Simon

@johngmyers
Copy link
Member

This doesn't have support for an "in house" cloud provider, so that case isn't relevant.

@ghost
Copy link
Author

ghost commented Jan 15, 2020

@johngmyers

I stand corrected. I thought KOPS supported self hosted OpenStack and VSphere. So I assume they are only the cloud offerings.

Thanks

@johngmyers
Copy link
Member

johngmyers commented Jan 15, 2020

Kops does support those to some extent, but this added code only acts when the provider is AWS or GCE.

@oded-dd
Copy link

oded-dd commented Jan 24, 2020

I am not sure you support NTP for CoreOs, as it defaults to timesyncd NTP service

The way we support it is by Kops hooks:

‘’’
amazon_time_sync_service_ip_address =
"169.254.169.123"

Type=oneshot
RemainAfterExit=no
ExecStart=/bin/sh -c "printf "[Time]\nNTP=${amazon_time_sync_service_ip_address}\n" > /etc/systemd/timesyncd.conf"
ExecStartPost=/bin/systemctl restart systemd-timesyncd.service
‘’’

Hope it can be supported as part of this PR

@ReillyTevera
Copy link
Contributor

I've been of the opinion that it would be better to use timesyncd instead of chrony or ntpd on all of the supported platforms in fact.

@johngmyers
Copy link
Member

timesyncd would indeed seem to be better. Still, this PR reasonably accomplishes what it is intending to do. A competing or revised PR would also be welcome.

@ghost
Copy link
Author

ghost commented Mar 2, 2020

I agree but this PR was only intended to add the missing cloud NTPs. I will be more then happy to create another PR to switch over to timesyncd another time.

case "aws":
ntpIP = "169.254.169.123"
case "gce":
ntpIP = "time.google.com"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this should probably be metadata.google.internal or metadata.google, based on https://cloud.google.com/compute/docs/instances/managing-instances#configure_ntp_for_your_instances

Copy link
Author

@ghost ghost Mar 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. I got the information from this doc https://developers.google.com/time/faq but it looks like time.google.com is public for all people to use even outside of GCP.

I can't resolve metadata.google.internal from my laptop but I assume it will when running on GCP.

Shall I update?

// updateNtpIP takes a ip and a ntpDaemon and will comment out
// the default server or pool values and append the correct cloud
// ip to the ntp config file.
func updateNtpIP(ip string, daemon ntpDaemon) ([]byte, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may do better to write the full ntp configuration file, but this is the less intrusive change so let's run with it!

@justinsb
Copy link
Member

Thanks @simonmacklin

/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb, simonmacklin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 14, 2020
@johngmyers
Copy link
Member

/retest

@k8s-ci-robot k8s-ci-robot merged commit e6803d0 into kubernetes:master Mar 14, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Mar 14, 2020
@rifelpet
Copy link
Member

It looks like this is causing issues on the kuberouter e2e job which uses Ubuntu 18.04.
nodeup is failing on all nodes:

Mar 14 23:15:07.576966 ip-172-20-54-66 nodeup[1634]: W0314 23:15:07.576951 1634 main.go:138] got error running nodeup (will retry in 30s): error building loader: open /etc/ntp.conf: no such file or directory

@rifelpet
Copy link
Member

Some research shows that Ubuntu 18.04 uses chronyd rather than ntpd so we'll probably need to update the logic to be more specific than just debian family vs rhel family.

@johngmyers
Copy link
Member

Looks like Ubuntu uses timesyncd instead.

@ghost
Copy link
Author

ghost commented Mar 15, 2020

When I wrote the code back in Jan the e2e tests all passed. Has Ubuntu 18.04 recently been added to the e2e tests?

@hakman
Copy link
Member

hakman commented Mar 15, 2020

@simonmacklin the e2e tests in PR are run only on Debian 9. There are more extensive e2e tests that run periodically on master:
https://testgrid.k8s.io/sig-cluster-lifecycle-kops#Summary

@ghost
Copy link
Author

ghost commented Mar 15, 2020

@hakman

Right that makes sense.

Do you think this could just be a race condition or maybe the wrong config path for ubuntu? I understand like @johngmyers said the default is timesyncd. But nodeup installs ntpd anyway https://github.com/kubernetes/kops/blob/master/nodeup/pkg/model/ntp.go#L71

So I am wondering if nodeup is trying to write the config before it installs the package.

I will have a closer look when I get home tonight

Simon

@hakman
Copy link
Member

hakman commented Mar 15, 2020

@simonmacklin I think the problem is the updateNtpIP function:

bytes, err := updateNtpIP(ntpIP, ntpd)
if err != nil {
return err
}
c.AddTask(&nodetasks.File{
Path: "/etc/ntp.conf",
Contents: fi.NewBytesResource(bytes),
Type: nodetasks.FileType_File,
Mode: s("0644"),
})

This runs before all the tasks are created and run. The config file will not exist at all at that time if the package is not installed.

IMHO, the correct solution would be to write the full config files as we do everywhere else in nodeup and as @justinsb suggested: #8312 (comment).

@ghost
Copy link
Author

ghost commented Mar 15, 2020

Wouldn't line

c.AddTask(&nodetasks.Package{Name: "ntp"})
install ntp first before running the code under it?

@hakman
Copy link
Member

hakman commented Mar 15, 2020

Wouldn't line install ntp first before running the code under it?

AddTask just creates and queues a task to install the package. Ordering and the actual action is done later.

@ghost
Copy link
Author

ghost commented Mar 15, 2020

Right OK thanks @hakman

hakman pushed a commit to hakman/kops that referenced this pull request Mar 17, 2020
k8s-ci-robot added a commit that referenced this pull request Mar 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants