Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-13738 enforce additional ntp sources added into chrony #5295

Merged

Conversation

filanov
Copy link
Contributor

@filanov filanov commented Jun 15, 2023

Chrony configuration applied into machine config manifest based on the reply from the agent, meaning that when user add additional ntp source first the service will ask the agent to add it then return the reply to the service it will be stored in the DB and used to create the machine config chrony configuration

Because NTP is a soft validation, in ZTP flow there are cases that the installation will start before the agent return a reply to the service, it will mean that those sources will not be applied, to avoid that when building the manifests in addition to taking the agent reply the additional ntp sources will be added directly from the DB

The downside of this solution is that if the source is incorrect it will appear in /etc/chrony.conf but not in chronyc sources and it will not be resolved as the regular reply from the agent

It will not harm the cluster installation or affect the cluster after the installation if the source is incorrect

Example i added harta,barta in additional ntp sources
They didn't return from the agent so without this code they will not be added to the installation event if there is no race
After installation

cat /etc/chrony.conf 
...
server harta iburst
server barta iburst

chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^? ntp.wdc2.us.leaseweb.net      0   9     0     -     +0ns[   +0ns] +/-    0ns
^? 152.70.159.102                0   9     0     -     +0ns[   +0ns] +/-    0ns
^? time.cloudflare.com           0   9     0     -     +0ns[   +0ns] +/-    0ns
^? haka.ruselabs.com             0   9     0     -     +0ns[   +0ns] +/-    0ns
^? ip229.ip-51-81-226.us         0   9     0     -     +0ns[   +0ns] +/-    0ns
^? 108.175.15.67                 0   9     0     -     +0ns[   +0ns] +/-    0ns
^? connected.by.freedominte>     0   9     0     -     +0ns[   +0ns] +/-    0ns
^? 168.61.215.74                 0   9     0     -     +0ns[   +0ns] +/-    0ns
^? tick.srs1.ntfo.org            0   9     0     -     +0ns[   +0ns] +/-    0ns
^? pacific.latt.net              0   9     0     -     +0ns[   +0ns] +/-    0ns
^? shaka.ruselabs.com            0   9     0     -     +0ns[   +0ns] +/-    0ns
^? 198.30.92.2                   0   9     0     -     +0ns[   +0ns] +/-    0ns
^? pi3.rellim.com                0   9     0     -     +0ns[   +0ns] +/-    0ns
^? srcf-ntp.stanford.edu         0   9     0     -     +0ns[   +0ns] +/-    0ns
^? time.cloudflare.com           0   9     0     -     +0ns[   +0ns] +/-    0ns
^* ntp1.ntp-001.prod.iad2.d>     2   8   377   121   +162us[ +215us] +/-   27ms

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

@filanov
Copy link
Contributor Author

filanov commented Jun 15, 2023

/assign @carbonin
/assign @avishayt

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 15, 2023
@openshift-ci
Copy link

openshift-ci bot commented Jun 15, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: filanov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 15, 2023
@filanov
Copy link
Contributor Author

filanov commented Jun 15, 2023

Not 100% sure about this solution because the user will be able to add invalid ntp sources, @carbonin @avishayt what do you think?

I didn't manage to find any other solution

@codecov
Copy link

codecov bot commented Jun 15, 2023

Codecov Report

Merging #5295 (b07c77b) into master (682809b) will increase coverage by 1.06%.
The diff coverage is 85.39%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5295      +/-   ##
==========================================
+ Coverage   67.50%   68.57%   +1.06%     
==========================================
  Files         221      221              
  Lines       33044    34508    +1464     
==========================================
+ Hits        22307    23664    +1357     
- Misses       8726     8754      +28     
- Partials     2011     2090      +79     
Impacted Files Coverage Δ
internal/common/conversions.go 5.76% <0.00%> (-1.38%) ⬇️
internal/network/manifests_generator.go 75.74% <69.23%> (-0.82%) ⬇️
...nsion/v1beta1/agentclusterinstall_mutating_hook.go 58.25% <95.00%> (+2.93%) ⬆️
...ion/v1beta1/agentclusterinstall_validation_hook.go 87.40% <96.00%> (+2.10%) ⬆️

... and 3 files with indirect coverage changes

Chrony configuration applied into machine config manifest based on the
reply from the agent, meaning that when user add additional ntp source
first the service will ask the agent to add it then return the reply to
the service it will be stored in the DB and used to create the machine
config chrony configuration

Because NTP is a soft validation, in ZTP flow there are cases that the
installation will start before the agent return a reply to the service,
it will mean that those sources will not be applied, to avoid that when
building the manifests in addition to taking the agent reply the
additional ntp sources will be added directly from the DB

The downside of this solution is that if the source is incorrect it
will appear in /etc/chrony.conf but not in `chronyc sources` and it will
not be resolved as the regular reply from the agent

It will not harm the cluster installation or affect the cluster after
the installation if the source is incorrect
@carbonin
Copy link
Member

Not 100% sure about this solution because the user will be able to add invalid ntp sources

If there was no race and we did whatever we do to add the sources to the host during discovery, would the user have seen an error?

Or put another way, are we doing anything to attempt to check the validity of these sources currently?

@filanov
Copy link
Contributor Author

filanov commented Jun 15, 2023

Not 100% sure about this solution because the user will be able to add invalid ntp sources

If there was no race and we did whatever we do to add the sources to the host during discovery, would the user have seen an error?

Or put another way, are we doing anything to attempt to check the validity of these sources currently?

so what we are doing right now is what i described in the commit message

Chrony configuration applied into machine config manifest based on the reply from the agent, meaning that when user add additional ntp source first the service will ask the agent to add it then return the reply to the service it will be stored in the DB and used to create the machine config chrony configuration

Meaning that invalid sources will not be added, but on the other had the user requested something so it will make sense to add them even if right now they are not available.

in addition we had a lot of confusion from the users related to the resolution that the agent is doing, we are being asked i added X and i don't see it in the chrony and with this change they will.

@carbonin
Copy link
Member

I'm fine with this.

I think it's less confusing to add the ntp sources the user asks for even if they may be incorrect as long as it doesn't negatively impact the cluster.

They can always remove or change the sources using the machineconfig once the cluster is installed.

@avishayt
Copy link
Contributor

If the user asked for an NTP source, we should add it. It might currently be offline or unreachable, but later on, it will be (for example, if the cluster is relocated).

It looks like this PR doesn't fix the actual race though - is that in a different PR?

@avishayt
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 20, 2023
@filanov
Copy link
Contributor Author

filanov commented Jun 20, 2023

If the user asked for an NTP source, we should add it. It might currently be offline or unreachable, but later on, it will be (for example, if the cluster is relocated).

It looks like this PR doesn't fix the actual race though - is that in a different PR?

The way to resolve the race is to add additional validation that will block the installation, but it will conflict with another validation that is not required so it will create a strange UX for the user, in ACM we don't have defaults so with another validation we will require the user to add additional NTP sources.

With the proposed solution we will copy exactly what the user requested.

@openshift-ci
Copy link

openshift-ci bot commented Jun 20, 2023

@filanov: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit cab4a19 into openshift:master Jun 20, 2023
14 checks passed
@filanov filanov deleted the additional-ntp-sources branch June 20, 2023 11:26
@filanov
Copy link
Contributor Author

filanov commented Jun 20, 2023

/cherry-pick release-ocm-2.8

@openshift-cherrypick-robot

@filanov: new pull request created: #5308

In response to this:

/cherry-pick release-ocm-2.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

danielerez pushed a commit to danielerez/assisted-service that referenced this pull request Oct 15, 2023
…hift#5295)

Chrony configuration applied into machine config manifest based on the
reply from the agent, meaning that when user add additional ntp source
first the service will ask the agent to add it then return the reply to
the service it will be stored in the DB and used to create the machine
config chrony configuration

Because NTP is a soft validation, in ZTP flow there are cases that the
installation will start before the agent return a reply to the service,
it will mean that those sources will not be applied, to avoid that when
building the manifests in addition to taking the agent reply the
additional ntp sources will be added directly from the DB

The downside of this solution is that if the source is incorrect it
will appear in /etc/chrony.conf but not in `chronyc sources` and it will
not be resolved as the regular reply from the agent

It will not harm the cluster installation or affect the cluster after
the installation if the source is incorrect
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants