Skip to content

Several pages still use removed services#11482

Merged
mburke5678 merged 1 commit intoopenshift:masterfrom
mburke5678:BZ-1614095
Oct 12, 2018
Merged

Several pages still use removed services#11482
mburke5678 merged 1 commit intoopenshift:masterfrom
mburke5678:BZ-1614095

Conversation

@mburke5678
Copy link
Copy Markdown
Contributor

@mburke5678 mburke5678 commented Aug 9, 2018

@openshift-docs-preview-bot
Copy link
Copy Markdown

The preview will be availble shortly at:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be separated 2 lines.

master-restart api
master-restart controllers

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be separated 2 lines.

master-restart api
master-restart controllers

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's no longer "openvswitch service". We can simply say "restart OpenShift SDN."

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 10, 2018
@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 19, 2018
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 20, 2018
@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 20, 2018
@mburke5678
Copy link
Copy Markdown
Contributor Author

@nekop Can you please take another look? I have made further changes, such as the API health check, replaced stopping and disabling service.,

@mburke5678 mburke5678 force-pushed the BZ-1614095 branch 2 times, most recently from 17ed8ed to 523cf4e Compare August 23, 2018 18:53
@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 23, 2018
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 24, 2018
Copy link
Copy Markdown
Contributor

@nekop nekop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is half-done review, but the important thing so far is:

  • If we move files /etc/origin/node/pods/, we need to restore it
  • atomic-openshift-node.service exists in 3.10

I'll add more comments later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The static pod files are master services, not the node we want to reboot
  • If we move files, we need to restore the files after that
  • We don't need to reboot node services nor del-br br0 if we reboot the node at step 3

To fix, simply replace the steps with node reboot.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The atomic-openshift-node.service is still used in 3.10, no need to replace it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The atomic-openshift-node.service is still used in 3.10, no need to replace it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to restore files before reboot and to restore files we need to stop atomic-openshift-node.service (otherwise it starts again).

# mkdir -p /etc/origin/node/pods-stopped
# mv /etc/origin/node/pods/* /etc/origin/node/pods-stopped/
# systemctl stop atomic-openshift-node
# mv /etc/origin/node/pods-stopped/* /etc/origin/node/pods/
# reboot

We probably need more steps to see if etcd is properly stopped. I'll send a query to openshift-sme.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we just need stop etcd service here, so I think should move etcd.yaml instead of all files in /etc/origin/node/pods/ folder unless you want stop master-api and controllers too.
so it might be:
mv /etc/origin/node/pods/etcd.yaml /etc/origin/node/pods-stopped/
@xingxingxia @geliu2016 could you help confirm?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lihongan yes, AGREE.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three types of pods: api, controllers, and etcd in the 'kube-system' project:
If here focus API service, then you can say view the master-api pods in the *kube-system* project and use the command with a label to get master-api pod only as below:
oc get pod -n kube-system -l openshift.io/component=api

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The master-controllers and master-etcd pod in the output can be removed if using the command above with the label.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not be true. The master-api.example.com looks like the pod name and you cannot curl it. You should get the hostname or IP of the node which the pod landed then curl the host. for example:

$ oc get pod -n kube-system -o wide
NAME                                               READY     STATUS    RESTARTS   AGE       IP            NODE
master-api-qe-anli310master-etcd-zone1-1           1/1       Running   0          7h        10.240.0.16   qe-anli310master-etcd-zone1-1
$ curl -k https://qe-anli310master-etcd-zone1-1/healthz
ok

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 28, 2018
@mburke5678
Copy link
Copy Markdown
Contributor Author

mburke5678 commented Aug 28, 2018

@nekop @lihongan Thank you very much for your comments!! Very helpful.Can you please review the changes and make sure I addressed all of your concerns?

Is there a way to do journalctl -u atomic-openshift-master-controllers.service?

@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 29, 2018
Copy link
Copy Markdown
Contributor

@xingxingxia xingxingxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lihongan Sorry, due to engaged on other tasks, didn't reply you timely in daylights.
The PR is a bit long, cross different files (and different subteams Networking, Master, Pod, maybe. A suggestion: if possible, it is better to have separate PRs for different subteams review). I'm from Master QE, just did a quick review to part of the PR today. Thank you!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syntax issue: viewing from https://github.com/mburke5678/openshift-docs/blob/BZ-1614095/admin_guide/ipsec.adoc , it has an unnecessary ending +

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context here is about features in node and about node service atomic-openshift-node, this service has no change in 3.10. It is not about master-restart controllers which is about controllers service in master.
Thus above change is not needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xingxingxia I am told that in 3.10+, changes to nodes no longer require a restart of the node. Changes made to the node config map are automatically made by the sync pods.

Release notes: https://docs.openshift.com/container-platform/3.10/release_notes/ocp_3_10_release_notes.html#ocp-310-new-node-configuration-process

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

systemctl stop docker atomic-openshift-node stops both docker and atomic-openshift-node. Why do systemctl stop docker again?
Either one line:
systemctl stop docker atomic-openshift-node
Or two lines:
systemctl stop atomic-openshift-node
systemctl stop docker

Please check other similar places in the PR

Copy link
Copy Markdown
Contributor

@xingxingxia xingxingxia Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node service means systemd service atomic-openshift-node. If wanting to restart the node service, just systemctl restart atomic-openshift-node instead of reboot the whole host. Thus pls remove lines 799~804

Please check other similar places in the PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to previous

Copy link
Copy Markdown
Contributor

@xingxingxia xingxingxia Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lihongan I'm not sure either whether above steps for etcd are correct. Waiting for etcd QE owner @geliu2016 (but he is on leave and may be back next week) . The changes about etcd in following files need his review too.

Copy link
Copy Markdown
Contributor

@xingxingxia xingxingxia Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, will this PR's change take effect for all versions https://docs.openshift.com/container-platform/3.9, 3.10 and 3.7 ... etc? If yes, need take care of versions <=3.9. Because removing systemd services etcd.service/atomic-openshift-master-api/atomic-openshift-master-controllers with static pods /etc/origin/node/pods as replacement only begins since 3.10

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xingxingxia Thank you. This PR is for 3.10=>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hello @mburke5678, regarding to 'reboot', I tried to reboot etcd host and 'systemctl start atomic-openshift-node', both works(the etcd pod status is running), so my question is reboot etcd host is mandatory or run 'systemctl start atomic-openshift-node' is also ok.

as my experience, sometimes, in some special vm(virtual machine) env, reboot host operation will result in the host can't be started normally, even be terminated. so i have some concern about reboot operation.

Copy link
Copy Markdown
Contributor Author

@mburke5678 mburke5678 Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geliu2016 I removed the reboot command so that the user need to only restate the node. Do you know whom we can ask to be certain?
Thank you for pointing this out.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mburke5678 , for etcd issue, you may try to ask to be cretain with jlegasse@redhat.com

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geliu2016 Joe Legasse: "As far as etcd is concerned, no reboot is necessary of the host machine..."

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node name is myserver.com so please remove /healthz from it.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other fixes here looks good, thanks.

@mburke5678
Copy link
Copy Markdown
Contributor Author

@nekop @xingxingxia @lihongan @geliu2016 Are we OK to send this PR to documentation peer review?

1 similar comment
@mburke5678
Copy link
Copy Markdown
Contributor Author

@nekop @xingxingxia @lihongan @geliu2016 Are we OK to send this PR to documentation peer review?

@geliu2016
Copy link
Copy Markdown

geliu2016 commented Sep 11, 2018 via email

@mburke5678
Copy link
Copy Markdown
Contributor Author

@geliu2016 Did I address your comment? Was it #11482 (comment)?

Copy link
Copy Markdown

@geliu2016 geliu2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geliu2016 Did I address your comment? Was it #11482 (comment)?

@mburke5678 , I noticed that 'reboot' be removed at line 26, is this 'reboot'(line 37) necessary? thx

admin_guide/topics/proc_removing-failed-etcd-member.adoc https://github.com/openshift/openshift-docs/pull/11482/files#diff-13578813adedc157a9999343aa416b14 line 37

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mburke5678 , I noticed that 'reboot' be removed at line 26, is this 'reboot'(line 37) necessary? thx

Copy link
Copy Markdown
Contributor Author

@mburke5678 mburke5678 Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geliu2016 I apologize. I thought I had removed the reboot. I have removed the command now.
Commit: 45165857d06f12c2d6d185a6078804ee9656aa50

@mburke5678
Copy link
Copy Markdown
Contributor Author

@mfojtik Can you help with an ectd question? In the "Creating a single-node etcd cluster" procedure, does the user need a reboot after stopping etcd in Step 1 of the following?
https://github.com/openshift/openshift-docs/blob/45165857d06f12c2d6d185a6078804ee9656aa50/admin_guide/topics/proc_etcd-quorum-single-node-v2-v3.adoc

@mburke5678
Copy link
Copy Markdown
Contributor Author

@geliu2016 I finally found someone to verify that the reboot here https://github.com/openshift/openshift-docs/pull/11482/files#diff-13578813adedc157a9999343aa416b14 is not needed and is removed, per mfojtik .

@csheremeta
Copy link
Copy Markdown

I also opened this https://bugzilla.redhat.com/show_bug.cgi?id=1626735 sorry for any dupes!

@mburke5678
Copy link
Copy Markdown
Contributor Author

@csheremeta PTAL

@mburke5678
Copy link
Copy Markdown
Contributor Author

@geliu2016 @lihongan PTAL One change from a customer issue and we can merge.
ca1eb2f

@geliu2016
Copy link
Copy Markdown

@mburke5678, LGTM, thx

@mburke5678
Copy link
Copy Markdown
Contributor Author

@openshift/team-documentation Find and replace changes. PTAL to see if there is anything I did wrong.

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 11, 2018
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 12, 2018
@mburke5678 mburke5678 merged commit ec23535 into openshift:master Oct 12, 2018
@mburke5678
Copy link
Copy Markdown
Contributor Author

/cherrypick enterprise-3.10

@openshift-cherrypick-robot
Copy link
Copy Markdown

@mburke5678: #11482 failed to apply on top of branch "enterprise-3.10":

.git/rebase-apply/patch:104: trailing whitespace.
$ systemctl start docker atomic-openshift-node 
.git/rebase-apply/patch:144: trailing whitespace.
To configure the amount of allocatable resources, edit the appropriate xref:../admin_guide/manage_nodes.adoc#modifying-nodes[node configuration map] 
.git/rebase-apply/patch:196: trailing whitespace.
. On the etcd node that you did not remove from the cluster, stop all etcd services by 
.git/rebase-apply/patch:217: trailing whitespace.
. Stop the etcd service on the failed etcd member by 
.git/rebase-apply/patch:335: trailing whitespace.
# systemctl stop docker 
warning: squelched 10 whitespace errors
warning: 15 lines add whitespace errors.
Using index info to reconstruct a base tree...
M	admin_guide/disabling_features.adoc
M	admin_guide/manage_nodes.adoc
M	admin_guide/out_of_resource_handling.adoc
M	architecture/infrastructure_components/kubernetes_infrastructure.adoc
A	contributing_to_docs/doc_guidelines.adoc
M	day_two_guide/topics/increasing_docker_storage.adoc
M	day_two_guide/topics/proc_bringing-openshift-online.adoc
M	day_two_guide/topics/proc_restoring-etcd.adoc
M	install/host_preparation.adoc
M	install/prerequisites.adoc
M	install_config/configuring_aws.adoc
M	install_config/configuring_sdn.adoc
M	install_config/http_proxies.adoc
M	install_config/master_node_configuration.adoc
M	security/monitoring.adoc
M	upgrading/downgrade.adoc
Falling back to patching base and 3-way merge...
Auto-merging upgrading/downgrade.adoc
CONFLICT (content): Merge conflict in upgrading/downgrade.adoc
Auto-merging security/monitoring.adoc
CONFLICT (content): Merge conflict in security/monitoring.adoc
Auto-merging install_config/master_node_configuration.adoc
CONFLICT (content): Merge conflict in install_config/master_node_configuration.adoc
Auto-merging install_config/http_proxies.adoc
Auto-merging install_config/configuring_sdn.adoc
Auto-merging install_config/configuring_aws.adoc
Auto-merging install/prerequisites.adoc
Auto-merging install/host_preparation.adoc
CONFLICT (content): Merge conflict in install/host_preparation.adoc
Auto-merging day_two_guide/topics/proc_restoring-etcd.adoc
Auto-merging day_two_guide/topics/proc_bringing-openshift-online.adoc
CONFLICT (content): Merge conflict in day_two_guide/topics/proc_bringing-openshift-online.adoc
Auto-merging day_two_guide/topics/increasing_docker_storage.adoc
CONFLICT (modify/delete): contributing_to_docs/doc_guidelines.adoc deleted in HEAD and modified in Several pages still use removed services. Version Several pages still use removed services of contributing_to_docs/doc_guidelines.adoc left in tree.
Auto-merging architecture/infrastructure_components/kubernetes_infrastructure.adoc
Auto-merging admin_guide/out_of_resource_handling.adoc
Auto-merging admin_guide/manage_nodes.adoc
Auto-merging admin_guide/disabling_features.adoc
error: Failed to merge in the changes.
Patch failed at 0001 Several pages still use removed services

Details

In response to this:

/cherrypick enterprise-3.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

branch/enterprise-3.10 branch/enterprise-3.11 size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants