Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add node level chaos scenarios for bastion node #68

Merged
merged 1 commit into from
Feb 16, 2021

Conversation

pravin-dsilva
Copy link
Contributor

Scenario: node level test to shut down and start the bastion node. Upon node startup, the list of services provided in the scenario will be checked if they are running or not.

Signed-off-by: Pravin Dsilva pravin.d-silva@ibm.com

@rht-perf-ci
Copy link

Can one of the admins verify this patch?

@@ -25,7 +26,7 @@ Following node chaos scenarios are supported:
A google service account is required to give proper authentication to GCP for node actions. See [here](https://cloud.google.com/docs/authentication/getting-started) for how to create a service account.

**NOTE**: A user with 'resourcemanager.projects.setIamPolicy' permission is required to grant project-level permissions to the service account.

r
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to remove this extra line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed it along with a few other minor changes.

@pravin-dsilva pravin-dsilva force-pushed the bastion_scenario branch 2 times, most recently from ed73d0a to 81c0df7 Compare February 4, 2021 17:42
time.sleep(sleeper)
i += sleeper
logging.info("Trying to ssh to instance: %s" % (node))
connection = ssh.connect(node, username='root', key_filename='/root/.ssh/id_rsa', timeout=800, banner_timeout=400)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we be able to add this ssh key parameter as a parameter in the node scenario yaml file

@pravin-dsilva pravin-dsilva force-pushed the bastion_scenario branch 3 times, most recently from 5037e88 to 5b971a0 Compare February 5, 2021 14:21
run_kraken.py Outdated
@@ -49,6 +49,8 @@ def inject_node_scenario(action, node_scenario, node_scenario_object):
node_name = node_scenario.get("node_name", "")
label_selector = node_scenario.get("label_selector", "")
timeout = node_scenario.get("timeout", 120)
service = node_scenario.get("service", "")
ssh_private_key = node_scenario.get("ssh_private_key", "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lets add "~/.ssh/id_rsa" as the default value for the ssh_private_key parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added it.

@paigerube14
Copy link
Collaborator

I’m not super familiar with helper nodes, would the helper node name and IP address be returned from oc get nodes using a specific label_selector in the config? If so, I think we could reuse the stop start node scenario code but still add on the service check at the end for this specific action. AKA no need for helper_node_start_scenario and helper_node_stop_scenario in openstack_node_sceanrios. Thoughts?

Also, not sure if this is all from code you have added but can you please take a look at the spacing issues outlined using tox.

To run:
pip install tox (if not already installed; might want to add this to the requirements)

Run “tox” and fix all spacing/import issues

@pravin-dsilva
Copy link
Contributor Author

I’m not super familiar with helper nodes, would the helper node name and IP address be returned from oc get nodes using a specific label_selector in the config? If so, I think we could reuse the stop start node scenario code but still add on the service check at the end for this specific action. AKA no need for helper_node_start_scenario and helper_node_stop_scenario in openstack_node_sceanrios. Thoughts?

Yes, we intended to re-use the code but since the helper node is an external entity not part of the openshift cluster, there is no way to get the ip or hostname from oc get nodes.

Also, not sure if this is all from code you have added but can you please take a look at the spacing issues outlined using tox.

To run:
pip install tox (if not already installed; might want to add this to the requirements)

Run “tox” and fix all spacing/import issues

Sure, I have created another commit in this PR fixing all the indentation issues which are not part of my code. Thanks

node_name: # node on which scenario has to be injected
label_selector: node-role.kubernetes.io/worker # when node_name is not specified, a node with matching label_selector is selected for node chaos scenario injection
instance_kill_count: 1 # number of times to inject each scenario under actions
timeout: 120 # duration to wait for completion of node scenario injection
cloud_type: aws # cloud type on which Kubernetes/OpenShift runs
helper_node_ip: # ip address of the helper node
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to move these next couple of new parameters to the example part of docs/node_scenario.md? Might be less confusing since those items aren't going to be used in the actions that are run be default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, moved them to the docs.

@paigerube14
Copy link
Collaborator

Is there a plan to add this type of scenario to all could types? If not, it would be nice to give a better error when trying to use aws with the stop_start_helper_node_scenario for example. Instead of getting this ugly error message:
AttributeError: 'aws_node_scenarios' object has no attribute 'helper_node_stop_scenario'

Other than that LGTM

@pravin-dsilva pravin-dsilva force-pushed the bastion_scenario branch 3 times, most recently from 5a9acf1 to 152a1c1 Compare February 15, 2021 18:02
@paigerube14
Copy link
Collaborator

LGTM

@mffiedler
Copy link
Collaborator

@pravin-dsilva please squash the commits and we'll do a final review. Thanks for working with us on this.

Signed-off-by: Pravin Dsilva <pravin.d-silva@ibm.com>
@pravin-dsilva
Copy link
Contributor Author

Thanks, please take a look.

@mffiedler mffiedler self-requested a review February 16, 2021 20:30
Copy link
Collaborator

@mffiedler mffiedler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mffiedler mffiedler merged commit bfe9448 into krkn-chaos:master Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants