-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add node level chaos scenarios for bastion node #68
Conversation
Can one of the admins verify this patch? |
3454d20
to
3852918
Compare
docs/node_scenarios.md
Outdated
@@ -25,7 +26,7 @@ Following node chaos scenarios are supported: | |||
A google service account is required to give proper authentication to GCP for node actions. See [here](https://cloud.google.com/docs/authentication/getting-started) for how to create a service account. | |||
|
|||
**NOTE**: A user with 'resourcemanager.projects.setIamPolicy' permission is required to grant project-level permissions to the service account. | |||
|
|||
r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to remove this extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed it along with a few other minor changes.
ed73d0a
to
81c0df7
Compare
time.sleep(sleeper) | ||
i += sleeper | ||
logging.info("Trying to ssh to instance: %s" % (node)) | ||
connection = ssh.connect(node, username='root', key_filename='/root/.ssh/id_rsa', timeout=800, banner_timeout=400) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would we be able to add this ssh key parameter as a parameter in the node scenario yaml file
5037e88
to
5b971a0
Compare
run_kraken.py
Outdated
@@ -49,6 +49,8 @@ def inject_node_scenario(action, node_scenario, node_scenario_object): | |||
node_name = node_scenario.get("node_name", "") | |||
label_selector = node_scenario.get("label_selector", "") | |||
timeout = node_scenario.get("timeout", 120) | |||
service = node_scenario.get("service", "") | |||
ssh_private_key = node_scenario.get("ssh_private_key", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe lets add "~/.ssh/id_rsa" as the default value for the ssh_private_key parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added it.
5b971a0
to
0c730ca
Compare
I’m not super familiar with helper nodes, would the helper node name and IP address be returned from oc get nodes using a specific label_selector in the config? If so, I think we could reuse the stop start node scenario code but still add on the service check at the end for this specific action. AKA no need for helper_node_start_scenario and helper_node_stop_scenario in openstack_node_sceanrios. Thoughts? Also, not sure if this is all from code you have added but can you please take a look at the spacing issues outlined using tox. To run: Run “tox” and fix all spacing/import issues |
0c730ca
to
cf7703c
Compare
Yes, we intended to re-use the code but since the helper node is an external entity not part of the openshift cluster, there is no way to get the ip or hostname from oc get nodes.
Sure, I have created another commit in this PR fixing all the indentation issues which are not part of my code. Thanks |
scenarios/node_scenarios_example.yml
Outdated
node_name: # node on which scenario has to be injected | ||
label_selector: node-role.kubernetes.io/worker # when node_name is not specified, a node with matching label_selector is selected for node chaos scenario injection | ||
instance_kill_count: 1 # number of times to inject each scenario under actions | ||
timeout: 120 # duration to wait for completion of node scenario injection | ||
cloud_type: aws # cloud type on which Kubernetes/OpenShift runs | ||
helper_node_ip: # ip address of the helper node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to move these next couple of new parameters to the example part of docs/node_scenario.md? Might be less confusing since those items aren't going to be used in the actions that are run be default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, moved them to the docs.
Is there a plan to add this type of scenario to all could types? If not, it would be nice to give a better error when trying to use aws with the stop_start_helper_node_scenario for example. Instead of getting this ugly error message: Other than that LGTM |
5a9acf1
to
152a1c1
Compare
LGTM |
152a1c1
to
0149dd1
Compare
@pravin-dsilva please squash the commits and we'll do a final review. Thanks for working with us on this. |
Signed-off-by: Pravin Dsilva <pravin.d-silva@ibm.com>
0149dd1
to
918b5fb
Compare
Thanks, please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Scenario: node level test to shut down and start the bastion node. Upon node startup, the list of services provided in the scenario will be checked if they are running or not.
Signed-off-by: Pravin Dsilva pravin.d-silva@ibm.com