openshift · openshift-merge-robot · Feb 1, 2021 · Jan 28, 2021
diff --git a/README.md b/README.md
@@ -13,3 +13,8 @@ a series of tests to diagnose the functionality in the cluster works ok. In case
 would dig in the script and provide further checks to point at potential root causes to resolve the issue.
 Therefore, the pattern for these scripts is to find issues and *why* those issues exist. Gathering logs is not
 in the scope of these scripts and are better suited on the openshift/must-gather image.
+
+## Documentation
+
+* Please see [contributor docs] (https://github.com/openshift/network-tools/blob/master/docs/contributor.md) for more information regarding the scope of this repository and how to contribute.
+* Users can go to [user docs] (https://github.com/openshift/network-tools/blob/master/docs/user.md) for information on how to leverage the tools and scripts shipped by this image.
diff --git a/docs/contributor.md b/docs/contributor.md
@@ -0,0 +1,66 @@
+# Contributor Documentation 
+
+Contributing to OpenShift Network-Tools will enable us to provide a better way to debug OpenShift Cluster Networking. Any type of contributions such as improving existing scripts, fixing bugs, adding new scripts are valuable and most welcome. Please open issues on github for this repository if you find any. RH employees can also reach out to us on #forum-sdn.
+
+## Scope 
+
+This repository aims at providing debugging tools for:
+
+- Checking overall cluster connectivity
+- Features developed as a part of OpenShift Networking. Currently supported areas include:
+    - OpenShiftSDN
+    - OVNKubernetes
+    - SRIOV
+- Capturing packet dumps and traces from interfaces of nodes/pods/resources in the cluster real-time
+- Gather networking metrics/logs from the cluster in situations where API is unreachable (must-gather will not help) by running the scripts locally
+- Allowing to spinning up a hostnetwork pod with priviledges which has the basic networking command line tools installed
+- Running commands in the pod's network namespace.
+
+**NOTE :**  All the scripts can be executed using:
+
+```
+    oc adm network-tools
+```
+command. Note this is a [WIP] (https://github.com/openshift/oc/pull/709). See the user documentation for more information on how to run existing scripts against an OpenShift cluster.
+
+## Repository Structure
+
+- All scripts live in `network-tools/debug-scripts`.
+- When the image is build, the scripts are then copied into `/usr/bin/`.
+- All docs live in `network-tools/docs`.
+- All dependencies should be vendored and they live in `network-tools/vendor`.
+- The main dockerfile used for the official release image build is `Dockerfile`.
+- The dockerfile for development purposes is based of off fedora and is called `Dockerfile.fedora`.
+
+## Adding Scripts
+
+Please follow the undermentioned instructions when adding a new script to the network-tools repo. This is to maintain a minimum level of uniformity.
+
+- Name of the script should reflect what it intends to do, eg: `ovn_pod_to_pod_connecvity` intends to check the connectivity between pods on an OVNKubernetes-k8s cluster.
+- Name of the script should start with the name of high-level component that it is a part of. eg: All scripts testing openshift-sdn should start with the prefix `sdn_`.
+- Functions that are reused in more than one script should be added to the `common` script.
+- If the script is intended to be a part of the default collection of scripts to be run, it should be invoked from `network-tools`.
+- A brief `help` method explaining the usage and options should be added which can be invoked with the `-h` option. eg: `ovn_pod_to_pod_connecvity -h`.
+- The script should try and follow a basic structure similar to existing scripts in the respository. eg: a `main` function, a meaningful sub-function that starts with the prefix `do_$file_name`.
+- The script should by default create the necessary resources to do the test if the user has not passed any arguments.
+- Each script should be both standalone and at the same time if invoked in the default mode, be compatible when running with the rest of the scripts.
+- Each message printed should fall under either `INFO` or `SUCCESS` or `FAILURE` categories.
+- Should test the functionality of the script with `oc adm network-tools --`. Make sure the script does not break the build and is well tested.
+- Add documentation regarding what the script does to the user docs.
+- Even though this image can be accessed only by priviledges users/administrators, avoid security vulnerabilites.
+- Use discretion when commands need to be run from a network namespace. First preference would be to use the `oc debug node/xx` command. If there are too many commands to be run create a hostNetwork pod.
+- If script assumes to have direct ssh access into the nodes in the cluster, it should be explicitly stated in the help function and must be used only under exceptional circumstances like when the api is down and new pods cannot be created.
+- All resources created for testing, must use the `openshift-network-tools` image.
+- All resource names should start with the prefix `network-tools-*`.
+
+## Reporting Bugs
+
+- Open an [issue] ( https://github.com/openshift/network-tools/issues/new ) against the repository specifying the of the detail bug.
+
+## Fixing Bugs
+
+- Open a PR with the fix against the repository indicating the issue.
+
+## Missing Tools
+
+- If any networking CLI tools or packages need to be shipped to enhance debugging experience, open a PR adding it to the `Dockerfile`'s install packages with adequate justification.
diff --git a/docs/user.md b/docs/user.md
@@ -0,0 +1,40 @@
+# User Documentation 
+
+As an end user of OpenShift, you can use network-tools to debug and manage cluster networking directly through the CLI.
+# What is OpenShift network-tools?
+
+OpenShift network-tools contains a set of:
+
+- cluster network debugging scripts written in bash
+- frequently used network CLI tools like ping, netcat, tcpdump, strace etc.
+
+to debug the networking state of a cluster in real time. This tool is currently only supported (has been tested) from OCP4.8.
+
+**NOTE :**  The official supported way of running all scripts and tools are only through the `oc adm network-tools` utility. In certain unprecedented situations like when the api-server is down or networking is not up in the cluster, we might have to directly access the nodes in the cluster to figure out the root cause. In other situations, we may have to access the network namespace of the pod to run specific commands. Some of the scripts in this repository are written with such debugging situations in mind. Although the image is restricted to administrators and priviledges users, care must be taken when running such scripts locally on the cluster. Scripts must be double checked to ensure they do exactly what they intend to do.
+
+# How is this different from must-gather?
+
+While OpenShift must-gather focuses on gathering logs from each container in the cluster, network-tools focuses on collecting relevant information obtained by running a specific command or script during a specific window of (real) time in the cluster.
+
+In certain situations the container logs might be difficult to parse and sometimes they may not contain all the relevant information needed to debug certain tricky networking bugs and packet losses like ovs/ovn packet traces and packet dumps. In addition to faciliating information collection necessary for debugging networking, it also allows users to run sample connectivity tests between existing nodes/pods/services. In future we hope to add debugging scripts for features such as network policies, egress IPs and egress routers which can help explain the path taken by a packet from a source to destination.
+
+# Example Scenarios
+
+Undermentioned scenarios are a part of the motivation behind which network-tools was created. Note that some of them are still a work in progress.
+
+- I want to do a quick connectivity check between podA and serviceB.
+- I want to check the status of the nodeports on nodeA.
+- I want to check which are the free service ports that I can use or how many free podIPs are left in the hostsubnet of nodeA.
+- I want to check the status of all the backing pods of serviceA.
+- I want to test if all the ports on podA are in listen state as they are expected to be.
+- I want to capture packets on interface X of nodeA.
+- I want to run commands like 'tcpdump -i bond0' or 'conntrack -L' or 'sysctl -A' and filter out the gathered data in a useful way on all the master nodes.
+- I want to run an ovn/ovs-packet trace between podA and podB.
+- I want to dump ovs/ovn flows and conntrack's state of connections on the SDN (OVN) pod running on nodeA.
+- I want to check if packets over the overlay network are encrypted using IPSec.
+- I want to pull network interface information when APIs are unresponsive (must-gather might not help since a new pod cannot be spawned) by running the scripts locally.
+- I want to check if the egress firewall blocks traffic of typeY.
+
+# Invoking Scripts
+
+TODO: Will update this section once we have the oc client integration patch merged.