Snooping on the Kubernetes OpenAPI communications
Our initial goal is to provide a useful indicator as to which Kubernetes APIs are used the most and don't yet have conformance tests. This is specifically to ensure we are testing APIs that are relevant, rather than just hitting all the endpoints. Our K8s API consumer/user journeys are an important model to drive this prioritization, in addition to providing insight into how the APIs are being used.
Our currently rough output is the most promising visualazion of how api groups endpoints and verbs are used today. The APISnoop Visualization presents in very clear terms how poor our current coverage actually is. Our high level graph shows our stable API's are mostly untested (the grey sections of the outer ring).
We also export the data to a csv / google sheet that clearly shows the most popular untested API endpoints.
Highlighting the above untested stable core APIs:
Our Secondary goal is a Parallel Certification Program, using same machinery for Certified Kubernetes Provider, to certify set of Apps that utilize the Kubernetes API. ie. Istio, Skaffold, Draft => Require K8s 1.9 If you have 1.9, it will run those tools. If they utilize only v1/stable APIS they are guaranteed to run on at least the next K8s release.
In order to identify target applications to test, we define a Kubernetes API Consumer as a KAPIC.
We inspect the advanced audit logs to describe which APIs are called during KAPIC operations. For our initial data run we installed a small set of KAPIC helm charts and obverved which API groups are called. We used the d3 libary to create Sunburst Partition Graphs center stable/beta/alpha with partitions for APIGroups then APICalls/Verbs.
The initial raw data is available within our v0.0.1-audit-logs release and a simple interactive demo is running at http://apisnoop.cncf.io
- June 12th, 2018 - SIG Node - APISnoop initial mapping of endpoints to e2e tests
- May 23rd, 2018 - Conformance WG - APISnoop: easing contribution and driving pod api utilization Recording
- May 10th, 2018 - SIG Architecture - APISnoop Introduction & Recording
- May 4th, 2018 - KubeCon Copenhagen - Deep Dive for Conformance WG & Recording
kubeadm supports advanced audit logging in 1.10 and later. Here are two examples using that approach:
The packet/kubicorn walkthru from @deitch only needs some minor changes.
We created a ./bootstrap/ folder with updated scripts to enable audit-logging
git clone https://github.com/cncf/apisnoop.git
cd apisnoop
export PACKET_APITOKEN=FOOBARBAZZ
export PACKET_PROJECT=YOUR-PROJECT
# use ./bootstrap/packet_k8s_ubuntu_16.04_*.sh
export KUBICORN_FORCE_LOCAL_BOOTSTRAP=1
kubicorn create apisnoop --profile packet
# ensure clusterAPI.spec.providerConfig: project.name is set correctly
sed -ie "s:kubicorn-apisnoop:${PACKET_PROJECT}:" _state/apisnoop/cluster.yaml
kubicorn apply apisnoop
Some interesting parts of the kubicorn logs:
# using our advanced audit-log setup
[ℹ] Parsing bootstrap script from filesystem [bootstrap/packet_k8s_ubuntu_16.04_master.sh]
[✔] Created Device [apisnoop.master-0]
Verbose bootstrap logging available in addition to the audit.log:
ssh root@$MASTER_NODE tail -f /var/log/cloud-init-output.log
ssh root@$MASTER_NODE tail -f /var/log/audit/audit.log
We modified GoogleCloudPlatform/terraform-google-gce to use the AdvancedAuditing / Audit feature gates available in kubernetes/kubeadm. Thanks @danisla!
To utilize it, create a tf config using the above module. Be sure to set your project and location.
export CLUSTERNAME=foo
cat <EOF >>my.tf
# save as my-auditable-cluster.tf
provider "google" {
project = "ii-coop"
region = "us-central1"
}
module "k8s" {
source = "github.com/ii/terraform-google-k8s-gce?ref=audit-logging"
name = "${CLUSTERNAME}"
k8s_version = "1.10.2"
}
EOF
We can monitor our cloud-init progress on the master, then collect the audit logs directly from the apiserver (easier if we only have one master).
terraform init
terraform apply
MASTER_NODE=$(gcloud compute instances list | grep ${CLUSTERNAME}.\*master | awk '{print $1}')
gcloud compute ssh $MASTER_NODE --command "sudo tail -f /var/log/cloud-init-output.log /var/log/cloud-init.log"
# master node is up when you see: service "kubernetes-dashboard" created
These clusters do not have a public api endpoint, portforwarding is required.
# get a working kubeconfig, you may need to portforward
gcloud compute ssh $MASTER_NODE -- -L 6443:127.0.0.1:6443
Copy the admin kubeconfig from the apiserver and set it to use the local portforward.
export KUBECONFIG=$PWD/kubeconfig
gcloud compute ssh $MASTER_NODE \
--command "sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl config view --flatten" \
> $KUBECONFIG
kubectl config set clusters.kubernetes.server https://127.0.0.1:6443
Find the apiserver container and tail the audit log to see every api request.
gcloud compute ssh $MASTER_NODE --command "sudo tail -f /var/log/audit/audit.log"
Tail the audit.log via ssh and redirect the output locally.
ssh $MASTER_NODE tail -f /var/log/audit/audit.log | tee MYKAPIC-audit.log
#^c when finished auditing
Creating a namespace and service accounts for your KAPIC to use will make filtering easier.
kubectl create ns $KAPP
helm install $REPO/$KAPP \
--name $KAPP \
--namespace $KAPP \
--set serviceAccount.name=$KAPP \
--set serviceAccount.create=true \
--set rbac.create=true
At this point, drive the KAPIC with 'helm test' or it's own e2e suite.
Audit logs record everything happening on the API server. We hope to make filtering for a particular KAPIC easier in the future. Until then gron may prove useful.
Once you have the audit logs for the app, you can turn them into an interactive graph of the endpoints and methods that were requested by the app.
Some setup is required
cd dev/audit-log-review
pip install -r requirements.txt
To load the audit log into the database
python logreview.py load-audit <audit log path> <app name>
Now that the log is in the database, lets start the webserver and have a look
python logreview.py start-server
Go to http://localhost:9090
in a web browser
Click Apps
then the app name and you will get a graph that looks similar to this:
To see the coverage graph from the Kubernetes e2e tests obtained from Sonobuoy or manually, load the logs using the name e2e
python logreview.py load-audit <audit log path> e2e
Now start the webserver
python logreview.py start-server
and go to http://localhost:9090
in a web browser
Click e2e
and you will get a graph that looks similar to this:
If you want to export data as csv files
python logreview.py export-data <exporter name> <output csv path> <app name>
exporter name
can be one of:
- app-usage-categories: breakdown of API categories an app is using
- app-usage-summary: summary of alpha / beta / stable API usage
- app-usage-endpoints: a list of endpoints and methods the app connects to
- coverage-spreadsheet: combines conformance google sheets data with endpoint hit counts
From the CSV, you can easily preview in terminal by using the command
cat <output csv path> | tr "," " " | column -t
Example output