Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT EVER MERGE] Reproducing soft lockup v1 #38731

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion cluster/gce/config-default.sh
Expand Up @@ -166,7 +166,7 @@ HAIRPIN_MODE="${HAIRPIN_MODE:-promiscuous-bridge}" # promiscuous-bridge, hairpin
E2E_STORAGE_TEST_ENVIRONMENT=${KUBE_E2E_STORAGE_TEST_ENVIRONMENT:-false}

# Evict pods whenever compute resource availability on the nodes gets below a threshold.
EVICTION_HARD="${EVICTION_HARD:-memory.available<250Mi,nodefs.available<10%,nodefs.inodesFree<5%}"
EVICTION_HARD="${EVICTION_HARD:-memory.available<30%,nodefs.available<10%,nodefs.inodesFree<5%}"

# Optional: custom scheduling algorithm
SCHEDULING_ALGORITHM_PROVIDER="${SCHEDULING_ALGORITHM_PROVIDER:-}"
Expand Down
10 changes: 10 additions & 0 deletions delete_pods.sh
@@ -0,0 +1,10 @@
#!/bin/bash
echo "Starting to delete pods every 10 seconds"
while true
do
echo "_______________________________"
kubectl get pods
kubectl get nodes
kubectl delete pods --all
sleep 5
done
11 changes: 11 additions & 0 deletions first_repro.sh
@@ -0,0 +1,11 @@
#!/bin/bash
#use these commands to reproduce soft lockup.
make clean
make quick-release
./cluster/kube-up.sh
kubectl create -f ./repro_first_trial.yaml
sleep 10
./delete_pods.sh
# once you have the names of the nodes, collect the serial port output to see when soft lockup occurs:
# (on gcp) gcloud beta compute instances tail-serial-port-output kubernetes-minion-group-[YOUR-XXXX-HERE] --zone=us-central1-b &> node_[YOUR-XXXX-HERE]_serial_out.txt &
# until cat node_[YOUR-XXXX-HERE]_serial_out.txt | grep "soft lockup"; do sleep 10; done
2 changes: 1 addition & 1 deletion pkg/apis/componentconfig/v1alpha1/defaults.go
Expand Up @@ -378,7 +378,7 @@ func SetDefaults_KubeletConfiguration(obj *KubeletConfiguration) {
obj.EvictionPressureTransitionPeriod = metav1.Duration{Duration: 5 * time.Minute}
}
if obj.ExperimentalKernelMemcgNotification == nil {
obj.ExperimentalKernelMemcgNotification = boolVar(false)
obj.ExperimentalKernelMemcgNotification = boolVar(true)
}
if obj.SystemReserved == nil {
obj.SystemReserved = make(map[string]string)
Expand Down
15 changes: 15 additions & 0 deletions repro_first_trial.yaml
@@ -0,0 +1,15 @@
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: containervm-deployment
spec:
replicas: 11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default namespace has a limitrange that sets cpu=100m which you can work around by setting your own limits of cpu=10m, that'd given you more pods per node and (probably) a faster repro

template:
metadata:
labels:
app: busythebox
spec:
containers:
- name: busythebox
image: gcr.io/google_containers/stress:v1
args: ["-mem-alloc-size", "100Mi", "-mem-alloc-sleep", "1s", "-mem-total", "500Mi"]