[Internal]: Add critical first steps to ECK troubleshooting workflow

## Description

Add a "First Steps" section at the beginning of the troubleshooting workflow to prevent common misdiagnoses and reduce investigation time.

**What:** We are adding a new section that emphasizes three critical checks users should perform before detailed troubleshooting:
1. Collect eck-diagnostics immediately (events expire after ~1 hour)
2. Check Kubernetes security policies and permissions (most common blocker)
3. Verify pod status before investigating application errors (causality awareness)

**Why:** Many ECK deployment issues are caused by Kubernetes admission layer blocks (security policies, quotas, admission webhooks) rather than application configuration. Without checking the Kubernetes layer first, users spend days investigating symptoms (operator errors, authentication failures) instead of the root cause (pods never created).

**Details users need to know:**
- eck-diagnostics captures critical namespace events that reveal pod creation failures
- `UP-TO-DATE: 0` metric indicates Kubernetes is blocking pod creation (not app failure)
- Operator errors (401, 503, connection refused) often occur because pods don't exist
- Security policy violations appear in events.json, not pod logs
- Events expire quickly - collect diagnostics early

---

## Proposed Content

**Section Title:** First Steps: Critical Checks Before Detailed Investigation

**Location:** Add as first major section after page introduction, before existing troubleshooting steps

**Content:**

```markdown
## First Steps: Critical Checks Before Detailed Investigation

Perform these checks first to catch common issues and prevent unnecessary investigation:

### Step 1: Collect eck-diagnostics

Collect diagnostics immediately for any ECK deployment issue. Events expire after ~1 hour in Kubernetes.

```bash
# Download from https://github.com/elastic/eck-diagnostics/releases/latest
./eck-diagnostics -o <operator-namespace> -r <resource-namespace>

# Check for pod creation failures
unzip -p eck-diagnostics-*.zip <namespace>/events.json | \
  jq '.items[] | select(.reason=="FailedCreate")'
```

**When to collect:**
- Deployments show `READY: 0/1` or `UP-TO-DATE: 0`
- Reports of "no pods deployed"
- Any new ECK deployment issue

### Step 2: Check Kubernetes Security Policies

Most "no pods created" issues stem from Kubernetes security policies blocking admission.

```bash
# Check namespace Pod Security labels
kubectl get namespace <namespace> -o yaml | grep pod-security

# Check for FailedCreate events
kubectl get events -n <namespace> | grep FailedCreate

# Check deployment status
kubectl get deployment -n <namespace>
```

**Common patterns:**

| Symptom | Likely Cause | Action |
|---------|--------------|--------|
| `UP-TO-DATE: 0` | Kubernetes blocking pod creation | Check events for FailedCreate |
| "violates PodSecurity" in events | Security policy violation | See kubernetes troubleshooting page |
| "exceeded quota" in events | Resource quota limit | Run `kubectl describe quota` |

### Step 3: Verify Pod Status First

Operator errors are often symptoms of pods not existing.

```bash
kubectl get pods -n <namespace>
```

**Decision point:**
- **No pods (UP-TO-DATE: 0)?** → Kubernetes-layer issue (check events, security policies)
- **Pods exist but failing?** → Application-layer issue (check pod logs)

**Important:** Don't investigate operator errors (401, 503) before verifying pods exist.
```

---

## Rationale

**Problem:** Users often investigate application-layer errors (authentication, connectivity) for days without first checking if pods were ever created. Kubernetes security policies silently block pod creation at the admission layer.

**Impact:** This addition provides a clear entry point that catches Kubernetes-layer issues immediately, reducing multi-day investigations to hours.

**Placement:** At the top of troubleshooting workflow ensures all users see these critical checks first.


### Resources

**Target Page:**  
https://www.elastic.co/docs/troubleshoot/deployments/cloud-on-k8s/troubleshooting-methods


### Which documentation set does this change impact?

Elastic On-Prem only

### Feature differences

N/A

### What release is this request related to?

9.1

### Serverless release

N/A

### Collaboration model

The documentation team

### Point of contact.

**Main contact:** @eedugon 

**Stakeholders:** @damianpfister 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Internal]: Add critical first steps to ECK troubleshooting workflow #3928

Description

Proposed Content

Step 2: Check Kubernetes Security Policies

Step 3: Verify Pod Status First

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Symptom	Likely Cause	Action
`UP-TO-DATE: 0`	Kubernetes blocking pod creation	Check events for FailedCreate
"violates PodSecurity" in events	Security policy violation	See kubernetes troubleshooting page
"exceeded quota" in events	Resource quota limit	Run `kubectl describe quota`

[Internal]: Add critical first steps to ECK troubleshooting workflow #3928

Description

Description

Proposed Content

Step 2: Check Kubernetes Security Policies

Step 3: Verify Pod Status First

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions