# Multi-Dimensional Arrays with NumPy

## Context
As an SRE, you often deal with massive streams of numerical data: CPU utilization percentages, network throughput in bytes, or memory consumption across hundreds of containers. Python lists are far too slow and memory-inefficient to handle this at scale.

## Objectives
- Understand NumPy's core object: the `ndarray` (N-dimensional array).
- Learn how to perform vectorized operations on arrays without slow `for` loops.
- Apply basic statistical functions to uncover anomalies in synthetic infrastructure metrics.

## Expected Outcome
- The ability to quickly analyze thousands of datapoints, calculate baselines, and detect spikes using pure mathematics.

In [None]:
import numpy as np

### 1. Creating Arrays (Simulating Metrics)
An `ndarray` allows you to store and manipulate large datasets efficiently. Let's create some arrays simulating CPU load percentages across different servers.

In [None]:
# 1D Array: CPU load recorded every minute for 1 server
cpu_server_1 = np.array([45, 48, 52, 50, 47])
print("1D Array (Server 1):", cpu_server_1)

# 2D Array (Matrix): CPU load for 3 separate servers over 4 minutes
cpu_cluster = np.array([
    [45, 48, 52, 50], # Server A
    [20, 22, 21, 23], # Server B
    [85, 88, 90, 89]  # Server C (Spiking!)
])
print("\n2D Array (Cluster Metrics):\n", cpu_cluster)

### 2. Inspecting Array Properties
When you load a dataset from a log file, the first step is understanding its dimensions.

In [None]:
print("Shape of cpu_cluster:", cpu_cluster.shape)  # Output: (3 servers, 4 minutes)
print("Number of dimensions:", cpu_cluster.ndim)
print("Total data points:", cpu_cluster.size)

### 3. Generating Data
Often you need to generate placeholder metrics or baseline limits.

In [None]:
# Create an array representing a 0% load baseline for 5 servers
zeros_array = np.zeros(5)
print("Zero Baseline:\n", zeros_array)

# Create an array representing a max capacity limit (100%)
max_capacity = np.full((3, 4), 100)
print("\nMax Capacity Matrix:\n", max_capacity)

# Generate a sequence of timestamps (e.g., 0 to 60 seconds, step 10)
timestamps = np.arange(0, 60, 10)
print("\nTimestamps:", timestamps)

### 4. Indexing and Slicing (Isolating Servers)
If Server C is throwing alerts, we need to extract only its data from the matrix.

In [None]:
# Accessing specific elements
print("Load for Server B at Minute 2:", cpu_cluster[1, 1]) 

# Slicing entire rows/columns
print("All metrics for Server C:", cpu_cluster[2, :])
print("Metrics across all servers at Minute 3:", cpu_cluster[:, 2])

### 5. Element-wise Operations & Broadcasting
NumPy's greatest strength is performing mathematical operations on entire arrays at once, without `for` loops. This is called *vectorization*.

In [None]:
ram_used_gb = np.array([16, 32, 8])
ram_total_gb = np.array([64, 64, 16])

# Element-wise calculation to find utilization percentage
utilization = (ram_used_gb / ram_total_gb) * 100
print("RAM Utilization %:", utilization)

# Broadcasting: Subtracting a scalar from an array
# Imagine a new software update reduces RAM usage by 2GB across all servers
new_ram_used = ram_used_gb - 2
print("\nNew RAM Usage:", new_ram_used)

### 6. Statistical Operations (Finding Anomalies)
Using built-in stats functions allows you to immediately identify outliers in your cluster.

In [None]:
print("Matrix:\n", cpu_cluster)

# Global stats
print("\nGlobal Max CPU Load:", np.max(cpu_cluster))
print("Global Mean CPU Load:", np.mean(cpu_cluster))

# Axis-specific stats (0 = columns, 1 = rows)
# Calculate the mean load for each individual server
server_means = np.mean(cpu_cluster, axis=1)
print("\nMean Load per Server:", server_means)

# Calculate the 99th percentile across all data (crucial for SLAs)
p99 = np.percentile(cpu_cluster, 99)
print("99th Percentile CPU Load:", p99)

### 7. Boolean Masks (Filtering Alerts)
We can use arrays of booleans to filter raw data and retrieve only the values that breach our thresholds.

In [None]:
# Find all instances where CPU load exceeded 80%
alert_mask = cpu_cluster > 80
print("Alert Mask:\n", alert_mask)

# Use the mask to extract those specific dangerous values
dangerous_values = cpu_cluster[alert_mask]
print("\nValues exceeding 80%:", dangerous_values)