<h3>üõ†Ô∏è Advanced NumPy: Aggregations & Broadcasting</h3>


Problem 1: The Multi-Dimensional Sensor Grid
Scenario: You have a 3D array representing temperature readings from a grid of sensors over time.

Structure: (4 days, 5 sensors per row, 2 rows of sensors).

Data: data = np.random.randint(20, 45, (4, 5, 2))

Task:

Calculate the daily average temperature for the entire grid.

Find the highest temperature recorded for each specific sensor across the 4-day period.

Determine the variance of temperatures for each row of sensors, averaged over the 4 days.

In [20]:
data

array([[[38, 37],
        [26, 36],
        [30, 22],
        [43, 24],
        [20, 22]],

       [[37, 42],
        [36, 42],
        [40, 34],
        [21, 35],
        [21, 26]],

       [[26, 26],
        [28, 41],
        [26, 26],
        [43, 28],
        [39, 21]],

       [[30, 32],
        [26, 41],
        [44, 38],
        [29, 25],
        [23, 23]]])

In [6]:
import numpy as np
data = np.random.randint(20, 45, (4, 5, 2))

# 1. Daily Average (Collapse sensor dimensions)
daily_avg = data.mean(axis=(1, 2))

# 2. Max per sensor across time (Collapse day dimension)
sensor_max = data.max(axis=0)

# 3. Variance per row (Axis 2 is the row)
row_variance = data.var(axis=1).mean(axis=0)

In [10]:
daily_avg = data.mean(axis=(1, 2))

(4, 5, 2)

In [19]:
print(data.var(axis=1))
print(data.var(axis=1).mean(axis= 0))


[[67.84 46.56]
 [68.4  35.36]
 [51.44 45.04]
 [52.24 49.36]]
[59.98 44.08]


In [13]:
data

array([[[38, 37],
        [26, 36],
        [30, 22],
        [43, 24],
        [20, 22]],

       [[37, 42],
        [36, 42],
        [40, 34],
        [21, 35],
        [21, 26]],

       [[26, 26],
        [28, 41],
        [26, 26],
        [43, 28],
        [39, 21]],

       [[30, 32],
        [26, 41],
        [44, 38],
        [29, 25],
        [23, 23]]])

In [16]:
data.mean(axis = (0,2))

array([33.5  , 34.5  , 32.5  , 31.   , 24.375])

Problem 2: Global Sales Normalization
Scenario: You have a 2D array sales of shape (10, 3), where rows are different products and columns are sales in 3 different regions.

Create this array with random integers between 100 and 1000.

Normalize the data: Subtract the mean of each product from its sales across all regions, and then divide by the standard deviation of that specific product.

Verify that the new mean for each product row is approximately 0 and the standard deviation is 1. Note: This must be done entirely through broadcasting.

In [1]:
import numpy as np
sales = np.random.randint(100, 1000, (10, 3))

# Calculate stats per product (row)
row_means = sales.mean(axis=1, keepdims=True)
row_stds = sales.std(axis=1, keepdims=True)

# Broadcase subtraction and division
normalized_sales = (sales - row_means) / row_stds

print(normalized_sales.mean(axis=1)) # Should be 0s
print(normalized_sales.std(axis=1))  # Should be 1s

[ 0.00000000e+00  1.48029737e-16  7.40148683e-17 -7.40148683e-17
  7.40148683e-17  0.00000000e+00  2.40548322e-16  1.48029737e-16
 -3.33066907e-16  0.00000000e+00]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


In [21]:
sales

array([[190, 295, 406],
       [307, 606, 910],
       [392, 102, 815],
       [297, 881, 881],
       [185, 208, 874],
       [656, 650, 124],
       [865, 458, 716],
       [789, 921, 197],
       [424, 303, 430],
       [658, 609, 236]])

In [28]:
sales.mean(axis=1)

array([297.        , 607.66666667, 436.33333333, 686.33333333,
       422.33333333, 476.66666667, 679.66666667, 635.66666667,
       385.66666667, 501.        ])

In [24]:
sales.mean(axis=1, keepdims =True).shape

(10, 1)

In [None]:
row_means = sales.mean(axis=1, keepdims=True)

Problem 3: The Geometric TransformationScenario: You are given a set of 2D coordinates representing the corners of a square.coords = np.array([[0, 0], [0, 1], [1, 1], [1, 0]])Use broadcasting to shift the square 5 units to the right and 3 units up.Use a ufunc to scale the coordinates by a factor of 2.5 relative to the origin.Calculate the Euclidean distance of each transformed coordinate from the origin $(0,0)$ without using a loop.

In [32]:
import numpy as np
coords = np.array([[0, 0], [0, 1], [1, 1], [1, 0]])

# 1. Shift
shifted = coords + np.array([5, 3])

# 2. Scale
scaled = shifted * 2.5

# 3. Euclidean Distance: sqrt(x^2 + y^2)
distances = np.sqrt(np.sum(np.square(scaled), axis=1))

In [34]:
distances

array([14.57737974, 16.00781059, 18.02775638, 16.77050983])

In [30]:
squareCoords = np.array([[0, 0], [0, 1], [1, 1], [1, 0]])
squareCoords[:,0] += 5
squareCoords[:,1] += 3

In [31]:
squareCoords

array([[5, 3],
       [5, 4],
       [6, 4],
       [6, 3]])

Problem 4: Conditional Quality Control
Scenario: A factory produces metal rods. You have an array of lengths (in cm):

rods = np.array([10.1, 9.8, 10.5, 10.0, 9.9, 10.2, 11.0, 9.0, 10.1, 10.0])

Calculate the mean and standard deviation of the batch.

A rod is considered "Defective" if it is more than 2 standard deviations away from the mean.

Use aggregations and boolean logic to find the percentage of defective rods in this batch.

In [3]:
import numpy as np
rods = np.array([10.1, 9.8, 10.5, 10.0, 9.9, 10.2, 11.0, 9.0, 10.1, 10.0])

mu, sigma = rods.mean(), rods.std()
# Boolean mask for outliers
defective = np.abs(rods - mu) > (2 * sigma)

percentage = (np.sum(defective) / rods.size) * 100

In [36]:
defective = np.abs(rods - mu) > (2 * sigma)
defective.mean()

0.1

Problem 5: The Weighted GPA Calculator
Scenario: A student has taken 5 courses with different credit weights.

grades = np.array([85, 92, 78, 88, 95])

weights = np.array([3, 4, 3, 2, 5])

Calculate the weighted average of the grades using NumPy ufuncs.

If the student receives a "bonus" of 5 points on every grade where they scored below 90, update the grades array.

Re-calculate the new weighted average after the bonus.

In [44]:
import numpy as np
grades = np.array([85, 92, 78, 88, 95])
weights = np.array([3, 4, 3, 2, 5])

# 1. Weighted Average: (Values * Weights).sum() / Weights.sum()
weighted_avg = np.sum(grades * weights) / weights.sum()

# 2. Conditional Bonus
grades[grades < 90] += 5

# 3. New Weighted Average
new_avg = np.sum(grades * weights) / weights.sum()

In [40]:
import numpy as np
grades = np.array([85, 92, 78, 88, 95])
weights = np.array([3, 4, 3, 2, 5])

In [43]:
grades[grades < 90]

array([85, 78, 88])

In [46]:
print(weighted_avg)
print(new_avg)

88.70588235294117
91.05882352941177


In [47]:
np.average(grades, weights=weights)

91.05882352941177