# Task 1: Generate a reproducible sensor matrix
Create a 2D NumPy array named readings with shape (360, 4) to represent six hours of minute-level readings from four sensors. Use a fixed seed and generate values that roughly follow a normal distribution around 50 with some variation. After creating the array, print its shape, dtype, and the first three rows to confirm the structure.

In [1]:
import numpy as np

rng = np.random.default_rng(seed=0)

readings = rng.normal(50, 10, size=(360, 4))

In [2]:
readings

array([[51.25730221, 48.67895137, 56.4042265 , 51.04900117],
       [44.64330627, 53.61595055, 63.04000045, 59.47080963],
       [42.96264764, 37.34578529, 43.76725537, 50.41325979],
       ...,
       [43.66139802, 52.96379038, 47.42410516, 36.32827186],
       [49.73843213, 65.95702441, 70.02532317, 44.3295854 ],
       [59.05274604, 51.32529455, 52.8388663 , 62.67371213]],
      shape=(360, 4))

In [3]:
print(readings.shape)

(360, 4)


In [4]:
print(readings.dtype)

float64


In [5]:
print(readings[:3])

[[51.25730221 48.67895137 56.4042265  51.04900117]
 [44.64330627 53.61595055 63.04000045 59.47080963]
 [42.96264764 37.34578529 43.76725537 50.41325979]]


# Task 2: Apply vectorized transformations
Create a new array scaled_readings that rescales the original readings to a 0â€“1 range using vectorized operations. Do not use loops. Then create a second array centered_readings by subtracting the column means from the original readings. Verify both arrays have the same shape as the original.

In [6]:
readings_min = readings.min()
readings_max = readings.max()

scaled_readings = (readings - readings_min) / (readings_max - readings_min)

In [7]:
column_means = readings.mean(axis=0)
centered_readings = readings - column_means

In [8]:
print(f"Original shape:  {readings.shape}")
print(f"Scaled shape:    {scaled_readings.shape}")
print(f"Centered shape:  {centered_readings.shape}")

Original shape:  (360, 4)
Scaled shape:    (360, 4)
Centered shape:  (360, 4)


# Task 3: Filter outliers with boolean masks
Define outliers as any reading above 80 or below 20. Build a boolean mask that identifies outliers across the full matrix. Use the mask to compute two values: the count of outlier readings and the percentage of total readings that are outliers. Then create a cleaned array cleaned_readings where outliers are replaced with np.nan. Confirm that the number of np.nan values matches your outlier count.

In [9]:
outlier_mask = (readings > 80) | (readings < 20)

outlier_count = np.sum(outlier_mask)
total_elements = readings.size
outlier_percentage = (outlier_count / total_elements) * 100

cleaned_readings = readings.copy()
cleaned_readings[outlier_mask] =np.nan

nan_count = np.isnan(cleaned_readings).sum()

print(f"Outlier Count: {outlier_count}")
print(f"Outlier Percentage: {outlier_percentage:.2f}%")
print(f"Number of NaNs in cleaned array: {nan_count}")
print(f"Verification Match: {outlier_count == nan_count}")

Outlier Count: 5
Outlier Percentage: 0.35%
Number of NaNs in cleaned array: 5
Verification Match: True


# Task 4: Compute summaries by sensor and over time
Compute the mean and standard deviation for each sensor using axis-based aggregation. Store the results in two 1D arrays named sensor_means and sensor_stds. Then compute the average reading at each time step across sensors and store it in a 1D array named time_means. Validate shapes: the sensor summaries should have length 4, and the time summary should have length 360.

In [10]:
sensor_means = readings.mean(axis=0)
sensor_stds = readings.std(axis=0)

time_means = readings.mean(axis=1)

print(f"Sensor Means shape: {sensor_means.shape}")
print(f"Sensor Stds shape:  {sensor_stds.shape}")
print(f"Time Means shape:   {time_means.shape}")

Sensor Means shape: (4,)
Sensor Stds shape:  (4,)
Time Means shape:   (360,)


# Task 5: Build a validation report (Part 1)
Create a short report dictionary with keys total_readings, outlier_count, outlier_percent, sensor_means, and sensor_stds. Convert this report into a readable string and print it in the notebook. Include at least one simple validation, such as confirming that total_readings equals readings.size.

In [11]:
report = {
    "total_readings": readings.size,
    "outlier_count": outlier_count,
    "outlier_percent": outlier_percentage,
    "sensor_means": sensor_means,
    "sensor_stds": sensor_stds
}
is_valid = report["total_readings"] == readings.size

In [12]:
report_string = f"""
--- SENSOR DATA VALIDATION REPORT ---
Total Readings:     {report['total_readings']}
Validation Status:  {'PASS' if is_valid else 'FAIL'}

Outlier Count:      {report['outlier_count']}
Outlier Percentage: {report['outlier_percent']:.2f}%

Sensor Means:       {np.round(report['sensor_means'], 2)}
Sensor Std Devs:    {np.round(report['sensor_stds'], 2)}
-------------------------------------
"""

print(report_string)


--- SENSOR DATA VALIDATION REPORT ---
Total Readings:     1440
Validation Status:  PASS

Outlier Count:      5
Outlier Percentage: 0.35%

Sensor Means:       [49.84 49.3  49.37 51.  ]
Sensor Std Devs:    [ 9.75  9.59 10.   10.19]
-------------------------------------



# Task 6: Create a reproducible simulation matrix
Initialize a random number generator with a fixed seed. Create a 2D array named sim with shape (1000, 6) representing 1000 trials for 6 scenarios. Use a normal distribution with mean 0 and standard deviation 1. After creating the array, print its shape, dtype, and the first three rows to confirm the structure.

In [13]:
rng = np.random.default_rng(seed=0)

sim = rng.normal(loc=0, scale=1, size=(1000, 6))

print(f"Shape: {sim.shape}")
print(f"Data Type: {sim.dtype}")
print(sim[:3])

Shape: (1000, 6)
Data Type: float64
[[ 0.12573022 -0.13210486  0.64042265  0.10490012 -0.53566937  0.36159505]
 [ 1.30400005  0.94708096 -0.70373524 -1.26542147 -0.62327446  0.04132598]
 [-2.32503077 -0.21879166 -1.24591095 -0.73226735 -0.54425898 -0.31630016]]


# Task 7: Apply broadcasting for scenario adjustments
Create a 1D array named scenario_shift of length 6 containing small offsets such as [-0.3, -0.1, 0.0, 0.1, 0.2, 0.4]. Use broadcasting to add these offsets to sim and store the result in adjusted_sim. Verify that adjusted_sim has the same shape as sim and that the mean of each column changes in the direction of the offsets.

In [14]:
scenario_shift = np.array([-0.3, -0.1, 0.0, 0.1, 0.2, 0.4])
adjusted_sim = sim + scenario_shift

shape_match = adjusted_sim.shape == sim.shape

original_means = sim.mean(axis=0)
adjusted_means = adjusted_sim.mean(axis=0)
mean_diffs = adjusted_means - original_means

print(f"Shape match: {shape_match} ({adjusted_sim.shape})")
print(f"Shift vector:    {scenario_shift}")
print(f"Actual mean change: {np.round(mean_diffs, 2)}")

Shape match: True ((1000, 6))
Shift vector:    [-0.3 -0.1  0.   0.1  0.2  0.4]
Actual mean change: [-0.3 -0.1  0.   0.1  0.2  0.4]


# Task 8: Rank scenarios with sorting and partitioning
Compute the mean outcome per scenario from adjusted_sim and store it in a 1D array named scenario_means. Use np.argsort to get the ranking of scenarios from lowest to highest. Then use np.partition to identify the top two scenario means without fully sorting the array. Confirm that the top two values from partition match the two largest values from a full sort.

In [15]:
scenario_means = adjusted_sim.mean(axis=0)
rank_indices = np.argsort(scenario_means)

top_two_partition = np.partition(scenario_means, -2)[-2:]
full_sort_top_two = np.sort(scenario_means)[-2:]

print(f"Scenario Means: {np.round(scenario_means, 3)}")
print(f"Ranking (Indices): {rank_indices}")
print(f"Top 2 (Partition): {np.sort(top_two_partition)}") 
print(f"Top 2 (Full Sort): {full_sort_top_two}")
print(f"Verification Match: {np.allclose(np.sort(top_two_partition), full_sort_top_two)}")

Scenario Means: [-0.336 -0.05   0.019  0.064  0.195  0.4  ]
Ranking (Indices): [0 1 2 3 4 5]
Top 2 (Partition): [0.19477725 0.39966337]
Top 2 (Full Sort): [0.19477725 0.39966337]
Verification Match: True


# Task 9: Controlled randomness and reproducibility
Generate a second simulation matrix sim_2 using the same seed and confirm that it matches sim exactly. Then generate a third matrix sim_3 with a different seed and confirm that it differs. Include a short check such as comparing equality counts or using np.allclose to verify the difference.

In [16]:
rng_same = np.random.default_rng(seed=0)
sim_2 = rng_same.normal(loc=0, scale=1, size=(1000, 6))

rng_diff = np.random.default_rng(seed=16)
sim_3 = rng_diff.normal(loc=0, scale=1, size=(1000, 6))

match_1_2 = np.array_equal(sim, sim_2)
match_1_3 = np.array_equal(sim, sim_3)

equal_elements_3 = np.sum(sim == sim_3)

print(f"Sim and Sim2 match: {match_1_2}")
print(f"Sim and Sim3 match: {match_1_3}")
print(f"Equality counts: {equal_elements_3}")

Sim and Sim2 match: True
Sim and Sim3 match: False
Equality counts: 0


# Task 10: Build a validation report (Part 2)
Create a small report dictionary with keys shape, scenario_means, top_two_indices, and top_two_values. Convert this report into a readable string and print it. Include at least one validation statement, such as confirming that scenario_means has length 6 and that the top_two_indices list has length 2.

In [18]:
top_two_values = np.sort(top_two_partition).tolist()
top_two_indices = rank_indices[-2:].tolist()

report_sim = {
    "shape": adjusted_sim.shape,
    "scenario_means": scenario_means.tolist(),
    "top_two_indices": top_two_indices,
    "top_two_values": top_two_values
}

valid_means_len = len(report_sim["scenario_means"]) == 6
valid_top_len = len(report_sim["top_two_indices"]) == 2

final_report = f"""
Matrix Shape:       {report_sim['shape']}
Scenarios Count:    {len(report_sim['scenario_means'])}

VALIDATION
- Scenario Means Length (6):  {'PASS' if valid_means_len else 'FAIL'}
- Top Two Indices Length (2): {'PASS' if valid_top_len else 'FAIL'}

RESULTS
Scenario Means:
{np.array2string(np.array(report_sim['scenario_means']), precision=3, separator=', ')}

Top Performing Scenarios:
- Indices: {report_sim['top_two_indices']}
- Values:  {np.round(report_sim['top_two_values'], 3).tolist()}
------------------------------------
"""

print(final_report)


Matrix Shape:       (1000, 6)
Scenarios Count:    6

VALIDATION
- Scenario Means Length (6):  PASS
- Top Two Indices Length (2): PASS

RESULTS
Scenario Means:
[-0.336, -0.05 ,  0.019,  0.064,  0.195,  0.4  ]

Top Performing Scenarios:
- Indices: [4, 5]
- Values:  [0.195, 0.4]
------------------------------------

