# **Task 1: Generate a reproducible sensor matrix*
Create a 2D NumPy array named readings with shape (360, 4) to represent six hours of minute-level readings from four sensors. Use a fixed seed and generate values that roughly follow a normal distribution around 50 with some variation. After creating the array, print its shape, dtype, and the first three rows to confirm the structure.

In [1]:
import numpy as np

In [2]:
random_state = 42

In [3]:
rng = np.random.default_rng(seed=0)
readings = rng.normal(50,10,size=(360,4))
# 10- how much this data should be chaotic

In [4]:
readings

array([[51.25730221, 48.67895137, 56.4042265 , 51.04900117],
       [44.64330627, 53.61595055, 63.04000045, 59.47080963],
       [42.96264764, 37.34578529, 43.76725537, 50.41325979],
       ...,
       [43.66139802, 52.96379038, 47.42410516, 36.32827186],
       [49.73843213, 65.95702441, 70.02532317, 44.3295854 ],
       [59.05274604, 51.32529455, 52.8388663 , 62.67371213]],
      shape=(360, 4))

In [5]:
readings.shape

(360, 4)

In [6]:
readings.dtype

dtype('float64')

In [7]:
readings[0:3]

array([[51.25730221, 48.67895137, 56.4042265 , 51.04900117],
       [44.64330627, 53.61595055, 63.04000045, 59.47080963],
       [42.96264764, 37.34578529, 43.76725537, 50.41325979]])

-----------------

# **Task 2: Apply vectorized transformations*
Create a new array scaled_readings that rescales the original readings to a 0–1 range using vectorized operations. Do not use loops. Then create a second array centered_readings by subtracting the column means from the original readings. Verify both arrays have the same shape as the original.

In [8]:
scaled_readings = (readings - readings.min()) / (readings.max()-readings.min())
# in this task it is not said you should do scaling generally or just because of rows(4 vaalues input)

In [9]:
scaled_readings

array([[0.57787323, 0.54085699, 0.65176534, 0.57488274],
       [0.48291902, 0.61173529, 0.7470322 , 0.69579091],
       [0.45879055, 0.37815174, 0.47034194, 0.56575568],
       ...,
       [0.4688222 , 0.60237252, 0.52284171, 0.36354375],
       [0.55606748, 0.78891062, 0.84731738, 0.47841507],
       [0.68978896, 0.57884936, 0.60057904, 0.74177356]], shape=(360, 4))

In [10]:
centered_readings = readings - readings.mean(axis=0)
centered_readings

array([[  1.42172772,  -0.62277474,   7.03122873,   0.04413789],
       [ -5.19226822,   4.31422444,  13.66700268,   8.46594635],
       [ -6.87292684, -11.95594082,  -5.6057424 ,  -0.59160349],
       ...,
       [ -6.17417646,   3.66206427,  -1.94889261, -14.67659142],
       [ -0.09714235,  16.6552983 ,  20.6523254 ,  -6.67527788],
       [  9.21717155,   2.02356844,   3.46586852,  11.66884885]],
      shape=(360, 4))

---------------------

# **Task 3: Filter outliers with boolean masks*
Define outliers as any reading above 80 or below 20. Build a boolean mask that identifies outliers across the full matrix. Use the mask to compute two values: the count of outlier readings and the percentage of total readings that are outliers. Then create a cleaned array cleaned_readings where outliers are replaced with np.nan. Confirm that the number of np.nan values matches your outlier count.

In [11]:
outlier_mask = (readings < 20) | (readings > 80)
# | - means OR

In [12]:
outlier_count = np.sum(outlier_mask == True)
# it will count the number of outliers

In [13]:
total_readings = readings.size
total_readings

1440

In [14]:
perc = (outlier_count/total_readings)*100
perc

np.float64(0.3472222222222222)

In [15]:
cleaned_readings = readings.copy()

In [16]:
cleaned_readings [outlier_mask] = np.nan
cleaned_readings 

array([[51.25730221, 48.67895137, 56.4042265 , 51.04900117],
       [44.64330627, 53.61595055, 63.04000045, 59.47080963],
       [42.96264764, 37.34578529, 43.76725537, 50.41325979],
       ...,
       [43.66139802, 52.96379038, 47.42410516, 36.32827186],
       [49.73843213, 65.95702441, 70.02532317, 44.3295854 ],
       [59.05274604, 51.32529455, 52.8388663 , 62.67371213]],
      shape=(360, 4))

In [17]:
nan_count = np.sum(np.isnan(cleaned_readings))
nan_count

np.int64(5)

In [18]:
nan_count == outlier_count

np.True_

-------------------------

# **Task 4: Compute summaries by sensor and over time*

Compute the mean and standard deviation for each sensor using axis-based aggregation. Store the results in two 1D arrays named sensor_means and sensor_stds. Then compute the average reading at each time step across sensors and store it in a 1D array named time_means. Validate shapes: the sensor summaries should have length 4, and the time summary should have length 360.

In [19]:
sensor_means = np.mean(readings, axis=0)
sensor_stds = np.std(readings, axis=0)
#  Validate shapes: the sensor summaries should have length 4, and the time summary should have length 360. 
# 4 means - use rows based on columns axis = 0

In [20]:
sensor_means
sensor_stds

array([ 9.74752295,  9.59309985,  9.99833748, 10.18720053])

In [21]:
time_means = np.mean(readings, axis=1)
time_means

array([51.84737031, 55.19251673, 43.62223702, 38.69499815, 51.48396192,
       52.31061051, 48.33064463, 46.35918885, 52.37913102, 53.73491093,
       55.95511267, 58.42202954, 55.66450103, 49.3969535 , 48.2002147 ,
       49.09304998, 57.43557737, 47.91389477, 52.21908731, 55.81242291,
       45.2165387 , 53.07113989, 55.55380558, 50.28242329, 42.08185206,
       50.63434108, 48.75149492, 44.15966821, 50.15553764, 60.38918114,
       50.70121666, 44.82787821, 44.06765527, 56.77263627, 53.53952569,
       53.79343985, 44.87487721, 50.17785104, 42.85715833, 41.52665048,
       48.62326263, 50.91555341, 59.19858662, 54.37149436, 43.45017523,
       47.91352701, 52.48843585, 41.56275979, 55.99088318, 45.6136064 ,
       44.6181174 , 48.00114108, 58.38058005, 50.40850923, 54.07502626,
       44.83858342, 54.27624352, 40.87468358, 50.89319008, 32.54754539,
       53.2921451 , 60.32576753, 46.54344939, 54.50539008, 52.52653208,
       43.57706674, 49.91413126, 53.88289789, 42.20976257, 48.56

# **Task 5: Build a validation report (Part 1)*
Create a short report dictionary with keys total_readings, outlier_count, outlier_percent, sensor_means, and sensor_stds. Convert this report into a readable string and print it in the notebook. Include at least one simple validation, such as confirming that total_readings equals readings.size.

In [22]:
report = {
    "total_readings": readings.size,
    "outlier_count": outlier_count,
    "outlier_percent": perc,
    "sensor_means": sensor_means,
    "sensor_stds": sensor_stds
}

In [23]:
report

{'total_readings': 1440,
 'outlier_count': np.int64(5),
 'outlier_percent': np.float64(0.3472222222222222),
 'sensor_means': array([49.83557449, 49.30172611, 49.37299777, 51.00486328]),
 'sensor_stds': array([ 9.74752295,  9.59309985,  9.99833748, 10.18720053])}

In [24]:
report = {
    "total_readings": readings.size,
    "outlier_count": outlier_count,
    "outlier_percent": perc,
    "sensor_means": sensor_means,
    "sensor_stds": sensor_stds
}

# Validation
assert report["total_readings"] == readings.size, "Total readings do not match array size!"
print("Validation passed: total_readings equals readings.size")

# Convert to readable string
report_str = (
    f"Total readings: {report['total_readings']}\n"
    f"Outlier count: {report['outlier_count']}\n"
    f"Outlier percent: {report['outlier_percent']:.2f}%\n"
    f"Sensor means: {report['sensor_means']}\n"
    f"Sensor stds: {report['sensor_stds']}"
)

print("\n--- Validation Report ---")
print(report_str)


Validation passed: total_readings equals readings.size

--- Validation Report ---
Total readings: 1440
Outlier count: 5
Outlier percent: 0.35%
Sensor means: [49.83557449 49.30172611 49.37299777 51.00486328]
Sensor stds: [ 9.74752295  9.59309985  9.99833748 10.18720053]


In [25]:
total_readings == readings.size

True

----------------------

---------------

# **Part 2: Simulation, Broadcasting, and Randomness**

# **Task 6: Create a reproducible simulation matrix*
Initialize a random number generator with a fixed seed. Create a 2D array named sim with shape (1000, 6) representing 1000 trials for 6 scenarios. Use a normal distribution with mean 0 and standard deviation 1. After creating the array, print its shape, dtype, and the first three rows to confirm the structure.

In [26]:
import random 
random.seed(42)

In [27]:
rng = np.random.default_rng(seed=0)
# seed=0 - means start from a known,fixed point
sim = rng.normal(0,1,size=(1000,6))
# 0 - mean
# 1- std
# 10- how much this data should be chaotic

In [28]:
sim

array([[ 0.12573022, -0.13210486,  0.64042265,  0.10490012, -0.53566937,
         0.36159505],
       [ 1.30400005,  0.94708096, -0.70373524, -1.26542147, -0.62327446,
         0.04132598],
       [-2.32503077, -0.21879166, -1.24591095, -0.73226735, -0.54425898,
        -0.31630016],
       ...,
       [ 0.94943846,  1.02922814,  0.31886802,  1.05773509,  0.26838522,
         0.35055328],
       [-0.12294856,  2.00252255,  1.63391972, -0.49129481,  0.87095129,
         0.24026025],
       [-0.22681171,  0.74016406,  0.18054669,  0.15984538, -0.2263336 ,
        -0.09355873]], shape=(1000, 6))

In [29]:
sim.shape

(1000, 6)

In [30]:
sim.dtype

dtype('float64')

In [31]:
sim[0:3]

array([[ 0.12573022, -0.13210486,  0.64042265,  0.10490012, -0.53566937,
         0.36159505],
       [ 1.30400005,  0.94708096, -0.70373524, -1.26542147, -0.62327446,
         0.04132598],
       [-2.32503077, -0.21879166, -1.24591095, -0.73226735, -0.54425898,
        -0.31630016]])

--------------

# **Task 7: Apply broadcasting for scenario adjustments*
Create a 1D array named scenario_shift of length 6 containing small offsets such as [-0.3, -0.1, 0.0, 0.1, 0.2, 0.4]. Use broadcasting to add these offsets to sim and store the result in adjusted_sim. Verify that adjusted_sim has the same shape as sim and that the mean of each column changes in the direction of the offsets.

In [32]:
scenario_shift = [-0.3, -0.1, 0.0, 0.1, 0.2, 0.4]
# it's shape is (6,)

In [33]:
adjusted_sim = sim + scenario_shift
adjusted_sim
# it's shape is (1000,6)

array([[-0.17426978, -0.23210486,  0.64042265,  0.20490012, -0.33566937,
         0.76159505],
       [ 1.00400005,  0.84708096, -0.70373524, -1.16542147, -0.42327446,
         0.44132598],
       [-2.62503077, -0.31879166, -1.24591095, -0.63226735, -0.34425898,
         0.08369984],
       ...,
       [ 0.64943846,  0.92922814,  0.31886802,  1.15773509,  0.46838522,
         0.75055328],
       [-0.42294856,  1.90252255,  1.63391972, -0.39129481,  1.07095129,
         0.64026025],
       [-0.52681171,  0.64016406,  0.18054669,  0.25984538, -0.0263336 ,
         0.30644127]], shape=(1000, 6))

In [34]:
# broadcasting happens because:

# (1000,6)

# (6,)

In [35]:
adjusted_sim.shape == sim.shape

True

In [36]:
original_means = sim.mean(axis=0)
adjusted_means = adjusted_sim.mean(axis=0)
# You compare column means before and after.

In [37]:
original_means

array([-0.03621381,  0.05030703,  0.01921678, -0.0361872 , -0.00522275,
       -0.00033663])

In [38]:
adjusted_means

array([-0.33621381, -0.04969297,  0.01921678,  0.0638128 ,  0.19477725,
        0.39966337])

# **Task 8: Rank scenarios with sorting and partitioning*
Compute the mean outcome per scenario from adjusted_sim and store it in a 1D array named scenario_means. Use np.argsort to get the ranking of scenarios from lowest to highest. Then use np.partition to identify the top two scenario means without fully sorting the array. Confirm that the top two values from partition match the two largest values from a full sort.

In [39]:
scenario_means = adjusted_sim.mean(axis=0)
scenario_means

# axis=0 removes the row axis → keeps columns
# axis=1 removes the column axis → keeps rows

array([-0.33621381, -0.04969297,  0.01921678,  0.0638128 ,  0.19477725,
        0.39966337])

In [40]:
rank_indices = np.argsort(scenario_means)
rank_indices

array([0, 1, 2, 3, 4, 5])

In [41]:
top_two_indices = np.argpartition(scenario_means, -2)[-2:]
top_two_values = scenario_means[top_two_indices]

In [42]:
top_two_indices

array([4, 5])

In [43]:
top_two_values 

array([0.19477725, 0.39966337])

In [44]:
# partition → dəyərlər
# argpartition → indekslər

# **Task 9: Controlled randomness and reproducibility*
Generate a second simulation matrix sim_2 using the same seed and confirm that it matches sim exactly. Then generate a third matrix sim_3 with a different seed and confirm that it differs. Include a short check such as comparing equality counts or using np.allclose to verify the difference.


In [45]:
rng_2 = np.random.default_rng(seed=0)
sim_2 = rng_2.normal(0,1,size=(1000,6))

In [46]:
rng_3 = np.random.default_rng(seed=20)
sim_3 = rng_3.normal(0,1,size=(1000,6))

In [47]:
print(np.allclose(sim,sim_2))

True


In [48]:
print(np.allclose(sim,sim_3))

False


# **Task 10: Build a validation report (Part 2)*
Create a small report dictionary with keys shape, scenario_means, top_two_indices, and top_two_values. Convert this report into a readable string and print it. Include at least one validation statement, such as confirming that scenario_means has length 6 and that the top_two_indices list has length 2.

In [49]:
report = {
    "shape": adjusted_sim.shape,
    "scenario_means": scenario_means,
    "top_two_indices": top_two_indices.tolist(),
    "top_two_values": top_two_values
}

# readable output
report_str = (
    f"Simulation shape: {report['shape']}\n"
    f"Scenario means: {report['scenario_means']}\n"
    f"Top two indices: {report['top_two_indices']}\n"
    f"Top two values: {report['top_two_values']}\n"
    f"Validation checks:\n"
    f"- scenario_means length == 6 → {len(report['scenario_means']) == 6}\n"
    f"- top_two_indices length == 2 → {len(report['top_two_indices']) == 2}"
)

print(report_str)


Simulation shape: (1000, 6)
Scenario means: [-0.33621381 -0.04969297  0.01921678  0.0638128   0.19477725  0.39966337]
Top two indices: [4, 5]
Top two values: [0.19477725 0.39966337]
Validation checks:
- scenario_means length == 6 → True
- top_two_indices length == 2 → True
