### **Module 9: The Scientific Computing Stack**
### **Part 1: NumPy (Numerical Python)**

#### **1. Introduce the Concept: The Bedrock of Data Science**

**What is NumPy?**
NumPy stands for **Numerical Python**. It is the most fundamental package for scientific computing in Python. Almost every data science and machine learning library, including Pandas and Scikit-learn, is built on top of it.

**Why does it exist? (The Problem with Python Lists)**
We already have Python lists, so why do we need a whole new library? Because for large-scale numerical operations, Python lists are **very slow** and **functionally limited**.

1.  **Performance:** A Python list is a general-purpose container. It can hold anything: an integer, a string, a function, an object. This flexibility comes at a huge performance cost because Python has to check the type of every single element when doing any operation.
2.  **Functionality:** Python lists don't behave like mathematical vectors. If you have a list of numbers and want to multiply every number by 2, you can't just do `my_list * 2`.

    ```python
    python_list = [1, 2, 3]
    print(python_list * 2) # Output: [1, 2, 3, 1, 2, 3] (Concatenation, not math)
    ```

**The Solution: The NumPy Array**
NumPy's core feature is a powerful N-dimensional array object called the `ndarray`.

*   **It's Fast:** A NumPy array is a grid of values of the **same data type**. Because every element is the same type (e.g., all 64-bit integers or all 64-bit floats), NumPy can use highly optimized, pre-compiled C code to perform mathematical operations on the entire array at once, without any Python type-checking loops. This can be 10x to 100x faster than using a Python list.
*   **It's Functional:** It supports "element-wise" operations. If you multiply a NumPy array by 2, it does exactly what you expect mathematically: it multiplies every single element by 2.

**Analogy: A Grocery Bag vs. a Carton of Eggs**
*   A **Python list** is like a **grocery bag**. It can hold an apple, a bottle of milk, and a box of cereal. To do anything, you have to look at each item individually.
*   A **NumPy array** is like a **carton of eggs**. You know every single item is an egg. You can perform one action on the whole carton (like "move to the fridge") incredibly fast and efficiently.



#### **2. Provide Simple Examples**

By convention, NumPy is always imported with the alias `np`. You will see this in virtually every data science script or notebook.

##### **Example 1: Creating a NumPy Array**

The most common way to create a NumPy array is by passing a Python list to the `np.array()` function.

```python
import numpy as np

# A standard Python list
python_list = [1, 2, 3, 4, 5]

# Create a NumPy array from the Python list
numpy_array = np.array(python_list)

# Let's print them and see the difference
print(f"This is a Python list: {python_list}")
print(f"This is a NumPy array: {numpy_array}")

# Let's check their types
print(f"\nType of python_list: {type(python_list)}")
print(f"Type of numpy_array: {type(numpy_array)}")
```

**Output:**
```
This is a Python list: [1, 2, 3, 4, 5]
This is a NumPy array: [1 2 3 4 5]
Type of python_list: <class 'list'>
Type of numpy_array: <class 'numpy.ndarray'>
```
*Notice the subtle difference in printing: NumPy arrays don't have commas.*

##### **Example 2: The Power of Element-wise Operations**

Now, let's see the main advantage. We'll perform a simple mathematical operation on both the list and the array.

```python
import numpy as np

python_list = [1, 2, 3, 4, 5]
numpy_array = np.array(python_list)

# --- Multiplying by 2 ---

# In Python, we need a loop (list comprehension)
doubled_list = [item * 2 for item in python_list]
print(f"Python list doubled: {doubled_list}")

# In NumPy, the syntax is clean and mathematical
doubled_array = numpy_array * 2
print(f"NumPy array doubled:  {doubled_array}")

# --- Adding 10 ---

# In Python, another loop
added_list = [item + 10 for item in python_list]
print(f"\nPython list + 10: {added_list}")

# In NumPy, it's just as easy
added_array = numpy_array + 10
print(f"NumPy array + 10:  {added_array}")
```

**Output:**
```
Python list doubled: [2, 4, 6, 8, 10]
NumPy array doubled:  [ 2  4  6  8 10]

Python list + 10: [11, 12, 13, 14, 15]
NumPy array + 10:  [11 12 13 14 15]
```

**Key Takeaway:**
With NumPy, you can write mathematical operations directly on the array, and NumPy applies that operation to every single element. This is called **vectorization**. It's not just cleaner to write; as we discussed, it's also vastly faster for large arrays because it avoids slow Python loops. All the common math operators (`+`, `-`, `*`, `/`, `**` for power, etc.) work this way.

---


### **Offer a Task**

**Goal:** Perform a simple data transformation task using NumPy to see the benefits over standard Python lists. Imagine you have a list of temperatures in Celsius and you need to convert them to Fahrenheit.

The formula for converting Celsius to Fahrenheit is: **F = C * 1.8 + 32**

**Instructions:**

1.  **Import NumPy** with its standard alias.
2.  You are given the following Python list of temperatures in Celsius:
    ```python
    celsius_temperatures = [0, 10, 20, 30, 40, 50]
    ```
3.  **Create a NumPy array** from this list.
4.  Using a **single line of code**, apply the conversion formula to the NumPy array to get the temperatures in Fahrenheit.
5.  Print the original Celsius temperatures (the NumPy array).
6.  Print the newly calculated Fahrenheit temperatures (the new NumPy array).

**For comparison (and to appreciate NumPy), think about how you would have to do this with only a Python list:** you would need to create an empty list and then loop through the `celsius_temperatures`, calculate the new value for each temperature, and append it to your new list. With NumPy, this becomes one simple, readable line.


In [2]:
import numpy as np
celsius_temperatures = [0, 10, 20, 30, 40, 50]
cel_temp_arr = np.array(celsius_temperatures)
far_temp_arr = (cel_temp_arr) * (1.8) + (32)
print(f"The temperature in celsius is {cel_temp_arr}")
print(f"The temperature in farenheit is {far_temp_arr}")

The temperature in celsius is [ 0 10 20 30 40 50]
The temperature in farenheit is [ 32.  50.  68.  86. 104. 122.]


## Array attributes

In [5]:
array_2d = np.array([
    [1,2,3,4],
    [5,6,7,8],
    [9,10,11,12],
    [13,14,15,16]
])
print(f"our 2d array: \n{array_2d}")

# shape: the dimensions of the array
print(f"\nshape of the array is: {array_2d.shape}")

# size: the total number of elements in the array
print(f"size of the array is: {array_2d.size}")

# ndim: number of dimensions (axes)
print(f"number of dimensions of the array is: {array_2d.ndim}")

# dtype: the data type of elements in the array
print(f"data type of the elements in the array is: {array_2d.dtype}")

our 2d array: 
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]

shape of the array is: (4, 4)
size of the array is: 16
number of dimensions of the array is: 2
data type of the elements in the array is: int64


## Array Creation and Statistical Methods

In [26]:
#1. An array of zeroes
#useful to initialize an array which you will fill up later

zeroes_array = np.zeros((2,3)) # these methods take shape of the array as arguments which is in tuple type.
print("---Array of zeros---")
print(f"{zeroes_array}\n")

#2. An array of ones

ones_array = np.ones((2,3))
print("---Array of ones---")
print(f"{ones_array}\n")

#3. An array with a range of numbers
# np.arange(start, stop, step) - 'stop' is exclusive, just like Python's range()
range_array = np.arange(0,15)
print("---Array of a range of numbers---")
print(f"{range_array}\n")

#4. .reshape(a) a -> tuple, : it created an array with the shape given as an argument,
# Note: the elements should be equal to number of elements the shape(dimensions) can have.
reshaped_array1 = range_array.reshape((3,5))
reshaped_array2 = np.arange(12).reshape((4,3))
print(f"---Reshaped array1---\n{reshaped_array1}")
print(f"---Reshaped array2---\n{reshaped_array2}")

---Array of zeros---
[[0. 0. 0.]
 [0. 0. 0.]]

---Array of ones---
[[1. 1. 1.]
 [1. 1. 1.]]

---Array of a range of numbers---
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]

---Reshaped array1---
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
---Reshaped array2---
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


In [19]:
import numpy as np

data = np.array([15,20,25,30,35,40])
print(f"Our data: {data}\n\n")

#---Common Statistical Methods---
#1. .sum(): Sum of all elements
print(f"Sum of all elements: {data.sum()}\n")

#2. .mean(): Mean of al elements
print(f"Average of all elements: {data.mean()}\n")

#3. .max(): The maximum value of all elements
print(f"Max value: {data.max()}\n")

#4. .min(): The minimum value of all elements
print(f"Min value: {data.min()}\n")

#5. .std(): The standard deviation ( a measure of data spread)
print(f"Standard Deviation: {data.std()}")



Our data: [15 20 25 30 35 40]


Sum of all elements: 165

Average of all elements: 27.5

Max value: 40

Min value: 15

Standard Deviation: 8.539125638299666


### **Offer a Task**

**Goal:** You will create a NumPy array representing sales data for 4 weeks (rows) and 5 days (columns). You will then use array attributes and methods to calculate some key business metrics.

**Instructions:**

1.  **Import NumPy** with its standard alias.
2.  Create a 2D NumPy array named `sales_data` with a shape of `(4, 5)`. This array should contain the numbers from 0 to 19, representing the number of sales each day.
    *   Use the `np.arange()` and `.reshape()` methods you just learned to do this in a single line.
3.  After creating the array, **print the following information** about it:
    *   The entire `sales_data` array.
    *   Its shape (using the `.shape` attribute).
    *   Its data type (using the `.dtype` attribute).
4.  Next, calculate and print the following **business metrics**:
    *   The total number of sales across all weeks (the sum of all elements).
    *   The average number of sales per day (the mean of all elements).
    *   The highest number of sales on any single day (the maximum value).
    *   The lowest number of sales on any single day (the minimum value).

**Example Output Structure:**

```
Sales Data (4 weeks x 5 days):
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]

--- Array Information ---
Shape: (4, 5)
Data Type: int64

--- Business Metrics ---
Total Sales: XXX
Average Daily Sales: X.X
Highest Sales Day: XX
Lowest Sales Day: X
```
*(Your numbers will be different based on the `arange` values)*


In [36]:
sales_data = np.arange(20,40).reshape((4,5))
print(f"Sales data (4weeks x 5days ):\n\n{sales_data}\n\n")
print("---Array Information---\n")
print(f"Shape of the data: {sales_data.shape}")
print(f"Datatype of the data: {sales_data.dtype}\n\n")

print("---Business Metrics--\n")
print(f"Total number of Sales: {sales_data.sum()}")
print(f"Average number of sales per day: {sales_data.mean()}")
print(f"Highest number of sales on a single day: {sales_data.max()}")
print(f"Lowest number of sales on a single day: {sales_data.min()}")



Sales data (4weeks x 5days ):

[[20 21 22 23 24]
 [25 26 27 28 29]
 [30 31 32 33 34]
 [35 36 37 38 39]]


---Array Information---

Shape of the data: (4, 5)
Datatype of the data: int64


---Business Metrics--

Total number of Sales: 590
Average number of sales per day: 29.5
Highest number of sales on a single day: 39
Lowest number of sales on a single day: 20


## **Operations Along Axes**

This leads us to a crucial and incredibly powerful feature of NumPy. Right now, all your calculations (`.sum()`, `.mean()`, etc.) are for the *entire* array.

But what if you wanted to answer more specific questions?
*   "What were the **total sales for each week**?" (i.e., the sum of each row)
*   "What was the **average sale for each day of the week**?" (i.e., the mean of each column)

To do this, we tell NumPy to perform the calculation along a specific **axis**.

In a 2D array:
*   **`axis=0`** refers to the **vertical** dimension (down the rows).
*   **`axis=1`** refers to the **horizontal** dimension (across the columns).

**Analogy: Squashing the Array**
*   `axis=0`: Imagine squashing the array **downwards**, collapsing all the rows into a single summary row.
*   `axis=1`: Imagine squashing the array **sideways**, collapsing all the columns into a single summary column.

Let's see it with your `sales_data` array:

```python
import numpy as np

sales_data = np.arange(20,40).reshape((4,5))
print("Original Sales Data:\n", sales_data)

# --- Calculating weekly totals (summing across the columns) ---
# For each row, we sum the values in its columns.
weekly_totals = sales_data.sum(axis=1)
print("\nTotal sales per week (axis=1):", weekly_totals)

# --- Calculating average daily sales (averaging down the rows) ---
# For each column, we average the values in its rows.
daily_averages = sales_data.mean(axis=0)
print("Average sales per day of the week (axis=0):", daily_averages)
```

**Output of the Example:**

```
Original Sales Data:
 [[20 21 22 23 24]
 [25 26 27 28 29]
 [30 31 32 33 34]
 [35 36 37 38 39]]

Total sales per week (axis=1): [110 135 160 185]
(i.e., 20+21+22+23+24 = 110, etc.)

Average sales per day of the week (axis=0): [27.5 28.5 29.5 30.5 31.5]
(i.e., (20+25+30+35)/4 = 27.5, etc.)
```

This concept of axes is one of the most important ideas in NumPy and Pandas. It allows you to move from whole-dataset analysis to group-by-group analysis.


### Doubt: but i was expecting the total sales per week to be represented vertically, like 110 and then below it 135 and so on like a 5x1 matrix because you said we were squashing it sideways.

That is an absolutely brilliant observation, and you have pinpointed a subtle but very important detail about how NumPy returns these results.

You are 100% correct in your mental model. When we calculate `sales_data.sum(axis=1)`, we are conceptually "squashing" this:

```
[[20 21 22 23 24]  --> squash --> [110]
 [25 26 27 28 29]  --> squash --> [135]
 [30 31 32 33 34]  --> squash --> [160]
 [35 36 37 38 39]]  --> squash --> [185]
```

So your expectation of getting a result with a shape of `(4, 1)`—a 2D column vector—is perfectly logical.

**So, why does NumPy return a 1D array of `[110 135 160 185]` with a shape of `(4,)`?**

The reason is **convenience and convention**. By default, when an aggregation operation (like `.sum()`, `.mean()`) causes a dimension to be "collapsed" entirely, NumPy **removes that dimension** from the output array's shape.

*   Original shape: `(4, 5)`
*   Operation: `sum(axis=1)` -> collapses the 2nd dimension (the columns).
*   Resulting shape: `(4,)` -> The 2nd dimension is just dropped.

### **How to Get the Shape You Expected**

You can force NumPy to keep the "squashed" dimension by using the `keepdims=True` argument. This is extremely useful in more advanced broadcasting and linear algebra operations where maintaining the number of dimensions is critical.

Let's see the difference:

```python
import numpy as np

sales_data = np.arange(20,40).reshape((4,5))

# --- The default behavior (keepdims=False) ---
default_sum = sales_data.sum(axis=1)
print("--- Default Behavior ---")
print(default_sum)
print(f"Shape: {default_sum.shape}") # -> (4,) a 1D array

# --- The behavior you expected (keepdims=True) ---
kept_dims_sum = sales_data.sum(axis=1, keepdims=True)
print("\n--- With keepdims=True ---")
print(kept_dims_sum)
print(f"Shape: {kept_dims_sum.shape}") # -> (4, 1) a 2D column array!
```

**Output:**
```
--- Default Behavior ---
[110 135 160 185]
Shape: (4,)

--- With keepdims=True ---
[[110]
 [135]
 [160]
 [185]]
Shape: (4, 1)
```

**Conclusion:**
Your intuition was perfect. The default behavior is a practical shortcut, but the `keepdims=True` argument reveals the "truer" mathematical nature of the operation.

Thank you for asking that. It's a sign of a very deep level of understanding, and it's a nuance that many people miss.

### **Offer a Task (Advanced)**

**Scenario:** You are a data analyst for a small online school. You've been given a NumPy array representing the test scores for 5 students on 4 different exams. Your job is to perform a detailed analysis to find the average score for each student, the average score for each exam, and identify the top student and the hardest exam.

**The Data:**

Create a 2D NumPy array named `student_scores` with a shape of `(5, 4)` (5 students, 4 exams). The data should be the numbers from 70 to 89, inclusive.

```python
# (5 students, 4 exams)
# Student 0: [70, 71, 72, 73]
# Student 1: [74, 75, 76, 77]
# ... and so on
```

**Instructions:**

1.  **Create and Display the Data:**
    *   Import NumPy.
    *   Create the `student_scores` array using `np.arange()` and `.reshape()`.
    *   Print the `student_scores` array with a clear title.

2.  **Student Performance Analysis:**
    *   Calculate the average score for **each student** across all their exams. This will be your `student_averages`.
    *   Find the **highest student average**. Use the `.max()` method on your `student_averages` array.
    *   Print these results clearly.

3.  **Exam Difficulty Analysis:**
    *   Calculate the average score for **each exam** across all students. This will be your `exam_averages`.
    *   Find the **lowest exam average**. This indicates the "hardest" exam. Use the `.min()` method on your `exam_averages` array.
    *   Print these results clearly.

4.  **The `keepdims` Challenge:**
    *   Recalculate the `student_averages`, but this time, name the result `student_averages_col` and use the `keepdims=True` argument.
    *   Print this new `student_averages_col` array.
    *   Print the **shape** of the original `student_averages` array and the new `student_averages_col` array so you can see the difference (`(5,)` vs `(5, 1)`).

This task requires you to think carefully about which axis (`0` or `1`) corresponds to students and which corresponds to exams, and then apply the correct aggregation methods to answer the specific questions. Good luck

In [40]:
import numpy as np
student_scores = np.random.randint(70,101,(5,4))
print("---Students Scores--- (5 students x 4 exams\n")
print(f"{student_scores}\n\n")

student_averages = student_scores.mean(axis=1)
highest_student_average = student_averages.max()
print("---Student Performance Analysis---\n")
print(f"Average scores of students:\n{student_averages}\n")
print(f"Highest avergage score: {highest_student_average}\n\n")


exam_averages = student_scores.mean(axis=0)
lowest_exam_average = exam_averages.min()
print("---Exam Difficulty Analysis---\n")
print(f"Average scores in every exam:\n{exam_averages}\n")
print(f"Most difficult exam according to lowest average score: {lowest_exam_average}\n\n")

print("---keepdims challenge---\n")
student_averages_col = student_scores.mean(axis=1, keepdims=True)
print(f"New student_averages:\n{student_averages_col}")

print(f"Shape of original student_averages: {student_averages.shape}")
print(f"Shape of new student_averages: {student_averages_col.shape}")

---Students Scores--- (5 students x 4 exams

[[75 78 99 85]
 [86 73 71 71]
 [90 83 85 70]
 [97 73 75 83]
 [93 76 96 81]]


---Student Performance Analysis---

Average scores of students:
[84.25 75.25 82.   82.   86.5 ]

Highest avergage score: 86.5


---Exam Difficulty Analysis---

Average scores in every exam:
[88.2 76.6 85.2 78. ]

Most difficult exam according to lowest average score: 76.6


---keepdims challenge---

New student_averages:
[[84.25]
 [75.25]
 [82.  ]
 [82.  ]
 [86.5 ]]
Shape of original student_averages: (5,)
Shape of new student_averages: (5, 1)


## **NumPy Indexing and Slicing**

This is how we select and retrieve data from a NumPy array. It's similar to Python lists but far more powerful because it works in multiple dimensions.

#### **1. Basic Indexing (Selecting a Single Element)**

For a 2D array, you use the syntax `array[row, column]`. Remember that indexing is zero-based.

```python
import numpy as np

# Create a sample 3x4 array
data = np.arange(10, 22).reshape((3, 4))

print("Our Data:\n", data)
# [[10 11 12 13]
#  [14 15 16 17]
#  [18 19 20 21]]

# Get the element at row 0, column 2
element_0_2 = data[0, 2]
print(f"\nElement at [0, 2]: {element_0_2}") # Expected: 12

# Get the element at row 2, column 3 (bottom right)
element_2_3 = data[2, 3]
print(f"Element at [2, 3]: {element_2_3}") # Expected: 21
```

#### **2. Slicing (Selecting a Sub-array)**

Slicing uses the colon `:` operator, just like with Python lists. The syntax is `start:stop:step`. Remember that `stop` is exclusive.

You can use slicing on each dimension.

```python
# Using the same 'data' array from above

# --- Slicing Rows ---
# Get the first two rows (row 0 and row 1)
first_two_rows = data[0:2, :] # or just data[:2, :]
# The ':' by itself for the column means "select ALL columns"
print("\n--- First two rows ---\n", first_two_rows)

# --- Slicing Columns ---
# Get the first two columns (column 0 and column 1)
first_two_cols = data[:, 0:2] # or just data[:, :2]
# The ':' for the row means "select ALL rows"
print("\n--- First two columns ---\n", first_two_cols)

# --- Slicing a "Block" ---
# Get a 2x2 block from the top right
# Rows 0 to 1, Columns 2 to 3
top_right_block = data[0:2, 2:4]
print("\n--- Top right 2x2 block ---\n", top_right_block)
```

**Key Takeaway:** The comma separates your selections for each dimension. The colon `:` is your tool for selecting ranges within a dimension.

---

### **Boolean Indexing (Conditional Filtering)**

This is arguably the most powerful and useful feature of NumPy for data analysis. It allows you to select elements from an array based on a condition.

The process has two steps:
1.  Create a "boolean mask" by applying a condition to the array.
2.  Use that mask to index the original array.

Let's see it in action.

```python
import numpy as np

scores = np.array([
    [85, 92, 71, 99],
    [78, 100, 81, 74]
])
print("Original scores:\n", scores)

# --- Step 1: Create the Boolean Mask ---
# Find all scores greater than 90
is_greater_than_90 = scores > 90

print("\nStep 1: The Boolean Mask (scores > 90):")
print(is_greater_than_90)
# This returns a new array of the SAME SHAPE as the original,
# but with True/False values.

# --- Step 2: Use the Mask to Select Elements ---
# This selects ONLY the elements from the original array
# where the mask is True.
high_scores = scores[is_greater_than_90] # or more commonly, scores[scores > 90]

print("\nStep 2: The actual high scores:")
print(high_scores)
# This returns a 1D array of the selected values.
```

**Output:**
```
Original scores:
 [[ 85  92  71  99]
 [ 78 100  81  74]]

Step 1: The Boolean Mask (scores > 90):
[[False  True False  True]
 [False  True False False]]

Step 2: The actual high scores:
[ 92  99 100]
```

You can also use boolean indexing to **modify** values. For example, let's cap all scores at 100 (in case of data entry errors).

```python
# Let's say we have some bad data
scores_with_errors = np.array([88, 95, 105, 72, 110])
print("\nOriginal scores with errors:", scores_with_errors)

# Find scores > 100 and set them to 100
scores_with_errors[scores_with_errors > 100] = 100

print("Corrected scores:", scores_with_errors)
```

**Output:**
```
Original scores with errors: [ 88  95 105  72 110]
Corrected scores: [ 88  95 100  72 100]
```

---

This was a dense but incredibly important lesson. Do the concepts of slicing with `[row, col]` and filtering with boolean masks make sense?

### **3. Final NumPy Challenge**

**Scenario:** You are analyzing sensor data from a manufacturing plant. The data represents temperature readings from 4 different machines over a period of 6 hours. Some readings are faulty (e.g., negative values). Your job is to process this data: select specific parts of it, filter out the faulty readings, and calculate key statistics on the valid data.

**The Data:**

Create a 2D NumPy array named `sensor_data` with a shape of `(4, 6)` (4 machines, 6 hours). The data should be random integers between -10 and 90.

**Instructions:**

1.  **Create and Display Data:**
    *   Import NumPy.
    *   Create the `sensor_data` array using `np.random.randint()`.
    *   Print the original `sensor_data` array, labeling it "Original Sensor Data".

2.  **Indexing and Slicing:**
    *   Select and print the temperature reading for **Machine 2** (row index 2) at **Hour 4** (column index 4).
    *   Select and print all the readings for the **first 3 hours** (columns 0, 1, 2) for **all machines**.
    *   Select and print all the readings for just **Machine 1 and Machine 3** (row indices 1 and 3). *(Hint: You can use a list of indices like `data[[1, 3], :]`)*

3.  **Boolean Filtering and Data Cleaning:**
    *   Create a "boolean mask" to identify all **valid** temperature readings. A valid reading is defined as being greater than or equal to 0.
    *   Use this mask to create a new 1D array called `valid_readings` that contains only the non-negative temperatures.
    *   Print this `valid_readings` array.

4.  **Final Analysis:**
    *   Using only the `valid_readings` array, calculate and print the following:
        *   The number of valid readings. *(Hint: the `.size` attribute)*
        *   The average of all valid temperatures.
        *   The maximum valid temperature recorded.

5.  **Bonus Challenge (Modification):**
    *   Go back to the original `sensor_data` array.
    *   Using a boolean mask, find all the faulty negative readings and **replace them with 0**.
    *   Print the `sensor_data` array again, labeling it "Corrected Sensor Data", to show that the negative values have been fixed.

This task combines everything: array creation, multi-dimensional indexing, slicing, conditional filtering, and statistical analysis. Good luck

In [60]:
import numpy as np
sensor_data = np.random.randint(-10,91,(4,6))
print(f"Original Sensor Data:\n{sensor_data}\n")
print(f"Readings for Machine 2 at Hour 4: {sensor_data[2,4]}\n")
print(f"All readings for first 3 hours:\n{sensor_data[:,:3]}\n")
print(f"All readings for Machine 1 and Machine 3:\n{sensor_data[[1,3],:]}\n")

boolean_mask_data = sensor_data >= 0
valid_readings = sensor_data[boolean_mask_data]
print(f"Valid Readings Array:\n{valid_readings}\n")
print(f"Number of valid readings: {valid_readings.size}\n")
print(f"Average of all valid temperatures: {valid_readings.mean()}\n")
print(f"Max Valid Temperature recorded: {valid_readings.max()}\n")

sensor_data[sensor_data<0] = 0
print(f"Corrected Sensor Data:\n {sensor_data}")


Original Sensor Data:
[[ -1  20  74  82  15  34]
 [ 63  13  18  -9 -10  33]
 [ 84  35  72  51  56  -1]
 [ 57  44  51  61  22  79]]

Readings for Machine 2 at Hour 4: 56

All readings for first 3 hours:
[[-1 20 74]
 [63 13 18]
 [84 35 72]
 [57 44 51]]

All readings for Machine 1 and Machine 3:
[[ 63  13  18  -9 -10  33]
 [ 57  44  51  61  22  79]]

Valid Readings Array:
[20 74 82 15 34 63 13 18 33 84 35 72 51 56 57 44 51 61 22 79]

Number of valid readings: 20

Average of all valid temperatures: 48.2

Max Valid Temperature recorded: 84

Corrected Sensor Data:
 [[ 0 20 74 82 15 34]
 [63 13 18  0  0 33]
 [84 35 72 51 56  0]
 [57 44 51 61 22 79]]
