# NumPy

**NumPy** (Numerical Python) is a fundamental library for scientific computing in Python. It offers a powerful N-dimensional array object (`ndarray`), tools for integrating C/C++ code, and useful features for linear algebra, Fourier transform, and random numbers.

### Why NumPy?

1.  **Performance**: NumPy array operations are implemented in C, making them much faster than traditional Python lists, especially for large amounts of data.
2.  **Memory efficiency**: NumPy arrays take up less memory than Python lists.
3.  **Math functionalities**: It provides a wide range of mathematical functions to operate on arrays efficiently (e.g., trigonometric, statistical, linear algebra).
4.  **A base fot other libraries**: Many other scientific and machine learning libraries (such as Pandas, SciPy, Scikit-learn) rely on NumPy for their data structures and operations.

In [None]:
import numpy as np

# Create a NumPy array from a Python list
list_data = [1, 2, 3, 4, 5]
numpy_array = np.array(list_data)
print(f"Array NumPy: {numpy_array}")
print(f"Type: {type(numpy_array)}")
print(f"Shape: {numpy_array.shape}")

# Vectorized operations (very efficient)
array_sum = numpy_array + 10
print(f"Array after adding 10: {array_sum}")

array_prod = numpy_array * 2
print(f"Array after multiplying by 2: {array_prod}")

# Create a two-dimensional array (matrix)
matrix_data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
numpy_matrix = np.array(matrix_data)
print(f"\nNumPy matrix:\n{numpy_matrix}")
print(f"Matrix shape: {numpy_matrix.shape}")

# Calculate the sum of all the elements of the matrix
print(f"Sum of all the elements of the matrix: {numpy_matrix.sum()}")

# Access the elements
print(f"Element in (0, 0): {numpy_matrix[0, 0]}")
print(f"First row: {numpy_matrix[0, :]}")

Array NumPy: [1 2 3 4 5]
Type: <class 'numpy.ndarray'>
Shape: (5,)
Array after adding 10: [11 12 13 14 15]
Array after multiplying by 2: [ 2  4  6  8 10]

NumPy matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Matrix shape: (3, 3)
Sum of all the elements of the matrix: 45
Element in (0, 0): 1
First row: [1 2 3]


In [None]:
# benchmark to realize NumPy super power
my_arr = np.arange(1000000)
my_list = list(range(1000000))
%timeit my_arr2 = my_arr * 2
%timeit my_list2 = [x * 2 for x in my_list]


1.18 ms ± 36.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
74.2 ms ± 19.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [None]:
# NumPy array creation functions

print("1. np.array(): From a Python list or tuple")
list_data = [1, 2, 3, 4, 5]
array_from_list = np.array(list_data)
print(f"  np.array({list_data}) -> {array_from_list}\n")

print("2a. np.zeros(): Create an array filled with zeros")
zerosimple_array = np.zeros(3, dtype=int)
print(f"  np.zeros(3) ->\n{zerosimple_array}\n")

print("2b. np.zeros(): Create an array filled with zeros")
zeros_array = np.zeros((3, 4)) # 3 rows, 4 columns
print(f"  np.zeros((3, 4)) ->\n{zeros_array}\n")

print("3. np.ones(): Create an array filled with ones")
ones_array = np.ones((2, 3))
print(f"  np.ones((2, 3)) ->\n{ones_array}\n")

print("4. np.empty(): Create an array without initializing entries (fastest)")
empty_array = np.empty((2, 2))
print(f"  np.empty((2, 2)) ->\n{empty_array}\n") # Contains arbitrary values

print("5. np.arange(): Create an array with a range of values")
arange_array = np.arange(0, 10, 2) # start, stop (exclusive), step
print(f"  np.arange(0, 10, 2) -> {arange_array}\n")

print("6. np.linspace(): Create an array with evenly spaced numbers over a specified interval")
linspace_array = np.linspace(0, 1, 5) # start, stop, number of elements
print(f"  np.linspace(0, 1, 5) -> {linspace_array}\n")

print("7. np.full(): Create a new array of given shape and type, filled with fill_value")
full_array = np.full((2, 2), 7)
print(f"  np.full((2, 2), 7) ->\n{full_array}\n")

print("8. np.eye(): Create an identity matrix")
identity_matrix = np.eye(3)
print(f"  np.eye(3) ->\n{identity_matrix}\n")

print("9. np.random.rand(): Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).")
random_uniform = np.random.rand(2, 2)
print(f"  np.random.rand(2, 2) ->\n{random_uniform}\n")

print("10. np.random.randint(): Return random integers from low (inclusive) to high (exclusive).")
random_integers = np.random.randint(0, 10, size=(2, 3))
print(f"  np.random.randint(0, 10, size=(2, 3)) ->\n{random_integers}\n")

1. np.array(): From a Python list or tuple
  np.array([1, 2, 3, 4, 5]) -> [1 2 3 4 5]

2a. np.zeros(): Create an array filled with zeros
  np.zeros(3) ->
[0 0 0]

2b. np.zeros(): Create an array filled with zeros
  np.zeros((3, 4)) ->
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

3. np.ones(): Create an array filled with ones
  np.ones((2, 3)) ->
[[1. 1. 1.]
 [1. 1. 1.]]

4. np.empty(): Create an array without initializing entries (fastest)
  np.empty((2, 2)) ->
[[0.89846528 0.52459216]
 [0.13814873 0.41636812]]

5. np.arange(): Create an array with a range of values
  np.arange(0, 10, 2) -> [0 2 4 6 8]

6. np.linspace(): Create an array with evenly spaced numbers over a specified interval
  np.linspace(0, 1, 5) -> [0.   0.25 0.5  0.75 1.  ]

7. np.full(): Create a new array of given shape and type, filled with fill_value
  np.full((2, 2), 7) ->
[[7 7]
 [7 7]]

8. np.eye(): Create an identity matrix
  np.eye(3) ->
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

9. np.random.rand(): Create an ar

### Boolean Indexing in NumPy

Boolean indexing (also known as boolean masking) is a method to select elements from a NumPy array that satisfy certain conditions. You pass a boolean array (of the same shape as the array you want to index) to the array's indexing operation. Wherever the boolean array has `True`, the corresponding element from the original array is selected; wherever it's `False`, the element is skipped.

This is extremely useful for filtering data, performing conditional operations, and data manipulation.

In [6]:
# Create a sample NumPy array
data = np.array([10, 25, 30, 45, 50, 65, 70, 85, 90, 100])
print(f"Original array: {data}")

# 1. Create a boolean array based on a condition
#    Let's find all elements greater than 50
bool_mask = data > 50
print(f"\nBoolean mask (data > 50): {bool_mask}")
print(type(bool_mask))

# 2. Use the boolean mask for indexing
filtered_data = data[bool_mask]
print(f"Filtered data (elements > 50): {filtered_data}")

# You can also combine steps directly:
filtered_data_direct = data[data % 2 == 0] # Filter for even numbers
print(f"\nFiltered data (even numbers): {filtered_data_direct}")

# Boolean indexing can also be used to modify elements
modified_data = data.copy() # Work on a copy to preserve original
modified_data[modified_data < 30] = 0 # Set elements less than 30 to 0
print(f"\nArray after modifying elements < 30 to 0: {modified_data}")

# For 2D arrays (matrices)
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(f"\nOriginal matrix:\n{matrix}")

# Select elements greater than 5 in the matrix
matrix_filtered = matrix[matrix > 5]
print(f"Matrix elements > 5: {matrix_filtered}")

Original array: [ 10  25  30  45  50  65  70  85  90 100]

Boolean mask (data > 50): [False False False False False  True  True  True  True  True]
<class 'numpy.ndarray'>
Filtered data (elements > 50): [ 65  70  85  90 100]

Filtered data (even numbers): [ 10  30  50  70  90 100]

Array after modifying elements < 30 to 0: [  0   0  30  45  50  65  70  85  90 100]

Original matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Matrix elements > 5: [6 7 8 9]


### Pseudorandom Number Generation in NumPy

NumPy's `numpy.random` module is used to generate pseudorandom numbers. These numbers are "pseudorandom" because they are generated by a deterministic algorithm, but they appear random for most practical purposes. The sequence of numbers is determined by an initial 'seed'. If you use the same seed, you'll get the same sequence of numbers.

Key concepts:

*   **Pseudorandomness**: Numbers appear random but are reproducible if the seed is fixed.
*   **Seed**: Initializes the random number generator. Useful for reproducible research and debugging.
*   **Distributions**: NumPy supports various probability distributions (uniform, normal, binomial, etc.).

Starting from NumPy 1.17, the recommended way to generate random numbers is using the `Generator` class, which offers a more modern and flexible approach compared to the older `np.random.seed()` and direct `np.random` functions.

In [None]:
print("1. Basic Random Number Generation (old way - convenient for quick scripts)")
# Generate a single random float between 0.0 and 1.0
random_float = np.random.rand()
print(f"  Single random float: {random_float}")

# Generate a 1D array of 5 random floats
random_array_1d = np.random.rand(5)
print(f"  1D array of 5 random floats: {random_array_1d}")

# Generate a 2x3 array of random floats
random_array_2d = np.random.rand(2, 3)
print(f"  2x3 array of random floats:\n{random_array_2d}\n")

print("2. Random Integers")
# Generate a single random integer between 0 (inclusive) and 10 (exclusive)
random_int = np.random.randint(0, 10)
print(f"  Single random integer (0-9): {random_int}")

# Generate a 2x2 array of random integers between 1 (inclusive) and 100 (exclusive)
random_ints_array = np.random.randint(1, 100, size=(2, 2))
print(f"  2x2 array of random integers (1-99):\n{random_ints_array}\n")

print("3. Sampling from a Normal Distribution (Gaussian)")
# Mean (loc) = 0, Standard Deviation (scale) = 1, size = 4 random numbers
normal_dist_samples = np.random.normal(loc=0, scale=1, size=4)
print(f"  Samples from a standard normal distribution: {normal_dist_samples}\n")

print("4. Reproducibility with a Seed (old way)")
np.random.seed(42) # Set the seed
print(f"  First set of random numbers with seed 42: {np.random.rand(3)}")
np.random.seed(42) # Set the same seed again
print(f"  Second set of random numbers with same seed 42: {np.random.rand(3)}\n")

print("5. Modern approach with Generator (recommended for new code)")
# Create a default random number generator
rng = np.random.default_rng() # rng is a random number generator object
print(f"  Random numbers using default_rng: {rng.random(3)}")

# Create a random number generator with a specific seed for reproducibility
rng_seeded = np.random.default_rng(123)
print(f"  First set of random numbers with seeded Generator (123): {rng_seeded.random(3)}")

rng_seeded_again = np.random.default_rng(123)
print(f"  Second set of random numbers with same seeded Generator (123): {rng_seeded_again.random(3)}")

# Generate integers with the Generator
print(f"  Random integers using Generator (1-10): {rng.integers(1, 11, size=5)}")

# Generate samples from a normal distribution with the Generator
print(f"  Normal distribution samples using Generator: {rng.normal(loc=10, scale=2, size=5)}")

1. Basic Random Number Generation (old way - convenient for quick scripts)
  Single random float: 0.5986584841970366
  1D array of 5 random floats: [0.15601864 0.15599452 0.05808361 0.86617615 0.60111501]
  2x3 array of random floats:
[[0.70807258 0.02058449 0.96990985]
 [0.83244264 0.21233911 0.18182497]]

2. Random Integers
  Single random integer (0-9): 4
  2x2 array of random integers (1-99):
[[33 76]
 [58 22]]

3. Sampling from a Normal Distribution (Gaussian)
  Samples from a standard normal distribution: [-2.43910582  0.60344123 -0.25104397 -0.16386712]

4. Reproducibility with a Seed (old way)
  First set of random numbers with seed 42: [0.37454012 0.95071431 0.73199394]
  Second set of random numbers with same seed 42: [0.37454012 0.95071431 0.73199394]

5. Modern approach with Generator (recommended for new code)
  Random numbers using default_rng: [0.20148109 0.08039398 0.60489697]
  First set of random numbers with seeded Generator (123): [0.68235186 0.05382102 0.22035987]


### Exercise 1: Array Creation, Slicing, and Basic Statistics

**Task:**
1. Create a 4x5 NumPy array named `data_array` containing random integers between 1 and 100.
2. Extract the first two rows and all columns from `data_array` and store it in a new array called `sub_array`.
3. Calculate first the meam of all values and then the mean of each column in `data_array` (look for `axis` spec).
4. Find the maximum value in the entire `data_array`.

**Expected Output:**
- `data_array` (e.g., `[[..],[..],[..],[..]]`)
- `sub_array` (e.g., `[[..],[..]]`)
- Mean of each column (e.g., `[mean_col1, mean_col2, ...] `)
- Maximum value (e.g., `98`)

In [None]:
# Your code for Exercise 1 here
import numpy as np

# 1. Create a 4x5 NumPy array with random integers between 1 and 100
np.random.seed(42) # for reproducibility
data_array = np.random.randint(1, 101, size=(4, 5))
print(f"Data Array:\n{data_array}")

# 2. Extract the first two rows and all columns
sub_array = data_array[0:2, :]
print(f"\nSub Array (first two rows):\n{sub_array}")

# 3. Calculate the mean of each column
column_means = data_array.mean(axis=0)
print(f"\nMean of each column: {column_means}")

# 4. Find the maximum value in the entire array
max_value = data_array.max()
print(f"\nMaximum value in the array: {max_value}")

Data Array:
[[ 52  93  15  72  61]
 [ 21  83  87  75  75]
 [ 88 100  24   3  22]
 [ 53   2  88  30  38]]

Sub Array (first two rows):
[[52 93 15 72 61]
 [21 83 87 75 75]]

Mean of each column: [53.5 69.5 53.5 45.  49. ]

Maximum value in the array: 100


### Exercise 2: Boolean Indexing and Conditional Modification

**Task:**
1. Create a 1D NumPy array named `float_array` of 20 random floats between 0 and 1.
2. Identify and print all elements in `float_array` that are greater than 0.7.
3. Replace all elements in `float_array` that are less than 0.3 with the value 0.
4. Count and print how many elements in the modified `float_array` are now 0.

**Expected Output:**
- `float_array` (initial and modified)
- Elements greater than 0.7
- Count of zeros in the modified array.

In [3]:
# Your code for Exercise 2 here
import numpy as np

# 1. Create a 1D NumPy array of 20 random floats between 0 and 1
np.random.seed(42) # for reproducibility
float_array = np.random.rand(20)
print(f"Original Float Array:\n{float_array}")

# 2. Identify and print all elements greater than 0.7
elements_gt_0_7 = float_array[float_array > 0.7]
print(f"\nElements greater than 0.7: {elements_gt_0_7}")

# 3. Replace all elements less than 0.3 with 0
float_array[float_array < 0.3] = 0
print(f"\nModified Float Array (elements < 0.3 replaced with 0):\n{float_array}")

# 4. Count how many elements are now 0
count_zeros = np.sum(float_array == 0)
print(f"\nNumber of zeros in the modified array: {count_zeros}")

Original Float Array:
[0.37454012 0.95071431 0.73199394 0.59865848 0.15601864 0.15599452
 0.05808361 0.86617615 0.60111501 0.70807258 0.02058449 0.96990985
 0.83244264 0.21233911 0.18182497 0.18340451 0.30424224 0.52475643
 0.43194502 0.29122914]

Elements greater than 0.7: [0.95071431 0.73199394 0.86617615 0.70807258 0.96990985 0.83244264]

Modified Float Array (elements < 0.3 replaced with 0):
[0.37454012 0.95071431 0.73199394 0.59865848 0.         0.
 0.         0.86617615 0.60111501 0.70807258 0.         0.96990985
 0.83244264 0.         0.         0.         0.30424224 0.52475643
 0.43194502 0.        ]

Number of zeros in the modified array: 8


### Exercise 3: Array Reshaping and Matrix Operations

**Task:**
1. Create a 1D NumPy array named `sequence_array` containing integers from 1 to 12 (inclusive).
2. Reshape `sequence_array` into a 3x4 matrix named `matrix_A`.
3. Create another 4x2 matrix named `matrix_B` containing random integers between 1 and 5 (inclusive).
4. Perform matrix multiplication between `matrix_A` and `matrix_B`, storing the result in `matrix_C`.
5. Calculate the transpose of `matrix_C`.

**Expected Output:**
- `sequence_array`
- `matrix_A`
- `matrix_B`
- `matrix_C` (result of A @ B)
- Transpose of `matrix_C`

In [4]:
# Your code for Exercise 3 here
import numpy as np

# 1. Create a 1D NumPy array containing integers from 1 to 12
sequence_array = np.arange(1, 13)
print(f"Sequence Array: {sequence_array}")

# 2. Reshape into a 3x4 matrix
matrix_A = sequence_array.reshape(3, 4)
print(f"\nMatrix A (3x4):\n{matrix_A}")

# 3. Create another 4x2 matrix of random integers between 1 and 5
np.random.seed(42) # for reproducibility
matrix_B = np.random.randint(1, 6, size=(4, 2))
print(f"\nMatrix B (4x2):\n{matrix_B}")

# 4. Perform matrix multiplication (A @ B)
matrix_C = matrix_A @ matrix_B
print(f"\nMatrix C (A @ B - 3x2):\n{matrix_C}")

# 5. Calculate the transpose of matrix_C
transpose_C = matrix_C.T
print(f"\nTranspose of Matrix C:\n{transpose_C}")

Sequence Array: [ 1  2  3  4  5  6  7  8  9 10 11 12]

Matrix A (3x4):
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Matrix B (4x2):
[[4 5]
 [3 5]
 [5 2]
 [3 3]]

Matrix C (A @ B - 3x2):
[[ 37  33]
 [ 97  93]
 [157 153]]

Transpose of Matrix C:
[[ 37  97 157]
 [ 33  93 153]]
