# Assignment 2: NumPy
- **you will learn:** how to create and manipulate NumPy arrays, perform vectorized computations, and use basic NumPy functions for data analysis
- **task:**  See section 2.9 below
- **deadline:** 20.10.2025
- [NumPy documentation](https://numpy.org/doc/stable/)
- 📝 **Reminder:** Sync your GitHub repository with the main course repository, update your project in PyCharm, and after completing the assignment, commit and push your changes back to GitHub.
---

## 2.0 PEP 8 and Code Commenting

### What is PEP 8?
**PEP 8** is the official **style guide** for Python code.
It defines conventions that make your code **clean, consistent, and easy to read**.
While not mandatory, following PEP 8 is considered a sign of **professional and readable coding**.

### Some Important PEP 8 Rules
- ✅ **Line length:** keep lines **under 79 characters**.
- ✅ **Spacing:**
  - add spaces around operators (`a + b`, not `a+b`)
  - add a space **after** commas, not before (`[1, 2, 3]`, not `[1 ,2 ,3]`)
- ✅ **Variable and function names:** use lowercase with underscores (`calculate_mean`, not `CalculateMean`).
- ✅ **Class names:** use `CamelCase` (`DataProcessor`).
- ✅ **Imports:** at the top of the file, one per line.
- ✅ **Blank lines:** use two blank lines between functions.

 **Official guide:** [PEP 8 – Style Guide for Python Code](https://peps.python.org/pep-0008/)

### Comments and Docstrings
Comments explain **what** your code does and **why**.
Every function should include a **docstring** — a text enclosed in triple quotes `""" ... """` that briefly describes the function’s purpose, parameters, and return value.

#### Example:

```python
def calculate_mean(values):
    """
    Compute the arithmetic mean of a list of numbers.

    Parameters
    ----------
    values : list of float
        Input numbers.

    Returns
    -------
    float
        The arithmetic mean of the input values.
    """
    if not values:
        return 0.0
    return sum(values) / len(values)



---
## 2.1 What is a NumPy Array?

- In computer programming, an **array** is a structure for storing and retrieving data.
- They are the foundation for **data science, machine learning, and scientific computing** in Python.
- We often visualize an array as a **grid in space**, with each cell storing one element of data.
- Arrays can be **1D (vectors), 2D (matrices), or higher-dimensional (tensors)**.

Most NumPy arrays have some rules:

1. **Homogeneous type:** All elements must be of the same data type.
2. **Fixed size:** Once created, the total size cannot change.
3. **Rectangular shape:** All rows (in 2D arrays) must have the same number of columns — no jagged arrays.

When these conditions are met, NumPy can exploit them to make arrays:

- **Faster** (optimized C loops under the hood)
- **More memory efficient** (contiguous memory storage)
- **More convenient to use** (vectorized operations without explicit loops)

In [218]:
import numpy
import numpy as np
from sqlalchemy.sql.functions import percentile_cont

print(np.__version__)

2.3.3


In [219]:
# Create 1D array (vector)
vector = np.array([10, 20, 30, 40, 50])
print("1D array (vector):", vector)
print("Shape:", vector.shape, "Dtype:", vector.dtype)

# Create 2D array (matrix)
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
print("2D array (matrix):\n", matrix)
print("Shape:", matrix.shape, "Dtype:", matrix.dtype)

# Vectorized operation: multiply all elements by 2 or square them
matrix2 = 2 * matrix
matrix_sq = matrix ** 2
print("Matrix after multiplying by 2:\n", matrix2)
print("Matrix after squaring by 2:\n", matrix_sq)

1D array (vector): [10 20 30 40 50]
Shape: (5,) Dtype: int64
2D array (matrix):
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Shape: (3, 3) Dtype: int64
Matrix after multiplying by 2:
 [[ 2  4  6]
 [ 8 10 12]
 [14 16 18]]
Matrix after squaring by 2:
 [[ 1  4  9]
 [16 25 36]
 [49 64 81]]


---
## 2.2 Constructing arrays

There are several mechanisms for creating arrays. Among others:

1. **Conversion from other Python structures**
   Arrays can be created directly from existing **lists or tuples** using `np.array()`.
   This is the most common and straightforward way to build an array from existing data.

In [220]:
# Conversion from Python structures
a = np.array([1, 2, 3, 4, 5])
b= np.array(((1,0),(0,1)))
c = np.array([([1,2],[2,1]), ([3,1],[1,3])])
print("From list:\n", a)
print("From tuples of tuples:\n", b)
print("From list of tuples or lists:\n", c)

From list:
 [1 2 3 4 5]
From tuples of tuples:
 [[1 0]
 [0 1]]
From list of tuples or lists:
 [[[1 2]
  [2 1]]

 [[3 1]
  [1 3]]]



2. **NumPy array creation functions**
   NumPy provides a set of **built-in constructors** such as `np.zeros`, `np.ones`, `np.arange`, and `np.linspace`
   to generate arrays of a specific shape or with evenly spaced values.

In [221]:
# np.empty(shape, dtype)
# Creates a new array *without initializing* its entries (values are arbitrary).
arr_empty = np.empty((2,3), dtype="int32")
print("np.empty:\n", arr_empty)
print("np.empty type:", arr_empty.dtype)

np.empty:
 [[ -927712936  1072938614 -1580547965]
 [ 1075230277  1628651599 -1071500349]]
np.empty type: int32


In [222]:
# np.identity(n)
# Shortcut for creating a square identity matrix (ones on the main diagonal).
arr_identity = np.identity(4)
print("np.identity:\n", arr_identity)

np.identity:
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [223]:
# np.eye(N, M, k)
# Creates a 2D array with ones on the main (or k-th) diagonal, zeros elsewhere.
arr_eye = np.eye(4,4,-1)
print("np.eye:\n", arr_eye)

np.eye:
 [[0. 0. 0. 0.]
 [1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]]


In [224]:
# np.ones(shape, dtype)
# Creates an array of given shape filled with ones.
arr_ones = np.ones((2, 4))
print("np.ones:\n", arr_ones)

np.ones:
 [[1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [225]:
# np.zeros(shape, dtype)
# Creates an array filled with zeros.
arr_zeros = np.zeros((3, 3))
print("np.zeros:\n", arr_zeros)

np.zeros:
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [226]:
# np.full(shape, fill_value)
# Creates an array filled with a specified constant value.
arr_full = np.full((2, 3), fill_value=7)
print("np.full:\n", arr_full)

np.full:
 [[7 7 7]
 [7 7 7]]


In [227]:
# np.empty_like(prototype)
# Creates an uninitialized array with the *same shape and dtype* as another array.
prototype = np.array([[1, 2, 3], [4, 5, 6]], dtype=float)
arr_zeros_like = np.zeros_like(prototype)
print("np.empty_like:\n", arr_zeros_like)

# Similarly with np.empty_like(a), np.zeros_like(a), np.full_like(a, fill_value)

np.empty_like:
 [[0. 0. 0.]
 [0. 0. 0.]]


In [228]:
# np.arange([start,] stop[, step,][, dtype])
# Returns evenly spaced values within a given interval.
# Similar to Python's range(), but returns a NumPy array.
arr_arange = np.arange(0, 10, 2, dtype=float)
print("np.arange:\n", arr_arange)

np.arange:
 [0. 2. 4. 6. 8.]


In [229]:
# np.linspace(start, stop[, num, endpoint])
# Returns evenly spaced numbers over a specified interval.
# Unlike arange, it lets you specify the number of samples.
arr_linspace = np.linspace(0, 1, num=5)
print("np.linspace:\n", arr_linspace)

np.linspace:
 [0.   0.25 0.5  0.75 1.  ]


In [230]:
# np.diag(v[, k])
# Construct a diagonal matrix from a 1D array, or extract a diagonal from a 2D array.
v = np.array([1, 2, 3])
arr_diag = np.diag(v)
print("np.diag (construct from 1D):\n", arr_diag)

np.diag (construct from 1D):
 [[1 0 0]
 [0 2 0]
 [0 0 3]]


In [231]:
# np.tril(m[, k])
# Return the lower triangle of an array (elements above the k-th diagonal are zeroed).
m = np.arange(1, 10).reshape(3, 3)
arr_tril = np.tril(m)
print("np.tril (lower triangle):\n", arr_tril)

# similarly with np.triu(m[, k])

np.tril (lower triangle):
 [[1 0 0]
 [4 5 0]
 [7 8 9]]



3. **Replicating, joining, or mutating existing arrays**
   Arrays can be **copied, concatenated, reshaped, or repeated** to create new ones.
   For example, you can use `np.tile`, `np.concatenate`, or `reshape` for this purpose.

In [232]:
# np.reshape(a, newshape)
# Changes the shape of an array without changing its data.
a = np.arange(6)
print("a:\n", a)
reshaped = np.reshape(a, (3, 2))
print("np.reshape:\n", reshaped)

a:
 [0 1 2 3 4 5]
np.reshape:
 [[0 1]
 [2 3]
 [4 5]]


In [233]:
# a.flatten()
# Flatten a multi-dimensional array into 1D.
a2 = np.array([[1, 2], [3, 4]])
print("a.flatten:\n", a2.flatten())

a.flatten:
 [1 2 3 4]


In [234]:
# np.transpose(a) or a.T
# Swaps axes, e.g., turns rows into columns.
print("np.transpose:\n", np.transpose(a2))

np.transpose:
 [[1 3]
 [2 4]]


In [235]:
# np.swapaxes(a, axis1, axis2)
# Swaps any two axes in a multi-dimensional array.
a3 = np.arange(8).reshape(2, 2, 2)
print("a3:\n", a3)
print("np.swapaxes:\n", np.swapaxes(a3, 0, 2))

a3:
 [[[0 1]
  [2 3]]

 [[4 5]
  [6 7]]]
np.swapaxes:
 [[[0 4]
  [2 6]]

 [[1 5]
  [3 7]]]


In [236]:
# np.moveaxis(a, source, destination)
# Moves a given axis to a new position.
a4 = np.zeros((2, 3, 4))
print("np.moveaxis shape:", np.moveaxis(a4, 0, -1).shape)

np.moveaxis shape: (3, 4, 2)


In [237]:
# np.squeeze(a)
# Removes axes of length 1.
a5 = np.zeros((1, 3, 1))
print("a5:\n", a5)
print("np.squeeze shape:", np.squeeze(a5).shape)
print("a5 squeezed:\n", np.squeeze(a5))

a5:
 [[[0.]
  [0.]
  [0.]]]
np.squeeze shape: (3,)
a5 squeezed:
 [0. 0. 0.]


In [238]:
# np.expand_dims(a, axis)
# Adds a new dimension (axis) to the array.
a6 = np.array([1, 2, 3])
print("np.expand_dims shape:", np.expand_dims(a6, axis=0).shape)

np.expand_dims shape: (1, 3)


In [239]:
# np.concatenate((a1, a2, ...), axis)
# Joins arrays along an existing axis.
a7 = np.ones((2, 2))
b7 = np.zeros((2, 2))
print("np.concatenate:\n", np.concatenate((a7, b7), axis=1))

np.concatenate:
 [[1. 1. 0. 0.]
 [1. 1. 0. 0.]]


In [240]:
# np.stack((a1, a2, ...), axis)
# Stacks arrays along a new axis.
a8 = np.array([1, 2])
b8 = np.array([3, 4])
print("np.stack:\n", np.stack((a8, b8), axis=0))

np.stack:
 [[1 2]
 [3 4]]


In [241]:
# np.split(a, sections, axis)
# Splits an array into multiple subarrays.
x = np.arange(9)
print("np.split:\n", np.split(x, 3))

np.split:
 [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]


In [242]:
# np.copy(a)
# Creates a deep copy of the array (independent of the original).
a11 = np.array([1, 2, 3])
b11 = a11.copy()
b11[0] = 99
print("Original:", a11, " | Copy:", b11)

Original: [1 2 3]  | Copy: [99  2  3]


In [243]:
# np.astype(dtype)
# Converts array elements to a new type.
a12 = np.array([1, 2, 3])
print("astype to float:\n", a12.astype(float))

astype to float:
 [1. 2. 3.]


In [244]:
# np.clip(a, min, max)
# Limits values to a given range.
a13 = np.array([-1, 0, 2, 5])
print("np.clip:\n", np.clip(a13, 0, 3))

np.clip:
 [0 0 2 3]


In [245]:
# np.where(condition, x, y)
# Selects elements based on a condition.
a14 = np.array([1, 2, 3])
print("np.where (a > 1 -> 100):\n", np.where(a14 > 1, 100, a14))

np.where (a > 1 -> 100):
 [  1 100 100]



4. **Creating Arrays from Other Libraries**

Many Python libraries — such as **SciPy**, **Pandas**, and **OpenCV** — use NumPy `ndarray` objects as a **common format for data exchange**.
These libraries can **create**, **manipulate**, and **interoperate with** NumPy arrays directly.


---
## 2.3 Indexing arrays

Note that Python indexes (unlike for example R) start from 0, not from 1.

### Basic Indexing

In [246]:
# Create a 2D array for demonstration
x = np.arange(1, 13).reshape(3, 4)
print("Array x:\n", x)

# 1. Single element indexing
print("Single element x[1, 2]:", x[1, 2])
print("Same by chained indexing x[1][2]:", x[1][2])

Array x:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
Single element x[1, 2]: 7
Same by chained indexing x[1][2]: 7


In [247]:
# 2. If fewer indices than dimensions, returns a subarray (view)
print("x[0] returns first row (view):", x[0])

x[0] returns first row (view): [1 2 3 4]


In [248]:
# 3. Slicing (rows, columns)
print("x[0:2, 1:4] → rows 0 and 1, columns 1 to 3:\n", x[0:2, 1:4])

x[0:2, 1:4] → rows 0 and 1, columns 1 to 3:
 [[2 3 4]
 [6 7 8]]


In [249]:
# 4. Striding with step (convention start:stop:step)
print("x[:, ::2] → all rows, every second column:\n", x[:, ::2])

x[:, ::2] → all rows, every second column:
 [[ 1  3]
 [ 5  7]
 [ 9 11]]


In [250]:
# 5. Using negative indices
print("x[-1, -2]:", x[-1, -2])  # last row, second-last column

x[-1, -2]: 11


In [251]:
# 6. Ellipsis (`...`) and `newaxis` (alias None)
# Ellipsis expands to as many ":" as needed
print("x[..., 2] → same as x[:, 2]:", x[..., 2])

# newaxis introduces a new dimension
y = x[:, 1]  # shape (3,)
y2 = x[:, 1, np.newaxis]  # shape (3,1)
print("y shape:", y.shape, "   y2 shape:", y2.shape)

x[..., 2] → same as x[:, 2]: [ 3  7 11]
y shape: (3,)    y2 shape: (3, 1)


- All slicing operations produce views, not copies — they refer to the same underlying data.
- Because of this, modifying a slice will affect the original array.
- When using integer indexing (not slicing), you reduce a dimension.
- : means “select all elements along this axis”.
- ... is a convenient placeholder to fill in missing : for remaining axes.

### Advanced Indexing

In [252]:
x = np.arange(1, 13).reshape(3, 4)
print("Array x:\n", x)

# 1. Integer array indexing
row_idx = [0, 2]
col_idx = [1, 3]
# Select elements (0,1) and (2,3)
print("x[row_idx, col_idx]:", x[row_idx, col_idx])

# Equivalent as
print("The same as:", np.array([x[0,1],x[2,3]]))

Array x:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
x[row_idx, col_idx]: [ 2 12]
The same as: [ 2 12]


In [253]:
# 2. Broadcasting integer indices
# If you supply fewer index arrays or scalars, they broadcast
print("x[row_idx, 2]:", x[row_idx, 2])
print("The same as:", x[[0,2], [2,2]])

x[row_idx, 2]: [ 3 11]
The same as: [ 3 11]


In [254]:
# 3. Boolean masking (Boolean indexing)
mask = x % 2 == 0  # True for even numbers
print("Boolean mask:\n", mask)
print("x[mask] → all even elements:", x[mask])

# Example modification using Boolean mask
x2 = x.copy()
x2[x2 % 2 == 1] = -1
print("x2 with odd elements replaced by –1:\n", x2)

Boolean mask:
 [[False  True False  True]
 [False  True False  True]
 [False  True False  True]]
x[mask] → all even elements: [ 2  4  6  8 10 12]
x2 with odd elements replaced by –1:
 [[-1  2 -1  4]
 [-1  6 -1  8]
 [-1 10 -1 12]]


In [255]:
# 4. Combining basic and advanced indexing
# e.g. select rows 0 and 2, but columns 1:3
print("x[[0, 2], 1:3]:\n", x[[0, 2], 1:3])

x[[0, 2], 1:3]:
 [[ 2  3]
 [10 11]]


---
## 2.4 Array attributes

- Every NumPy array is a Python object of class `numpy.ndarray`.
- Besides storing the actual data, it also stores various attributes that describe its structure and memory layout.

In [256]:
# Let's create a simple 2D array
a = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12]])

print("Type of numpy array object:", type(a))

Type of numpy array object: <class 'numpy.ndarray'>


In [257]:
print("Number of dimensions (ndim):", a.ndim)
print("Shape (rows, columns):", a.shape)
print("Total number of elements (size):", a.size)
print("Data type (dtype):", a.dtype)
print("Size of one element in bytes (itemsize):", a.itemsize)
print("Total size in bytes (nbytes):", a.nbytes)
print("Transposed array (T):\n", a.T)

Number of dimensions (ndim): 2
Shape (rows, columns): (3, 4)
Total number of elements (size): 12
Data type (dtype): int64
Size of one element in bytes (itemsize): 8
Total size in bytes (nbytes): 96
Transposed array (T):
 [[ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]
 [ 4  8 12]]


Example explanation:
- ndim   → tells how many axes (dimensions) the array has
- shape  → gives the length of each dimension as a tuple
- size   → total count of elements = product of shape entries
- dtype  → data type of the elements (e.g. int32, float64)
- itemsize → bytes per element, depends on dtype
- nbytes   → total memory used by the array
- T        → shorthand for the transposed view (rows <-> columns)


---
## 2.5 Array methods

- A NumPy ndarray provides many built-in methods that operate on the array or return information about it. Most of these methods return a new array or a computed value derived from the data.

In [258]:
# Create a 2D array
a = np.array([[1, 2, 3], [4, 5, 6]])

# Reshape the array to 3 rows and 2 columns
reshaped = a.reshape(3, 2)
print("Reshaped array:\n", reshaped)

# Flatten the array to 1D
flattened = a.flatten()
print("\nFlattened array:", flattened)

Reshaped array:
 [[1 2]
 [3 4]
 [5 6]]

Flattened array: [1 2 3 4 5 6]


In [259]:
# --- Max and Min ---
print("Max element:", a.max())                  # ndarray.max()
print("Index of max (flattened):", a.argmax())  # ndarray.argmax()
print("Min element:", a.min())                  # ndarray.min()
print("Index of min (flattened):", a.argmin())  # ndarray.argmin()

Max element: 6
Index of max (flattened): 5
Min element: 1
Index of min (flattened): 0


In [260]:
# --- Rounding ---
arr_float = np.array([[1.234, 2.567], [3.891, 4.456]])
rounded = arr_float.round(1)             # Round to 1 decimal
print("Rounded array:\n", rounded)

Rounded array:
 [[1.2 2.6]
 [3.9 4.5]]


In [261]:
# --- Trace ---
print("Trace (sum of diagonal):", a.trace())  # Sum along main diagonal

Trace (sum of diagonal): 6


In [262]:
# --- Sum, Cumsum, Mean ---
print("Sum of all elements:", a.sum())
print("Cumulative sum along rows:\n", a.cumsum(axis=1))
print("Mean along columns:", a.mean(axis=0))

Sum of all elements: 21
Cumulative sum along rows:
 [[ 1  3  6]
 [ 4  9 15]]
Mean along columns: [2.5 3.5 4.5]


In [263]:
# --- Variance and Standard Deviation ---
print("Variance of all elements:", a.var())
print("Standard deviation:", a.std())

Variance of all elements: 2.9166666666666665
Standard deviation: 1.707825127659933


In [264]:
# --- Logical checks ---
print("All elements > 0?", (a > 0).all())
print("Any element > 5?", (a > 5).any())

All elements > 0? True
Any element > 5? True


---
## 2.6 Arithmetic and linear algebra


In [265]:
# Create example arrays
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

print("Array a:\n", a)
print("Array b:\n", b)


Array a:
 [[1 2]
 [3 4]]
Array b:
 [[5 6]
 [7 8]]


In [266]:
# Addition (elementwise)
print("Addition (a + b):\n", a + b)

# Subtraction (elementwise)
print("Subtraction (a - b):\n", a - b)

# Multiplication (elementwise)
print("Elementwise multiplication (a * b):\n", a * b)

# Division (elementwise)
print("Elementwise division (a / b):\n", a / b)

# Exponentiation (elementwise)
print("Elementwise power (a ** 2):\n", a ** 2)

# Modulo (elementwise)
print("Elementwise modulo (b % a):\n", b % a)

Addition (a + b):
 [[ 6  8]
 [10 12]]
Subtraction (a - b):
 [[-4 -4]
 [-4 -4]]
Elementwise multiplication (a * b):
 [[ 5 12]
 [21 32]]
Elementwise division (a / b):
 [[0.2        0.33333333]
 [0.42857143 0.5       ]]
Elementwise power (a ** 2):
 [[ 1  4]
 [ 9 16]]
Elementwise modulo (b % a):
 [[0 0]
 [1 0]]


In [267]:
# Matrix multiplication
matmul = a @ b  # or np.matmul(a, b)
print("Matrix multiplication (a @ b):\n", matmul)

# Dot product
dot = np.dot(a[0], b[0])
print("Dot product (np.dot(a[0], b[0])):\n", dot)

# Transpose
print("Transpose of a (a.T):\n", a.T)

# Determinant
det = np.linalg.det(a)
print("Determinant of a:", det)

# Inverse
inv = np.linalg.inv(a)
print("Inverse of a:\n", inv)

# Eigenvalues and eigenvectors
eigvals, eigvecs = np.linalg.eig(a)
print("Eigenvalues of a:", eigvals)
print("Eigenvectors of a:\n", eigvecs)

# Norms
norm_a0 = np.linalg.norm(a[0])
print("Frobenius norm of a[0]:", norm_a0)

Matrix multiplication (a @ b):
 [[19 22]
 [43 50]]
Dot product (np.dot(a[0], b[0])):
 17
Transpose of a (a.T):
 [[1 3]
 [2 4]]
Determinant of a: -2.0000000000000004
Inverse of a:
 [[-2.   1. ]
 [ 1.5 -0.5]]
Eigenvalues of a: [-0.37228132  5.37228132]
Eigenvectors of a:
 [[-0.82456484 -0.41597356]
 [ 0.56576746 -0.90937671]]
Frobenius norm of a[0]: 2.23606797749979


---
## 2.7 Miscellaneous

**NumPy** provides optimized **mathematical functions** that work directly on entire arrays (`np.array` objects). These functions are implemented in **compiled C code**, which makes them **very fast**. They automatically apply the operation **elementwise** to all elements in the array — this is called **vectorization**.

In [268]:
# These are elementwise functions that operate efficiently on ndarrays.

# Create an example array
a = np.array([0, np.pi/4, np.pi/2, np.pi])
print("Array a:\n", a)

# Elementwise trigonometric functions
print("sin(a):", np.sin(a))

Array a:
 [0.         0.78539816 1.57079633 3.14159265]
sin(a): [0.00000000e+00 7.07106781e-01 1.00000000e+00 1.22464680e-16]


In [269]:
b = np.array([1, 2, 3, 4])
print("Array b:\n", b)

# Exponential
print("Exponential (e^b):", np.exp(b))

# Logarithms
print("Natural log (ln(b)):", np.log(b))
print("Log base 10:", np.log10(b))

# Power
print("b cubed:", np.power(b, 3))
c = np.array([1.234, 5.678, -9.1011])

# Round to nearest integer
print("Rounded:", np.round(c))

# Floor and ceiling
print("Floor:", np.floor(c))
print("Ceil:", np.ceil(c))

# Absolute values
print("Absolute values:", np.abs(c))
d = np.array([1, 2, 3, 4, 5])

# Sum/product
print("Sum:", np.sum(d))
print("Product:", np.prod(d))

Array b:
 [1 2 3 4]
Exponential (e^b): [ 2.71828183  7.3890561  20.08553692 54.59815003]
Natural log (ln(b)): [0.         0.69314718 1.09861229 1.38629436]
Log base 10: [0.         0.30103    0.47712125 0.60205999]
b cubed: [ 1  8 27 64]
Rounded: [ 1.  6. -9.]
Floor: [  1.   5. -10.]
Ceil: [ 2.  6. -9.]
Absolute values: [1.234  5.678  9.1011]
Sum: 15
Product: 120


---
## 2.8 Python lists vs NumPy arrays

Python lists are flexible but **slow and memory-inefficient** for numerical computations.
NumPy arrays (`ndarray`) store data in **contiguous memory** and support **vectorized operations**, making them much faster and smaller in memory footprint.

Let's compare both in terms of **execution speed** and **memory usage**.

In [270]:
import time, sys

# Create a large list and a NumPy array
n = 1_000_000
py_list = list(range(n))
np_array = np.arange(n)

# Compare memory usage of both objects
list_mem = sys.getsizeof(py_list) + sum(sys.getsizeof(x) for x in py_list)
array_mem = np_array.nbytes
print(f"Python list memory: {list_mem / 1e6:.2f} MB")
print(f"NumPy array memory: {array_mem / 1e6:.2f} MB")
print(f"Memory ratio (list / array): {list_mem / array_mem:.1f}×")

# Compute 2x Python list
list_start = time.time()
list_result = [x * 2 for x in py_list]
list_end = time.time()
print(f"Python list time: {list_end - list_start:.5f} s")

# Compute 2x NumPy array
array_start = time.time()
array_result = np_array * 2
array_end = time.time()
print(f"NumPy array time: {array_end - array_start:.5f} s")
print(f"Execution speed ratio (list / array): {(list_end - list_start) / (array_end - array_start):.1f}×")

Python list memory: 36.00 MB
NumPy array memory: 8.00 MB
Memory ratio (list / array): 4.5×
Python list time: 0.16065 s
NumPy array time: 0.00272 s
Execution speed ratio (list / array): 59.1×


**Observation:**

- NumPy operations are much faster because they run in optimized C loops rather than Python loops.
- NumPy arrays use far less memory because all elements share the same data type and they are stored in an efficient way

➡️ This demonstrates the two biggest advantages of NumPy:
1. **Vectorization** (no explicit loops)
2. **Efficient memory representation**

---
## 2.9  🏠 Homework: NumPy Arrays in Data Science

### Task Overview
In this assignment, you will practice working with **NumPy arrays** and **mathematical functions** to perform a mini data analysis. You will simulate a small part of a **data preprocessing pipeline** — a common step in data science when dealing with multivariate datasets.

### Your Task

1. **Generate synthetic data:**
   - Create a NumPy array `data` of shape **(100, 10)** — representing 100 samples and 10 features.
   - The values should be drawn from a **normal distribution** with mean = 50 and standard deviation = 10 using
      `np.random.normal(loc=50, scale=10, size=(100, 10))`.
   - Print the shape, data type, and the **first 5 rows** of the array.

2. **Data cleaning:**
   - Replace all values **smaller than 20** or **larger than 80** with `np.nan` (treat them as outliers).
   - Print how many `np.nan` values are now in the array.

3. **Handle missing values:**
   - Compute the **mean of each column** ignoring missing values (`np.nanmean`).
   - Replace all `np.nan` values in each column with that column’s mean.

4. **Data transformation:**
   - **Standardize each column** so that it has mean 0 and standard deviation 1.
   - Create a new array where:
     - all positive standardized values are replaced with their **square roots**,
     - negative values remain unchanged.
   - For the first 5 rows, also compute the **exponential (`np.exp`)** of all standardized values and print the result.

5. **Array indexing and logical operations:**
   - Compute the **75th percentile** for each column.
   - Create a Boolean mask that marks all values above the 75th percentile.
   - Print how many such “high” values there are in total.
   - Replace all values **below the 25th percentile** (computed column-wise) with the 25th percentile value (a simple form of *winsorization*).

6. **Descriptive statistics:**
   - Compute and print for the final cleaned dataset:
     - column-wise **mean**, **median**, **variance**, and **standard deviation**,
     - the **overall mean** of the entire array,
     - and the **minimum and maximum** values per column.

### ✍️ Hints
- Use functions such as `np.mean`, `np.std`, `np.nanmean`, `np.isnan`, `np.where`, `np.percentile`, `np.sqrt`, and `np.exp`.
- Remember to specify the `axis` argument when computing column-wise statistics (`axis=0`).
- Use **vectorized operations** — avoid `for` loops.
- Include **comments or docstrings** to make your code clear and readable.


---
## Your solution:

1)

In [271]:
data = np.random.normal(loc=50, scale=10, size=(100, 10))
print(data.shape)
print(data.dtype)
print(data[0:5, ...]) # prvnich 5 radku a vsechny sloupce

(100, 10)
float64
[[36.29543977 34.97541486 44.59006661 45.42461237 49.4605977  41.80652476
  53.26233069 42.795038   55.17836837 61.21586288]
 [39.02082205 38.6043903  56.72851612 32.85453846 49.98491235 43.69684724
  56.83938524 38.65578626 37.90439216 60.94913249]
 [43.81692859 60.52234852 35.14760106 51.35284737 55.58064634 50.21253952
  41.57756415 50.72006807 56.90530001 53.22422593]
 [45.62711093 58.32632988 48.28874019 59.25509057 44.73875543 45.68835642
  42.66056728 48.46166308 49.74603672 33.86300946]
 [32.10388166 35.06038246 66.07647355 45.16723698 40.85421321 55.46441102
  40.12744414 38.90745709 70.39448823 42.60369761]]


2)

In [272]:
data[data<20] = np.nan
print(data[np.isnan(data)].size)

1


3)

In [273]:
col_means = np.nanmean(data, axis=0) # prumery podle sloupce
indexes = np.where(np.isnan(data)) # pozice NaN
data[indexes] = col_means[indexes[1]] # Podle sloupce vyberu prumer v poradi v jakem jsou NaN a nahradim

4)

In [274]:
col_means_matrix = np.repeat(col_means, 100).reshape(10, 100).T # zreplikuje prumery, pretvori do 10x100 matice a transponuji, aby v col_means_matrix byl vzdy prumer daneho sloupce
data -= col_means_matrix # prvni cast standardizace odectenim prumeru

In [275]:
col_std = np.nanstd(data, axis=0) # smerodatne odchylky podle sloupce
col_std_matrix = np.repeat(col_std, 100).reshape(10, 100).T # smerodatne odchylky podle sloupce v kazdem poli
data /= col_std_matrix # druha cast standardizace vydelenim smer. odch.

In [276]:
new = np.copy(data)
new[new > 0] = new[new > 0]**2
print(np.exp(new[0:5, :]))

[[ 0.28949951  0.21687293  0.5985988   0.61654688  1.00631521  0.39333177
   1.4467266   0.46092012  1.3377434   3.65680725]
 [ 0.37214182  0.32868482  1.42692636  0.15355894  1.01658162  0.48910454
   2.43369104  0.29803669  0.30864925  3.45470538]
 [ 0.57893864  7.07119027  0.25254164  1.03000732  1.52596195  1.00129483
   0.61394513  1.00363698  1.65756887  1.15739126]
 [ 0.68402182  3.72679328  0.83936042  2.98535212  0.69696952  0.61533338
   0.67956097  0.83724673  1.00000001  0.23454681]
 [ 0.19675046  0.21899452  8.2017018   0.59924633  0.48515628  1.50896949
   0.53589788  0.30604322 66.88135134  0.5364147 ]]


5)

In [277]:
percentiles_75 = np.percentile(new, 75, axis=0) # 75. percentily po sloupcích
percentiles_75_matrix = np.repeat(percentiles_75, 100).reshape(10, 100).T # matice 75. percentilu na porovnani
percentiles_75_mask = new > percentiles_75_matrix # maska z porovnani matic po prvcich
print(new[percentiles_75_mask].size) # pocet vetsich prvku

250


In [278]:
percentiles_25 = np.percentile(new, 25, axis=0) # 25. percentily po sloupcích
percentiles_25_matrix = np.repeat(percentiles_25, 100).reshape(10, 100).T # matice 25. percentilu na porovnani
under_25_indexes = np.where(data < percentiles_25_matrix) # pozice mensich nez percentily
new[under_25_indexes] = percentiles_25[under_25_indexes[1]]  # vymena mensich za percentily (winsorization)

6)

In [279]:
print("Prumery:", np.mean(new, axis=0))
print("Mediany:", np.median(new, axis=0))
print("Rozptyly:", np.var(new, axis=0))
print("Smerodatne odchylky:", np.std(new, axis=0))

print("Celkovy prumer:", np.mean(new))
print("Maxima:", np.max(new, axis=0))
print("Minima:", np.min(new, axis=0))

Prumery: [0.23897883 0.24453393 0.24515912 0.32000726 0.25671795 0.23823437
 0.27938456 0.27702881 0.24555874 0.29037292]
Mediany: [ 0.00380076  0.00029581  0.00282808  0.00385935 -0.00028395 -0.0793069
 -0.17088556  0.00596677  0.00080908 -0.04605412]
Rozptyly: [1.29131059 1.26427716 1.55252953 2.47879992 1.78595474 1.19958318
 1.52494114 1.87013327 1.88646213 1.82786133]
Smerodatne odchylky: [1.13635848 1.1244008  1.24600543 1.5744205  1.33639617 1.09525485
 1.23488507 1.36752816 1.37348539 1.35198422]
Celkovy prumer: 0.2635976498325613
Maxima: [ 4.83542951  4.53061751  6.64390981 10.63318871  6.43100613  4.01816156
  5.8009349   8.22922322  8.23593749  7.3019693 ]
Minima: [-0.68548761 -0.66970405 -0.69994278 -0.51969804 -0.72467732 -0.67189596
 -0.62313681 -0.63358417 -0.75944617 -0.62110982]
