# 7. Data Preprocessing with NumPy

[## 3. NumPy Fundamentals](#3-numpy-fundamentals)

[## 5. Generating Data with NumPy](#5-generating-data-with-numpy).

## 0. Import Libraries and set np print options

In [14]:
import numpy as np
np.__version__
# np.set_printoptions(suppress=True, linewidth=100, precision=2)
# check 'whith np.printoptions ...' and np.get_printoptions.

'1.26.4'

## 1. Introduction to NumPy

#### 1/2. NumPy arrays allow for element-wise operations, leading to more efficient computations.
#### 2/2. NumPy is suitable for data analysis because it is not only stable but also offers shorter computation times due to its low-level language roots. > True <

In [15]:
# NumPy arrays allow for element-wise operations, leading to more efficient computations.

display(A := np.arange(1,7).reshape(2,3))
np.power(A, 2)

array([[1, 2, 3],
       [4, 5, 6]])

array([[ 1,  4,  9],
       [16, 25, 36]], dtype=int32)

#### 1/1. A 0-dimensional array is equivalent to a scalar value. | A 1-dimensional array can be thought of as a vector. | A 2-dimensional array in NumPy is similar to a matrix in linear algebra.
#### 2/2.
- A scalar is a single data point in NumPy, also known as a 0D array.
- A vector is a sequence of values, representing a 1D array.
- A 2D array in NumPy is equivalent to a matrix, consisting of rows and columns.

In [16]:
display(array_0D := np.array(5))
print(array_0D.shape, ' - ', array_0D.ndim)

display(array_1D := np.array([5,4,3]))
print(array_1D.shape, ' - ', array_1D.ndim)

display(array_2D := np.array([[5,4,3], [9,8,7]]))
print(array_2D.shape, ' - ', array_2D.ndim)

array(5)

()  -  0


array([5, 4, 3])

(3,)  -  1


array([[5, 4, 3],
       [9, 8, 7]])

(2, 3)  -  2


#### 1/1. Press Shift + Tab after typing the function name -> check doc in Jupyter

## 2. Why do we use NumPy

#### 1/1. Why was NumPy created?: There was a need for a unified array type.

#### 1/2. Lists concatenate elements, while arrays perform element-wise addition.
#### 2/2. NumPy arrays have a .shape attribute, while Python lists do not <> True <>

In [17]:
# Lists vs np.ndarray
display(lst := [[1,2,3],[4,5,6]])
display(arr := np.array(lst))

display(lst_plus_lst := lst + lst)
display(arr + arr)                  # idem np.add(arr, arr)

arr.shape
# lst.shape       # AttributeError: 'list' object has no attribute 'shape'

[[1, 2, 3], [4, 5, 6]]

array([[1, 2, 3],
       [4, 5, 6]])

[[1, 2, 3], [4, 5, 6], [1, 2, 3], [4, 5, 6]]

array([[ 2,  4,  6],
       [ 8, 10, 12]])

(2, 3)

## 3. NumPy Fundamentals

#### 1/4 - element Indexing

In [18]:
array_a = np.array([[1, 2, 3], [4, 5, 6]])
array_a[1, 2]                              # jm -> 6 np.int

6

#### 2/4. Negative indices in NumPy will result in IndexOutOfBounds error. <> FALSE <>

#### 3/4. In NumPy, indexing starts at 0 for both rows and columns.
- To retrieve an entire column in a 2D array, use a colon (:)in place of the row index. 

#### 4/4. NumPy uses zero-based indexing.
- NumPy allows negative indices.
- The shape of an array affects the number of indices you need to provide.

In [19]:
# Some jm-practice in indexing
display(jm := np.arange(20).reshape(4,5))
display(jm[[0,3],[4,1]])
print(jm[0,4], ' - ',jm[3,1])

display(jm[:3,3:0:-1])      # INVERT col order
display(jm[:3,1:4])         # Same vals not inverted

# ixs = np.argwhere(jm % 2 != 0)
# rows, cols = ixs[:,0], ixs[:,1]
# jm[rows, cols]

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

array([ 4, 16])

4  -  16


array([[ 3,  2,  1],
       [ 8,  7,  6],
       [13, 12, 11]])

array([[ 1,  2,  3],
       [ 6,  7,  8],
       [11, 12, 13]])

#### 1/2. dtype Numeric ex. np.int32, np.float64 - Non-numeric: np.str, np.bool
#### 2/2. If an array is created in NumPy without explicitly specifying the data type, NumPy automatically assigns the smallest possible data type that can store the values in the array.

#### 1/3. Elementwise multiplication between two NumPy arrays always requires the arrays to have the same dimensions. <> FALSE <>
#### 2/3. TypeCasting - dtype
#### 3/3. In NumPy, typecasting is performed only on the output of an operation (e.g. np.add()), not the inputs themselves. <> FALSE <>

## 4. Working with Arrays

- 3/8. In stepwise slicing, using a step of 0 will result in an error because it means no movement within the array.
- 4/8. Conditional slicing in NumPy returns a one-dimensional array, even if the original array is multi-dimensional
- 6/8. Either greater than 4 or are even? C[(C > 4) | (C % 2 == 0)]
- 8/8. The NumPy squeeze() function removes all dimensions of size 1 from an array.

## 5. Generating Data with NumPy

## 6. Importing and Saving Data with Numpy

## 7. Statistics with NumPy

## 8. Data Manipulation with NumPy