In [3]:
import numpy as np

# Numpy Array Basics for Data Science

>In these notes the main focus is concepts rather than methods and functions. Given methods will be explained if needed but the order to follow will be to understand why numpy is useful in data science projects.

 To create elements with different dimmensions, like scalars, vectors, matrices, tensors, or arrays in general, we can use the method
- `np.array([element])`

**There are 6 general mechanisms for creating arrays**:

- Conversion from other Python structures (e.g., lists and tuples).
- Intrinsic NumPy functions for array creation (e.g., arange, ones, zeros, etc.)
- Replication, union or mutation of existing arrays
- Reading arrays from disk, either from standard or custom formats
- Creating arrays from raw bytes using strings or buffers.
- Using special library functions (e.g. random)

You can use these methods to create ndarrays or structured arrays.

## Types of arrays accordig to its dimensions

### Scalars

Scalars are element with dimension of 0.

In [4]:
escalar = np.array(42)
print(f'Type: {type(escalar)}, Value: {escalar}, Dimension: {escalar.ndim}')

Type: <class 'numpy.ndarray'>, Value: 42, Dimension: 0


### Vectors

Vectors or lists are objects with dimension 1.

In [5]:
vector = np.array([23, 28, 27, 23, 22, 21, 20])
print(f'Type: {type(vector)}, Value: {vector}, Dimension: {vector.ndim}')

Type: <class 'numpy.ndarray'>, Value: [23 28 27 23 22 21 20], Dimension: 1


### Matrices

Matrices are objects with dimension 2.

In [6]:
matrix = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(f'Matrix: \n\n{matrix}, Dimension: {matrix.ndim}')

Matrix: 

[[1 2 3]
 [4 5 6]
 [7 8 9]], Dimension: 2


### Tensors

Tensors are mathematical objects with general dimensions, for example, an scalar is a tensor with dimension 0. In data science a tensor is generally an object with dimension 3 or greater.

In [7]:
tensor = np.array([[[255,255,255], [0,0,0], [255,255,255]], [[255,255,255], [0,0,0], [255,255,255]], [[255,255,255], [0,0,0], [255,255,255]]])
print(f'Tensor: \n\n{tensor}, Dimension: {tensor.ndim}')

Tensor: 

[[[255 255 255]
  [  0   0   0]
  [255 255 255]]

 [[255 255 255]
  [  0   0   0]
  [255 255 255]]

 [[255 255 255]
  [  0   0   0]
  [255 255 255]]], Dimension: 3


## Special Matrices and Arrays in Numpy

### `np.arange(n)`

In [8]:
arr_arange = np.arange(10)
print(arr_arange)

[0 1 2 3 4 5 6 7 8 9]


### `np.eye(n)`

In [9]:
matrix_eye = np.eye(4)
print(matrix_eye)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


### `np.diag(<list_in_diag>)`

In [10]:
matrix_diag = np.diag([1,1,1,1])
matrix_diag

array([[1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1]])

### `np.random.random((<rows>, <cols>))`

In [11]:
# Generate a matrix of 3x2 with random float numbers among 0 and 1
matrix_rand = np.random.random((3,2))
matrix_rand

array([[0.16579601, 0.38448524],
       [0.50433786, 0.55292126],
       [0.25164809, 0.62002328]])

In [12]:
# Generate a 3x2 matrix with random integers among 5 and 10
matrix_randint = np.random.randint(5,10,(3,2))
matrix_randint

array([[5, 8],
       [8, 6],
       [7, 6]], dtype=int32)

## Basic arithmetics with numpy arrays

In [13]:
v = np.array([1,2,3])
w = np.array([10,20,30])

In [14]:
print('Basic Arithmetic')
print(f'Addition: {v + w}')
print(f'Substraction: {v - w}')
print(f'Product (element wise): {v * w}')
print(f'Division (element wise): {v / w}')
print(f'Square root: {np.sqrt(v)}')
print(f'exponent = {np.exp(v)}')
print(f'logarithm = {np.log(v)}')

print('\nBasic Statistics')
print(f'mean = {np.mean(v)}')
print(f'median = {np.median(v)}')
print(f'variance = {np.var(v)}')
print(f'standar deviation = {np.std(v)}')

Basic Arithmetic
Addition: [11 22 33]
Substraction: [ -9 -18 -27]
Product (element wise): [10 40 90]
Division (element wise): [0.1 0.1 0.1]
Square root: [1.         1.41421356 1.73205081]
exponent = [ 2.71828183  7.3890561  20.08553692]
logarithm = [0.         0.69314718 1.09861229]

Basic Statistics
mean = 2.0
median = 2.0
variance = 0.6666666666666666
standar deviation = 0.816496580927726


## Numpy arrays performance

Arrays in NumPy are used because they provide an efficient and flexible structure for handling large volumes of numerical data. Here are some of the reasons:

- Efficiency: NumPy arrays are faster and use less memory than Python lists. This is because NumPy is implemented in C and performs mathematical operations very efficiently.
- Vectorized Operations: NumPy allows you to perform mathematical and logical operations on integer arrays without the need for explicit loops. This is known as vectorization and results in cleaner and faster code.
- Extended Functionality: NumPy provides a wide range of mathematical and statistical functions that can be applied directly to arrays. In addition, it facilitates data manipulation, such as reshaping, slicing, and indexing arrays.
- Compatibility: NumPy is the foundation for many other scientific and data analysis packages in Python, such as SciPy, Pandas, and scikit-learn. Using NumPy makes it easy to integrate and use these libraries.

In [15]:
a = np.array([[1,2,3],[4,5,6]])
a

array([[1, 2, 3],
       [4, 5, 6]])

In [17]:
print(f'Dimension: {a.ndim}')
print(f'Shape: {a.shape}')
print(f'Data type: {a.dtype}')

Dimension: 2
Shape: (2, 3)
Data type: int64


### `dtype`

Es un metodo de numpy que nos sirve para saber y para especificar el tipo de dato que estamos usando o que queremos definir.

In [19]:
b = np.array(3, dtype=np.uint8)
print(f'Value: {b}, dtype: {b.dtype}')

Value: 3, dtype: uint8


Convert an array to doubles

In [25]:
d_array = np.array(a, dtype='d')
print(f'Value: \n{d_array}, \n\ndtype: {d_array.dtype}')

Value: 
[[1. 2. 3.]
 [4. 5. 6.]], 

dtype: float64


Cast to another `dtype` using `astype()`

In [29]:
c = a.astype(np.float32)
print(f'Value: \n{c}, dtype: {c.dtype}')

Value: 
[[1. 2. 3.]
 [4. 5. 6.]], dtype: float32
