<!--TABLE OF CONTENTS-->
Contents:
- [Numpy Basics](#Numpy-Basics)
  - [Highlights](#Highlights)
    - [ndarray](#ndarray)
      - [Shape and dtype](#Shape-and-dtype)
      - [Array creation](#Array-creation)
      - [Data Types](#Data-Types)
        - [Character Codes for Data Types](#Character-Codes-for-Data-Types)
      - [Arithmetic](#Arithmetic)
      - [Boolean Indexing](#Boolean-Indexing)
      - [Fancy Indexing](#Fancy-Indexing)

# Numpy Basics 1

## Highlights

- ndarray: efficient multidimensional array providing fast array-oriented arithmetic operations and flexible broadcasting capabilities.
- Mathematical operations applied to entire arrays without having to write loops.
- Tools for readding / writing array data to disk and working with memory-maped files.

In [3]:
import numpy as np

# a little helper
def print_n(*args):
    print('\n')
    print(args)
    
# Fast demo
my_arr = np.arange(1_000_000)
my_list = list(range(1_000_000))

# Check execution time
%timeit my_list2 = my_arr * 2 # 2.1 ms ± 50.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit my_list2 = [x * 2 for x in my_list] # 87.2 ms ± 611 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


2.26 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
87.2 ms ± 419 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


### ndarray

An `ndarray` in NumPy (short for "N-dimensional array") is a powerful and versatile data structure that is used for efficient storage and manipulation of large arrays and matrices of numerical data. Here are its key features:

1. **N-Dimensional**: `ndarray` can have any number of dimensions (e.g., 1D, 2D, 3D, etc.), making it flexible for various types of data representations.

2. **Homogeneous Data Type**: All elements in an `ndarray` are of the same data type, ensuring efficient memory usage and performance.

3. **Shape and Size**: The shape of an `ndarray` is a tuple that specifies the size of the array along each dimension. The size is the total number of elements in the array.

4. **Indexing and Slicing**: NumPy provides powerful indexing and slicing capabilities, allowing for efficient access and manipulation of array subsets.

5. **Broadcasting**: NumPy supports broadcasting, a mechanism that allows arithmetic operations on arrays of different shapes in a way that would not be possible otherwise.

6. **Vectorized Operations**: Arithmetic operations and mathematical functions in NumPy are optimized to work on entire arrays, providing significant performance improvements over standard Python loops.

7. **Memory Efficiency**: `ndarray` is designed for efficient memory use, often consuming less memory than equivalent Python lists due to its fixed data type and contiguous memory layout.

8. **Integration with Other Libraries**: NumPy's `ndarray` serves as the foundation for many other scientific computing libraries in Python, such as SciPy, Pandas, and scikit-learn.
    

In [4]:
data = np.array([[1.5,-0.1,3], [0,-3,6.5]])
print_n(data)
print_n(data*10)
print_n(data+data)



(array([[ 1.5, -0.1,  3. ],
       [ 0. , -3. ,  6.5]]),)


(array([[ 15.,  -1.,  30.],
       [  0., -30.,  65.]]),)


(array([[ 3. , -0.2,  6. ],
       [ 0. , -6. , 13. ]]),)


#### Shape and dtype

In [5]:
print_n(data.shape)
print_n(data.dtype)



((2, 3),)


(dtype('float64'),)


#### Array creation

In [6]:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
print_n(arr1)



(array([6. , 7.5, 8. , 0. , 1. ]),)


In [7]:
data2 = [[1,3,5,7],[2,4,6,8]]
arr2 = np.array(data2)

# inferr dimensions
print_n(arr2.ndim)
# shape
print_n(arr2.shape)



(2,)


((2, 4),)


In [8]:
print_n(np.zeros(10))
print_n(np.zeros((3,4)))



(array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),)


(array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]]),)


In [9]:
print_n(np.ones(5))
print_n(np.ones((5,5)))



(array([1., 1., 1., 1., 1.]),)


(array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]]),)


In [10]:
# garbage filled, C style
print_n(np.empty(8))
print_n(np.empty((2,5)))



(array([5.33503207e-310, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
       0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000]),)


(array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]]),)


Some important array creation functions:

   - `numpy.array(object)`: Converts a list, tuple, or other array-like objects into a NumPy array.
   - `numpy.asarray(object)`: Similar to `array`, but does not copy the data if the input is already a NumPy array.

   - `numpy.zeros(shape)`: Creates an array filled with zeros. `shape` can be a tuple specifying the dimensions.
   - `numpy.ones(shape)`: Creates an array filled with ones.
   - `numpy.full(shape, fill_value)`: Creates an array filled with a specified value.

   - `numpy.eye(N)`: Creates a 2D identity matrix with ones on the diagonal and zeros elsewhere.
   - `numpy.identity(n)`: Creates an identity matrix of size `n x n`.
   - `numpy.diag(v)`: Creates a 2D array with the elements of `v` on the diagonal and zeros elsewhere.

   - `numpy.arange(start, stop, step)`: Creates an array with evenly spaced values within a given range.
   - `numpy.linspace(start, stop, num)`: Creates an array with `num` evenly spaced values between `start` and `stop`.
   - `numpy.logspace(start, stop, num)`: Creates an array with `num` logarithmically spaced values between `10^start` and `10^stop`.

   - `numpy.random.rand(d0, d1, ..., dn)`: Creates an array of the given shape and populates it with random samples from a uniform distribution over [0, 1).
   - `numpy.random.randn(d0, d1, ..., dn)`: Creates an array of the given shape and populates it with random samples from a standard normal distribution.
   - `numpy.random.randint(low, high, size)`: Creates an array of random integers from `low` (inclusive) to `high` (exclusive).

   - `numpy.empty(shape)`: Creates an array without initializing the entries.
   - `numpy.empty_like(prototype)`: Creates an array with the same shape and type as a given array, without initializing the entries.

   - `numpy.zeros_like(prototype)`: Creates an array of zeros with the same shape and type as a given array.
   - `numpy.ones_like(prototype)`: Creates an array of ones with the same shape and type as a given array.
   - `numpy.full_like(prototype, fill_value)`: Creates an array with the same shape and type as a given array, filled with a specified value.

#### Data Types

In [11]:
arr1 = np.array([1,2,3], dtype=np.float64)
print_n(arr1.dtype)
arr2 = np.array([1,2,3], dtype=np.int32)
print_n(arr2.dtype)



(dtype('float64'),)


(dtype('int32'),)


NumPy provides a wide range of data types to accommodate different kinds of numerical and non-numerical data. Here are the main NumPy data types:

- `bool_`: Boolean (True or False) stored as a byte.

- `int_`: Default integer type (equivalent to `int64` or `int32` depending on the platform).
- `int8`: 8-bit (1 byte) signed integer.
- `int16`: 16-bit (2 bytes) signed integer.
- `int32`: 32-bit (4 bytes) signed integer.
- `int64`: 64-bit (8 bytes) signed integer.
- `uint8`: 8-bit (1 byte) unsigned integer.
- `uint16`: 16-bit (2 bytes) unsigned integer.
- `uint32`: 32-bit (4 bytes) unsigned integer.
- `uint64`: 64-bit (8 bytes) unsigned integer.

- `float_`: Default floating-point type (equivalent to `float64`).
- `float16`: Half precision floating-point (16 bits).
- `float32`: Single precision floating-point (32 bits).
- `float64`: Double precision floating-point (64 bits).

- `complex_`: Default complex type (equivalent to `complex128`).
- `complex64`: Complex number represented by two 32-bit floats (real and imaginary parts).
- `complex128`: Complex number represented by two 64-bit floats (real and imaginary parts).

- `str_`: Fixed-length string type.
- `unicode_`: Fixed-length Unicode type.

- `bytes_`: Fixed-length byte type.

- `object_`: Python object type.

- `datetime64`: Date and time with various units (e.g., `ns`, `us`, `ms`, `s`, `m`, `h`, `D`, `W`, `M`, `Y`).
- `timedelta64`: Differences between two `datetime64` objects with various units.

- `void`: Type for data of unknown size or shape, often used in record arrays.

- `int`: Same as `int_` (platform-dependent).
- `float`: Same as `float_`.
- `complex`: Same as `complex_`.

##### Character Codes for Data Types
NumPy also supports character codes for specifying data types:
- `'b'`: boolean.
- `'i'`: (signed) integer.
- `'u'`: unsigned integer.
- `'f'`: floating-point.
- `'c'`: complex floating-point.
- `'m'`: timedelta.
- `'M'`: datetime.
- `'O'`: object.
- `'S'`: (byte-)string.
- `'U'`: Unicode.
- `'V'`: void.

These data types allow NumPy to efficiently handle a wide variety of numerical and other types of data, providing flexibility and performance for scientific computing.

In [12]:
#### Casting
arr = np.array([1,2,3,4,5])
print_n(arr.dtype)
float_arr = arr.astype(np.float64)
print_n(float_arr.dtype)



(dtype('int64'),)


(dtype('float64'),)


#### Arithmetic

In [13]:
# Vectorizations are batch operations over arrays without the need to write loops
arr = np.array([[1,3,5],[2,4,6]])
print_n(arr)
print_n(arr*arr)
print_n(arr-arr)
print_n(arr+arr)
print_n(arr/2.5)
print_n(1/arr)
print_n(arr**3)
arr2 = np.array([[10,12,14],[11,13,15]])
print_n(arr2 > arr)




(array([[1, 3, 5],
       [2, 4, 6]]),)


(array([[ 1,  9, 25],
       [ 4, 16, 36]]),)


(array([[0, 0, 0],
       [0, 0, 0]]),)


(array([[ 2,  6, 10],
       [ 4,  8, 12]]),)


(array([[0.4, 1.2, 2. ],
       [0.8, 1.6, 2.4]]),)


(array([[1.        , 0.33333333, 0.2       ],
       [0.5       , 0.25      , 0.16666667]]),)


(array([[  1,  27, 125],
       [  8,  64, 216]]),)


(array([[ True,  True,  True],
       [ True,  True,  True]]),)


In [14]:
#### Basic Indexing and Slicing

In [15]:
arr = np.arange(10)
print_n(arr)
print_n(arr[3])
print_n(arr[:5])
print_n(arr[2:7])
# Unless python std lists, np array ARE mutable
arr[2:7] = 99
print_n(arr)

# Unless python std lists, np array ARE mutable
arr_slice = arr[2:4]
print_n(arr_slice)
arr_slice[:] = 77
print_n(arr_slice)



(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),)


(3,)


(array([0, 1, 2, 3, 4]),)


(array([2, 3, 4, 5, 6]),)


(array([ 0,  1, 99, 99, 99, 99, 99,  7,  8,  9]),)


(array([99, 99]),)


(array([77, 77]),)


In [16]:
arr2d = np.array([[1,3,5],[2,4,6]])
print_n(arr2d[1])
# These are equivalent
print_n(arr2d[0][1])
print_n(arr2d[0,1])

new_arr = arr2d.copy()
print_n(new_arr)
# or slice it
new_arr = arr2d[0].copy()
print_n(new_arr)

new_arr = arr2d[0,1].copy()
print_n(new_arr)



(array([2, 4, 6]),)


(3,)


(3,)


(array([[1, 3, 5],
       [2, 4, 6]]),)


(array([1, 3, 5]),)


(3,)


In [17]:
arr = np.array([[1,3,5],[2,4,6],[9,9,9]])
print_n(arr[2])
print_n(arr[1:])
# or a combination
print_n(arr[:1, 2:])




(array([9, 9, 9]),)


(array([[2, 4, 6],
       [9, 9, 9]]),)


(array([[5]]),)


!(Slicing)[img/slicing.jpeg]

#### Boolean Indexing

In [18]:
names = np.array(['Alice', 'Bob', 'Carl', 'Dora','Alice', 'Bob',])
age_weight = np.array([[20,80],[15,55],[90,60],[35,120],[5,30],[1,8]])
print_n(names, age_weight)

print_n(names == 'Bob')
#Pass all ocurrences of 'Bob' to find age and weight
print_n(age_weight[names == 'Bob'])

print_n(age_weight[names != 'Bob'])
# same thing
print_n(age_weight[~(names == 'Bob')])



(array(['Alice', 'Bob', 'Carl', 'Dora', 'Alice', 'Bob'], dtype='<U5'), array([[ 20,  80],
       [ 15,  55],
       [ 90,  60],
       [ 35, 120],
       [  5,  30],
       [  1,   8]]))


(array([False,  True, False, False, False,  True]),)


(array([[15, 55],
       [ 1,  8]]),)


(array([[ 20,  80],
       [ 90,  60],
       [ 35, 120],
       [  5,  30]]),)


(array([[ 20,  80],
       [ 90,  60],
       [ 35, 120],
       [  5,  30]]),)


In [19]:
mask = (names=='Bob') | (names=='Alice')
print_n(age_weight[mask])

mask = (names=='Bob') & (names=='Carl')
print_n(age_weight[mask])




(array([[20, 80],
       [15, 55],
       [ 5, 30],
       [ 1,  8]]),)


(array([], shape=(0, 2), dtype=int64),)


#### Fancy Indexing

In [20]:
arr = np.zeros((8,4))

for i in range(8):
    arr[i] = i

print_n(arr)

# Select elements by passing the desired order
print_n(arr[[2,4,6]])



(array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]]),)


(array([[2., 2., 2., 2.],
       [4., 4., 4., 4.],
       [6., 6., 6., 6.]]),)
