### 2.0: Intro to NumPy

NumPy provides an efficient interface to store and operate on dense data buffers. NumPy arrays are similar to Python's built in ``list`` type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size. NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python!

In [2]:
import numpy
numpy.__version__

'1.24.4'

In [3]:
import numpy as np

#

### 2.1: Understanding Data Types in Python

Types are dynamically inferred.

Variables are more than just their value -- they also contain info about the type of the value. 

A single integer actually contains 4 pieces:
- A reference count that helps Python silently handle memory allocation and deallocation
- Variable type encoding 
- The size of the data members
- The actual integer value that we expect theo variable to represent

Can create heterogeneous lists:

In [1]:
L3 = [True, "2", 3.0, 4]
[type(item) for item in L3]

[bool, str, float, int]

However, doing this comes at a cost: to allow these flexible types, each item in the list must contain its own type info, reference count, and other info. It is much more efficient to store data in fixed-type arrays.

#### Fixed-type arrays

There are a few different ways to store data in efficient, fixed-type data buffers. The built in `array` module can be used to create dense arrays of a uniform type.

In [2]:
import array
L = list(range(10))
A = array.array('i', L)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

`ndarray` is a much more powerful option and is a part of the NumPy package. While `array` provides efficient storage, NumPy adds the operations side. 

In [2]:
import numpy as np

np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

Unlike lists, NumyPy is contrained to arrays that all contain the same type. if the types are not the same, numpy will upcast if possible.

Can explicitly set the data type of the array using `dtype`:

In [3]:
np.array([1, 2, 3, 4], dtype = 'float32')

array([1., 2., 3., 4.], dtype=float32)

#### Creating arrays from scratch

In [4]:
np.zeros(10, dtype = int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [5]:
np.ones((3,5), dtype = float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [6]:
np.full((3,5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [7]:
# Create an array filled with a linear sequence
# start, end, step
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [14]:
# Create an array of 5 values evenly spaced between 2 numbers (0 and 1 here)
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [15]:
# Create a nxn array of uniformly distributed random values between 0 and 1
np.random.random((3,3))

array([[0.54048865, 0.0197953 , 0.52745082],
       [0.99245737, 0.27141964, 0.61896793],
       [0.26124452, 0.61825824, 0.91093759]])

In [16]:
# Create nxn identity matrix
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

### NumPy Basics

Let's start with defining three random arrays that are 1D, 2D and 3D respectively. Then let's explore NumPy's random number generator:

In [20]:
# Gotta set the seed
np.random.seed(0)

x1 = np.random.randint(10, size = 6) # One-dimensional array
x2 = np.random.randint(10, size = (3,4)) # Two-dimenstional array
x3 = np.random.randint(10, size = (3, 4, 5)) # Three-dimenstional array

Each array has:
- `ndim`: the number of dimensions
- `shape`: the size of each dimension
- `size`: the total size of the array

In [21]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


Can also observe the `itemsize` which lists the size (in bytes) of each array element, and `nbytes` lists the total size (in bytes) of the array:

In [22]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

itemsize: 8 bytes
nbytes: 480 bytes


#### Array indexing

In [23]:
x1

array([5, 0, 3, 3, 7, 9])

In [28]:
x1[0]
x1[-1]
x2[2,0]
x2[2,-1]

# Values can be modified using index notation
x2[0,0] = 12
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

#### Accessing subarrays

In [29]:
x = np.arange(10)

# Gets first 5 elements
x[:5]

# Prints elements 4 through 6
x[4:7]

# Prints every other element
x[::2]

# Prints every other element starting at 1
x[1::2]

# Reverses the array
x[::-1]

# Reversed every other from index 5
x[5::-2]

array([0, 1, 2, 3, 4])

#### Multi-dimensional arrays

Slicing arrays works the same as for 1D arrays

In [30]:
# Gets first two rows and first three columns
x2[:2, :3]

# Gets all rows, every otehr column
x2[:3, ::2]

# Dimensions can be reversed
x2[::-1, ::-1]

array([[12,  5,  2],
       [ 7,  6,  8]])

**Array slices return views rater than copies of the array**. Let's consider an 

example: 

In [31]:
x2_sub = x2[:2, :2]
print(x2_sub)

[[12  5]
 [ 7  6]]


In [32]:
x2_sub[0, 0] = 99
print(x2_sub)
print(x2)

[[99  5]
 [ 7  6]]
[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


This default behavior is useful! It means that when we work with large datasets, we can access and process pieces of these big datasets without the need to ocpy the underlying data.

#### Creating copies of arrays

In [33]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

# Note that updating the copy will not change the original

[[99  5]
 [ 7  6]]


#### Reshaping arrays

We can use the `reshape` method. The size of the intial array must match the size of the reshaped array.

In [34]:
grid = np.arange(1, 10).reshape((3,3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix. This can be done with the `newaxis` keyword within a slice operator.

In [35]:
x = np.array([1, 2, 3])

# row vertor via reshape
x.reshape((1,3))

# row vector via newaxis
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

#### Concatenation and splitting

In [36]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

In [38]:
# Can concatenate more than two arrays at once
z = [99, 99, 99]
print(np.concatenate([x, y, z]))

# Or on 2D arrays
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

# concatenate along the first axis
np.concatenate([grid, grid])

[ 1  2  3  3  2  1 99 99 99]


array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [39]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

For working with arrays of mixed dimensions, it can be clearer to use the `np.vstack` (vertical stack) and `np.hstack` (horizontal stack) functions:

In [40]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [41]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

#### Splitting arrays
Splitting can be implemented with the functions `np.split`, `np.hsplit`, and `np.vsplit`. For each of these, we can pass a list of indices giving the split points:

In [42]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


Notice that N split-points leads to N+1 subarrays. The other split functions act similarly:

In [43]:
grid = np.arange(16).reshape((4, 4))
grid

upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [44]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]
