# The Basics of NumPy Arrays

In [None]:
import pandas
pandas.__version__

In [None]:
import numpy as np   # import NumPy under the alias np

Data manipulation in Python is nearly synonymous with NumPy array manipulation

- *Array creation*: NumPy is constrained to arrays that all contain the same type
- *Attributes of arrays*: Determining the size, shape, memory consumption, and data types of arrays
- *Indexing of arrays*: Getting and setting the value of individual array elements
- *Slicing of arrays*: Getting and setting smaller subarrays within a larger array
- *Reshaping of arrays*: Changing the shape of a given array
- *Joining and splitting of arrays*: Combining multiple arrays into one, and splitting one array into many

## NumPy Array Creation

In [None]:
import numpy as np
a = np.array([2,3,4])
a

In [None]:
print(a)

In [None]:
a.dtype

In [None]:
b = np.array([1.2, 3.5, 5.1])
b.dtype

In [None]:
np.zeros( (3,4) )

To create sequences of numbers, NumPy provides a function analogous to range that returns arrays instead of lists.

In [None]:
np.arange( 10, 30, 5 )    # (start, end, step)

In [None]:
c = np.arange(6)
print(c)

In [None]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

In [None]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

In [None]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))


## NumPy Array Attributes

In [None]:
import numpy as np
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

In [None]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

In [None]:
print("dtype:", x3.dtype)

In [None]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

## Array Indexing: Accessing Single Elements

In [None]:
x1

In [None]:
x1[0]

In [None]:
x1[4]    

In [None]:
x1[-1]

In [None]:
x1[-2]

In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:

In [None]:
x2

In [None]:
x2[0, 0]

In [None]:
x2[2, 0]

In [None]:
x2[2, -1]

Values can also be modified using any of the above index notation:

In [None]:
x2[0, 0] = 12
x2

In [None]:
x1[0] = 3.14  # this will be truncated!
x1

## Array Slicing: Accessing Subarrays

To access a slice of an array ``x``, use this: x[start:stop:step]

### One-dimensional subarrays

In [None]:
import numpy as np
x = np.arange(10)
x

In [None]:
x[:5]  # first five elements

In [None]:
x[5:]  # elements after index 5

In [None]:
x[4:7]  # middle sub-array

In [None]:
x[::2]  # every other element for odd-numbered elements

In [None]:
x[?:?:2]  # every other element, how do get even-numbered elements?

### Multi-dimensional subarrays

Multi-dimensional slices work in the same way, with multiple slices separated by commas.

In [None]:
x2

In [None]:
x2[:2, :3]  # two rows, three columns

In [None]:
x2[:3, ::2]  # all rows, every other column

#### Accessing array rows and columns

In [None]:
print(x2[:, 0])  # first column of x2

In [None]:
print(x2[0, :])  # first row of x2

In the case of row access, the empty slice can be omitted for a more compact syntax:

In [None]:
print(x2[0])  # equivalent to x2[0, :]

### Subarrays as no-copy views

In [None]:
print(x2)

Let's extract a $2 \times 2$ subarray from this:

In [None]:
x2_sub = x2[:2, :2]
print(x2_sub)

In [None]:
x2_sub[0, 0] = 99    # modify this subarray
print(x2_sub)

In [None]:
print(x2)    # original array is changed!

### Creating copies of arrays

 ``copy()`` 

In [None]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

In [None]:
x2_sub_copy[0, 0] = 42  # modify this array
print(x2_sub_copy)

In [None]:
print(x2)     # Is the original array the same or modified?

## Reshaping of Arrays

In [None]:
x4 = np.arange(1,10)
x4

In [None]:
grid = x4.reshape((3, 3))
print(grid)

In [None]:
x5 = np.array([[1, 2, 3],
               [4, 5, 6]])
x5.reshape((3, 2))    # row vector via reshape

In [None]:
x5.reshape((6, 1)) # column vector via reshape

## Array Concatenation and Splitting

### Array concatenation
 ``np.concatenate``, ``np.hstack``, and ``np.vstack``

In [None]:
# np.concatenate
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

In [None]:
z = [99, 99, 99]    
print(np.concatenate([x, y, z]))    # add more arrays

In [None]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [None]:
# concatenate along the first axis
np.concatenate([grid, grid])

In [None]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

For working with arrays of mixed dimensions, it can be clearer to use the ``np.vstack`` (vertical stack) and ``np.hstack`` (horizontal stack) functions:

In [None]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

In [None]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

### Splitting of arrays
 ``np.split``, ``np.hsplit``, and ``np.vsplit``

In [None]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

Notice that *N* split-points, leads to *N + 1* subarrays.
The related functions ``np.hsplit`` and ``np.vsplit`` are similar:

In [None]:
grid = np.arange(16).reshape((4, 4))
grid

In [None]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

In [None]:
left, right = np.hsplit(grid, [3])
print(left)
print(right)