*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*

# Introduction to NumPy

Datasets can come from a wide range of sources and a wide range of formats, including be collections of documents, collections of images, collections of sound clips, collections of numerical measurements, or nearly anything else.
Despite this apparent heterogeneity, it will help us to think of all data fundamentally as arrays of numbers.

For example, images–particularly digital images–can be thought of as simply two-dimensional arrays of numbers representing pixel brightness across the area.
Sound clips can be thought of as one-dimensional arrays of intensity versus time.
Text can be converted in various ways into numerical representations, perhaps binary digits representing the frequency of certain words or pairs of words.
No matter what the data are, the first step in making it analyzable will be to transform them into arrays of numbers.

NumPy (short for *Numerical Python*) provides an efficient interface to store and operate on dense data buffers.
In some ways, NumPy arrays are like Python's built-in ``list`` type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size.
NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python, so time spent learning to use NumPy effectively will be valuable no matter what aspect of data science interests you.

If you followed the advice outlined in the Preface and installed the Anaconda stack, you already have NumPy installed and ready to go.
If you're more the do-it-yourself type, you can go to http://www.numpy.org/ and follow the installation instructions found there.
Once you do, you can import NumPy and double-check the version:

In [1]:
import numpy
numpy.__version__

'2.0.2'

For the pieces of the package discussed here, I'd recommend NumPy version 1.8 or later.
By convention, you'll find that most people in the SciPy/PyData world will import NumPy using ``np`` as an alias:

In [2]:
import numpy as np

# The Basics of NumPy Arrays

## NumPy Array Attributes

First let's discuss some useful array attributes.
We'll start by defining random arrays of one, two, and three dimensions.
We'll use NumPy's random number generator, which we will *seed* with a set value in order to ensure that the same random arrays are generated each time this code is run:

In [3]:
import numpy as np
rng = np.random.default_rng(seed=1701)  # use a fixed seed for reproducibility

x1 = rng.integers(10, size=6)  # One-dimensional array
x2 = rng.integers(10, size=(3, 4))  # Two-dimensional array
x3 = rng.integers(10, size=(3, 4, 5))  # Three-dimensional array

display(x1,x2,x3)

array([9, 4, 0, 3, 8, 6])

array([[3, 1, 3, 7],
       [4, 0, 2, 3],
       [0, 0, 6, 9]])

array([[[4, 3, 5, 5, 0],
        [8, 3, 5, 2, 2],
        [1, 8, 8, 5, 3],
        [0, 0, 8, 5, 8]],

       [[5, 1, 6, 2, 3],
        [1, 2, 5, 6, 2],
        [5, 2, 7, 9, 3],
        [5, 6, 0, 2, 0]],

       [[2, 9, 4, 3, 9],
        [9, 2, 2, 4, 0],
        [0, 3, 0, 0, 2],
        [3, 2, 7, 4, 7]]])

Each array has attributes including ``ndim`` (the number of dimensions), ``shape`` (the size of each dimension), and ``size`` (the total size of the array), and `dtype` (the type of each element);

In [4]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
print("dtype:   ", x3.dtype)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60
dtype:    int64


For more discussion of `dtype`, see [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb)):

## Array Indexing: Accessing Single Elements

If you are familiar with Python's standard list indexing, indexing in NumPy will feel quite familiar.
In a one-dimensional array, the $i^{th}$ value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists:

In [None]:
x1

In [None]:
x1[0]

In [None]:
x1[4]

To index from the end of the array, you can use negative indices:

In [None]:
x1[-1]

In [None]:
x1[-2]

In a multi-dimensional array, items can be accessed using a comma-separated `(row, column)` tuple:

In [None]:
x2

In [None]:
x2[0, 0]

In [None]:
x2[2, 0]

In [None]:
x2[2, -1]

Values can also be modified using any of the above index notation:

In [None]:
x2[0, 0] = 12
x2

Keep in mind that, unlike Python lists, NumPy arrays have a fixed type.
This means, for example, that if you attempt to insert a floating-point value to an integer array, the value will be silently truncated. Don't be caught unaware by this behavior!

In [None]:
x1[0] = 3.14159  # this will be truncated!
x1

## Array Slicing: Accessing Subarrays

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation, marked by the colon (``:``) character.
The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array ``x``, use this:
``` python
x[start:stop:step]
```
If any of these are unspecified, they default to the values ``start=0``, ``stop=``*``size of dimension``*, ``step=1``.
We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.

### One-dimensional subarrays

In [None]:
x1

In [None]:
x1[:3]  # first three elements

In [None]:
x1[3:]  # elements after index 3

In [None]:
x1[1:4]  # middle sub-array from index 1 to the forth element

In [None]:
x1[::2]  # every other element

In [None]:
x1[1::2]  # every other element, starting at index 1

A potentially confusing case is when the ``step`` value is negative.
In this case, the defaults for ``start`` and ``stop`` are swapped.
This becomes a convenient way to reverse an array:

In [None]:
x1[::-1]  # all elements, reversed

In [None]:
x1[4::-2]  # reversed every other from index 4

<font face="verdana" style="font-size:30px" color="red">Your turn</font>

Get sub-array of index 2,3,4

### Multi-dimensional subarrays

Multi-dimensional slices work in the same way, with multiple slices separated by commas.
For example:

In [None]:
x2

In [None]:
x2[:2, :3]  # first two rows & three columns

In [None]:
x2[:3, ::2]  # three rows, every other column

In [None]:
x2[::-1, ::-1]  # all rows & columns, reversed

#### Accessing array rows and columns

One commonly needed routine is accessing of single rows or columns of an array.
This can be done by combining indexing and slicing, using an empty slice marked by a single colon (``:``):

In [None]:
x2[:, 0]  # first column of x2

In [None]:
x2[0, :]  # first row of x2

In the case of row access, the empty slice can be omitted for a more compact syntax:

In [None]:
x2[0]  # equivalent to x2[0, :]

## Reshaping of Arrays

Another useful type of operation is reshaping of arrays, which can be done with the ``reshape`` method.
For example, if you want to put the numbers 1 through 9 in a $3 \times 3$ grid, you can do the following:

In [None]:
grid = np.arange(1, 10).reshape(3, 3)
print(grid)

Note that for this to work, the size of the initial array must match the size of the reshaped array, and in most cases the ``reshape`` method will return a no-copy view of the initial array.

A common reshaping operation is converting a one-dimensional array into a two-dimensional row or column matrix:

In [None]:
x = np.array([1, 2, 3])
x.reshape((1, 3))  # row vector via reshape

In [None]:
x.reshape((3, 1))  # column vector via reshape

A convenient shorthand for this is to use `np.newaxis` within a slicing syntax:

In [None]:
x[np.newaxis, :]  # row vector via newaxis

<font face="verdana" style="font-size:30px" color="red">Your turn</font>

Create a numpy array of range (10,35) and reshape to (5,5) and print out

## Array Concatenation and Splitting

All of the preceding routines worked on single arrays. NumPy also provides tools to combine multiple arrays into one, and to conversely split a single array into multiple arrays.

### Concatenation of arrays

Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines ``np.concatenate``, ``np.vstack``, and ``np.hstack``.
``np.concatenate`` takes a tuple or list of arrays as its first argument, as we can see here:

In [None]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

You can also concatenate more than two arrays at once:

In [None]:
z = np.array([99, 99, 99])
print(np.concatenate([x, y, z]))

It can also be used for two-dimensional arrays:

In [None]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [None]:
# concatenate along the first axis
np.concatenate([grid, grid])

In [None]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

For working with arrays of mixed dimensions, it can be clearer to use the ``np.vstack`` (vertical stack) and ``np.hstack`` (horizontal stack) functions:

In [None]:
# vertically stack the arrays
np.vstack([x, grid])

In [None]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

Similary, for higher-dimensional arrays, ``np.dstack`` will stack arrays along the third axis.

### Splitting of arrays

The opposite of concatenation is splitting, which is implemented by the functions ``np.split``, ``np.hsplit``, and ``np.vsplit``.  For each of these, we can pass a list of indices giving the split points:

In [5]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


Notice that *N* split-points, leads to *N + 1* subarrays.
The related functions ``np.hsplit`` and ``np.vsplit`` are similar:

In [6]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [7]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [8]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


Similarly, for higher-dimensional arrays, ``np.dsplit`` will split arrays along the third axis.

In [9]:
grid = np.arange(16).reshape((4, 4))
upper, lower = np.vsplit(grid, [2])
ul,ur=np.hsplit(upper,[2])
print(ul)
print(ur)

[[0 1]
 [4 5]]
[[2 3]
 [6 7]]


<font face="verdana" style="font-size:30px" color="red">Your turn</font>

Given `grid = np.arange(16).reshape((4, 4))`  
Split it into four 2x2 arrays and print out.