# Chapter 2: Introduction to Numpy

This notebook is a summary of code snippets within the Python Data Science Handbook by Jake VanderPlas. 

Along with some notes, I took the snippets that seemed most important to me, and this notebook could be used as a quick reference guide adapted from the book.

In [1]:
import numpy as np

### Creating arrays from Python Lists

In [2]:
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

Changing data type

In [3]:
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

Multidimensional arrays with nested lists

In [4]:
np.array([range(i, i+3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

### Creating Arrays from Scratch

In [5]:
# Create length-10 integer array filled with zeros
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [6]:
# Create a 3x5 floating-point array filled with 1s
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [7]:
# create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [8]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [9]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [10]:
# Create a 3x3 array of uniformly distributed 
# random values between 0 and 1
np.random.random((3, 3))

array([[0.09758546, 0.35464725, 0.95727393],
       [0.34667301, 0.38192069, 0.49297534],
       [0.40822541, 0.44318036, 0.81091251]])

In [11]:
# Create a 3x3 array of random integers in interval [0, 10)
np.random.randint(0, 10, (3,3))

array([[6, 6, 9],
       [5, 0, 8],
       [5, 6, 6]])

In [12]:
# Create a 3x3 identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

### Numpy Standard Data Types

| Data type | Description |
|---|---|
|bool_ | boolean (True or False) stored as a byte
|int_ | Default integer type (same as C long; normally either int64 or int32)
|intc | Identical to C int (normally int32 or int64)
|intp | Integer used for indexing (same as C size_t; normally either int32 or int64)
|int8 | Byte (-128 to 127)
|int18 | Integer (-32768 to 32767)
|int32 | Integer (-2147483648 to 2147483647)
|int64 | Integer 64bits
|uint8 | unsigned integer 8bits
|uint32 | unsigned integer 32bits
|uint64 | Unsigned integer 64bits 
|float_ | Shorthand for float64
|float16 | Half-precision float: sign bit, 5 bits exponent, 10 bits mantissa
|float32 | Single-precision float: sign bit, 8 bits exponent, 23 bits mantissa
|float64 | Double-precision float: sign bit, 11 bits exponent, 52 bits mantissa
|complex_ | Shorthand for complex128
|complex64 | Complex number, represented by two 32 bit floats
|complex128 | complex number, represented by two 64-bit floats

## The Basics of Numpy Arrays

#### NumPy Array Attributes
Setup

In [13]:
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

**dim, shape, size**

In [14]:
print("x3 ndim: ", x3.ndim)
print("x3 shape: ", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape:  (3, 4, 5)
x3 size:  60


**dtype**

In [15]:
print("dtype:", x3.dtype)

dtype: int64


**itemsize, nbytes**

In [16]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

itemsize: 8 bytes
nbytes: 480 bytes


### Array Indexing, Slicing

In [17]:
x1[0]  # first item

5

In [18]:
x1[-1]  # last item

9

In [19]:
x2[0, 0]  # 1st 2-d

3

In [20]:
x2[2, -1]

7

In [21]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [22]:
x[:5]  # First 5

array([0, 1, 2, 3, 4])

In [23]:
x[5:]  # elements after index 5

array([5, 6, 7, 8, 9])

In [24]:
x[1::2]  # every other starting at index 1

array([1, 3, 5, 7, 9])

In [25]:
x[::-1]  # all elements reversed

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [26]:
x[5::-2]  # every other starting from index 5 reversed

array([5, 3, 1])

**Multidimensional Array Slicing**

In [27]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [28]:
x2[:2, :3]  # two rows, three columns

array([[3, 5, 2],
       [7, 6, 8]])

In [29]:
print(x2[:, 0]) # first column

[3 7 1]


In [30]:
print(x2[0, :])  # first row
print(x2[0])  # also first row

[3 5 2 4]
[3 5 2 4]


**Subarrays as no Copy Views**

This allows for modifying sections of larger arrays without copying

In [31]:
print(x2)

[[3 5 2 4]
 [7 6 8 8]
 [1 6 7 7]]


In [32]:
x2_sub = x2[:2, :2]
print(x2_sub)

[[3 5]
 [7 6]]


In [33]:
x2_sub[0, 0] = 99
print(x2_sub)

[[99  5]
 [ 7  6]]


In [34]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


**Creating copies of arrays**

In [35]:
x2_sub_copy = x2[:2, :2].copy()

### Reshaping Arrays

In [36]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


**Create a row vector**

In [37]:
x = np.array([1, 2, 3])
x.reshape(1, 3)

array([[1, 2, 3]])

In [38]:
x[np.newaxis, :]

array([[1, 2, 3]])

**create a column vector**

In [39]:
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [40]:
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

## Array Concatenation and Splitting

**Horizontal**

In [41]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

In [42]:
z = [99, 99 , 99]
print(np.concatenate([x, y, z]))

[ 1  2  3  3  2  1 99 99 99]


In [43]:
grid = np.array([[1, 2, 3], [4, 5, 6]])
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

In [44]:
grid = np.array([[9, 8, 7],
                [6, 5, 4]])
y = np.array([[99],
              [99]])

np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

**Vertical**

In [45]:
np.concatenate(([grid, grid]))

array([[9, 8, 7],
       [6, 5, 4],
       [9, 8, 7],
       [6, 5, 4]])

In [46]:
x = np.array([1, 2, 3])

np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

### Splitting Arrays

In [47]:
x = [1, 2, 4, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])  # given list of indices for split points
print(x1, x2, x3)

[1 2 4] [99 99] [3 2 1]


## Computation on NumPy Arrays: Universal Functions

***universal functions*** (ufuncs) are functions that implement ***vectorized*** operations

***Vectorized*** Operations ensure much faster calculations

### Array Arithmetic

In [48]:
x = np.arange(4)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)
print("-x     =", -x)
print("x ** 2 =", x ** 2)
print("x % 2  =", x % 2)

x     = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]
-x     = [ 0 -1 -2 -3]
x ** 2 = [0 1 4 9]
x % 2  = [0 1 0 1]


**Absolute Value**

In [49]:
x = np.array([-2, -1, 0, 1, 2])
np.abs(x)

array([2, 1, 0, 1, 2])

In [50]:
x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
np.abs(x)  # Returns magnitude

array([5., 5., 2., 1.])

### Trigonometric Functions

In [51]:
theta = np.linspace(0, np.pi, 3)

In [52]:
print("theta      =", theta)
print("sin(theta) =", np.sin(theta))
print("cos(theta) =", np.cos(theta))
print("tan(theta) =", np.tan(theta))

x = [-1, 0, 1]
print("x          =", x)
print("arcsin(x)  =", np.arcsin(x))
print("arccos(x)  =", np.arccos(x))
print("arctan(x)  =", np.arctan(x))

theta      = [0.         1.57079633 3.14159265]
sin(theta) = [0.0000000e+00 1.0000000e+00 1.2246468e-16]
cos(theta) = [ 1.000000e+00  6.123234e-17 -1.000000e+00]
tan(theta) = [ 0.00000000e+00  1.63312394e+16 -1.22464680e-16]
x          = [-1, 0, 1]
arcsin(x)  = [-1.57079633  0.          1.57079633]
arccos(x)  = [3.14159265 1.57079633 0.        ]
arctan(x)  = [-0.78539816  0.          0.78539816]


### Exponents and Logarithms

In [53]:
x = [1, 2, 3]
print("x      =", x)
print("e^x    =", np.exp(x))
print("2^x    =", np.exp2(x))
print("3^x    =", np.power(3, x))
x = [1, 2, 4, 10]
print("x        =", x)
print("ln(x)    =", np.log(x))
print("log2(x)  =", np.log2(x))
print("log10(x) =", np.log10(x))

x      = [1, 2, 3]
e^x    = [ 2.71828183  7.3890561  20.08553692]
2^x    = [2. 4. 8.]
3^x    = [ 3  9 27]
x        = [1, 2, 4, 10]
ln(x)    = [0.         0.69314718 1.38629436 2.30258509]
log2(x)  = [0.         1.         2.         3.32192809]
log10(x) = [0.         0.30103    0.60205999 1.        ]


### Specialized Ufuncs

In [54]:
from scipy import special

In [55]:
# Gamme functions (generalized factorials) and related functions
x = [1, 5, 10]
print("gamma(x)     =", special.gamma(x))
print("ln|gamma(x)| =", special.gammaln(x))
print("beta(x, 2)   =", special.beta(x, 2))
# Error function (integral of Gaussian)
# its complement, and its inverse
x = np.array([0, 0.3, 0.7, 1.0])
print("erf(x)    =", special.erf(x))
print("erfc(x)   =", special.erfc(x))
print("erfinv(x) =", special.erfinv(x))

gamma(x)     = [1.0000e+00 2.4000e+01 3.6288e+05]
ln|gamma(x)| = [ 0.          3.17805383 12.80182748]
beta(x, 2)   = [0.5        0.03333333 0.00909091]
erf(x)    = [0.         0.32862676 0.67780119 0.84270079]
erfc(x)   = [1.         0.67137324 0.32219881 0.15729921]
erfinv(x) = [0.         0.27246271 0.73286908        inf]


### Advanced Ufunc features

**Specifying Output**

In [56]:
x = np.arange(5)
y = np.empty(5)
np.multiply(x, 10, out=y)
print(y)

[ 0. 10. 20. 30. 40.]


**Aggregates**

In [57]:
x = np.arange(1, 6)
print(np.add.reduce(x))  # aggregate 1 to 5 to one val with sum
print(np.multiply.reduce(x))
# Store intermediate results with accumulate
np.add.accumulate(x)

15
120


array([ 1,  3,  6, 10, 15])

## Aggregations: Min, Max, and Everything in Between

In [58]:
import numpy as np

**sum**

In [59]:
L = np.random.random(100)
np.sum(L)

52.12818058833702

**min, max**

In [60]:
big_array = np.random.rand(1000000)
np.min(big_array), np.max(big_array)

(1.4057692298008462e-06, 0.9999994392723005)

In [61]:
print(big_array.min(), big_array.max(), big_array.sum())

1.4057692298008462e-06 0.9999994392723005 500202.5348847683


In [62]:
M = np.random.random((3, 4))
print(M)
print(M.sum())
print(M.min(axis=0))  # min of each column
print(M.max(axis=1))  # max of each row

[[0.50063048 0.07383653 0.49018646 0.72521956]
 [0.84926562 0.10226215 0.99559424 0.59250301]
 [0.53509    0.88518089 0.25518136 0.13130483]]
6.1362551272647154
[0.50063048 0.07383653 0.25518136 0.13130483]
[0.72521956 0.99559424 0.88518089]


## Computation on Arrays: Broadcasting

**Broadcasting** is a set of rules for **applying binary ufuncs on arrays of different sizes**

**Rules of Broadcasting**

- Rule 1: If two arrays differ in **number of dimensions**, the shape with fewer dimensions is **padded** with ones on its left side
- Rule 2: If the shape of arrays do no match, **the array with dimension shape = 1 is stretched to match other shape**
- Rule 3: If any dimension sizes still disagree, and neither are equal to 1, **raise error**

## Comparisons, Masks, and Boolean Logic

In [63]:
x = np.array([1, 2, 3, 4, 5])

In [64]:
x < 3

array([ True,  True, False, False, False])

In [65]:
x > 3

array([False, False, False,  True,  True])

In [66]:
(2 * x) == (x ** 2)

array([False,  True, False, False, False])

In [67]:
rng = np.random.RandomState(0)
x = rng.randint(10, size=(3, 4))
x

array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])

In [68]:
x < 6

array([[ True,  True,  True,  True],
       [False, False,  True,  True],
       [ True,  True, False, False]])

**Counting entries**

In [69]:
np.sum(x < 6)

8

In [70]:
# how many values less than 6 in each row?
np.sum(x < 6, axis=1)

array([4, 2, 2])

In [71]:
# are there any values greater than 8?
np.any(x > 8)

True

In [72]:
# are all values equal to 6?
np.all(x == 6)

False

In [73]:
np.sum((x > 0.5) & (x < 1))

0

| Operator | Equivalent ufunc |
|---|---|
|&| np.bitwise_and
| | | np.bitwise_or
| ^ | np.bitwise_xor
| ~ | np.bitwise_not


**Using masks**

In [74]:
x[x < 5]

array([0, 3, 3, 3, 2, 4])

`and` and `or` perform a single Boolean evalutaion on an **entire** object, whereas `&` and `|` perform multiple Boolean ecaluations (the indiviual bits or bytes) of an object. Using the latter is nearly always the desired operation for Boolean NumPy arrays

**fancy indexing**

In [75]:
X = np.arange(12).reshape((3, 4))
X

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [76]:
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
X[row, col]

array([ 2,  5, 11])

## Sorting Arrays

**np.sort** and **np.argsort**

**np.sort** uses the ***quicksort algorithm*** although mergesort and heapsort are also available. It is O(NlogN)

In [77]:
x = np.array([2, 1, 4, 3, 5])
np.sort(x)

array([1, 2, 3, 4, 5])

In [78]:
x.sort()
print(x)

[1 2 3 4 5]


**np.argsort** returns the ***indices*** of the sorted elements

In [79]:
x = np.array([2, 1, 4, 3, 5])
i = np.argsort(x)
print(i)

[1 0 3 2 4]


**Sorting Along rows or columns**

In [80]:
rand = np.random.RandomState(42)
X = rand.randint(0, 10, (4, 6))

In [81]:
X

array([[6, 3, 7, 4, 6, 9],
       [2, 6, 7, 4, 3, 7],
       [7, 2, 5, 4, 1, 7],
       [5, 1, 4, 0, 9, 5]])

In [82]:
# sort each column
np.sort(X, axis=0)

array([[2, 1, 4, 0, 1, 5],
       [5, 2, 5, 4, 3, 7],
       [6, 3, 7, 4, 6, 7],
       [7, 6, 7, 4, 9, 9]])

In [83]:
# sort each row
np.sort(X, axis=1)

array([[3, 4, 6, 6, 7, 9],
       [2, 3, 4, 6, 7, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 5, 9]])

**Partial Sorts: Partitioning**

In [84]:
x = np.array([7, 2, 3, 1, 6, 5, 4])
np.partition(x, 3)

array([2, 1, 3, 4, 6, 5, 7])