<b>Key Feature:</b> It's N-dimensional array object, or ndarray, which is fast, flexible containers for large data sets in Python.
<br>
** Arrays enable to perform mathematical operation on whole blocks of data using similar syntax to the equivalent operations between scaler elements

In [1]:
import numpy as np

In [2]:
data = np.array([[0.9526, -0.246, -0.8856], [0.5639, 0.2379, 0.9104]])

In [3]:
data

array([[ 0.9526, -0.246 , -0.8856],
       [ 0.5639,  0.2379,  0.9104]])

In [4]:
data * 10

array([[ 9.526, -2.46 , -8.856],
       [ 5.639,  2.379,  9.104]])

In [5]:
data + data

array([[ 1.9052, -0.492 , -1.7712],
       [ 1.1278,  0.4758,  1.8208]])

In [6]:
data.shape

(2, 3)

In [7]:
data.dtype

dtype('float64')

### Creating ndarrays

In [8]:
data1 = [6, 7.5, 8, 0, 1]

In [9]:
arr1 = np.array(data)
arr1

array([[ 0.9526, -0.246 , -0.8856],
       [ 0.5639,  0.2379,  0.9104]])

In [10]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [11]:
arr2.ndim

2

In [12]:
arr2.shape

(2, 4)

In [13]:
# Checking data type
arr1.dtype

dtype('float64')

In [14]:
arr2.dtype

dtype('int64')

In [15]:
# Functions for creating new arrays
# zeros and ones create arrays of 0's and 1's with given length or shape.
# empty creates an array without initializing its value to any particular value
# to create a heigher dimensional array with these methods, pass a tuple for the shape
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [16]:
np.zeros((3, 6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [17]:
# It's not safe to assume that np.empty will remain an array of all zeros.
# In many cases it will return uninitialized garbage values
np.empty((2, 3, 2))

array([[[4.64114667e-310, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]],

       [[0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]]])

In [18]:
# array is an array-valued version to built-in Python range function
np.array(15)

array(15)

### Data Types for ndarray

In [19]:
arr1 = np.array([1, 2, 3], dtype=np.float64)

In [20]:
arr2 = np.array([1, 2, 3], dtype=np.int32)

In [21]:
arr1.dtype

dtype('float64')

In [22]:
arr2.dtype

dtype('int32')

In [23]:
# Explicitly convert or cast an array from one dtype to another using ndarray's astype method
arr = np.array([1, 2, 3, 4, 5])
arr.dtype

dtype('int64')

In [24]:
float_arr = arr.astype(np.float64)
float_arr.dtype

dtype('float64')

In [25]:
# Cast floating number to be integer dtype, the decimal part will be turncated
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr

array([ 3.7, -1.2, -2.6,  0.5, 12.9, 10.1])

In [26]:
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10], dtype=int32)

### Operations between Arrays and Scalers
Array are important because they enable to express batch operations on data without wirting any for loops. This usually called <b>Vectorization</b>. Any arithmetic operations between equal-size arrays applies the operation elementwise

In [27]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])

In [28]:
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [29]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [30]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

In [31]:
# Arithmetic operations with scalers are, propagating the value to each elements
1/arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [32]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

### Basic Indexing and Slicing

In [33]:
arr = np.arange(10)

In [34]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [35]:
arr[5]

5

In [36]:
arr[5:8]

array([5, 6, 7])

In [37]:
arr[5:8] = 12

In [38]:
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [39]:
# An important first distinction from list is that array slices are views on the original array
# This means that the data is not copied, and any modification to the view will be reflected in the source array.
arr_slice = arr[5:8]

In [40]:
arr_slice[1] = 12345

In [41]:
arr

array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,
           9])

In [42]:
arr_slice[:] = 74

In [43]:
arr

array([ 0,  1,  2,  3,  4, 74, 74, 74,  8,  9])

In [44]:
# As NumPy has been designed with large data use cases in mind, we could imagine performance and memory problems if NumPy 
# insisted on copying data left and right

# If want a copy of slice of an ndarry instead of a view, we need to explicitly copy the array; exmaple arr[5:8].copy()
# Indexing with slices - Higher dimensional objects gice more options for slicing, as we can slice one or more axes and also
# mix integers. For exmple in 2D array

### Boolean Indexning

In [45]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [46]:
data = np.random.rand(7, 4)

In [47]:
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [48]:
data

array([[0.78515805, 0.23851195, 0.28671942, 0.33153828],
       [0.60223825, 0.38500549, 0.88215559, 0.4110205 ],
       [0.64069642, 0.00566884, 0.2897101 , 0.66412913],
       [0.58504441, 0.471254  , 0.43848925, 0.43566508],
       [0.72332333, 0.43479632, 0.18400094, 0.11356831],
       [0.6261523 , 0.37200681, 0.9930318 , 0.11660001],
       [0.25062008, 0.31171759, 0.21624857, 0.86517631]])

In [49]:
# Suppose each name corresponds to a row in data array.
names == "Bob" # Boolean Array

array([ True, False, False,  True, False, False, False])

In [50]:
# This boolean array can be passed when indexing the array
data[names == "Bob"]

array([[0.78515805, 0.23851195, 0.28671942, 0.33153828],
       [0.58504441, 0.471254  , 0.43848925, 0.43566508]])

In [51]:
data[names == "Bob", 2:]

array([[0.28671942, 0.33153828],
       [0.43848925, 0.43566508]])

In [52]:
data[names == "Bob", 3]

array([0.33153828, 0.43566508])

In [53]:
# Select everything except Bob
data[names != "Bob"]

array([[0.60223825, 0.38500549, 0.88215559, 0.4110205 ],
       [0.64069642, 0.00566884, 0.2897101 , 0.66412913],
       [0.72332333, 0.43479632, 0.18400094, 0.11356831],
       [0.6261523 , 0.37200681, 0.9930318 , 0.11660001],
       [0.25062008, 0.31171759, 0.21624857, 0.86517631]])

In [54]:
data[data < 0.6] = 0

In [55]:
data

array([[0.78515805, 0.        , 0.        , 0.        ],
       [0.60223825, 0.        , 0.88215559, 0.        ],
       [0.64069642, 0.        , 0.        , 0.66412913],
       [0.        , 0.        , 0.        , 0.        ],
       [0.72332333, 0.        , 0.        , 0.        ],
       [0.6261523 , 0.        , 0.9930318 , 0.        ],
       [0.        , 0.        , 0.        , 0.86517631]])

In [56]:
data[names != 'Joe'] = 7

In [57]:
data

array([[7.        , 7.        , 7.        , 7.        ],
       [0.60223825, 0.        , 0.88215559, 0.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.6261523 , 0.        , 0.9930318 , 0.        ],
       [0.        , 0.        , 0.        , 0.86517631]])

### Fancy Index
Fancy indexing is termed adopted by NumPy to descibe indexing using integers arrays

In [60]:
arr = np.empty((8, 4))

In [61]:
for i in range(8):
    arr[i] = i

In [62]:
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [63]:
# Select out a subset of the rows in a particular order - simply pass a list or ndarray of integers specify the desired order
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

In [64]:
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

In [65]:
arr = np.arange(32).reshape((8, 4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [66]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

In [67]:
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

In [68]:
arr[np.ix_([1, 5, 7, 2], [0, 3, 1, 2])]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

### Transposing Arrays and Swapping Axes
Transposing is a special form of reshaping which similarly returns view on the underlying data without copying anything.

In [71]:
arr = np.arange(15).reshape((3, 5))

In [72]:
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [73]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [74]:
# Computing the inner matrix product X(transpose)X using np.dot
arr = np.random.randn(6, 3)

In [75]:
np.dot(arr.T, arr)

array([[ 8.04387872,  0.41910464, -2.61607741],
       [ 0.41910464,  3.42119284,  1.2185665 ],
       [-2.61607741,  1.2185665 ,  7.49873471]])

In [76]:
# For higher dimensional arrays, transpose will accept a tuple of axis numbers to permute the axes
arr = np.arange(16).reshape((2, 2, 4))

In [77]:
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [78]:
arr.transpose((1, 0, 2))

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

In [79]:
# Simple transposing with .T is just a special case of swapping axes.
# ndarray has the method swapaxes which takes a pair of axis number
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [82]:
arr.swapaxes(1, 2)

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

In [83]:
# swapaxes similarly returns a view on the data without making a copy

### Universal Functions: Fast Element-wise Array Fucntions
A universal function of <i>ufunc</i>, is a function that performs elementwise operations on data in ndarrays. We can think of it as fast vectorized wrappers for simple functions that take one or more scaler values and produce one or more scaler results

In [85]:
arr = np.arange(10)

In [86]:
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [87]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [88]:
# These are referred to as unary ufuncs. Others, such as add or maximum, take 2 arrays (thus, binary ufuncs) and return
# as the single result
x = np.random.randn(8)
y = np.random.randn(8)

In [89]:
x

array([ 0.11092223,  0.66877042, -0.78308211, -0.71082465,  0.26261375,
       -0.2434131 ,  0.26513614, -0.25561995])

In [90]:
y

array([ 0.94676706,  0.85664119, -0.6047244 , -0.50110749, -0.69999072,
        0.27061047,  0.93973583,  0.44315982])

In [91]:
np.maximum(x, y) # element-wise maximum

array([ 0.94676706,  0.85664119, -0.6047244 , -0.50110749,  0.26261375,
        0.27061047,  0.93973583,  0.44315982])

In [93]:
# While not common, a ufunc can return multuple arrays.
# modf is one exapmple, a vectorized version of the built-in Python divmod:
# it returns the fractional and integral parts of a floating point array
arr = np.random.randn(7)*5

In [94]:
arr

array([-6.94986117, -0.80394622, -0.84927347, -2.44304359, -0.24068028,
        1.76290832,  1.76718512])

In [95]:
np.modf(arr)

(array([-0.94986117, -0.80394622, -0.84927347, -0.44304359, -0.24068028,
         0.76290832,  0.76718512]), array([-6., -0., -0., -2., -0.,  1.,  1.]))