In [1]:
import warnings
warnings.filterwarnings("ignore")

## NumPy

In [1]:
import numpy as np # import numpy and give it the short form ('nickname') np

<b>From Wikipedia:
    
NumPy (pronounced /ˈnʌmpaɪ/ (NUM-py) or sometimes /ˈnʌmpi/ (NUM-pee)) is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors.</b>

<a href = "https://en.wikipedia.org/wiki/NumPy">Full article here</a>

Some important **Numpy** features:
<ul>
<li>ndarray: fast and space-efficient multidimensional array with vectorized arithmetic and sophisticated broadcasting</li>
<li>Standard vectorized math</li>
<li>Reading / writing arrays to disk</li>
<li>Memory-mapped file access</li>
<li>Linear algebra, rng, fourier transform</li>
<li>Integration of C, C++, FORTRAN</li>
</ul>

**Creating NumPy arrays**

The function *array()* is commonly used to create numpy ndarrays on the fly from other Python sequence-like objects such as tuples and lists.

In [3]:
np.array(range(0,3))

array([0, 1, 2])

In [4]:
np.array((1, 2, 3)) # from a tuple

array([1, 2, 3])

In [5]:
np.array([1, 2, 3]) # from a list

array([1, 2, 3])

In [6]:
type(np.array(range(3)))

numpy.ndarray

Nested lists result in mutlidimensional arrays:

In [7]:
import random
nestedList = [[random.uniform(0, 9) for x in range(3)] for y in range(4)] #[[],[],[],[]]
nestedList

[[5.269979073791461, 5.385705030541615, 0.17032114941085819],
 [7.059393366358506, 0.009742070447256501, 4.801986993047455],
 [8.656758761545362, 1.6430355623772401, 5.717187685110451],
 [8.174831848248267, 7.485346709675396, 1.9052472675614194]]

In [8]:
type(nestedList)

list

In [9]:
myArray = np.array(nestedList)
myArray

array([[5.26997907, 5.38570503, 0.17032115],
       [7.05939337, 0.00974207, 4.80198699],
       [8.65675876, 1.64303556, 5.71718769],
       [8.17483185, 7.48534671, 1.90524727]])

**Important attributes of arrays**

In [10]:
print(myArray.ndim)  # Number of dimensions
print(myArray.shape) # Shape of the ndarray
print(myArray.dtype) # Data type contained in the array

2
(4, 3)
float64


#### Other functions to create arrays

**arange** - this is equivalent to the range function, except returns a one-dimensional array instead of a range object: 

In [11]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

**ones**, **zeros**, **ones_like**, **zeros_like** - to create arrays filled with ones or zeroes with a given shape or with a shape similar to a given object:

In [12]:
np.ones(3)

array([1., 1., 1.])

In [13]:
np.ones((3,4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [14]:
np.zeros(4)

array([0., 0., 0., 0.])

In [15]:
np.zeros((4,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

We can explicitly specify the data type with which an array should be created:

In [16]:
newArray = np.array(nestedList, dtype = np.int)
newArray

array([[5, 5, 0],
       [7, 0, 4],
       [8, 1, 5],
       [8, 7, 1]])

In [17]:
newArray = np.array(nestedList, dtype = np.float)
newArray

array([[5.26997907, 5.38570503, 0.17032115],
       [7.05939337, 0.00974207, 4.80198699],
       [8.65675876, 1.64303556, 5.71718769],
       [8.17483185, 7.48534671, 1.90524727]])

In [18]:
newArray = np.array(nestedList, dtype = np.bool) # numbers = 0 are False
newArray

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [19]:
newArray = np.array(nestedList, dtype = np.unicode) # Unicode strings
newArray

array([['5.269979073791461', '5.385705030541615', '0.17032114941085819'],
       ['7.059393366358506', '0.009742070447256501', '4.801986993047455'],
       ['8.656758761545362', '1.6430355623772401', '5.717187685110451'],
       ['8.174831848248267', '7.485346709675396', '1.9052472675614194']],
      dtype='<U20')

In [20]:
newArray = np.array(nestedList, dtype = np.object) # arbitrary objects
newArray

array([[5.269979073791461, 5.385705030541615, 0.17032114941085819],
       [7.059393366358506, 0.009742070447256501, 4.801986993047455],
       [8.656758761545362, 1.6430355623772401, 5.717187685110451],
       [8.174831848248267, 7.485346709675396, 1.9052472675614194]],
      dtype=object)

## Vectorized math

One of the most powerful and useful features of numpy, especially for data science, is the ability to vectorize mathmatical operations - that is to apply operations to each element of a vector or array simultaneously without needing to use a *for* loop

In [21]:
myArray

array([[5.26997907, 5.38570503, 0.17032115],
       [7.05939337, 0.00974207, 4.80198699],
       [8.65675876, 1.64303556, 5.71718769],
       [8.17483185, 7.48534671, 1.90524727]])

In [22]:
%%time
myArray + 3

Wall time: 997 µs


array([[ 8.26997907,  8.38570503,  3.17032115],
       [10.05939337,  3.00974207,  7.80198699],
       [11.65675876,  4.64303556,  8.71718769],
       [11.17483185, 10.48534671,  4.90524727]])

In [23]:
myArray2 = np.array([[random.uniform(0, 3) for x in range(3)] for y in range(4)]) 
# creating a nested list and then a 2D array in one step
myArray2

array([[0.11228746, 2.95299933, 1.04103642],
       [1.53750629, 0.9321694 , 2.44887823],
       [1.93475747, 2.63074209, 2.15830447],
       [0.73203975, 2.00925295, 0.25322387]])

In [24]:
myArray + myArray2

array([[ 5.38226653,  8.33870436,  1.21135757],
       [ 8.59689966,  0.94191147,  7.25086522],
       [10.59151624,  4.27377765,  7.87549215],
       [ 8.9068716 ,  9.49459966,  2.15847114]])

In [25]:
newArray = myArray.astype(np.int)
newArray

array([[5, 5, 0],
       [7, 0, 4],
       [8, 1, 5],
       [8, 7, 1]])

In [26]:
newArray * 3

array([[15, 15,  0],
       [21,  0, 12],
       [24,  3, 15],
       [24, 21,  3]])

In [27]:
newArray * [1,2,3] 
# Elementwise multiplication on rows
# That is, in each row, the first column is multiplied by 1, the second by 2 and the third by 3

array([[ 5, 10,  0],
       [ 7,  0, 12],
       [ 8,  2, 15],
       [ 8, 14,  3]])

In [28]:
x = np.array([[1], [2], [3], [4]])
x.shape

(4, 1)

In [29]:
newArray * [[1], [2], [3], [4]] # Elementwise multiplication on rows

array([[ 5,  5,  0],
       [14,  0,  8],
       [24,  3, 15],
       [32, 28,  4]])

These types of numpy operations make it very easy and efficient to work with matrices and perform matrix algebra

In [30]:
np.sqrt(newArray)

array([[2.23606798, 2.23606798, 0.        ],
       [2.64575131, 0.        , 2.        ],
       [2.82842712, 1.        , 2.23606798],
       [2.82842712, 2.64575131, 1.        ]])

In [31]:
np.log(newArray + 1) # to avoid a log(0) error

array([[1.79175947, 1.79175947, 0.        ],
       [2.07944154, 0.        , 1.60943791],
       [2.19722458, 0.69314718, 1.79175947],
       [2.19722458, 2.07944154, 0.69314718]])

These vectorized functions return an array of the appropriate shape - the same shape as the input array in the examples above.

There are also operations that are applied to an array which return a scalar value, usually the result of some type of aggregation

In [32]:
np.sum(newArray,axis = 0)

array([28, 13, 10])

In [33]:
np.max(newArray)

8

In [34]:
np.mean(newArray)

4.25