# NumPy Basics

NumPy (Numerical Python) is an open source library for the Python programming language that is used to facilitate scientific computing and data analysis. The library allows for improved performance over native Python code due to the fact that a large percentage of NumPy is written in C, and fewer explicit loops are used in operations overall. At the center of this package there lies the NumPy array in addition to the complementary mathematical functions and tools that allows users to work with these objects. A NumPy array, or ndarray, is a uniform and multidimensional collection of elements that can be used to describing blocks of computer memory for future manipulation. Being able to convert information into numerical representations is fundamental to data science, and this data structure allows for the efficient storage and management of numerical arrays.

More Information on NumPy: https://docs.scipy.org/doc/numpy-1.13.0/user/whatisnumpy.html


## Efficiency of NumPy Arrays

Adding Vectors (one-dimensional arrays)
- Vector a = Squares of integers from 0 to n
- Vector b = Cubes of integers from 0 to n

In [380]:
from __future__ import print_function 
import sys 
from datetime import datetime 
import numpy as np

# Adding vectors using pure Python
def sumWithPython(vectorSize):   
    a = list(range(vectorSize))  
    b = list(range(vectorSize))  
    c = []
    for i in list(range(len(a))):       
        a[i] = i ** 2       
        b[i] = i ** 3       
        c.append(a[i] + b[i])
    return c 

# Adding vectors using NumPy
def sumWithNumpy(vectorSize):   
    a = np.arange(vectorSize) ** 2   
    b = np.arange(vectorSize) ** 3   
    c = a + b
    return c

size = 1000000

startTime = datetime.now()
c = sumWithPython(size) 
pythonDelta = datetime.now() - startTime 
print("Pure Python elapsed time:", str(pythonDelta.microseconds), " microseconds")

startTime = datetime.now() 
c = sumWithNumpy(size) 
numpyDelta = datetime.now() - startTime 
print("NumPy elapsed time:", str(numpyDelta.microseconds), " microseconds")

Pure Python elapsed time: 126693  microseconds
NumPy elapsed time: 29035  microseconds


## Array creation, data types, and other attributes of ndarray

In [381]:
# Import NumPy
import numpy as np

# Display NumPy's built-in documentation
# np?

# Create array
array = np.arange(10)

# Display its data type
array.dtype

dtype('int32')

In [382]:
# Display array
array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [383]:
# Display shape of array (one-dimensional)
array.shape

(10,)

In [384]:
# Create a two by five two-dimensional array
twoDimArray = np.arange(10).reshape((2, 5))
twoDimArray

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [385]:
# 2x5 shape
twoDimArray.shape

(2, 5)

In [386]:
# Another way to create a multi-dimensional array
tda = np.array([np.arange(5), np.arange(5)]) 
print(tda)

# Display the shape of the array
tda.shape

[[0 1 2 3 4]
 [0 1 2 3 4]]


(2, 5)

The array() function will create an array from the inputs so long as they are array-like objects.

In [387]:
# View array() function documentation
# array?

In [388]:
# Create array from Python list
np.array([1, 2, 3, 4, 5])

array([1, 2, 3, 4, 5])

In [389]:
# Specify data type upon creation
x = np.array([1, 2, 3, 4, 5], dtype='float32')
print(x)
x.dtype

[ 1.  2.  3.  4.  5.]


dtype('float32')

In [390]:
# Creates an array that starts at 0 and ends at 100 going by 10
np.arange(0, 100, 10)

array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [391]:
# Creates an array of ten values evenly spaced between 0 and 1
np.linspace(0, 1, 10)

array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
        0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])

In [392]:
# Create a 5x5 array of random integers from 0 to 5
np.random.randint(0, 5, (5, 5))

array([[2, 0, 0, 0, 3],
       [0, 0, 4, 1, 4],
       [0, 3, 1, 1, 0],
       [2, 3, 3, 3, 3],
       [2, 3, 3, 2, 4]])

In [393]:
# Create a 10x5 array of normally distributed random values with a mean of 0 and standard deviation of 1
np.random.normal(0, 1, (10, 5))

array([[-0.24223262, -0.31397153,  0.14018923,  1.45420716,  1.6100088 ],
       [-2.22612152,  0.84831692,  1.2127477 ,  1.5084316 , -0.04974136],
       [-2.3314682 , -0.4729337 ,  0.10465026,  0.82188065, -0.01629058],
       [ 0.28062347, -1.08507035, -0.06518354, -1.44698734, -0.06743733],
       [ 0.31697924, -3.42661041,  0.38523789,  0.32498188,  2.07778511],
       [-1.58386318,  0.38659417, -0.6280606 ,  0.2529269 ,  1.73970342],
       [ 0.49348275,  0.04949257,  0.23911953, -0.95140148,  1.1820982 ],
       [-0.01310707,  0.64446461, -1.74262902, -0.0414586 , -0.60577051],
       [ 0.34964548,  0.6377795 ,  0.6558535 ,  0.47145196, -0.76435363],
       [-0.60822116,  1.31943058, -1.63112898,  1.94236583, -1.28377368]])

In [394]:
# Creating a three-dimensional array of two 5x5 arrays with random integer values from 0 through 9
threeDimArray = np.random.randint(10, size=(2, 5, 5)) 
threeDimArray

array([[[3, 1, 6, 0, 5],
        [8, 5, 6, 4, 6],
        [4, 6, 4, 2, 5],
        [7, 6, 4, 6, 0],
        [4, 9, 2, 1, 6]],

       [[8, 5, 5, 2, 5],
        [7, 4, 9, 9, 7],
        [8, 3, 2, 2, 2],
        [9, 2, 2, 6, 6],
        [6, 3, 5, 7, 0]]])

In [395]:
print("Array Attributes\n")
print("Number of dimensions:", threeDimArray.ndim)
print("Size of each dimension:", threeDimArray.shape)
print("Total size of array:", threeDimArray.size)
print("Size of each array element:", threeDimArray.itemsize, "bytes")
print("Total size (in bytes) of the array:", threeDimArray.nbytes, "bytes")

Array Attributes

Number of dimensions: 3
Size of each dimension: (2, 5, 5)
Total size of array: 50
Size of each array element: 4 bytes
Total size (in bytes) of the array: 200 bytes


In [396]:
# Reshaping array
print(threeDimArray.reshape(5, 10))

[[3 1 6 0 5 8 5 6 4 6]
 [4 6 4 2 5 7 6 4 6 0]
 [4 9 2 1 6 8 5 5 2 5]
 [7 4 9 9 7 8 3 2 2 2]
 [9 2 2 6 6 6 3 5 7 0]]


More Information on array creation: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.creation.html

## Standard NumPy Data Types

| Type	        | Description |
|---------------|-------------|
| ``bool_``     | Boolean (True or False) stored as a byte |
| ``int_``      | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)| 
| ``intc``      | Identical to C ``int`` (normally ``int32`` or ``int64``)| 
| ``intp``      | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)| 
| ``int8``      | Byte (-128 to 127)| 
| ``int16``     | Integer (-32768 to 32767)|
| ``int32``     | Integer (-2147483648 to 2147483647)|
| ``int64``     | Integer (-9223372036854775808 to 9223372036854775807)| 
| ``uint8``     | Unsigned integer (0 to 255)| 
| ``uint16``    | Unsigned integer (0 to 65535)| 
| ``uint32``    | Unsigned integer (0 to 4294967295)| 
| ``uint64``    | Unsigned integer (0 to 18446744073709551615)| 
| ``float_``    | Shorthand for ``float64``.| 
| ``float16``   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa| 
| ``float32``   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa| 
| ``float64``   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa| 
| ``complex_``  | Shorthand for ``complex128``.| 
| ``complex64`` | Complex number, represented by two 32-bit floats| 
| ``complex128``| Complex number, represented by two 64-bit floats| 

Reference: https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.01-Understanding-Data-Types.ipynb

## Selecting Elements

In [397]:
# 1D Array
x = np.array([1, 2, 3, 4])

# Display index one
print(x[1])

2


In [398]:
# 2D Array
x = np.array([[1, 2], 
              [3, 4]])

# Display element at certain indices
print(x[0, 0])
print(x[0, 1])
print(x[1, 0])
print(x[1, 1])

1
2
3
4


In [399]:
# 3D Array
x = np.random.randint(10, size=(3, 2, 2)) 

print(x, "\n")

# Display the first index of the second element in the 3rd array
print(x[2][1][0])

print(x
      [1] # 2nd array
      [0] # First element
      [1]) # Second index

[[[0 6]
  [3 7]]

 [[5 3]
  [2 0]]

 [[7 9]
  [6 7]]] 

6
3


## Subarrays

In [400]:
# 1D Array
x = np.arange(10)
print(x, "\n")

print("First two elements:\n", x[:2], "\n")

print("Elements after index 4 including index 4:\n", x[4:], "\n")

print("Elements between indices 2 and 4 but not including 4:\n", x[2:4], "\n")

print("Elements from index 2 to 7 by twos:\n", x[2:7:2], "\n")

print("Reverse array:\n", x[::-1], "\n")

[0 1 2 3 4 5 6 7 8 9] 

First two elements:
 [0 1] 

Elements after index 4 including index 4:
 [4 5 6 7 8 9] 

Elements between indices 2 and 4 but not including 4:
 [2 3] 

Elements from index 2 to 7 by twos:
 [2 4 6] 

Reverse array:
 [9 8 7 6 5 4 3 2 1 0] 



In [401]:
# 2D Array
x = np.random.randint(10, size=(4, 6))
print(x, "\n")

print("Two rows, four columns:\n", x2[:2, :4], "\n")

print("First row:\n", x[0, :], "\n")

print("Third column:\n", x[:, 2], "\n")

print("Reverse array:\n", x[::-1, ::-1], "\n")

[[0 4 2 3 2 1]
 [5 8 0 1 4 6]
 [2 7 9 4 2 6]
 [6 4 8 7 1 6]] 

Two rows, four columns:
 [[7 9 4 2]
 [0 3 4 4]] 

First row:
 [0 4 2 3 2 1] 

Third column:
 [2 0 9 8] 

Reverse array:
 [[6 1 7 8 4 6]
 [6 2 4 9 7 2]
 [6 4 1 0 8 5]
 [1 2 3 2 4 0]] 



Changing a value in a subarray will influence the parent array.

In [402]:
print("Two rows, two columns sub array from x:\n", x[:2, :2], "\n")
xSub = x[:2, :2]

# Chnage element
xSub[0, 1] = 11
print(x)

Two rows, two columns sub array from x:
 [[0 4]
 [5 8]] 

[[ 0 11  2  3  2  1]
 [ 5  8  0  1  4  6]
 [ 2  7  9  4  2  6]
 [ 6  4  8  7  1  6]]


In [403]:
# 3D Array
x = np.arange(30).reshape(2, 3, 5) 
print(x, "\n")

print("Select every second elment from subarray x[0][0]:\n", x[0, 0, ::2], "\n")

print("Third column values from all 2D arrays (matrices):\n", x[..., 2], "\n")

print("Second row values from all 2D arrays (matrices):\n", x[:, 1], "\n")

print("First matrix, last column:\n", x[0, : ,-1], "\n")

print("Last matrix, first column reversed:\n", x[-1, :: -1, 0], "\n")

[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]]

 [[15 16 17 18 19]
  [20 21 22 23 24]
  [25 26 27 28 29]]] 

Select every second elment from subarray x[0][0]:
 [0 2 4] 

Third column values from all 2D arrays (matrices):
 [[ 2  7 12]
 [17 22 27]] 

Second row values from all 2D arrays (matrices):
 [[ 5  6  7  8  9]
 [20 21 22 23 24]] 

First matrix, last column:
 [ 4  9 14] 

Last matrix, first column reversed:
 [25 20 15] 



Copying an array

In [404]:
xSubCopy = x[:2, :2, :3].copy()
print(xSubCopy, "\n")

# Change element
xSubCopy[0][0][0] = 11
print(xSubCopy, "\n")

# No influence on original
print(x, "\n")

[[[ 0  1  2]
  [ 5  6  7]]

 [[15 16 17]
  [20 21 22]]] 

[[[11  1  2]
  [ 5  6  7]]

 [[15 16 17]
  [20 21 22]]] 

[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]]

 [[15 16 17 18 19]
  [20 21 22 23 24]
  [25 26 27 28 29]]] 



## Array Reshaping

In [405]:
threeDimArray = np.random.randint(10, size=(4, 4, 4)) 
print(threeDimArray, "\n")

# Reshaping array
print(threeDimArray.reshape(8, 8), "\n")

[[[9 6 6 0]
  [3 6 3 0]
  [1 6 4 9]
  [7 9 9 6]]

 [[5 1 3 6]
  [5 8 0 8]
  [5 8 2 3]
  [3 6 0 1]]

 [[2 5 0 5]
  [2 9 5 8]
  [9 8 9 4]
  [2 1 8 5]]

 [[5 2 7 8]
  [0 6 1 2]
  [8 3 8 8]
  [0 0 1 0]]] 

[[9 6 6 0 3 6 3 0]
 [1 6 4 9 7 9 9 6]
 [5 1 3 6 5 8 0 8]
 [5 8 2 3 3 6 0 1]
 [2 5 0 5 2 9 5 8]
 [9 8 9 4 2 1 8 5]
 [5 2 7 8 0 6 1 2]
 [8 3 8 8 0 0 1 0]] 



In [406]:
# Other ways to reshape
# These modifiy the array they operate on
x = np.random.randint(10, size=(2, 4, 4)) 
print(x, "\n")

x.shape = (4, 8)
print("Shape:\n", x, "\n")

y = np.random.randint(10, size=(2, 3, 2)) 
print(y, "\n")

y.resize((2, 6))
print("Resize:\n", y)

[[[7 5 8 7]
  [4 3 5 4]
  [5 9 4 9]
  [6 8 5 5]]

 [[1 0 2 6]
  [4 1 0 0]
  [1 2 9 4]
  [2 6 7 0]]] 

Shape:
 [[7 5 8 7 4 3 5 4]
 [5 9 4 9 6 8 5 5]
 [1 0 2 6 4 1 0 0]
 [1 2 9 4 2 6 7 0]] 

[[[9 0]
  [4 0]
  [1 6]]

 [[3 2]
  [2 6]
  [0 3]]] 

Resize:
 [[9 0 4 0 1 6]
 [3 2 2 6 0 3]]


Flattening a multidimensional NumPy arrays results in a one-dimensional array.

In [407]:
x = np.random.randint(10, size=(3, 3, 3)) 
print(x)

[[[3 9 1]
  [6 2 9]
  [9 5 8]]

 [[5 5 2]
  [4 1 6]
  [3 4 6]]

 [[9 0 7]
  [6 3 8]
  [4 0 3]]]


In [408]:
print(np.ravel(x))

[3 9 1 6 2 9 9 5 8 5 5 2 4 1 6 3 4 6 9 0 7 6 3 8 4 0 3]


The flatten function does the same as ravel, but it always creates a new copy in memory in order to avoid modifying the original array.

In [409]:
x = np.random.randint(10, size=(2, 2, 2)) 
print(x.flatten())

[9 4 8 0 4 6 4 4]


## Stacking

Horizontal stacking of arrays

In [410]:
x = np.random.randint(10, size=(3, 3)) 
y = x - 1

# Stacks arrays horizontally 
z = np.hstack((x, y))
print(x, "\n")
print(y, "\n")
print(z)

# concatenate((x, y), axis=1) does the same thing

[[1 1 9]
 [0 9 5]
 [7 1 0]] 

[[ 0  0  8]
 [-1  8  4]
 [ 6  0 -1]] 

[[ 1  1  9  0  0  8]
 [ 0  9  5 -1  8  4]
 [ 7  1  0  6  0 -1]]


Vertical stacking of arrays

In [411]:
x = np.random.randint(10, size=(3, 3)) 
y = x - 1

# Stacks arrays vertically 
z = np.vstack((x, y))
print(x, "\n")
print(y, "\n")
print(z)

# concatenate((x, y), axis=0) does the same thing

[[8 0 8]
 [5 9 3]
 [8 8 5]] 

[[ 7 -1  7]
 [ 4  8  2]
 [ 7  7  4]] 

[[ 8  0  8]
 [ 5  9  3]
 [ 8  8  5]
 [ 7 -1  7]
 [ 4  8  2]
 [ 7  7  4]]


Depth stacking of arrays

In [412]:
x = np.random.randint(10, size=(3, 3)) 
y = x - 1

# Stacks arrays depth-wise along the third axis 
z = np.dstack((x, y))
print(x, "\n")
print(y, "\n")
print(z)

[[7 4 5]
 [1 0 0]
 [1 8 7]] 

[[ 6  3  4]
 [ 0 -1 -1]
 [ 0  7  6]] 

[[[ 7  6]
  [ 4  3]
  [ 5  4]]

 [[ 1  0]
  [ 0 -1]
  [ 0 -1]]

 [[ 1  0]
  [ 8  7]
  [ 7  6]]]


Column stacking of arrays

In [413]:
x = np.random.randint(10, size=(10)) 
y = x + 1

# Stacks one-dimensional arrays as columns to create a two-dimensional array 
z = np.column_stack((x, y))
print(x, "\n")
print(y, "\n")
print(z)

# Two-dimensional arrays are stacked in the same way that hstack stacks them

[8 2 9 1 6 7 8 3 0 1] 

[ 9  3 10  2  7  8  9  4  1  2] 

[[ 8  9]
 [ 2  3]
 [ 9 10]
 [ 1  2]
 [ 6  7]
 [ 7  8]
 [ 8  9]
 [ 3  4]
 [ 0  1]
 [ 1  2]]


In [414]:
x = np.random.randint(10, size=(10)) 
y = x + 1

# Stacks one-dimensional arrays as rows to create a two-dimensional array 
z = np.row_stack((x, y))
print(x, "\n")
print(y, "\n")
print(z)

# Two-dimensional arrays are stacked in the same way that vstack stacks them

[9 9 8 5 2 6 1 7 2 0] 

[10 10  9  6  3  7  2  8  3  1] 

[[ 9  9  8  5  2  6  1  7  2  0]
 [10 10  9  6  3  7  2  8  3  1]]


## Splitting

Horizontal splitting

In [415]:
x = np.random.randint(10, size=(3, 4)) 
print(x, "\n")

# Splits the array along its horizontal axis into n pieces of the same size and shape
print(np.hsplit(x, 2))

[[1 7 8 7]
 [6 0 9 5]
 [4 7 2 2]] 

[array([[1, 7],
       [6, 0],
       [4, 7]]), array([[8, 7],
       [9, 5],
       [2, 2]])]


Vertical splitting

In [416]:
x = np.random.randint(10, size=(3, 6)) 
print(x, "\n")

# Splits the array along its vertical axis into n pieces of the same size and shape
print(np.vsplit(x, 3))

[[7 6 4 8 3 2]
 [4 1 1 0 4 8]
 [2 6 8 1 5 5]] 

[array([[7, 6, 4, 8, 3, 2]]), array([[4, 1, 1, 0, 4, 8]]), array([[2, 6, 8, 1, 5, 5]])]


Depth-wise splitting

In [417]:
x = np.random.randint(10, size=(2, 3, 4)) 
print(x, "\n")

# Splits the array depth-wise along the third axis into n pieces of the same size and shape
print(np.dsplit(x, 4))

[[[1 2 5 3]
  [1 7 6 8]
  [9 6 9 5]]

 [[3 0 8 5]
  [3 6 2 6]
  [6 3 0 4]]] 

[array([[[1],
        [1],
        [9]],

       [[3],
        [3],
        [6]]]), array([[[2],
        [7],
        [6]],

       [[0],
        [6],
        [3]]]), array([[[5],
        [6],
        [9]],

       [[8],
        [2],
        [0]]]), array([[[3],
        [8],
        [5]],

       [[5],
        [6],
        [4]]])]


## Converting Arrays

In [418]:
x = np.random.randint(10, size=(5)) 
print(x)

# Convert to single precision float
f = x.astype(np.float32)
print(f)

# Convert to string
s = x.astype(np.str)
print(s)

# Convert to complex number
c = x.astype(np.complex)
print(c)

[9 2 3 8 6]
[ 9.  2.  3.  8.  6.]
['9' '2' '3' '8' '6']
[ 9.+0.j  2.+0.j  3.+0.j  8.+0.j  6.+0.j]
