Think about storing data into array of numbers. Any information(sound, text, pictures) can be converted into arrays of numbers.

Numpy forms the basis of almost all data science packages and libraries used in Python

In [78]:
import numpy as np

### Understanding data types in Python

Python is a dynamically typed language, and therefore one result is that the data types do not need to be explicitly declared.

However, due to this, Python variables dont only contain the data that is stored in them, but they also contain information on the type of data that is stored in them. 

The standard Python implementation is all written in C. 

A simple integer object in Python contains the following pieces of information:

* ob_refcnt, which is a reference count that allows Python to handle memory allocation and deallocation
* ob_type , which contains information on the type of variable
* ob_size, which specifies the size of the data members
* ob_digit, which contains the actual values of the data stored in the integer object. 

When we start storing several of such Python objects in a list, we need to be aware about the additional information(apart from just the data values) that Python objects contain. 

Lets create a list of integers

In [79]:
L = list(range(10))
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [80]:
type(L[0])

int

Lets now create a list of strings. 

In [81]:
L2 = [str(c) for c in L]
L2

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [82]:
type(L2[0])

str

Lets creae a hetrogenous string.

In [83]:
L3 = [True, "2", 3.0, 4]
[type(item) for item in L3]

[bool, str, float, int]

In cases where all elements in the array are of the same type, it makes more sense to store them in a fixed-type array. 

The in-built **array** library can be used for creating fixed-style arrays. 

In [84]:
import array

In [85]:
L = list(range(10))
A = array.array('i', L)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

A better way of creating fixed-type arrays is my using the **ndarray** object in the Numpy package. 

In [86]:
#create an integer list
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

We can use in-built numpy functions to create different fixed-type arrays. 

In [87]:
# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [88]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [89]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [90]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [91]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [92]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

array([[0.65279032, 0.63505887, 0.99529957],
       [0.58185033, 0.41436859, 0.4746975 ],
       [0.6235101 , 0.33800761, 0.67475232]])

In [93]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

array([[ 1.0657892 , -0.69993739,  0.14407911],
       [ 0.3985421 ,  0.02686925,  1.05583713],
       [-0.07318342, -0.66572066, -0.04411241]])

In [94]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

array([[7, 2, 9],
       [2, 3, 3],
       [2, 3, 4]])

In [95]:
# Create a 3x3 identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [96]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
np.empty(3)

array([1., 1., 1.])

### Numpy array attributes

Data manipulation in Python is synonymous with Numpy manipulation. 

In [97]:
np.random.seed(0) # seed for reproducability

x1 = np.random.randint(10, size = 6) #one dimensional array
x2 = np.random.randint(10, size = (3,4)) #two-dimensional array
x3 = np.random.randint(10, size = (3, 4, 5)) #three-dimensional array

Each array has the **ndim** attribute(number of dimensions), **shape**(size of each dimension) and **size**(total size of array)

In [98]:
print("x3 ndim: ", x3.ndim)
print("x3 shape: ", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape:  (3, 4, 5)
x3 size:  60


The **dtype** attribute tells us the data type of the array

In [99]:
print("x3 dtype: ", x3.dtype)

x3 dtype:  int64


**itemsize** tells us the size of each array element in bytes, while **nbytes** tells us the total size of the array in bytes. **nbytes** should be equal to **itemsize** times **size**

In [100]:
print("x3 itemsize: ", x3.itemsize)
print("x3 nbytes: ", x3.nbytes) 
print("x3 itemsize * size: ", x3.itemsize * x3.size)

x3 itemsize:  8
x3 nbytes:  480
x3 itemsize * size:  480


### Array indexing: accessing single elements

In a one-dimensional array, the ith value can be accessed by referencing to the index via square brackets.

In [101]:
x1

array([5, 0, 3, 3, 7, 9])

In [102]:
x1[0]

5

To index from end of array, we can use negative indexes

In [103]:
x1[-1]

9

In [104]:
x1[-2]

7

In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices

In [105]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [106]:
x2[0,0]

3

In [107]:
x2[0,1]

5

In [108]:
x2[2,-1]

7

In [109]:
x2[-1,1]

6

You can also modify the values in a specific position in an array by using the same notations. 

In [110]:
x2[0,0] = 12

In [111]:
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

### Array slicing: accessing subarrays

The **:** character is used to slice arrays. The default Python syntax is x[start:stop:step]. If any of the values are not specified, the defaul values for start is 0, for stop is the size of the dimension, and for step is 1

#### One dimensional sub arrays

In [112]:
x = np.arange(10)

In [113]:
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [114]:
x[:5] # first five elements

array([0, 1, 2, 3, 4])

In [115]:
x[5:] #elements after index 5

array([5, 6, 7, 8, 9])

In [116]:
x[4:7] # middle sub array

array([4, 5, 6])

In [117]:
x[::2] # every other element

array([0, 2, 4, 6, 8])

In [118]:
x[1::2] # every other element, starting at index 1

array([1, 3, 5, 7, 9])

#### Multi-dimensional sub-arrays

In [119]:
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

In [120]:
x2[:2, :3] #two rows, three columns

array([[12,  5,  2],
       [ 7,  6,  8]])

In [121]:
x2[:3, ::2] #all rows, every other column

array([[12,  2],
       [ 7,  8],
       [ 1,  7]])

In [122]:
x2[::-1, ::-1] #reverse rows and columns

array([[ 7,  7,  6,  1],
       [ 8,  8,  6,  7],
       [ 4,  2,  5, 12]])

#### accessing array rows and columns

In [123]:
print(x2[:, 0]) #first column of x2

[12  7  1]


In [124]:
print(x2[0, :]) # first row of x2

[12  5  2  4]


### subarrays as no-copy views

array slicing in numpy arrays returns views rather than copies of the data. this means that we can make changes to sub-slices of a very large array and that will have an affect on the values in the larger array. 

In [125]:
print(x2)

[[12  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


In [126]:
x2_sub = x2[:2, :2]

In [127]:
x2_sub

array([[12,  5],
       [ 7,  6]])

In [128]:
x2_sub[0, 0] = 99

In [129]:
print(x2_sub)

[[99  5]
 [ 7  6]]


In [130]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


### creating copies of arrays

we can create copies of the arrays as follows:

In [131]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

[[99  5]
 [ 7  6]]


In [132]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

[[42  5]
 [ 7  6]]


In [133]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


### reshaping of arrays

In [134]:
grid = np.arange(1,10).reshape((3, 3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


a common reshaping pattern is to convert a one-dimensional array into a two-dimensional matrix. 

In [135]:
x = np.array([1,2,3])

# row vector via reshape
x.reshape((1,3))

array([[1, 2, 3]])

In [136]:
#row vector via newaxis
x[np.newaxis, :]

array([[1, 2, 3]])

In [137]:
#column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [138]:
# column vector via newaxis
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

## Array concatanation and splitting

In [139]:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
np.concatenate([x,y])

array([1, 2, 3, 4, 5, 6])

In [140]:
z = [99, 99, 99]
np.concatenate([x, y, z])

array([ 1,  2,  3,  4,  5,  6, 99, 99, 99])

Array concatenation can also be used for two-dimensional arrays

In [141]:
grid = np.array([[1, 2, 3], 
               [4, 5, 6]])

In [142]:
grid

array([[1, 2, 3],
       [4, 5, 6]])

In [143]:
#concatenate along the first axis
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [144]:
#concatenate along the second axis(zero indexed)
np.concatenate([grid, grid], axis = 1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

When working with arrays of mixed dimensions, its better to use np.vstack and np.hstack

In [145]:
x = np.array([1, 2, 3])

In [146]:
grid = np.array([[9,8,7], [6,5,4]])

In [147]:
#vertically stack the arrays
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [148]:
#horizontally stack the arrays
y = np.array([[99], [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

#### splitting of arrays

In [149]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


In [150]:
grid = np.arange(16).reshape((4,4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [151]:
upper, lower = np.vsplit(grid, [2])

In [152]:
upper, lower

(array([[0, 1, 2, 3],
        [4, 5, 6, 7]]),
 array([[ 8,  9, 10, 11],
        [12, 13, 14, 15]]))

In [153]:
left, right = np.hsplit(grid, [2])

In [154]:
left, right

(array([[ 0,  1],
        [ 4,  5],
        [ 8,  9],
        [12, 13]]),
 array([[ 2,  3],
        [ 6,  7],
        [10, 11],
        [14, 15]]))