# Getting started with ndarray

**ndarrays** are time and space-efficient multidimensional arrays at the core of numpy. Like the data structures in Week 2, let's get started by creating ndarrays using the numpy package

## How to create Rank 1 numpy arrays:

In [2]:
# Rank 1 numpy arrays are single dimensional arrays aka a vector
import numpy as np

an_array = np.array([3, 33, 333]) 
# create a rank 1 array

print(type(an_array))
# The type of an ndarray is <class 'numpy.ndarray'>

<class 'numpy.ndarray'>


In [3]:
# Test the shape of the array that we just created
# it should have just one dimension
print(an_array.shape)

(3,)


In [4]:
# because this is a 1-rank array, we need only one index to access each element
# just like you do with a list
print(an_array[0], an_array[1], an_array[2])

3 33 333


In [5]:
an_array[0] = 888
# np arrays are mutable. You are able to change elements, but
# remember they must always be the same type as the rest of the array

print(an_array)

[888  33 333]


## How to create a Rank 2 numpy array:

A rank 2 **ndarray** is one with two dimensions. Notice the format below of [[row], [row]]. 2 dimensional arrays are great for representing matricies which are often useful in data science.

In [9]:
another = np.array([[11, 12, 13], [21, 22, 23]])
# Create a rank 2 array

print(another)
# print the array

print('The shape is 2 rows and 3 columns', another.shape)
# rows x columns

print('Accessing elements [0,0], [0,1] and [1,0] of the ndarray:',
     another[0, 0], another[0, 1], another[1, 0])
# when asking for items, it is row first then column

[[11 12 13]
 [21 22 23]]
The shape is 2 rows and 3 columns (2, 3)
Accessing elements [0,0], [0,1] and [1,0] of the ndarray: 11 12 21


## There are many ways to create numpy arrays:

Here we create a number of different size arrays with different shapes and different pre-filled values. Numpy has a number of built in methods which help us quickly and easily create mulitdimensional arrays

In [10]:
import numpy as np

# create a 2x2 array of zeros
zero_example = np.zeros([2,2])
print(zero_example)

[[0. 0.]
 [0. 0.]]


In [12]:
# create a 2x2 array filled with 9.0
full_example = np.full((2,2), 9.0)
print(full_example)

[[9. 9.]
 [9. 9.]]


In [13]:
# create a 2x2 matrix with the diagonal 1s and the others 0s
eye_example = np.eye(2,2)
print(eye_example)

[[1. 0.]
 [0. 1.]]


In [16]:
# create an array of all ones
ones_example = np.ones([1,2])
print(ones_example)

# notice that this creates a 1 x 2 matrix. This is different from a
# rank one matrix in that it is still ranked two, and this can
# be seen by testing its shape:
print(ones_example.shape)

# in order to index this, we need to two dimensions:
print(ones_example[0, 1])

[[1. 1.]]
(1, 2)
1.0


In [18]:
# create an array of random floats between 0 and 1
random_example = np.random.random([2,2])
print(random_example)
# useful for algorithms that need a random state to get started

[[0.07121807 0.45808815]
 [0.43513192 0.96319841]]


# Array Indexing

## Slice Indexing:

Similar to the use of slice indexing with lists and strings, we can use slice indexing to pull out sub-regions of ndarrays

In [20]:
# Rank 2 array of shape (3, 4)
an_array = np.array([[11,12,13,14],[21,22,23,24],[31,32,33,34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


Use array slicing to get a subarray consisting of the first 2 rows x 2 columns

In [22]:
a_slice = an_array[:2, 1:3]
# first slice is the rows, second is columns
# grabs all rows from 0-1 and then all cols from 1-2
# a_slice points to the same elements in memory as an_array
# a_slice is still part of an_array!!!

print(a_slice)

# a_slice has its own indicies, which are different from an_array
print(a_slice[0,0])

[[12 13]
 [22 23]]
12


When you modify a slice, you actually modify the underlying array

In [23]:
print('Before:', an_array[0,1])
# inspects the element at 0, 1

a_slice[0,0] = 1000
# a_slice[0,0] is the same piecce of data as an_array[0,1]
print('After:', an_array[0,1])

Before: 12
After: 1000


In [27]:
# to create a copy, you need to make a new array
copy_slice = np.array(an_array[:2,1:3])
print(copy_slice)
print()

copy_slice[0,0] = 11
print('Slice Copy:')
print(copy_slice)
print()

print('Slice of an_array:')
print(a_slice)

[[1000   13]
 [  22   23]]

Slice Copy:
[[11 13]
 [22 23]]

Slice of an_array:
[[1000   13]
 [  22   23]]


## Use both integer indexing and slice indexing

We can use combinations of integer indexing and slice indexing to create different shaped matricies.

In [6]:
# Create a rank 2 array of shape (3, 4)
an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


In [32]:
# Using both integer indexing and slicing generates an array of lower rank
row_rank1 = an_array[1, :]
# rank 1 view
# takes row 1 and all of its cols

print(row_rank1, row_rank1.shape)
# Notice only a single []
print()

print(row_rank1[1])
# only need 1 number to reference

[21 22 23 24] (4,)

22


In [33]:
# Slicing with : instead of an integer index generates an array 
# of the same rank as the an_array
row_rank2 = an_array[1:2, :]
# rank 2 view
# takes row 1 and all its cols, but makes a rank 2 array

print(row_rank2, row_rank2.shape)
# Notice the nested [ [] ]
print()

print(row_rank2[0, 1])
# Now need 2 values to reference the array

[[21 22 23 24]] (1, 4)

22


In [34]:
# We can do the same thing for cols of an array:
col_rank1 = an_array[:, 1]
col_rank2 = an_array[:, 1:2]

print(col_rank1, col_rank1.shape) # rank1
# Notice that this is horizontal
print()
print(col_rank2, col_rank2.shape) # rank2
# While this is vertical

[12 22 32] (3,)

[[12]
 [22]
 [32]] (3, 1)


## Array indexing for changing elements:

Sometimes it's useful to use an array of indexes to access or change elements.

In [8]:
# Create a new array 4 x 3:
an_array = np.array([[11,12,13], [21,22,23],[31,32,33],[41,42,43]])
print('Original Array:')
print(an_array)

Original Array:
[[11 12 13]
 [21 22 23]
 [31 32 33]
 [41 42 43]]


In [9]:
# Create an array of indicies
import numpy as np
col_indicies = np.array([0,1,2,0])
print('\nCol indicies picked: ', col_indicies)

row_indicies = np.arange(4)
print('\nRows indicies picked : ', row_indicies)


Col indicies picked:  [0 1 2 0]

Rows indicies picked :  [0 1 2 3]


In [10]:
# Examine the parings of row indicies an col_indicies:
for row, col in zip(row_indicies,col_indicies):
    print(row, ', ', col)

0 ,  0
1 ,  1
2 ,  2
3 ,  0


# Select one element from each row
print('Values in the array at those indicies: ', an_array[row_indicies, col_indicies])

In [14]:
# Change one element from each row using the indicies selected
an_array[row_indicies, col_indicies] += 100000
print('\nChanged Array: ')
print(an_array)


Changed Array: 
[[300011     12     13]
 [    21 300022     23]
 [    31     32 300033]
 [300041     42     43]]


# Boolean Indexing

# Array indexing for changing elements:

In [30]:
# Create a 3 x 2 array
an_array = np.array([[11,12],[21,22],[31,32]])
print(an_array)           

[[11 12]
 [21 22]
 [31 32]]


In [23]:
# Create a filter which will be boolean values for whatever element
# Meets the condition

filter = (an_array > 15)
filter

array([[False, False],
       [ True,  True],
       [ True,  True]])

Notice that the filter is a same size ndarray as an_array which is filled with True for each element whose corresponding element in an_array which is greater than 15 and False for those elements whose value is less than 15

In [24]:
# We can now select just those elements which meet that criteria
print(an_array[filter])

[21 22 31 32]


In [25]:
# For short, we could have just used the approact below
# without the need for the filter

an_array[an_array > 15]

array([21, 22, 31, 32])

In [31]:
# Another example:
an_array[(an_array > 20) & (an_array < 30)]

array([21, 22])

Whiat is particurly useful is that we can actually change elements in the array by applying a similar logical filter. Lets add 100 to all the even values.

In [32]:
an_array[an_array % 2 == 0] += 100
print(an_array)

[[ 11 112]
 [ 21 122]
 [ 31 132]]


# Datatypes and Array Operations

## Datatypes:

In [33]:
ex1 = np.array([11,12]) 
# Python assigns the datatype
print(ex1.dtype)

int32


In [36]:
ex2 = np.array([11.0, 12.0])
# Python assigns the datatype
print(ex2.dtype)

float64


In [37]:
ex3 = np.array([11, 21], dtype=np.int64)
# You can also tell Python the data type
print(ex3.dtype)

int64


In [39]:
# You can use this to force floats into integers
# using the floor function
ex4 = np.array([11.1, 12.7], dtype=np.int64)
print(ex4.dtype)
print()
print(ex4)
# Note the loss of information

int64

[11 12]


In [40]:
# you can use this to force integers into floats
# floats if you anticpate the values may change
# to floats later
ex5 = np.array([11, 12], dtype=np.float64)
print()
print(ex5)


[11. 12.]


## Arithmetic Array Operations:

In [42]:
x = np.array([[111, 112],[121, 122]], dtype=np.int)
y = np.array([[211.1, 212.1],[221.1, 222.1]], dtype=np.float64)

print(x)
print()
print(y)
# results will be upcase to floats to avoid loss of information

[[111 112]
 [121 122]]

[[211.1 212.1]
 [221.1 222.1]]


In [43]:
# add
print(x + y) # The plus sign works
print()
print(np.add(x,y)) # So does the numpy function 'add'

[[322.1 324.1]
 [342.1 344.1]]

[[322.1 324.1]
 [342.1 344.1]]


In [44]:
# subtract
print(x - y)
print()
print(np.subtract(x, y))

[[-100.1 -100.1]
 [-100.1 -100.1]]

[[-100.1 -100.1]
 [-100.1 -100.1]]


In [45]:
# multiply
print(x * y)
print()
print(np.multiply(x,y))

[[23432.1 23755.2]
 [26753.1 27096.2]]

[[23432.1 23755.2]
 [26753.1 27096.2]]


In [46]:
# divide
print(x / y)
print()
print(np.divide(x, y))

[[0.52581715 0.52805281]
 [0.54726368 0.54930212]]

[[0.52581715 0.52805281]
 [0.54726368 0.54930212]]


In [47]:
# square root
print(np.sqrt(x))

[[10.53565375 10.58300524]
 [11.         11.04536102]]


In [48]:
# exponent (e ** x)
print(np.exp(x))

[[1.60948707e+48 4.37503945e+48]
 [3.54513118e+52 9.63666567e+52]]


# Statistical Methods, Sorting, and Set Operations

<p style='font-family: Arial; font-size: 1.75em; color: #2472C0; font-style: bold'>
<br>
Basic Statistical Operations:
</p>

In [50]:
# Setup a random 2 x 4 matrix
arr = np.random.randn(2,4)
print(arr)

[[-0.64826944 -3.01907787 -1.18576472 -1.13144928]
 [-1.43983578  0.81222068  0.84044162 -0.14493854]]


In [52]:
# Compute the mean for all elements
print(arr.mean())

-0.739584165787858


In [54]:
# Compute the mean by row:
# axis = 1 gives the means of each row
print(arr.mean(axis = 1))
# Returns a 2 element array with the mean for each row

[-1.49614033  0.01697199]


In [56]:
# Compute the mean by col:
# axis = 0 gives the means of each col
print(arr.mean(axis = 0))
# Retuns a 4 element array with the mean for each col

[-1.04405261 -1.10342859 -0.17266155 -0.63819391]


In [57]:
# Compute the sum of all elements:
print(arr.sum())

-5.916673326302864


In [58]:
# Compute the median of each row
print(np.median(arr, axis = 1))

[-1.158607    0.33364107]


## Sorting:

In [62]:
# Create a 10 element array of randoms
unsorted = np.random.randn(10)
print(unsorted)

[ 1.90076528 -1.30340982 -1.01955268  1.13056997 -1.02305702  0.53233449
  0.69972563 -0.73770095 -0.8996074   0.56090031]


In [63]:
# Create copy and sort
sorted = np.array(unsorted)
sorted.sort()

print(sorted)

[-1.30340982 -1.02305702 -1.01955268 -0.8996074  -0.73770095  0.53233449
  0.56090031  0.69972563  1.13056997  1.90076528]


In [64]:
# In place sorting
unsorted.sort()

print(unsorted)

[-1.30340982 -1.02305702 -1.01955268 -0.8996074  -0.73770095  0.53233449
  0.56090031  0.69972563  1.13056997  1.90076528]


## Find unique elements

In [67]:
array = np.array([2,1,4,2,1,4,2])

print(np.unique(array))

[1 2 4]


## Set Operations with np.array data type

In [73]:
s1 = np.array(['desk', 'lamp', 'chair'])
s2 = np.array(['lamp', 'bulb', 'chair'])
print(s1, s2)

['desk' 'lamp' 'chair'] ['lamp' 'bulb' 'chair']


In [74]:
# Print all elements that both have in common
print(np.intersect1d(s1, s2))

['chair' 'lamp']


In [78]:
# Print all unique elements across both arrays
print(np.union1d(s1, s2))

['bulb' 'chair' 'desk' 'lamp']


In [79]:
# print all elements in one set but not the other
print(np.setdiff1d(s1, s2))

['desk']


In [80]:
# Print all elements in s1 that are also in s2
print(np.in1d(s1, s2))
# returns a boolean array

[False  True  True]


# Broadcasting

In [83]:
# Broadcasting solves the problem of adding arrays of
# different sizes. Broadcasting will try to figure
# out which dimensions you want to add. If the cols
# are the same number, it will add the values of
# on array into another:

a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
b = np.array([0,1,0,2])

print(a + b)

# The array b is added into each row of a
# b retains its original shape.
# Memory and computationaly efficient!

[[ 1  3  3  6]
 [ 5  7  7 10]
 [ 9 11 11 14]]


#### Broadcasting rules:

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when:

1. They are equal, or
2. One of them is 1

In [85]:
start = np.zeros([4,3])
print(start)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [86]:
# Create a rank 1 ndarray with 3 values
add_rows = np.array([1, 0, 2])
print(add_rows)

[1 0 2]


In [87]:
y = start + add_rows
print(y)

[[1. 0. 2.]
 [1. 0. 2.]
 [1. 0. 2.]
 [1. 0. 2.]]


In [89]:
# Create an ndarray which is 4 x 1 to broadcast
# across columns
add_cols = np.array([[0,1,2,3]])
add_cols = add_cols.T
# Note the T
# adding .T transposes the array (flips rows to cols)

print(add_cols)

[[0]
 [1]
 [2]
 [3]]


In [90]:
# Add the values of add_cols to Start
y = start + add_cols
print(y)

[[0. 0. 0.]
 [1. 1. 1.]
 [2. 2. 2.]
 [3. 3. 3.]]


In [91]:
# Scalars also work (array with 1 value)
add_scalar = np.array([1])
print(start + add_scalar)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


# Read or Write to Disk:

## Binary Format:

In [1]:
import numpy as np

x = np.array([23.23, 24.24])

np.save('an_array', x)

In [3]:
np.load('an_array.npy')

array([23.23, 24.24])

## Text Format:

In [4]:
np.savetxt('array.txt', X=x, delimiter=',')

In [5]:
np.loadtxt('array.txt', delimiter=',')

array([23.23, 24.24])

# Additional Common ndarray Operations

## Dot product on Matrices and Inner Product on Vectors:

In [6]:
# Determine the dot product of two matrices
x2d = np.array([[1,1],[1,1]])
y2d = np.array([[2,2],[2,2]])

print(x2d.dot(y2d))
print()
print(np.dot(x2d, y2d))

[[4 4]
 [4 4]]

[[4 4]
 [4 4]]


In [8]:
# determine the inner product of two vectors
a1d = np.array([9,9])
b1d = np.array([10,10])

print(a1d.dot(b1d))
print()
print(np.dot(a1d, b1d))

180

180


In [10]:
# dot produce on an array and vector
print(x2d.dot(a1d))
print()
print(np.dot(x2d, a1d))

[18 18]

[18 18]


## Sum:

In [13]:
# sum elements in the array
ex1 = np.array([[11,12],[21,22]])
print(np.sum(ex1))
# add all members

66


In [14]:
print(np.sum(ex1, axis=0))
# columnwise sum

[32 34]


In [15]:
print(np.sum(ex1, axis=1))
# row-wise sum

[23 43]


## Element-wise Functions:

For example, let's compare two arrays values to get the maximum of each

In [16]:
# random array
x = np.random.randn(8)
print(x)

[ 0.64945469 -1.03073752 -0.82403424  0.44532143 -0.51470461  0.83841522
 -0.98892712  1.10781844]


In [17]:
# another random array
y = np.random.randn(8)
y

array([ 0.88021624,  1.60156392,  0.93879189,  0.73323873, -0.53696313,
       -1.44024482, -1.11127715,  0.89989964])

In [18]:
# returns element-wise maximum between two arrays
np.maximum(x,y)

array([ 0.88021624,  1.60156392,  0.93879189,  0.73323873, -0.51470461,
        0.83841522, -0.98892712,  1.10781844])

Reshaping array:

In [19]:
# grab values from 0 through 19 in an array
arr = np.arange(20)
print(arr)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [20]:
# reshape to be a 4x5 matrix
arr.reshape(4,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

Transpose:

In [23]:
# transpose
ex1 = np.array([[11,12],[21,22]])
print(ex1)
print()
ex1.T

[[11 12]
 [21 22]]



array([[11, 21],
       [12, 22]])

Indexng using where():

In [24]:
x_1 = np.array([1,2,3,4,5])

y_1 = np.array([11,22,33,44,55])

filter = np.array([True, False, True, False, True])

In [26]:
out = np.where(filter, x_1, y_1)
print(out)
# True returns the element in array x_1
# False returns the element in array y_1

[ 1 22  3 44  5]


In [27]:
mat = np.random.rand(5,5)
mat

array([[0.52620617, 0.44870136, 0.49004603, 0.98532434, 0.69798437],
       [0.19550955, 0.17151165, 0.57192763, 0.82150448, 0.98155857],
       [0.42182315, 0.29602088, 0.60874818, 0.32142036, 0.61543726],
       [0.84075782, 0.7366635 , 0.17348172, 0.43538225, 0.30421343],
       [0.60174064, 0.42391088, 0.60719817, 0.45403508, 0.51931616]])

In [29]:
np.where(mat > 0.5, 1000, -1)
# Changes elements. In this case, if the element is greater than
# 0.5, make it 1000, else make it -1

array([[1000,   -1,   -1, 1000, 1000],
       [  -1,   -1, 1000, 1000, 1000],
       [  -1,   -1, 1000,   -1, 1000],
       [1000, 1000,   -1,   -1,   -1],
       [1000,   -1, 1000,   -1, 1000]])

"any" or "all" conditionals:

In [30]:
arr_bools = np.array([True, False, True, True, False])

In [33]:
arr_bools.any()
# Returns true if any elements are True

True

In [34]:
arr_bools.all()
# Returns true if all elements are True

False

Random Number Generation:

In [41]:
Y = np.random.normal(size = (1,5))
print(Y)

[[ 0.49366691 -0.06783093 -0.73371514  1.02321188 -0.4600587 ]]


In [44]:
Z = np.random.randint(low=2, high=50, size=4)
print(Z)
# prints random integers with conditions, in this case:
# The lowest a number can be is 2
# The Highest it can be is 50
# the size of the array is 4 (so it is a 1d array)
# can also make 2d arrays

[20 45 39 49]


In [45]:
np.random.permutation(Z)
# return a new ordering of elements in Z

array([39, 49, 45, 20])

In [46]:
np.random.uniform(size=4)
# uniform distribution

array([0.87849235, 0.02838803, 0.76842001, 0.92136497])

In [47]:
np.random.normal(size=4)
# normal distribution

array([ 0.33028651, -0.8078987 , -0.12719788,  0.70045856])

Merging data sets:

In [48]:
K = np.random.randint(low=2, high=50, size=(2,2))
print(K)

print()

M = np.random.randint(low=2, high=50, size=(2,2))
print(M)

[[29 31]
 [41  3]]

[[ 9  7]
 [ 3 12]]


In [49]:
np.vstack((K,M))

array([[29, 31],
       [41,  3],
       [ 9,  7],
       [ 3, 12]])

In [50]:
np.hstack((K,M))

array([[29, 31,  9,  7],
       [41,  3,  3, 12]])

In [51]:
np.concatenate([K, M], axis = 0)

array([[29, 31],
       [41,  3],
       [ 9,  7],
       [ 3, 12]])

In [52]:
np.concatenate([K, M.T], axis = 1)

array([[29, 31,  9,  3],
       [41,  3,  7, 12]])