# Numpy Tutorial

Numpy is a computational library for Python that is optimized for operations on multi-dimensional arrays. In this notebook we will use numpy to work with 1-d arrays (often called vectors) and 2-d arrays (often called matrices).

For a the full user guide and reference for numpy see: http://docs.scipy.org/doc/numpy/

In [2]:
import numpy as np # importing this way allows us to refer to numpy as np

# Creating Numpy Arrays

New arrays can be made in several ways. We can take an existing list and convert it to a numpy array:

In [3]:
mylist = [1., 2., 3., 4.]
mynparray = np.array(mylist)
mynparray

array([1., 2., 3., 4.])

You can initialize an array (of any dimension) of all ones or all zeroes with the ones() and zeros() functions:

In [4]:
one_vector = np.ones(4)
print(one_vector)# using print removes the array() portion

[1. 1. 1. 1.]


In [5]:
one2Darray = np.ones((2, 4)) # an 2D array with 2 "rows" and 4 "columns"

print(one2Darray)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [6]:
zero_vector = np.zeros(4)
print(zero_vector)

[0. 0. 0. 0.]


You can also initialize an empty array which will be filled with values. This is the fastest way to initialize a fixed-size numpy array however you must ensure that you replace all of the values.

In [7]:
empty_vector = np.empty(5)
print (empty_vector)

[2.30540121e-316 6.91664528e-310 0.00000000e+000 0.00000000e+000
 4.67616093e+180]


# Accessing array elements

Accessing an array is straight forward. For vectors you access the index by referring to it inside square brackets. Recall that indices in Python start with 0.

In [8]:
mynparray[2]

3.0

2D arrays are accessed similarly by referring to the row and column index separated by a comma:

In [9]:
my_matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(my_matrix)


[[1 2 3]
 [4 5 6]]


In [10]:
print(my_matrix[1, 2])

6


Sequences of indices can be accessed using ':' for example

In [11]:
print(my_matrix[0:2, 0]) # recall 0:2 = [0, 1]

[1 4]


In [12]:
print(my_matrix[0, 0:3])

[1 2 3]


You can also pass a list of indices. 

In [13]:
fib_indices = np.array([1, 1, 2, 3])
random_vector = np.random.random(10) # 10 random numbers between 0 and 1
print(random_vector)

[0.09312789 0.37581263 0.9450987  0.82271885 0.43878259 0.17657312
 0.24116478 0.88085685 0.2267618  0.05566104]


In [14]:
print(random_vector[fib_indices])

[0.37581263 0.37581263 0.9450987  0.82271885]


You can also use true/false values to select values

In [15]:
my_vector = np.array([1, 2, 3, 4])
select_index = np.array([True, False, True, False])
print(my_vector[select_index])

[1 3]


For 2D arrays you can select specific columns and specific rows. Passing ':' selects all rows/columns

In [16]:
select_cols = np.array([True, False, True]) # 1st and 3rd column
select_rows = np.array([False, True]) # 2nd row

In [17]:
print(my_matrix[select_rows, :]) # just 2nd row but all columns

[[4 5 6]]


In [18]:
print(my_matrix[:, select_cols]) # all rows and just the 1st and 3rd column

[[1 3]
 [4 6]]


# Operations on Arrays

You can use the operations '\*', '\*\*', '\\', '+' and '-' on numpy arrays and they operate elementwise.

In [19]:
my_array = np.array([1., 2., 3., 4.])
print(my_array*my_array)


[ 1.  4.  9. 16.]


In [20]:
print(my_array**2)

[ 1.  4.  9. 16.]


In [21]:
print my_array - np.ones(4)

SyntaxError: invalid syntax (<ipython-input-21-1a231d16e691>, line 1)

In [None]:
print my_array + np.ones(4)

In [None]:
print my_array / 3

In [None]:
print my_array / np.array([2., 3., 4., 5.]) # = [1.0/2.0, 2.0/3.0, 3.0/4.0, 4.0/5.0]

You can compute the sum with np.sum() and the average with np.average()

In [None]:
print np.sum(my_array)

In [None]:
print np.average(my_array)

In [None]:
print np.sum(my_array)/len(my_array)

# The dot product

An important mathematical operation in linear algebra is the dot product. 

When we compute the dot product between two vectors we are simply multiplying them elementwise and adding them up. In numpy you can do this with np.dot()

In [23]:
array1 = np.array([1., 2., 3., 4.])
array2 = np.array([2., 3., 4., 5.])
print(np.dot(array1 * array2))

[ 2.  6. 12. 20.]


In [28]:
print(np.sum(array1*array2))

40.0


Recall that the Euclidean length (or magnitude) of a vector is the squareroot of the sum of the squares of the components. This is just the squareroot of the dot product of the vector with itself:

In [29]:
array1_mag = np.sqrt(np.dot(array1, array1))
array1_mag

5.477225575051661

In [30]:
 np.sqrt(np.sum(array1*array1))

5.477225575051661

We can also use the dot product when we have a 2D array (or matrix). When you have an vector with the same number of elements as the matrix (2D array) has columns you can right-multiply the matrix by the vector to get another vector with the same number of elements as the matrix has rows. For example this is how you compute the predicted values given a matrix of features and an array of weights.

In [32]:
my_features = np.array([[1., 2.], [3., 4.], [5., 6.], [7., 8.]])
my_features

array([[1., 2.],
       [3., 4.],
       [5., 6.],
       [7., 8.]])

In [33]:
my_weights = np.array([0.4, 0.5])
my_weights

array([0.4, 0.5])

In [34]:
my_predictions = np.dot(my_features, my_weights) # note that the weights are on the right
my_predictions # which has 4 elements since my_features has 4 rows

array([1.4, 3.2, 5. , 6.8])

Similarly if you have a vector with the same number of elements as the matrix has *rows* you can left multiply them.

In [39]:
my_matrix = my_features
my_array = np.array([0.3, 0.4, 0.5, 0.6])

In [40]:
np.dot(my_array, my_matrix) # which has 2 elements because my_matrix has 2 columns

array([ 8.2, 10. ])

# Multiplying Matrices

If we have two 2D arrays (matrices) matrix_1 and matrix_2 where the number of columns of matrix_1 is the same as the number of rows of matrix_2 then we can use np.dot() to perform matrix multiplication.

In [41]:
matrix_1 = np.array([[1., 2., 3.],[4., 5., 6.]])
matrix_1

array([[1., 2., 3.],
       [4., 5., 6.]])

In [53]:
matrix_2 = np.array([[1., 2.,2.], [3., 4.,5.],[2.,3.,4.]])
matrix_2

array([[1., 2., 2.],
       [3., 4., 5.],
       [2., 3., 4.]])

In [54]:
np.dot(matrix_1,matrix_2)

array([[13., 19., 24.],
       [31., 46., 57.]])

In [51]:
import turicreate as tc
data=tc.SFrame('home_data.sframe')
print(data['price','sqft_living'].head())
data['price','sqft_living'].to_numpy()

+-----------+-------------+
|   price   | sqft_living |
+-----------+-------------+
|  221900.0 |    1180.0   |
|  538000.0 |    2570.0   |
|  180000.0 |    770.0    |
|  604000.0 |    1960.0   |
|  510000.0 |    1680.0   |
| 1225000.0 |    5420.0   |
|  257500.0 |    1715.0   |
|  291850.0 |    1060.0   |
|  229500.0 |    1780.0   |
|  323000.0 |    1890.0   |
+-----------+-------------+
[10 rows x 2 columns]



array([[221900.,   1180.],
       [538000.,   2570.],
       [180000.,    770.],
       ...,
       [402101.,   1020.],
       [400000.,   1600.],
       [325000.,   1020.]])