# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Arrays and Array Operations

Week 1, Lecture 4.2

### LEARNING OBJECTIVES
*After this lesson, you will be able to:*
- Explain why using arrays can be preferable to Python container types
- Explain the difference between a vector, a matrix and an array
- Find the dimensionality of an array, reshape it, and transform it
- Find the data type in numpy and numpy's data type behaviors
- Add and subtract arrays
- Explain how the dot product is obtained
- Use numpy to obtain the dot product

## Recap

This week we've discussed some of the basic Python data types including "container types".


- **What are some of those types we discussed?**


- **What were some of the benefits of using container types?**

In [1]:
my_list = ['a', 'b', 'c', 1, 2, 3, ('monkey', 'cat')]

for item in my_list:
    print item

a
b
c
1
2
3
('monkey', 'cat')


## The Downside of data type flexibility

While being able to hold any data type is useful, it also means Python **must evaluate the type of each item as it is processed**.


## Enter the numpy array


Numpy arrays are of one type and one type only. There is no need to evaluate each element at runtime.

[From the numpy documentation](http://docs.scipy.org/doc/numpy/user/whatisnumpy.html):
    
>At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:
- NumPy arrays have a **fixed size at creation**, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.
- The elements in a NumPy array are **all required to be of the same data type**, and thus will be the same size in memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of different sized elements.

In [15]:
same_type = np.array(mixed_type)
print same_type

[ 1.  2.  3.  4.  5.  6.  7.  8.  9.]


## Exercise:

1. Create a Python list containing 10 or more elements composed of both ints and floats
2. Save that list using the variable name 'mixed_type'
3. Now create a numpy array using ```np.array()``` with that same list and save it as 'same_type'
4. Run the following cells to compare the time differences

In [12]:
import numpy as np

## enter work below:

In [27]:
mixed_type = [1,2,3,4.,5.,6.,7.,8.,9.]
same_type = np.array(mixed_type)
same_type

array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [30]:
%%timeit

mixed_type * 10000000

1 loop, best of 3: 666 ms per loop


In [31]:
%%timeit

same_type * 10000000

The slowest run took 16.36 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.17 µs per loop


## Time difference


700 milliseconds is equal to 700,000 microseconds

That means it took nearly **700,000 times longer** for the mixed type to run than the same type

## Terminology & Dimensionality

![](http://i.imgur.com/3gdNc12.jpg)

## Create an array

In [32]:
my_vector = np.array([0, 1, 2, 3, 4, 5])
my_matrix = np.array([[0, 1, 2, 3, 4, 5],[5, 4, 3, 2, 1, 0]])

In [33]:
my_vector

array([0, 1, 2, 3, 4, 5])

In [34]:
my_matrix

array([[0, 1, 2, 3, 4, 5],
       [5, 4, 3, 2, 1, 0]])

## Dimensions and Shapes

### Vector

In [43]:
my_vector

array([0, 1, 2, 3, 4, 5])

In [44]:
my_vector.ndim

1

In [45]:
my_vector.shape

(6,)

In [47]:
reshaped_array = my_vector.reshape(2,3)
reshaped_array

array([[0, 1, 2],
       [3, 4, 5]])

In [48]:
reshaped_array.shape

(2, 3)

### Matrix

In [49]:
my_matrix

array([[0, 1, 2, 3, 4, 5],
       [5, 4, 3, 2, 1, 0]])

In [50]:
my_matrix.ndim

2

In [51]:
my_matrix.shape

(2, 6)

In [52]:
new_matrix = my_matrix.reshape(1,12)

In [53]:
new_matrix

array([[0, 1, 2, 3, 4, 5, 5, 4, 3, 2, 1, 0]])

## Reshaping

In [70]:
my_vector

array([0, 1, 2, 3, 4, 5])

In [71]:
my_vector.shape

(6,)

In [72]:
reshaped_array = my_vector.reshape(2,3)
reshaped_array

array([[0, 1, 2],
       [3, 4, 5]])

In [78]:
reshaped_2 = reshaped_array.reshape(3,2)
reshaped_2

array([[0, 1],
       [2, 3],
       [4, 5]])

## Check: 

- What dimensionality will our reshaped array have?
- What do we call an array with this dimensionality?
- What will the shape be?

2

## Reshaping the matrix

In [60]:
my_matrix

array([[0, 1, 2, 3, 4, 5],
       [5, 4, 3, 2, 1, 0]])

In [61]:
my_matrix.reshape(1, 12)

array([[0, 1, 2, 3, 4, 5, 5, 4, 3, 2, 1, 0]])

## And another way...

In [62]:
my_matrix

array([[0, 1, 2, 3, 4, 5],
       [5, 4, 3, 2, 1, 0]])

In [63]:
my_matrix.reshape(12, 1)

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [5],
       [4],
       [3],
       [2],
       [1],
       [0]])

## And another...

In [64]:
my_matrix

array([[0, 1, 2, 3, 4, 5],
       [5, 4, 3, 2, 1, 0]])

In [65]:
my_matrix.reshape(3, 4)

array([[0, 1, 2, 3],
       [4, 5, 5, 4],
       [3, 2, 1, 0]])

## Transposition of arrays

In [79]:
my_matrix

array([[0, 1, 2, 3, 4, 5],
       [5, 4, 3, 2, 1, 0]])

In [80]:
my_matrix.T

array([[0, 5],
       [1, 4],
       [2, 3],
       [3, 2],
       [4, 1],
       [5, 0]])

In [81]:
my_matrix

array([[0, 1, 2, 3, 4, 5],
       [5, 4, 3, 2, 1, 0]])

## Another transposition

In [82]:
my_matrix.reshape(3,4)

array([[0, 1, 2, 3],
       [4, 5, 5, 4],
       [3, 2, 1, 0]])

In [91]:
my_matrix

array([[0, 1, 2, 3, 4, 5],
       [5, 4, 3, 2, 1, 0]])

In [92]:
my_matrix.reshape(3,4).T

array([[0, 4, 3],
       [1, 5, 2],
       [2, 5, 1],
       [3, 4, 0]])

In [93]:
my_matrix

array([[0, 1, 2, 3, 4, 5],
       [5, 4, 3, 2, 1, 0]])

In [95]:
new_matrix = my_matrix.reshape(3,4)
new_matrix

array([[0, 1, 2, 3],
       [4, 5, 5, 4],
       [3, 2, 1, 0]])

In [99]:
new_matrix.T

array([[0, 4, 3],
       [1, 5, 2],
       [2, 5, 1],
       [3, 4, 0]])

In [100]:
new_matrix.T.T

array([[0, 1, 2, 3],
       [4, 5, 5, 4],
       [3, 2, 1, 0]])

In [101]:
new_matrix.T.T.T

array([[0, 4, 3],
       [1, 5, 2],
       [2, 5, 1],
       [3, 4, 0]])

## Exercise

- Create a 16 element int vector in numpy
- Use .ndim to check the dimensionality
- Use .shape to check the shape
- Use .reshape to change the number of rows and columns
- Use .T to transpose the data 
- Notice if .reshape and .T happen in place or if they are just views (i.e., must you save to a new variable to retain what you see in the notebook output)

In [114]:
int_vector = np.array([1,2,3,4,5,6,7,8,9,10,11.,12.,13,14,15,16])
int_vector

array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.,  13.,  14.,  15.,  16.])

In [115]:
int_vector.ndim

1

In [116]:
int_vector.shape

(16,)

In [118]:
new_vector = int_vector.reshape(8,2)
new_vector

array([[  1.,   2.],
       [  3.,   4.],
       [  5.,   6.],
       [  7.,   8.],
       [  9.,  10.],
       [ 11.,  12.],
       [ 13.,  14.],
       [ 15.,  16.]])

In [120]:
new_vector.T

array([[  1.,   3.,   5.,   7.,   9.,  11.,  13.,  15.],
       [  2.,   4.,   6.,   8.,  10.,  12.,  14.,  16.]])

## Data types of an array

### What if we add a float?

In [130]:
with_float = np.array([1, 2, 3, 4, 5.0])

In [131]:
type(with_float)

numpy.ndarray

In [132]:
with_float.dtype

dtype('float64')

### What if we add a string?

In [134]:
with_str = np.array([1, 2, 3, 4, '5'])

In [135]:
with_str.dtype

dtype('S21')

## What if we go nuts?

In [125]:
nuts = np.array(['1', 2, 3.0, 'four', 10/2])

In [126]:
nuts.dtype

dtype('S4')

## Check: 

- Why is it important to check the data type?
- Under what conditions might this be a problem?

## Let's do some math


We're now going to look at how to add and subtract arrays, as well as how to find the dot product of arrays.

### Adding arrays

Arrays are added ith element to ith element

In [136]:
a = np.array([0, 1, 2, 3, 4, 5])
b = np.array([6, 5, 4, 3, 2, 1])

In [137]:
np.add(a, b)

array([6, 6, 6, 6, 6, 6])

In [138]:
c = np.add(a,b)
c

array([6, 6, 6, 6, 6, 6])

### Subtract arrays

Arrays are added ith element to ith element

In [139]:
a = np.array([0, 1, 2, 3, 4, 5])
b = np.array([6, 5, 4, 3, 2, 1])

In [140]:
np.subtract(a, b)

array([-6, -4, -2,  0,  2,  4])

## Check:

- Will np.subtract(a, b) be the same as np.subtract(b, a)?

In [141]:
a = np.array([0, 1, 2, 3, 4, 5])
b = np.array([6, 5, 4, 3, 2, 1])

In [142]:
np.subtract(a, b)

array([-6, -4, -2,  0,  2,  4])

In [143]:
np.subtract(b, a)

array([ 6,  4,  2,  0, -2, -4])

## Multiply arrays (dot product)

The dot product is also known as the scalar product. This is an appropriate name since a scalar is a single value rather that a vector.

In [147]:
d = np.array([1, 2, 3])
e = np.array([3, 2, 1])

In [148]:
np.dot(d, e)

10

In [149]:
np.dot(e, d)

10

### Check:

- What about different sized vectors - will those work?

In [150]:
c = np.array([1, 2, 3, 4, 5, 9])
d = np.array([5, 4, 3, 2, 1])

### Add

In [151]:
np.add(c, d)

ValueError: operands could not be broadcast together with shapes (6,) (5,) 

### Subtract

In [152]:
np.subtract(c, d)

ValueError: operands could not be broadcast together with shapes (6,) (5,) 

### Dot product

In [153]:
np.dot(c, d)

ValueError: shapes (6,) and (5,) not aligned: 6 (dim 0) != 5 (dim 0)

## Exercise

** Using the following list, on your desk, calculate the dot ** <br>
** product by hand. Then do the same thing without numpy using iteration. ** <br>
** Then finally, use numpy. Check that all are equal. ** <br><br>
** Bonus: Do a single line list comprehension to calculate it **<br><br>
list_one = [1,3,5,7,9]<br>
list_two = [2,4,6,8,10]

## Solutions

In [173]:
list_one = [1,3,5,7,9]
list_two = [2,4,6,8,10]

In [None]:
list_products = [2, 12, 30, 56, 90]
sum_products = [190]

In [157]:
array_one = np.array(list_one)
array_two = np.array(list_two)
np.dot(array_one, array_two)

190

In [196]:
list_zip = zip(list_one,list_two)
new_list = []
total = 0
for a, b in list_zip:
    new_list.append(a*b)
for x in new_list:
    total = total + x
print new_list
print total

[2, 12, 30, 56, 90]
190


In [211]:
def dot_prod(a,b):
    total = 0
    for i in range(len(a)):
        mini_prod = a[i] * b[i]
        total += mini_prod
    return total


In [216]:
sum([x*y for x,y in zip(list_one, list_two)])

190

## Conclusion

In this lecture, we've learned:

- What arrays are and why they are beneficial
- How to create them, check their type, dimensionality, and shape
- How to reshape them
- How to add, subtract, and multiply them (dot product)

There is a lab that follows in the repo, to let you explore more on this and how to work with numpy.