<p style="text-align:center">
PSY 394U <b>Python Coding for Psychological Sciences</b>, Fall 2017

<img src="https://www.python.org/static/community_logos/python-logo-master-v3-TM.png" alt="Python logo" width="200">
</p>

<h1 style="text-align:center"> NumPy & arrays </h1>

<h4 style="text-align:center"> November 2 - 7, 2017 </h4>
<hr style="height:5px;border:none" />
<p>

# 1. Creating an array
<hr style="height:1px;border:none" />

An array is a data type available in **`NumPy`**. It is similar to a list, but much more
versatile than a list, and especially useful for scientific data.

In [2]:
import numpy as np
a = np.array([1, 2, 3, 4, 5])
a

array([1, 2, 3, 4, 5])

Note that when we import **`numpy`**, we assign a name **`np`**, so that we don't have to
type `numpy` every time we call a function in the `numpy` module. An array can be
converted to a list, or vice versa.

In [3]:
list(a)

[1, 2, 3, 4, 5]

In [4]:
b = [10, 9, 8, 7, 6]
np.array(b)

array([10,  9,  8,  7,  6])

An array can be two dimensional. For example,

In [5]:
c = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
c

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

You can examine the shape of an array using the **`ndim`** method (to examine
dimension), the **`shape`** method (to examine the size in each dimension), and the
**`size`** method (to examine the total number of elements).

In [6]:
c.ndim

2

In [7]:
c.shape

(4, 3)

In [8]:
c.size

12

This tells us that the array `c` is two-dimensional, with 4 rows and 3 columns, and
has 12 elements.

In practice, you usually do not enter elements one by one. Here are some useful
functions. First, the **`arange()`** function works just like the `range()` function.
The difference is that it can produce a sequence of non-integers.

In [9]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [10]:
np.arange(0,1.875,0.125)

array([ 0.   ,  0.125,  0.25 ,  0.375,  0.5  ,  0.625,  0.75 ,  0.875,
        1.   ,  1.125,  1.25 ,  1.375,  1.5  ,  1.625,  1.75 ])

Another useful function is the **`linspace()`** function. This function splits an
interval into segments of equal widths. For example,

In [11]:
np.linspace(0,1,5)

array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])

In [12]:
np.linspace(0,1,6)

array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])

In [13]:
np.linspace(0,1,7)

array([ 0.        ,  0.16666667,  0.33333333,  0.5       ,  0.66666667,
        0.83333333,  1.        ])

You can also create a 2D array of ones or zeros with the **`ones()`** and **`zeros()`**
functions, respectively.

In [14]:
np.ones((3,3))

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [15]:
np.ones([4,2])

array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

In [16]:
np.zeros((4,5))

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

Note that the input for these functions have to be a *tuple* (e.g., (3, 3)) or a *list*
(e.g., [3, 3]).

You can also create an identity matrix ***`I`*** using the **`eye()`** function. 

In [3]:
np.eye(4)

array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])

And finally, some random numbers.

In [4]:
np.random.rand(3)

array([ 0.63117383,  0.39305781,  0.86594975])

In [5]:
np.random.rand(3,3)

array([[ 0.83886992,  0.23149087,  0.60715952],
       [ 0.10434583,  0.7738115 ,  0.92432445],
       [ 0.75259197,  0.53184338,  0.62917152]])

Unlike a list, an array can only hold elements of the same data type (e.g., integers, floats, strings, etc.).

In [6]:
np.array([10,20])

array([10, 20])

In [7]:
np.array([10,20,0.5])

array([ 10. ,  20. ,   0.5])

In [8]:
np.array([10,20,0.5,'Cat'])

array(['10', '20', '0.5', 'Cat'], 
      dtype='<U32')

# 2. Shape manipulation
<hr style="height:1px;border:none" />

You can change the shape of an array with the **`reshape()`** method. An array of 15 numbers are reshaped into different 2D arrays.

In [9]:
a = np.arange(15)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [10]:
b = a.reshape(3,5)
b

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [11]:
c = a.reshape(5,3)
c

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In the resulting 2D array, elements are filled row by row. You can use the **`flatten()`** or **`ravel()`** method to convert a 2D array to a 1D array.

In [12]:
b.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [13]:
c.ravel()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

# 3. Stacking arrays together
<hr style="height:1px;border:none" />

You stack combine arrays together and create a larger array. You can concatenate arrays vertically with the **`vstack()`** function.

In [14]:
a = np.arange(10).reshape(2,5)
a

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [15]:
b = np.arange(15).reshape(3,5)
b

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [16]:
np.vstack((a,b))

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

Or you can stack horizontally with the **`hstack()`** function.

In [17]:
c = np.arange(8).reshape(2,4)
c

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [18]:
np.hstack((a,c))

array([[0, 1, 2, 3, 4, 0, 1, 2, 3],
       [5, 6, 7, 8, 9, 4, 5, 6, 7]])

Needless to say, the number of columns have to be the same for the `vstack()` function, and the number of rows have to be the same for the `hstack()` function. Notice that both `vstack()` and `hstack()` function take a tuple or a list as the input.

Two 1D arrays can be stacked by `vstack()` and `hstack()` as well.

In [19]:
x = np.arange(5)
x

array([0, 1, 2, 3, 4])

In [20]:
y = np.random.rand(5)
y

array([ 0.11656209,  0.05408744,  0.00639533,  0.41080986,  0.02826659])

In [21]:
np.vstack((x,y))

array([[ 0.        ,  1.        ,  2.        ,  3.        ,  4.        ],
       [ 0.11656209,  0.05408744,  0.00639533,  0.41080986,  0.02826659]])

In [22]:
np.hstack((x,y))

array([ 0.        ,  1.        ,  2.        ,  3.        ,  4.        ,
        0.11656209,  0.05408744,  0.00639533,  0.41080986,  0.02826659])

The `vstack()` function results in a 2D array, whereas the `hstack()` function concatenates the arrays into a longer 1D array. You can transpose the stacked array by the **`T`** method (or the transpose method).

In [23]:
z = np.vstack((x,y))
z

array([[ 0.        ,  1.        ,  2.        ,  3.        ,  4.        ],
       [ 0.11656209,  0.05408744,  0.00639533,  0.41080986,  0.02826659]])

In [24]:
z.T

array([[ 0.        ,  0.11656209],
       [ 1.        ,  0.05408744],
       [ 2.        ,  0.00639533],
       [ 3.        ,  0.41080986],
       [ 4.        ,  0.02826659]])

You can transpose any 2D array with the `T` method. However, you cannot transpose a 1D array.

In [25]:
x = np.arange(5)
x

array([0, 1, 2, 3, 4])

In [26]:
x.T

array([0, 1, 2, 3, 4])

To transpose a 1D array, you have to convert it to a 2D array. 

In [27]:
np.array([x])

array([[0, 1, 2, 3, 4]])

Notice that it is enclosed in double square brackets `[[ ]]`. This means this array is a 2D array, thus it can be transposed.

In [28]:
np.array([x]).shape

(1, 5)

In [29]:
np.array([x]).T

array([[0],
       [1],
       [2],
       [3],
       [4]])

### Exercise
1. You have the following arrays:
```python
u = [1.17, 1.82, 5.79, 6.29, 8.56]
v = [0.86, 3.14, 3.45, 5.88, 8.52]
w = [-1.58, 1.47, 2.77, 5.99, 7.80]
x = [0.73, 0.43, 3.16, 5.96, 7.45]
```
  1. Stack `u` and `w` vertically, call it `y`.
  2. Stack `v` and `x` vertically, call it `z`.
  3. Stack `y` and `z` horizontally.
  4. Stack `u`, `v`, `w`, and `x` vertically and transpose the resulting array.
2. **Stacking arrays, 1**. Create a 2D array of size $4\times3$ with ones as its elements. Create an identity matrix of size 3x3. Stack the two arrays vertically.
3. **Stacking arrays, 2**. You have a 1D array
```python
v = np.array([-33, 44, 35])
```
You want to stack this vector to a 3x3 identity matrix so that the resulting array is
```python
array([[  1.,   0.,   0., -33.],
       [  0.,   1.,   0.,  44.],
       [  0.,   0.,   1.,  35.]])
```
How can you do this?

# 4. Basic operations
<hr style="height:1px;border:none" />

Unlike lists, arrays can be used in mathematical operations. For example,

In [30]:
a = np.ones((3,3))
a

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [31]:
b = np.ones((3,3))
b = np.arange(9).reshape(3,3)
b

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [32]:
a + b

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.]])

In [33]:
b - a

array([[-1.,  0.,  1.],
       [ 2.,  3.,  4.],
       [ 5.,  6.,  7.]])

An operation involving arrays is performed on element-by-element basis. You can also perform an operation between an array and a scalar (i.e., a single number). In such a case, each element in the array is used in an operation with a scalar.

In [34]:
c = np.arange(6)
c

array([0, 1, 2, 3, 4, 5])

In [35]:
c+10

array([10, 11, 12, 13, 14, 15])

In [36]:
c*10

array([ 0, 10, 20, 30, 40, 50])

In [37]:
c**2

array([ 0,  1,  4,  9, 16, 25])

In [38]:
c/5

array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])

There are also some useful methods, such as **`sum`**, **`min`**, and **`max`** (sum, minimum, and maximum, respectively).

In [39]:
d = np.random.rand(5)
d

array([ 0.45101362,  0.34768716,  0.38284549,  0.44798698,  0.84837493])

In [40]:
d.sum()

2.4779081840114441

In [41]:
d.min()

0.3476871581071288

In [42]:
d.max()

0.84837492889347499

For 2D or higher dimension arrays, these methods return a single number for the entire array. In other words, the **`min`** method returns the smallest element of the entire array. You can also calculate the **`sum`**, **`min`**, and **`max`** for each row or column by specifying the axis parameter. 

In [43]:
a = np.random.rand(4,3)
a

array([[ 0.14884635,  0.02183561,  0.25646553],
       [ 0.23785761,  0.90276123,  0.26046412],
       [ 0.30789054,  0.91337381,  0.73480458],
       [ 0.94598519,  0.96340125,  0.52560488]])

In [44]:
a.sum(axis=0)

array([ 1.64057969,  2.8013719 ,  1.7773391 ])

In [45]:
a.sum(axis=1)

array([ 0.4271475 ,  1.40108296,  1.95606892,  2.43499131])

In case of a 2D array, **`axis=0`** corresponds to rows, and **`axis=1`** corresponds to columns. 

In [46]:
a.max(axis=1)

array([ 0.25646553,  0.90276123,  0.91337381,  0.96340125])

In [47]:
a.min(axis=0)

array([ 0.14884635,  0.02183561,  0.25646553])

Finally, there are some basic math functions in NumPy, such as **`exp()`**, **`sin()`**, and **`sqrt()`**. You can use these functions on an array. Each element is used in the calculation of a function. 

In [50]:
angle = np.linspace(0,1,5) * np.math.pi
angle

array([ 0.        ,  0.78539816,  1.57079633,  2.35619449,  3.14159265])

In [51]:
np.sin(angle)

array([  0.00000000e+00,   7.07106781e-01,   1.00000000e+00,
         7.07106781e-01,   1.22464680e-16])

In [52]:
b = np.arange(6)
b

array([0, 1, 2, 3, 4, 5])

In [53]:
np.sqrt(b)

array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ,
        2.23606798])

### Exercise
1. You have an array
```python
m = np.array([[8, 3, 4], [1, 5, 9], [6, 7, 2]])
```
Calculate the `sum`, `min`, and `max` across the rows and across the columns.
2. **Temperature conversion**. You have an array of temperatures in Fahrenheit
```python
tempF = np.arange(0,101,10)
```
Convert the temperatures to Celsius by the formula
```python
C = (F-32) * 5/9
```
3. **Random sample**. The function **`np.random.randn()`** produces random numbers following a Gaussian distribution (i.e., normal distribution). Create a 2D array of $1000 \times 20$ of Gaussian random numbers. Then calculate the mean and standard deviation (with **`mean()`** and **`std()`** methods, respectively) across rows. In theory, the mean should be 0 and the standard deviation should be 1. 
> ***Hint***: *`np.random.randn(X,Y)` produces a 2D array of random numbers with X rows and Y columns.*
4.	**Random samples, combined**. Generate another 2D array of Gaussian random numbers with size $1000 \times 20$. Add the newly created array to the random number array generated from Question 3. Calculate the mean and standard deviation of the resulting matrix. In theory, the mean should be 0 and the standard deviation should be 1.41 (or square root of 2). 

# 5. Linear algebra – matrix-like operations
<hr style="height:1px;border:none" />

In the multiplication operation described above is element-wise multiplication. If you remember from your linear algebra class, you can multiple two matrices by following a matrix multiplication rule. You can perform such matrix multiplication in NumPy as well. 

In [55]:
V = np.array([[-0.61997528, -0.39101962, -0.49047357, -0.47134912]]).T
V

array([[-0.61997528],
       [-0.39101962],
       [-0.49047357],
       [-0.47134912]])

In [56]:
AV = np.dot(A,V)
AV

array([[-13.0902939 ],
       [ -8.25607392],
       [-10.35596648],
       [ -9.9521686 ]])

The vector `V` is the *eigenvector* of the matrix `A`. The product `AV` is actually the vector `V` times its eigenvalue, `21.114`. 

In [57]:
AV /V

array([[ 21.1142191 ],
       [ 21.11421908],
       [ 21.11421922],
       [ 21.11421912]])

For a square matrix, its inverse can be calculated by the **`np.linalg.inv()`** function. 

In [58]:
B = np.random.randint(0,10,9).reshape(3,3)
B

array([[1, 4, 3],
       [4, 6, 4],
       [1, 6, 3]])

In [59]:
invB = np.linalg.inv(B)
invB

array([[-0.375,  0.375, -0.125],
       [-0.5  ,  0.   ,  0.5  ],
       [ 1.125, -0.125, -0.625]])

In [61]:
np.dot(B, invB)

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

# 6. Copy method
<hr style="height:1px;border:none" />

Remember that when you create a new list based on another list, both lists are referring to the same data? This may be true for arrays under some circumstances. For example,

In [63]:
a = np.arange(15)
b = a
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [64]:
b

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [65]:
a.shape = (3,5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [66]:
b

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

If you modify the shape of the array `a`, then the shape of array `b` is also altered. To overcome this problem, you can create a copy of the original array by using the **`copy`** method.

In [67]:
c = a.copy()
c

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [68]:
a.shape = (5,3)
a

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [69]:
c

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])