<p style="text-align:center">
PSY 341K <b>Python Coding for Psychological Sciences</b>, Fall 2018

<img src="https://www.python.org/static/community_logos/python-logo-master-v3-TM.png" alt="Python logo" width="200">
</p>

<h1 style="text-align:center"> NumPy </h1>

<h4 style="text-align:center"> November 1 - 6, 2018 </h4>
<hr style="height:5px;border:none" />
<p>

# 1. Creating an array
<hr style="height:1px;border:none" />

An array is a data type available in **`NumPy`**. It is similar to a list, but much more
versatile than a list, and especially useful for scientific data.

In [1]:
import numpy as np
a = np.array([1, 2, 3, 4, 5])
a

array([1, 2, 3, 4, 5])

Note that when we import **`numpy`**, we assign a name **`np`**, so that we don't have to
type `numpy` every time we call a function in the `numpy` module. An array can be
converted to a list, or vice versa.

In [3]:
list(a)

[1, 2, 3, 4, 5]

In [4]:
b = [10, 9, 8, 7, 6]
np.array(b)

array([10,  9,  8,  7,  6])

An array can be two dimensional. For example,

In [5]:
c = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
c

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

You can examine the shape of an array using the **`ndim`** attribute (to examine
dimension), the **`shape`** attribute (to examine the size in each dimension), and the
**`size`** attribute (to examine the total number of elements).

In [6]:
c.ndim

2

In [7]:
c.shape

(4, 3)

In [8]:
c.size

12

This tells us that the array `c` is two-dimensional, with 4 rows and 3 columns, and
has 12 elements.

In practice, you usually do not enter elements one by one. Here are some useful
functions. First, the **`arange()`** function works just like the `range()` function.
The difference is that it can produce a sequence of non-integers.

In [9]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [10]:
np.arange(0,1.875,0.125)

array([ 0.   ,  0.125,  0.25 ,  0.375,  0.5  ,  0.625,  0.75 ,  0.875,
        1.   ,  1.125,  1.25 ,  1.375,  1.5  ,  1.625,  1.75 ])

Another useful function is the **`linspace()`** function. This function splits an
interval into segments of equal widths. For example,

In [11]:
np.linspace(0,1,5)

array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])

In [12]:
np.linspace(0,1,6)

array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])

In [13]:
np.linspace(0,1,7)

array([ 0.        ,  0.16666667,  0.33333333,  0.5       ,  0.66666667,
        0.83333333,  1.        ])

You can also create a 2D array of ones or zeros with the **`ones()`** and **`zeros()`**
functions, respectively.

In [14]:
np.ones((3,3))

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [15]:
np.ones([4,2])

array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

In [16]:
np.zeros((4,5))

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

Note that the input for these functions have to be a *tuple* (e.g., (3, 3)) or a *list*
(e.g., [3, 3]).

You can also create an identity matrix ***`I`*** using the **`eye()`** function. 

In [3]:
np.eye(4)

array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])

And finally, some random numbers.

In [4]:
np.random.rand(3)

array([ 0.63117383,  0.39305781,  0.86594975])

In [5]:
np.random.rand(3,3)

array([[ 0.83886992,  0.23149087,  0.60715952],
       [ 0.10434583,  0.7738115 ,  0.92432445],
       [ 0.75259197,  0.53184338,  0.62917152]])

Unlike a list, an array can only hold elements of the same data type (e.g., integers, floats, strings, etc.).

In [6]:
np.array([10,20])

array([10, 20])

In [7]:
np.array([10,20,0.5])

array([ 10. ,  20. ,   0.5])

In [8]:
np.array([10,20,0.5,'Cat'])

array(['10', '20', '0.5', 'Cat'], 
      dtype='<U32')

# 2. Shape manipulation
<hr style="height:1px;border:none" />

You can change the shape of an array with the **`reshape()`** method. An array of 15 numbers are reshaped into different 2D arrays.

In [9]:
a = np.arange(15)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [10]:
b = a.reshape(3,5)
b

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [11]:
c = a.reshape(5,3)
c

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In the resulting 2D array, elements are filled row by row. You can use the **`flatten()`** or **`ravel()`** method to convert a 2D array to a 1D array.

In [12]:
b.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [13]:
c.ravel()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

# 3. Stacking arrays together
<hr style="height:1px;border:none" />

You stack combine arrays together and create a larger array. You can concatenate arrays vertically with the **`vstack()`** function.

In [14]:
a = np.arange(10).reshape(2,5)
a

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [15]:
b = np.arange(15).reshape(3,5)
b

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [16]:
np.vstack((a,b))

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

Or you can stack horizontally with the **`hstack()`** function.

In [17]:
c = np.arange(8).reshape(2,4)
c

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [18]:
np.hstack((a,c))

array([[0, 1, 2, 3, 4, 0, 1, 2, 3],
       [5, 6, 7, 8, 9, 4, 5, 6, 7]])

Needless to say, the number of columns have to be the same for the `vstack()` function, and the number of rows have to be the same for the `hstack()` function. Notice that both `vstack()` and `hstack()` function take a tuple or a list as the input.

Two 1D arrays can be stacked by `vstack()` and `hstack()` as well.

In [19]:
x = np.arange(5)
x

array([0, 1, 2, 3, 4])

In [20]:
y = np.random.rand(5)
y

array([ 0.11656209,  0.05408744,  0.00639533,  0.41080986,  0.02826659])

In [21]:
np.vstack((x,y))

array([[ 0.        ,  1.        ,  2.        ,  3.        ,  4.        ],
       [ 0.11656209,  0.05408744,  0.00639533,  0.41080986,  0.02826659]])

In [22]:
np.hstack((x,y))

array([ 0.        ,  1.        ,  2.        ,  3.        ,  4.        ,
        0.11656209,  0.05408744,  0.00639533,  0.41080986,  0.02826659])

The `vstack()` function results in a 2D array, whereas the `hstack()` function concatenates the arrays into a longer 1D array. You can transpose the stacked array by the **`T`** method (or the transpose method).

In [23]:
z = np.vstack((x,y))
z

array([[ 0.        ,  1.        ,  2.        ,  3.        ,  4.        ],
       [ 0.11656209,  0.05408744,  0.00639533,  0.41080986,  0.02826659]])

In [24]:
z.T

array([[ 0.        ,  0.11656209],
       [ 1.        ,  0.05408744],
       [ 2.        ,  0.00639533],
       [ 3.        ,  0.41080986],
       [ 4.        ,  0.02826659]])

You can transpose any 2D array with the `T` method. However, you cannot transpose a 1D array.

In [25]:
x = np.arange(5)
x

array([0, 1, 2, 3, 4])

In [26]:
x.T

array([0, 1, 2, 3, 4])

To transpose a 1D array, you have to convert it to a 2D array. 

In [27]:
np.array([x])

array([[0, 1, 2, 3, 4]])

Notice that it is enclosed in double square brackets `[[ ]]`. This means this array is a 2D array, thus it can be transposed.

In [28]:
np.array([x]).shape

(1, 5)

In [29]:
np.array([x]).T

array([[0],
       [1],
       [2],
       [3],
       [4]])

### Exercise
1. You have the following arrays:
```python
u = [1.17, 1.82, 5.79, 6.29, 8.56]
v = [0.86, 3.14, 3.45, 5.88, 8.52]
w = [-1.58, 1.47, 2.77, 5.99, 7.80]
x = [0.73, 0.43, 3.16, 5.96, 7.45]
```
  1. Stack `u` and `w` vertically, call it `y`.
  2. Stack `v` and `x` vertically, call it `z`.
  3. Stack `y` and `z` horizontally.
  4. Stack `u`, `v`, `w`, and `x` vertically and transpose the resulting array.
2. Create a 2D array of size 4x3 with ones as its elements. Create an identity matrix of size 3x3. Stack the two arrays vertically.
3. You have a 1D array
```python
v = np.array([-33, 44, 35])
```
You want to stack this vector to a 3x3 identity matrix so that the resulting array is
```python
array([[  1.,   0.,   0., -33.],
       [  0.,   1.,   0.,  44.],
       [  0.,   0.,   1.,  35.]])
```
How can you do this?

# 4. Basic operations
<hr style="height:1px;border:none" />

Unlike lists, arrays can be used in mathematical operations. For example,

In [30]:
a = np.ones((3,3))
a

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [31]:
b = np.ones((3,3))
b = np.arange(9).reshape(3,3)
b

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [32]:
a + b

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.]])

In [33]:
b - a

array([[-1.,  0.,  1.],
       [ 2.,  3.,  4.],
       [ 5.,  6.,  7.]])

An operation involving arrays is performed on element-by-element basis. You can also perform an operation between an array and a scalar (i.e., a single number). In such a case, each element in the array is used in an operation with a scalar.

In [34]:
c = np.arange(6)
c

array([0, 1, 2, 3, 4, 5])

In [35]:
c+10

array([10, 11, 12, 13, 14, 15])

In [36]:
c*10

array([ 0, 10, 20, 30, 40, 50])

In [37]:
c**2

array([ 0,  1,  4,  9, 16, 25])

In [38]:
c/5

array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])

There are also some useful methods, such as **`sum`**, **`min`**, and **`max`** (sum, minimum, and maximum, respectively).

In [39]:
d = np.random.rand(5)
d

array([ 0.45101362,  0.34768716,  0.38284549,  0.44798698,  0.84837493])

In [40]:
d.sum()

2.4779081840114441

In [41]:
d.min()

0.3476871581071288

In [42]:
d.max()

0.84837492889347499

For 2D or higher dimension arrays, these methods return a single number for the entire array. In other words, the **`min`** method returns the smallest element of the entire array. You can also calculate the **`sum`**, **`min`**, and **`max`** for each row or column by specifying the axis parameter. 

In [43]:
a = np.random.rand(4,3)
a

array([[ 0.14884635,  0.02183561,  0.25646553],
       [ 0.23785761,  0.90276123,  0.26046412],
       [ 0.30789054,  0.91337381,  0.73480458],
       [ 0.94598519,  0.96340125,  0.52560488]])

In [44]:
a.sum(axis=0)

array([ 1.64057969,  2.8013719 ,  1.7773391 ])

In [45]:
a.sum(axis=1)

array([ 0.4271475 ,  1.40108296,  1.95606892,  2.43499131])

In case of a 2D array, **`axis=0`** corresponds to rows, and **`axis=1`** corresponds to columns. 

In [46]:
a.max(axis=1)

array([ 0.25646553,  0.90276123,  0.91337381,  0.96340125])

In [47]:
a.min(axis=0)

array([ 0.14884635,  0.02183561,  0.25646553])

Finally, there are some basic math functions in NumPy, such as **`exp()`**, **`sin()`**, and **`sqrt()`**. You can use these functions on an array. Each element is used in the calculation of a function. 

In [50]:
angle = np.linspace(0,1,5) * np.math.pi
angle

array([ 0.        ,  0.78539816,  1.57079633,  2.35619449,  3.14159265])

In [51]:
np.sin(angle)

array([  0.00000000e+00,   7.07106781e-01,   1.00000000e+00,
         7.07106781e-01,   1.22464680e-16])

In [52]:
b = np.arange(6)
b

array([0, 1, 2, 3, 4, 5])

In [53]:
np.sqrt(b)

array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ,
        2.23606798])

### Exercise
1. You have an array
```python
m = np.array([[8, 3, 4], [1, 5, 9], [6, 7, 2]])
```
Calculate the `sum`, `min`, and `max` across the rows and across the columns.
2. **Temperature conversion**. You have an array of temperatures in Celsius
```python
tempC = np.arange(-40,41,10)
```
Convert the temperatures to Fahrenheit by the formula
```python
F = C * 1.8 + 32
```
3. **Random sample**. The function **`np.random.randn()`** produces random numbers following a Gaussian distribution (i.e., normal distribution). Create a 2D array of $1000 \times 20$ of Gaussian random numbers. Then calculate the mean and standard deviation (with **`mean()`** and **`std()`** methods, respectively) across rows. In theory, the mean should be 0 and the standard deviation should be 1. 

   ***Hint***: *`np.random.randn(X,Y)` produces a 2D array of random numbers with X rows and Y columns.*

# 5. Linear algebra – matrix-like operations
<hr style="height:1px;border:none" />

In the multiplication operation described above is element-wise multiplication. If you remember from your linear algebra class, you can multiple two matrices by following a matrix multiplication rule. You can perform such matrix multiplication in NumPy as well. 

In [4]:
U = np.random.randn(1,4)
U

array([[ 0.52428626,  0.40943156, -2.78561054, -1.00963066]])

In [5]:
V = np.ones([1,4])
V

array([[1., 1., 1., 1.]])

In [6]:
W = np.arange(0,5)
W

array([0, 1, 2, 3, 4])

In [None]:
A = np.arange(0,16).reshape(4,4)

In [56]:
AV = np.dot(A,V)
AV

array([[-13.0902939 ],
       [ -8.25607392],
       [-10.35596648],
       [ -9.9521686 ]])

The vector `V` is the *eigenvector* of the matrix `A`. The product `AV` is actually the vector `V` times its eigenvalue, `21.114`. 

In [57]:
AV /V

array([[ 21.1142191 ],
       [ 21.11421908],
       [ 21.11421922],
       [ 21.11421912]])

For a square matrix, its inverse can be calculated by the **`np.linalg.inv()`** function. 

In [58]:
B = np.random.randint(0,10,9).reshape(3,3)
B

array([[1, 4, 3],
       [4, 6, 4],
       [1, 6, 3]])

In [59]:
invB = np.linalg.inv(B)
invB

array([[-0.375,  0.375, -0.125],
       [-0.5  ,  0.   ,  0.5  ],
       [ 1.125, -0.125, -0.625]])

In [61]:
np.dot(B, invB)

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

# 6. Copy method
<hr style="height:1px;border:none" />

Remember that when you create a new list based on another list, both lists are referring to the same data? This may be true for arrays under some circumstances. For example,

In [63]:
a = np.arange(15)
b = a
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [64]:
b

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [65]:
a.shape = (3,5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [66]:
b

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

If you modify the shape of the array `a`, then the shape of array `b` is also altered. To overcome this problem, you can create a copy of the original array by using the **`copy`** method.

In [67]:
c = a.copy()
c

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [68]:
a.shape = (5,3)
a

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [69]:
c

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

# 7. Indexing and slicing
<hr style="height:1px;border:none" />

You can use an index or a slice to access an element or elements on an array.

In [2]:
import numpy as np
a = np.arange(10)**2
a

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [3]:
a[5]

25

In [4]:
a[7:]

array([49, 64, 81])

In [5]:
a[:-3]

array([ 0,  1,  4,  9, 16, 25, 36])

You can also replace the existing elements with new elements using an index or
a slice.

In [6]:
a[0] = -1
a

array([-1,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [7]:
a[::2] = 100
a

array([100,   1, 100,   9, 100,  25, 100,  49, 100,  81])

In other words, replacing every other elements by 100.

Unlike a list, you can also have a list of indices to access elements as well.

In [11]:
a[[1,3,7,8]]

array([  1,   9,  49, 100])

In [12]:
a[[1,1,1,9]]

array([ 1,  1,  1, 81])

In [13]:
a[[7,6,8,1]]

array([ 49, 100, 100,   1])

For a 2D array, the first index corresponds to the row, and the second index
corresponds to the column. If you ignore the second index, then the entire row is
returned. Both row and column indices can be a number (i.e., index) or a slice.

In [14]:
b = np.arange(15).reshape(3,5)
b

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [15]:
b[1]

array([5, 6, 7, 8, 9])

In [16]:
b[2,3]

13

In [17]:
b[0,:3]

array([0, 1, 2])

In [18]:
b[1:,3:]

array([[ 8,  9],
       [13, 14]])

In [19]:
b[:,2]

array([ 2,  7, 12])

In [20]:
b[:,::2]

array([[ 0,  2,  4],
       [ 5,  7,  9],
       [10, 12, 14]])

And, you can use a list for row indices or column indices as well.

In [21]:
b[1:,[1,1,3,3,2]]

array([[ 6,  6,  8,  8,  7],
       [11, 11, 13, 13, 12]])

In [22]:
b[:,[1,1,3,3,2]]

array([[ 1,  1,  3,  3,  2],
       [ 6,  6,  8,  8,  7],
       [11, 11, 13, 13, 12]])

In [23]:
b[1:,[2,4,2,4]]

array([[ 7,  9,  7,  9],
       [12, 14, 12, 14]])

In [24]:
b[[1,1,0],2:]

array([[7, 8, 9],
       [7, 8, 9],
       [2, 3, 4]])

In [25]:
b[[0,1,0,2,1],[1,1,3,3,2]]

array([ 1,  6,  3, 13,  7])

Note that, if you choose to use a list for both row and column indices, the lists of
indices have the have the same length. Otherwise you will get an error message.

You can substitute elements using slices or indices.

In [26]:
b[1:,2:] = -1
b

array([[ 0,  1,  2,  3,  4],
       [ 5,  6, -1, -1, -1],
       [10, 11, -1, -1, -1]])

You can apply the same principles of indexing and slicing to higher-dimension
arrays as well.

In [27]:
c = np.arange(60).reshape(3,4,5)
c

array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39]],

       [[40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

In [28]:
c[:,:,0]

array([[ 0,  5, 10, 15],
       [20, 25, 30, 35],
       [40, 45, 50, 55]])

In [29]:
c[:2,:2,4]

array([[ 4,  9],
       [24, 29]])

### Exercise
1. You have an array
```python
array([[ 0, 5, 10, 15, 20],
       [25, 30, 35, 40, 45],
       [50, 55, 60, 65, 70],
       [75, 80, 85, 90, 95]])
```
generated by
```python
a = np.arange(0,100,5).reshape(4,5)
```
From this array, using indices and slices, create the following arrays.
  1. 
  ```python
  array([[15, 10],
         [40, 35],
         [65, 60],
         [90, 85]])
  ```
  2.
  ```python
  array([[ 0, 5, 10, 15, 20],
         [ 0, 5, 10, 15, 20],
         [ 0, 5, 10, 15, 20]])
  ```
  3.
  ```python
  array([[ 0, 10, 20],
         [50, 60, 70],
         [ 0, 10, 20],
         [50, 60, 70],
         [ 0, 10, 20],
         [50, 60, 70]])
  ```
2. **Patterned array**. Create an array of this pattern.
```python
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  2.,  2.,  2.,  2.],
       [ 1.,  2.,  3.,  3.,  3.],
       [ 1.,  2.,  3.,  4.,  4.],
       [ 1.,  2.,  3.,  4.,  5.]])
```

   ***Hint:*** *First create a 5x5 array of ones with `np.ones`. Then replace appropriate elements with 2, using row and column slices, i.e.,*
   ```python
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  2.,  2.,  2.,  2.],
       [ 1.,  2.,  2.,  2.,  2.],
       [ 1.,  2.,  2.,  2.,  2.],
       [ 1.,  2.,  2.,  2.,  2.]])
```
   *Repeat the process for 3, 4, and 5.*

# 8. Array in a `for` loop
<hr style="height:1px;border:none" />

You can use an array in a `for` loop, just like you use a list for a `for` loop. If you
have a 1D array, you can use the array just like using a list.

`<ForLoopExample.py>`

In [30]:
import numpy as np

a = np.arange(10)

for i in a:
    print(i)

0
1
2
3
4
5
6
7
8
9


And this should produce a print out of numbers between 0 and 9. With a 2D
array, each row is iterated in a `for` loop. For example,

In [31]:
b = np.arange(15).reshape(5,3)

for row in b:
    print(row)

[0 1 2]
[3 4 5]
[6 7 8]
[ 9 10 11]
[12 13 14]


To use individual elements, rather than a row at a time, there are two different
approaches.

In [32]:
for row in b:
    for element in row:
        print(element)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14


This approach uses nested `for` loops, the outer loop for rows, and the inner loop
for elements in each row.

In [33]:
for element in b.flat:
    print(element)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14


This method flattens, or converts to a 1D array, with **`flat`** method, which
produces an *iterator* (a series of items to be iterated in a for loop) to go over each
element in the array.

# 9. Broadcasting
<hr style="height:1px;border:none" />

When two arrays are used in a mathematical operation, both arrays should have
the same dimension. However, one of them can be a 1D array.

In [35]:
a = np.arange(10).reshape(2,5)
a

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [36]:
b = np.array([[10,5]]).T
b

array([[10],
       [ 5]])

In [37]:
a*b

array([[ 0, 10, 20, 30, 40],
       [25, 30, 35, 40, 45]])

Notice that **`a`** and **`b`** have different dimensions. However, `b` has the same number
of rows as `a`. So, conceptually, `b` is repeated 5 times in the column direction to
create an array of the same size as `a`.

In [38]:
bFull=np.array([[10]*5,[5]*5])
bFull

array([[10, 10, 10, 10, 10],
       [ 5,  5,  5,  5,  5]])

In [39]:
a*bFull

array([[ 0, 10, 20, 30, 40],
       [25, 30, 35, 40, 45]])

The missing portion of the array is filled by repeating a 1D vector. This is known
as ***broadcasting***. Broadcasting can occur in the column direction (as you just
saw), or in the row direction as well.

In [39]:
c = np.arange(0,10,2)
c

array([0, 2, 4, 6, 8])

In [40]:
a*c

array([[ 0,  2,  8, 18, 32],
       [ 0, 12, 28, 48, 72]])

### Exercise
1. **Broadcasting**. Say, you have a 6x8 array of ones generated by `np.ones`:
```python
array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])
```
Then with broadcasting, modify this array to:
```python
array([[1., 0., 2., 0., 3., 0., 4., 0.],
       [1., 0., 2., 0., 3., 0., 4., 0.],
       [1., 0., 2., 0., 3., 0., 4., 0.],
       [1., 0., 2., 0., 3., 0., 4., 0.],
       [1., 0., 2., 0., 3., 0., 4., 0.],
       [1., 0., 2., 0., 3., 0., 4., 0.]])
```
Then further modify this array to:
```python
array([[1., 0., 2., 0., 3., 0., 4., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 2., 0., 3., 0., 4., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 2., 0., 3., 0., 4., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.]])
```

# 10. Finding elements satisfying conditions
<hr style="height:1px;border:none" />

With an array, it is easy to generate a sub-array satisfying certain conditions. For
example, lets say you have the following data.

`<ExampleData.py>`

In [49]:
import numpy as np

subjID = np.array(['sub001']*3 + ['sub005']*4 + ['sub010']*3)
RT = np.array([ 98,  96,  86,  90,  95,  80, 117,  90, 114, 113])
score = np.array([0, 0, 1, 0, 1, 0, 0, 0, 1, 0])

print('ID\tRT\tScore')
for i,iID in enumerate(subjID):
    print(iID, RT[i], score[i], sep='\t')

ID	RT	Score
sub001	98	0
sub001	96	0
sub001	86	1
sub005	90	0
sub005	95	1
sub005	80	0
sub005	117	0
sub010	90	0
sub010	114	1
sub010	113	0


Then you can create sub-arrays based on a particular subject ID.

In [50]:
subjID[subjID=='sub005']

array(['sub005', 'sub005', 'sub005', 'sub005'], dtype='<U6')

In [51]:
RT[subjID=='sub005']

array([ 90,  95,  80, 117])

In [52]:
score[subjID=='sub005']

array([0, 1, 0, 0])

Or, you can create sub-arrays with `RT`>100ms.

In [53]:
RT[RT>100]

array([117, 114, 113])

In [54]:
subjID[RT>100]

array(['sub005', 'sub010', 'sub010'], dtype='<U6')

In [55]:
score[RT>100]

array([0, 1, 0])

Or, you can create sub-arrays corresponding to score = 1.

In [56]:
score[score==1]

array([1, 1, 1])

In [57]:
subjID[score==1]

array(['sub001', 'sub005', 'sub010'], dtype='<U6')

In [58]:
RT[score==1]

array([ 86,  95, 114])

You can use technique on 2D arrays as well.

In [59]:
dMat = np.array([[ 94, 116, 104, 97, 99],
[ 92, 107, 92, 103, 104],
[115, 112, 81, 90, 90],
[ 94, 100, 90, 92, 114]])

dMat[dMat>95]

array([116, 104,  97,  99, 107, 103, 104, 115, 112, 100, 114])

In [62]:
dMat[dMat>95].size

11

Notice that it returns a 1D array.

### Exercise
You have two arrays of the same size.

`<RTData.py>`

In [63]:
import numpy as np

RTMat = np.array([[111, 100,  86, 120,  91],
                  [ 92,  83, 105, 103, 112],
                  [117, 121, 124, 111, 110],
                  [111,  86, 113,  88, 105]])
scoreMat = np.array([[1, 0, 0, 0, 0],
                     [0, 1, 1, 0, 1],
                     [0, 1, 1, 0, 0],
                     [0, 0, 1, 1, 0]])


where **`RTMat`** corresponds to the response times (RT) in ms and **`scoreMat`**
corresponds to the scores from the same experiment.

1. **Low or high RT**. Count the number of observations with RT<90. Similarly count the number of observations with RT>115. (***Hint:*** *You can use the `size` method for an array*.) Moreover, calculate the mean score for observations with RT<90, as well as the mean score for observations with RT>115.

# 11. Saving and loading arrays
<hr style="height:1px;border:none" />

You may recall reading and writing files in Python require some effort on your
part. If you are only interested in saving and loading array data, then you can use
some useful functions from `NumPy`.

### Saving arrays to a file
Recall the data we used earlier.

`<SaveData.py>`

In [63]:
import numpy as np

subjID = np.array(['sub001']*3 + ['sub005']*4 + ['sub010']*3)
RT = np.array([ 98,  96,  86,  90,  95,  80, 117,  90, 114, 113])
score = np.array([0, 0, 1, 0, 1, 0, 0, 0, 1, 0])

np.savez('ArrayData.npz', subjID=subjID, RT=RT, score=score)

In this data set, we want to save 3 arrays, `subjID`, `RT`, and `score`. We can do
this by
```python
np.savez('ArrayData.npz', subjID=subjID, RT=RT, score=score)
```
In the **`savez`** function, the first argument is the name of the file to be written to. It
has **`.npz`** extension, indicating that this is a `NumPy` archive where multiple
arrays are saved together. You need to specify the names of the arrays to be
saved. You have to say, for example, `subjID=subjID`, to indicate that you want
to save an array named subjID and it will be referred as subjID when the file
is read.

### Loading arrays from a file
Now, let's try reading the arrays you saved to a file.

`<LoadData.py>`

In [65]:
import numpy as np

infile = np.load('ArrayData.npz')

subjID = infile['subjID']
RT = infile['RT']
score = infile['score']

The **`load`** function loads the data into an `npz` file object called `infile`. Here, you can use a name other than `infile`, if you would like. The `load` function takes care of opening, reading and closing the file. To access arrays in the file, you access `infile` with appropriate array names.

If you simply want to check the content of `infile`, then you can use the **`keys()`** method.

In [69]:
infile.keys()

['subjID', 'RT', 'score']

It returns the names of the variables stored in `infile`. If you would like to see the complete content of `infile` (including the variable names and values), then you can use the **`items()`** method.

In [66]:
infile.items()

[('subjID', array(['sub001', 'sub001', 'sub001', 'sub005', 'sub005', 'sub005',
         'sub005', 'sub010', 'sub010', 'sub010'], 
        dtype='<U6')),
 ('RT', array([ 98,  96,  86,  90,  95,  80, 117,  90, 114, 113])),
 ('score', array([0, 0, 1, 0, 1, 0, 0, 0, 1, 0]))]

### Exercise
You have these arrays:

`<RTData.py>`

In [71]:
import numpy as np

RTMat = np.array([[111, 100,  86, 120,  91],
                  [ 92,  83, 105, 103, 112],
                  [117, 121, 124, 111, 110],
                  [111,  86, 113,  88, 105]])
scoreMat = np.array([[1, 0, 0, 0, 0],
                     [0, 1, 1, 0, 1],
                     [0, 1, 1, 0, 0],
                     [0, 0, 1, 1, 0]])

1. Save these arrays to a file using **`np.savez`** function.
2. Load these arrays from a file using **`np.load`** function. Check whether the data have been loaded correctly.