# 2.0 Numpy



[Numpy](http://www.numpy.org/) is short for _numerical python_, and provides functions that are especially useful when you have to work with large arrays and matrices of numeric data, like matrix multiplications.  

The array object class is the foundation of Numpy, and Numpy arrays are like lists in Python, except that every thing inside an array must be of the same type, like int or float. As a result, arrays provide much more efficient storage and data operations, especially as the arrays grow larger in size. However, in other ways, NumPy arrays are very similar to Python's built-in list type.

### Create array from lists:

In [2]:
import numpy as np # similar to library() in R

my_list = [[1,2,3,4,5],[6,7,8,9,10]]

array = np.array(my_list)

print(array, type(array))

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]] <class 'numpy.ndarray'>


In [23]:
print(np.zeros((3,4))) # default output is float
print(np.ones((5,2)))
print(np.zeros((3,3), dtype=int)) # change to int

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]
[[0 0 0]
 [0 0 0]
 [0 0 0]]


In [4]:
print(np.arange(10000))

[   0    1    2 ... 9997 9998 9999]


In [5]:
np.random.random((3, 3))

array([[0.82396768, 0.27970997, 0.40683874],
       [0.80577538, 0.53455264, 0.5669847 ],
       [0.11771515, 0.15585651, 0.88625163]])

### Exercise:
Create a 3x3 array of normally distributed random values with mean 0 and standard deviation 1

In [26]:
mean = 0
std = 1

print(np.random.normal(loc=mean, scale=std, size=(3, 3))) # putting parameters allow you to put values in different sequence


[[-0.919022    1.52786561 -1.2372882 ]
 [ 2.06978568  1.27877935 -0.73210131]
 [ 1.48596831  0.41345349 -1.68177482]]


### Vectorization

In [27]:
my_list = [1,2,3,4,5] # append list

my_list + my_list

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

### try:
np.array(my_list) + np.array(my_list)

In [28]:
np.array(my_list) + np.array(my_list) # addition instead of append

array([ 2,  4,  6,  8, 10])

In [9]:
print([x*2 for x in my_list])

[2, 4, 6, 8, 10]


In [10]:
np.random.random((3, 3))

array([[0.34218053, 0.53363063, 0.24424857],
       [0.70241995, 0.70966853, 0.05307202],
       [0.35284488, 0.07046519, 0.23331493]])

### indexing


In [11]:
rand_num = np.random.random((3, 3))

rand_num

array([[0.38501068, 0.54831922, 0.35722987],
       [0.34839281, 0.33225265, 0.60492038],
       [0.97907124, 0.34304265, 0.84129883]])

In [12]:
rand_num[:,0:1]

array([[0.38501068],
       [0.34839281],
       [0.97907124]])

In [13]:
rand_num[:,0:3]

array([[0.38501068, 0.54831922, 0.35722987],
       [0.34839281, 0.33225265, 0.60492038],
       [0.97907124, 0.34304265, 0.84129883]])

In [14]:
mask = (0.2 < rand_num) * (rand_num < 0.7)
rand_num[mask]

array([0.38501068, 0.54831922, 0.35722987, 0.34839281, 0.33225265,
       0.60492038, 0.34304265])

In [29]:
np.where(mask) # find location in vector

(array([0, 0, 0, 1, 1, 1, 2], dtype=int64),
 array([0, 1, 2, 0, 1, 2, 1], dtype=int64))

In [16]:
rand_num*rand_num

array([[0.14823322, 0.30065397, 0.12761318],
       [0.12137755, 0.11039182, 0.36592866],
       [0.9585805 , 0.11767826, 0.70778372]])

### Matrix multiplication

In [17]:
np.dot(rand_num,rand_num)

array([[0.68901719, 0.51583435, 0.76976385],
       [0.84214953, 0.50893579, 0.83436152],
       [1.32015796, 0.93942179, 1.2650507 ]])

### Array Concatenation and Splitting

np.concatenate (axis = 1)

np.split

np.hstack

np.vstack

np.dstack

np.floor

np.hsplit

np.vsplit

np.dsplit

In [18]:
a = np.arange(5)
print(a)
print()

np.hstack((a,a))

[0 1 2 3 4]



array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4])

## **Exercises:**


1. Create a 3x3 matrix with values ranging from 0 to 8
2. Create a 10x10 array with random values and find the minimum and maximum values
3. Create a 8x8 matrix and fill it with a checkerboard pattern 
3. Create random vector of size 10 and replace the maximum value by 0
4. Create a $4 * 4$ identity matrix.
5. Generate the 2D array
6. Generate a random $4 \times 4 \times 4$ array of Gaussianly distributed numbers.   
7. Generate `n` evenly spaced intervals between 0. and 1.  
8. Create a vector and then reverse the vector (first element becomes last)


Please feel free to discuss with all of us or refer to Prof Google. 

In [40]:
# Q1
print(np.random.randint(low=0, high=8, size=(3,3)))

[[6 6 1]
 [2 1 3]
 [6 2 2]]


In [48]:
# Q2
arr_q2 = np.random.random(size=(10,10))
print(arr_q2)
print("Max: ", np.max(arr_q2))
print("Min: ", np.min(arr_q2))

[[0.10237807 0.89781423 0.8038247  0.56562217 0.85207539 0.30626854
  0.76155203 0.2610437  0.52257398 0.12578415]
 [0.69157113 0.14025948 0.6938098  0.05462377 0.97894926 0.07218591
  0.90111648 0.24738612 0.40417875 0.12450764]
 [0.61209069 0.99845081 0.55087253 0.82432879 0.74228662 0.21090556
  0.11956486 0.64841877 0.33180578 0.61263802]
 [0.91727831 0.01413317 0.68327732 0.96719022 0.38266941 0.1856443
  0.55478242 0.33725077 0.96077978 0.22589015]
 [0.9446921  0.64119141 0.33263738 0.56112194 0.64490385 0.67366294
  0.16846538 0.28585089 0.13887698 0.17130864]
 [0.26815239 0.8594457  0.5787721  0.97051221 0.87632551 0.18778017
  0.08331895 0.9126203  0.52481337 0.60229948]
 [0.95603344 0.13700458 0.05193936 0.2060613  0.61326252 0.78977059
  0.82350368 0.68093388 0.17181994 0.71835954]
 [0.6845406  0.01688494 0.56611338 0.45334052 0.63832033 0.10914807
  0.75732309 0.80705481 0.39142089 0.8810772 ]
 [0.16642043 0.70953178 0.32288412 0.90075453 0.51012352 0.88421087
  0.17621862 

In [47]:
# Q3
ro = np.ones((1,8), dtype=int)
rz = np.zeros((1,8), dtype=int)
print(np.row_stack(4*(ro,rz)))

[[1 1 1 1 1 1 1 1]
 [0 0 0 0 0 0 0 0]
 [1 1 1 1 1 1 1 1]
 [0 0 0 0 0 0 0 0]
 [1 1 1 1 1 1 1 1]
 [0 0 0 0 0 0 0 0]
 [1 1 1 1 1 1 1 1]
 [0 0 0 0 0 0 0 0]]


In [70]:
# Q4
arr_q4 = np.random.randint(low=1, high=1000, size=(10,10))
print(arr_q4)
max = np.max(arr_q4)
print("Max: ", max)
print(np.where(arr_q4==max, 0, arr_q4))

[[ 77  87 161  57 549 948 754 566 485  44]
 [783 793  92  25 641 417 141 908 269 217]
 [815 976 353 381 328 996 246 550  15  98]
 [721 850  16 973 589 940 275 214 454 492]
 [878 140 409 274 753 247 520  84 275 818]
 [425 348 713 376 844 634 911 610 855 430]
 [793 391 502 867 352 116 264  70 206 946]
 [145 832 675 329 886 927 546 731  13 762]
 [695 678 525 981 352 723 367 877 757 397]
 [769 692 161 516 250  78 146 612 848 659]]
Max:  996
[[ 77  87 161  57 549 948 754 566 485  44]
 [783 793  92  25 641 417 141 908 269 217]
 [815 976 353 381 328   0 246 550  15  98]
 [721 850  16 973 589 940 275 214 454 492]
 [878 140 409 274 753 247 520  84 275 818]
 [425 348 713 376 844 634 911 610 855 430]
 [793 391 502 867 352 116 264  70 206 946]
 [145 832 675 329 886 927 546 731  13 762]
 [695 678 525 981 352 723 367 877 757 397]
 [769 692 161 516 250  78 146 612 848 659]]


In [94]:
# Q5
print(np.identity(n=4, dtype=int))

[[1 0 0 0]
 [0 1 0 0]
 [0 0 1 0]
 [0 0 0 1]]


In [77]:
# Q6
print(np.random.random(size=(2,2)))

[[0.30880981 0.85988466]
 [0.74904547 0.80790285]]


In [78]:
# Q7
print(np.random.normal(loc=0, scale=1, size=(4,4,4)))

[[[ 0.40329173 -0.51080233  0.86939476 -1.41817245]
  [-0.52184478  1.19310187 -1.31989437 -0.12147186]
  [ 0.62985406 -0.53029496 -0.40837981  0.17323516]
  [-0.08256632 -1.24442244 -0.03301453  1.90105465]]

 [[-0.51329099  0.37065265 -1.17584876 -1.01623065]
  [-0.46473857 -0.79047735 -2.39296892 -2.37681064]
  [ 1.94995276 -1.59686183 -1.66180944  0.99074864]
  [ 0.24983314  0.73695951  0.02663739 -0.32591877]]

 [[ 0.93129637  0.98192982 -1.15049425  1.16162096]
  [-1.59215339 -0.64431273 -0.0795374  -1.37704034]
  [ 1.04525576 -1.00067911 -0.50455034 -0.47222308]
  [ 0.0312661  -0.2700914   0.82969498  0.54328113]]

 [[ 0.25945139  1.85951177 -1.23137599  0.1890464 ]
  [-1.36827805  0.23651364 -1.74814699 -1.87399625]
  [-0.88359685  0.05419581 -1.65359725  0.76874014]
  [-0.35983244 -1.3152595   0.7232967   0.0030838 ]]]


In [87]:
# Q8
print(np.linspace(start=0, stop=1, num=10))

[0.         0.11111111 0.22222222 0.33333333 0.44444444 0.55555556
 0.66666667 0.77777778 0.88888889 1.        ]


In [93]:
# Q9
arr_q9 = np.array(np.arange(start=1, stop=10))
print(arr_q9)
print(np.flip(arr_q9))

[1 2 3 4 5 6 7 8 9]
[9 8 7 6 5 4 3 2 1]


### Data aggregation functions

NumPy provides many other aggregation functions, but we won't discuss them in detail here.
Additionally, most aggregates have a ``NaN``-safe counterpart that computes the result while ignoring missing values, which are marked by the special IEEE floating-point ``NaN`` value (for a fuller discussion of missing data.
The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |

Source: Python Data Science Handbook

In [19]:
m = np.random.rand(3,3)
m

array([[0.22732079, 0.03440752, 0.92920289],
       [0.44612727, 0.96384581, 0.20754438],
       [0.90518019, 0.02208322, 0.85485957]])

In [20]:
print(m.mean())
print(np.mean(m))

0.5100635153695601
0.5100635153695601
