<a href="https://colab.research.google.com/github/nittayac/DFEDATA6-EX1/blob/main/06_Python_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The first thing we want to do is import numpy.

In [None]:
import numpy as np
import pandas as pd

Let us first define a Python list containing the ages of 6 people.

In [None]:
ages_list = [10, 5, 8, 32, 65, 43]
print(ages_list)


[10, 5, 8, 32, 65, 43]


There are 3 main ways to instantiate a Numpy ndarray object. One of these is to use `np.array(<collection>)`

In [None]:
ages = np.array(ages_list)
print(type(ages))
print(ages)


<class 'numpy.ndarray'>
[[[10  5  8 32 65 43]
  [ 4  5  8  0  0  0]]]


In [None]:
print(ages)
print("Size:\t" , ages.size)
print("Shape:\t", ages.shape)




[[[10  5  8 32 65 43]
  [ 4  5  8  0  0  0]]]
Size:	 12
Shape:	 (1, 2, 6)


In [None]:
zeroArr = np.zeros(5)
print(zeroArr)

[0. 0. 0. 0. 0.]


### Multi-dim

Now let us define a new list containing the weights of these 6 people.

In [None]:
weight_list = [32, 18, 26, 60, 55, 65]

Now, we define an ndarray containing all fo this information, and again print the size and shape of the array.

In [None]:
people = np.array([ages_list, weight_list])

print("People:\t" , people)
print("Size:\t" , people.size)
print("Shape:\t", people.shape)

People:	 [[10  5  8 32 65 43]
 [32 18 26 60 55 65]]
Size:	 12
Shape:	 (2, 6)


In [None]:
people = people.reshape(12,1)
print("People:\t" , people)
print("Size:\t" , people.size)
print("Shape:\t", people.shape)

People:	 [[10]
 [ 5]
 [ 8]
 [32]
 [65]
 [43]
 [32]
 [18]
 [26]
 [60]
 [55]
 [65]]
Size:	 12
Shape:	 (12, 1)


###### Note: The new shape must be the same "size" as the old shape

### Exercise

* Generate a 1D numpy array with the values [7, 9, 65, 33, 85, 99]

* Generate a matrix (2D numpy array) of the values:

\begin{align}
  \mathbf{A} =
  \begin{pmatrix}
    1 & 2 & 4 \\
    2 & 3 & 0 \\
    0 & 5 & 1
  \end{pmatrix}
\end{align}

* Change the dimensions of this array to another permitted shape

In [None]:
#Generate a 1D numpy array with the values [7, 9, 65, 33, 85, 99]

arr1 = np.array([1,2,0,2,3,5,4,0,1])
print(arr1)
print(arr1.shape)
print()


#Generate a matrix (2D numpy array) of the values:
rearr = arr1.reshape(3,3)
print(rearr)
print()



#Change the dimensions of this array to another permitted shape
print(rearr.transpose(1,0))  #change row to column
print()


[1 2 0 2 3 5 4 0 1]
(9,)

[[1 2 0]
 [2 3 5]
 [4 0 1]]

[[1 2 4]
 [2 3 0]
 [0 5 1]]



## Array Generation

Instead of defining an array manually, we can ask numpy to do it for us.

The `np.arange()` method creates a range of numbers with user defined steps between each.

In [None]:
five_times_table = np.arange(0, 55, 5)
five_times_table

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

The `np.linspace()` method will produce a range of evenly spaced values, starting, ending, and taking as many steps as you specify.

In [None]:
five_spaced = np.linspace(0,50,11)
print(five_spaced)

[ 0.  5. 10. 15. 20. 25. 30. 35. 40. 45. 50.]


The `.repeat()` method will repeat an object you pas a specified number of times.

In [None]:
twoArr = np.repeat(2, 10)
print(twoArr)

[2 2 2 2 2 2 2 2 2 2]


The `np.eye()` functions will create an identity matrix/array for us.

In [None]:
identity_matrix = np.eye(6)
print(identity_matrix)

[[1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1.]]


# Operations

There are many, many operations which we can perform on arrays. Below, we demonstrate a few.

What is happening in each line?

In [None]:
five_times_table

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

In [None]:
print("1:", 2 * five_times_table)
print("2:", 10 + five_times_table)
print("3:", five_times_table - 1)
print("4:", five_times_table/5)
print("5:", five_times_table **2)
print("6:", five_times_table < 20)

1: [  0  10  20  30  40  50  60  70  80  90 100]
2: [10 15 20 25 30 35 40 45 50 55 60]
3: [-1  4  9 14 19 24 29 34 39 44 49]
4: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
5: [   0   25  100  225  400  625  900 1225 1600 2025 2500]
6: [ True  True  True  True False False False False False False False]


### Speed Test

If we compare the speed at which we can do these operations compared to core python, we will notice a substantial difference.

In [None]:
fives_list = list(range(0,5001,5))
fives_list

In [None]:
five_times_table_lge = np.arange(0,5001,5)
five_times_table_lge

array([   0,    5,   10, ..., 4990, 4995, 5000])

In [None]:
%timeit five_times_table_lge + 5

The slowest run took 18.20 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 1.69 µs per loop


In [None]:
%timeit [e + 5 for e in fives_list]

10000 loops, best of 5: 50.3 µs per loop


Boolean string operations can also be performed on ndarrays.

In [None]:
words = np.array(["ten", "nine", "eight", "seven", "six"])

print(np.isin(words, 'e'))

print("e" in words)
["e" in word for word in words]

[False False False False False]
False


[True, True, True, True, False]

# Transpose

In [None]:
people.shape = (2, 6)
print(people, "\n")
print(people.T)

[[10  5  8 32 65 43]
 [32 18 26 60 55 65]] 

[[10 32]
 [ 5 18]
 [ 8 26]
 [32 60]
 [65 55]
 [43 65]]


# Data Types

As previously mentioned, ndarrays can only have one data type. If we want to obtain or change this, we use the `.dtype` attribute.

In [None]:
people.dtype

dtype('int64')

What is the data type of the below ndarray?

In [None]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'])
ages_with_strings

array(['10', '5', '8', '32', '65', '43'], dtype='<U21')

What is the dtype of this array?

In [None]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'], dtype='int32')
ages_with_strings

array([10,  5,  8, 32, 65, 43], dtype=int32)

What do you think has happened here?

In [None]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'])
print(ages_with_strings)

['10' '5' '8' '32' '65' '43']


In [None]:
ages_with_strings.dtype = 'int32'
print(ages_with_strings)

[49 48  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 53  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 56  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 51 50  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0 54 53  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0 52 51  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0]


In [None]:
ages_with_strings.size

126

In [None]:
ages_with_strings.size/21

6.0

In [None]:
np.array([10, 5, 8, '32', '65', '43']).size

6

The correct way to have changed the data type of the ndarray would have been to use the `.astype()` method, demonstrated below.

In [None]:
ages_with_strings = np.array([10, 5, 8, '32', '65', '43'])
print(ages_with_strings)
print(ages_with_strings.astype('int32'))

['10' '5' '8' '32' '65' '43']
[10  5  8 32 65 43]


### Exercise

* #### Create an array of string numbers, but use dtype to make it an array of floats.
* #### Transpose the matrix, printing the new size and shape.
* #### Use the .astype() method to convert the array to boolean.

In [None]:
#Create an array of string numbers, but use dtype to make it an array of floats.
#arrstr = np.array(["1","2","3","4","5","6","7","8","9"],dtype='int64')
arrstr = np.array(["1","2","3","4","5","6","7","8","9","10","11","12"])
print(arrstr)
print()


print("------float64---------")
print(arrstr.astype('float64'))
print(arrstr.shape)
print()



print("------reshape(4,3)---------")
arrnew = arrstr.astype('float64').reshape(4,3)
print(arrnew)
print()



print("------transpose---------")
arrnew.transpose
print()
arrnew.T






['1' '2' '3' '4' '5' '6' '7' '8' '9' '10' '11' '12']

------float64---------
[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12.]
(12,)

------reshape(4,3)---------
[[ 1.  2.  3.]
 [ 4.  5.  6.]
 [ 7.  8.  9.]
 [10. 11. 12.]]

------transpose---------



array([[ 1.,  4.,  7., 10.],
       [ 2.,  5.,  8., 11.],
       [ 3.,  6.,  9., 12.]])

## Array Slicing Operations

As before, we can use square brackets and indices to access individual values, and the colon operator to slice the array.

In [None]:
five_times_table

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

In [None]:
five_times_table[0]

0

In [None]:
five_times_table[-1]

50

In [None]:
five_times_table[:4]

array([ 0,  5, 10, 15])

In [None]:
five_times_table[4:]

array([20, 25, 30, 35, 40, 45, 50])

We can also slice an n-dim ndarray., specifying the slice operation accross each axis.

In [None]:
print(people)
people[:3, :3]

[[10  5  8 32 65 43]
 [32 18 26 60 55 65]]


array([[10,  5,  8],
       [32, 18, 26]])

### Exercise

* Create a numpy array with 50 zeros
* Create a np array of 2 repeated 20 times
* Create a numpy array from 0 to 2 $\pi$ in steps of 0.1

For one of the arrays generated:
* Get the first five values
* Get the last 3 values
* Get the 4th value to the 7th value

We can reverse an array by using `.flip()` or by using the `::` operator.

In [None]:
reverse_five_times_table = np.flip(five_times_table)
reverse_five_times_table

array([50, 45, 40, 35, 30, 25, 20, 15, 10,  5,  0])

In [None]:
reverse_five_times_table = five_times_table[-1::-1]
print(reverse_five_times_table)
five_times_table

[50 45 40 35 30 25 20 15 10  5  0]


array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

We can also use the `::` operator to select steps of the original array.

In [None]:
five_times_table[0::3] #Every 3rd element starting from 0

array([ 0, 15, 30, 45])

In [None]:
from numpy.core.numeric import array_equiv

def createlist(range1,range2):
  arr =[]
  for i in range(range1,range2):
    arr.append(i)
  arr = np.array(arr)
  return arr



#Create a numpy array with 50 zeros
arrazero = np.zeros(20)

#Create a np array of 2 repeated 20 times
arrrep2 = np.repeat(2,20)
arrrep3 = np.repeat(3,20)



#arrrep4,arrrep5,arrrep6,arrrep7 = [],[],[],[];
arrrep4,arrrep5,arrrep6,arrrep7 = createlist(-20,0),createlist(0,20),createlist(20,40),createlist(40,60)

tuples = zip(arrrep4,arrrep5,arrrep6,arrrep7)
tuples = list(tuples)
print(tuples)

#print(arrrep4)
#print(arrrep5)
#print(arrrep6)
#print(arrrep7)

ls1 = np.array(list(tuples))
print(ls1)



#Create a numpy array from 0 to 2  π  in steps of 0.1
pi= 3.14 *2
arrpi = np.arange(0,pi,0.1)
print(arrpi)
print()


#Get the first five values
print()
print("#####Get the first five values")
print(ls1[:5])


#Get the last 3 values
print()
print("#####Get the last 3 values")
print(ls1[len(ls1)-3:])


#Get the 4th value to the 7th value
print()
print("#####Get the 4th value to the 7th value")
print(ls1[4:7])


[(-20, 0, 20, 40), (-19, 1, 21, 41), (-18, 2, 22, 42), (-17, 3, 23, 43), (-16, 4, 24, 44), (-15, 5, 25, 45), (-14, 6, 26, 46), (-13, 7, 27, 47), (-12, 8, 28, 48), (-11, 9, 29, 49), (-10, 10, 30, 50), (-9, 11, 31, 51), (-8, 12, 32, 52), (-7, 13, 33, 53), (-6, 14, 34, 54), (-5, 15, 35, 55), (-4, 16, 36, 56), (-3, 17, 37, 57), (-2, 18, 38, 58), (-1, 19, 39, 59)]
[[-20   0  20  40]
 [-19   1  21  41]
 [-18   2  22  42]
 [-17   3  23  43]
 [-16   4  24  44]
 [-15   5  25  45]
 [-14   6  26  46]
 [-13   7  27  47]
 [-12   8  28  48]
 [-11   9  29  49]
 [-10  10  30  50]
 [ -9  11  31  51]
 [ -8  12  32  52]
 [ -7  13  33  53]
 [ -6  14  34  54]
 [ -5  15  35  55]
 [ -4  16  36  56]
 [ -3  17  37  57]
 [ -2  18  38  58]
 [ -1  19  39  59]]
[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.  1.1 1.2 1.3 1.4 1.5 1.6 1.7
 1.8 1.9 2.  2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.  3.1 3.2 3.3 3.4 3.5
 3.6 3.7 3.8 3.9 4.  4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.  5.1 5.2 5.3
 5.4 5.5 5.6 5.7 5.8 5.9 6.  6.1 6.2]


### Exercise
Take one of the arrays you defined and
* #### Reverse it
* #### Only keep every 4th element.
* #### Get every 2nd element, starting from the last and moving backwards.

In [169]:
#Reverse it
list2 = ls1[:5]
print(list2)
print()
print(list2[-1::-1])  #move every row  (:: = every)
print()
print(np.flip(list2)) #move every row and every column


#Only keep every 4th element start from 0
print()
print(list2[0::4])

#Get every 2nd element, starting from the last and moving backwards.
print()
print(list2[-1::-2])


[[-20   0  20  40]
 [-19   1  21  41]
 [-18   2  22  42]
 [-17   3  23  43]
 [-16   4  24  44]]

[[-16   4  24  44]
 [-17   3  23  43]
 [-18   2  22  42]
 [-19   1  21  41]
 [-20   0  20  40]]

[[ 44  24   4 -16]
 [ 43  23   3 -17]
 [ 42  22   2 -18]
 [ 41  21   1 -19]
 [ 40  20   0 -20]]

[[-20   0  20  40]
 [-16   4  24  44]]

[[-16   4  24  44]
 [-18   2  22  42]
 [-20   0  20  40]]


# Stats

In [170]:
np.array([1.65432, 5.98765]).round(2)

array([1.65, 5.99])

In [171]:
nums = np.arange(0, 4, 0.2555)

### Exercise

* Compute min, max, sum, mean, median, variance, and standard deviation of the above array, all to to 2 decimal places.

In [None]:
print("min = ", np.min(nums).round(2))
print("max = ", np.max(nums).round(2))
print("sum = ", np.sum(nums).round(2))
print("mean = ", np.mean(nums).round(2))
print("median = ", np.median(nums).round(2))
print("var = ", np.var(nums).round(2))
print("std = ", np.std(nums).round(2))

In [193]:
#Compute min, max, sum, mean, median, variance, and standard deviation of the above array, all to to 2 decimal places. 

df1 = pd.DataFrame(ls1,columns=["radius","area","height","width",])
print(df1.describe())

newlist =[]
for i in range(len(ls1)):
   for k in range(len(ls1[i])):
     newlist.append(ls1[i][k])

print()
print(newlist)

min = np.min(newlist).round(2)
max = np.max(newlist).round(2)
sum = np.sum(newlist).round(2)
mean = np.mean(newlist).round(2)
median = np.median(newlist).round(2)
var = np.var(newlist).round(2)
std = np.std(newlist).round(2)


print(f"\tmin\tmax\tsumt\tmean\tmedian\tvar\tstd")
print(f"\t{min}\t{max}\t{sum}\t{mean}\t{median}\t{var}\t{std}")


         radius      area    height     width
count  20.00000  20.00000  20.00000  20.00000
mean  -10.50000   9.50000  29.50000  49.50000
std     5.91608   5.91608   5.91608   5.91608
min   -20.00000   0.00000  20.00000  40.00000
25%   -15.25000   4.75000  24.75000  44.75000
50%   -10.50000   9.50000  29.50000  49.50000
75%    -5.75000  14.25000  34.25000  54.25000
max    -1.00000  19.00000  39.00000  59.00000

[-20, 0, 20, 40, -19, 1, 21, 41, -18, 2, 22, 42, -17, 3, 23, 43, -16, 4, 24, 44, -15, 5, 25, 45, -14, 6, 26, 46, -13, 7, 27, 47, -12, 8, 28, 48, -11, 9, 29, 49, -10, 10, 30, 50, -9, 11, 31, 51, -8, 12, 32, 52, -7, 13, 33, 53, -6, 14, 34, 54, -5, 15, 35, 55, -4, 16, 36, 56, -3, 17, 37, 57, -2, 18, 38, 58, -1, 19, 39, 59]
	min	max	sumt	mean	median	var	std
	-20	59	1560	19.5	19.5	533.25	23.09


## Random

With `np.random`, we can generate a number of types of dataset, and create training data.

The below code simulates a fair coin toss.

In [None]:
flip = np.random.choice([0,1], 10)
flip

In [None]:
np.random.rand(10,20,9)

We can produce 1000 datapoints of a normally distributed data set by using `np.random.normal()`

In [None]:
mu, sigma = 0, 0.1 # mean and standard deviation
s = np.random.normal(mu, sigma, 1000)

### Exercise
* Simulate a six-sided dice using numpy.random.choice(), generate a list of values you would obtain from 10 throws.
* Simulate a two-sided coin toss that is NOT fair: it is twice as likely to have head than tails.
