***
## 3.2 Numpy - Data Types, Functions, and Random Module

***
### Python3.1 Numpy Introduction
### Python3.2 Numpy DataTypes, Functions, and Random Module
### Python3.3 Numpy Iterating Over Arrays
### Python3.4 Numpy Manipulating Arrays
### Python3.5 Numpy Operations
### Python3.6 Numpy File Input and Output and Data Processing
### Python3.7 Numpy-Sort, Argsort, Nonzero, and Extract Functions
### Python3.8 Numpy BreakoutGroupExercises
### Python3.8 Numpy BreakoutGroupExercises - Solutions
***

***
## Table of Contents:

### Section 1. Data Types
#### 1) Use `np.array()` with the dtype keyword argument to explicitly define the type of the array data
#### 2) Use the **`.dtype`** property of an ndarray to see the data type of an array

### Section 2. Data Type Casting

Since Numpy arrays are statically typed, the type of an array does not change once created. But we can explicitly cast an array of some type to another using the **`astype()`** function (see also the similar asarray function). 

### Section 3. Numpy Built-in Functions and Capabilities

#### Numpy arrays
Numpy arrays are the main way we will use Numpy throughout the course. **Numpy arrays essentially come in two types**: `vectors` and `matrices`. Vectors are 1-dimensional arrays and matrices are 2-dimensional ones.

#### 1) `np.array()` Function: Creates Numpy Arrays

#### 2) Copying Arrays
- h = a.view() Create a view of the array with the same data
- np.copy(a) Create a copy of the array
- h = a.copy() Create a deep copy of the array

#### 3) Sorting Arrays
- a.sort() Sort an array
- c.sort(axis=0) Sort the elements of an array's axis

#### 4) Using Array-Generating Functions to Create Initial Placeholders

##### a) `arange()` Function: Returns evenly spaced values within a given interval - Only includes the lower end
##### b) `zeros()` and `ones()` Functions: Generate arrays of zeros or ones
##### c) `linspace()` Funtion: Returns evenly spaced numbers over a specified interval - both end points ARE included
##### d) `diag()` Function: Creates a diagonal matrix
##### e) `eye()` Function: Creates an identity matrix
##### f)  `full()` Function: Create a constant array
##### g) `empty()` Function: Create an empty array

### Section 4. Numpy Random Module: a suite of functions based on pseudorandom number generation to create random number arrays

#### 1) `rand()` Function: Creates an array of the given shape and populate it with random samples from an uniform distribution
over ``[0, 1)``.
#### 2) `randn()` Function: Returns a sample (or samples) from the "standard normal" distribution. 
#### 3) `randint()` Function: Returns random integers from `low (inclusive)` to high (exclusive).


***

## Section 1. Data Types

NumPy supports a much greater variety of numerical types than Python does. The following table shows different scalar data types defined in NumPy.

- bool_: Boolean (`True or False`) stored as a byte

- int_: Default integer type (same as C long; normally either int64 or int32)

- intc: Identical to C int (normally int32 or int64)

- intp: Integer used for indexing (same as C ssize_t; normally either int32 or int64)

- int8: Byte (-128 to 127)

- int16: Integer (-32768 to 32767)

- int32: Integer (-2147483648 to 2147483647)

- int64: Integer (-9223372036854775808 to 9223372036854775807)

- uint8: Unsigned integer (0 to 255)

- uint16: Unsigned integer (0 to 65535)

- uint32: Unsigned integer (0 to 4294967295)

- uint64: Unsigned integer (0 to 18446744073709551615)

- float_: Shorthand for float64

- float16: Half precision float: sign bit, 5 bits exponent, 10 bits mantissa

- float32: Single precision float: sign bit, 8 bits exponent, 23 bits mantissa

- float64: Double precision float: sign bit, 11 bits exponent, 52 bits mantissa

- complex_: Shorthand for complex128

- complex64: Complex number, represented by two 32-bit floats (real and imaginary components)

- complex128: Complex number, represented by two 64-bit floats (real and imaginary components)

Note: **Common data types** that can be used with dtype are: **int, float, complex, bool, object,** etc.
We can also explicitly define the bit size of the data types, for example: `int64`, `int16`, `float128`, `complex128`.

Note: Each type of integer has a different range of storage capacity

   Type  --  Capacity

   Int16 -- (-32,768 to +32,767)

   Int32 -- (-2,147,483,648 to +2,147,483,647)

   Int64 -- (-9,223,372,036,854,775,808 to +9,223,372,036,854,775,807)

In [89]:
import numpy as np
M = np.array([[1, 2], [3, 4]])
M

array([[1, 2],
       [3, 4]])

#### Note: We get an error if we try to assign a value of the wrong type to an element in a Numpy array:

In [90]:
M

array([[1, 2],
       [3, 4]])

In [91]:
M[0,0] = "hello"

ValueError: invalid literal for int() with base 10: 'hello'

In [92]:
M[1,0] = "5"

In [93]:
M

array([[1, 2],
       [5, 4]])

In [94]:
M[0,0] = 0
M

array([[0, 2],
       [5, 4]])

### 1) Use `np.array()` with the dtype keyword argument to explicitly define the type of the array data

In [95]:
M = np.array([[1, 2], [3, 4]], dtype=complex)
M

array([[1.+0.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])

In [96]:
M = np.array([[0, 2], [3, 4]], dtype=bool)
M

array([[False,  True],
       [ True,  True]])

### 2) Use the **`.dtype`** property of an ndarray to see the data type of an array
You can also grab the data type of the object in the array:

In [97]:
M.dtype

dtype('bool')

## Section 2. Data Type Casting

Since Numpy arrays are statically typed, the type of an array does not change once created. But we can explicitly cast an array of some type to another using the **`astype()`** function (see also the similar asarray function). This always creates a new array of new type:

In [98]:
M2 = M.astype(float)
M2

array([[0., 1.],
       [1., 1.]])

In [99]:
M2.dtype

dtype('float64')

In [100]:
M3 = M.astype(bool)
M3

array([[False,  True],
       [ True,  True]])

In [101]:
M2.astype('str')
M2

array([[0., 1.],
       [1., 1.]])

## Section 3. Numpy Built-in Functions and Capabilities

We will focus on some of the most important aspects of Numpy: `vectors`, `arrays`, `matrices`, and `numbers` generation. Let's start by discussing arrays.

#### Numpy arrays
Numpy arrays are the main way we will use Numpy throughout the course. **Numpy arrays essentially come in two types**: 

`vectors` (1-dimensional array) and `matrices` (2-dimensional arrays).

### 1) `np.array()` Function: Creates NumPy arrays
From a Python list, we can create an array by directly converting a list or list of lists:

#### a) Cast a list to np.array as vector

In [102]:
my_list = [10,20,30]
my_list

[10, 20, 30]

In [103]:
np.array(my_list)

array([10, 20, 30])

#### b) Cast a list of lists to np.array as matrix

In [104]:
my_matrix = [[10,20,30],[40,50,60],[70,80,90]]
my_matrix

[[10, 20, 30], [40, 50, 60], [70, 80, 90]]

In [105]:
np.array(my_matrix)

array([[10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

### 2) Copying Arrays

In [106]:
a = np.array([1,2,3])
b = np.array([(1.5,2,3), (4,5,6)], dtype = float)
c = np.array([[(1.5,2,3), (4,5,6)], [(3,2,1), (4,5,6)]], dtype = float)

In [107]:
h = a.view() # Create a view of the array with the same data
h

array([1, 2, 3])

In [108]:
p=np.copy(a) # Create a copy of the array
p

array([1, 2, 3])

In [109]:
p[2]=0
p

array([1, 2, 0])

In [110]:
a

array([1, 2, 3])

In [111]:
h = a.copy() # Create a deep copy of the array
h

array([1, 2, 3])

In [112]:
h[1]=3
h

array([1, 3, 3])

### 3) Sorting Arrays
- syntax: sort(a, axis=-1, kind=None, order=None) 
  - Use the `order` keyword to specify a field to use when sorting a structured array
- Return a sorted copy of an array.


In [113]:
np.info(np.sort)

 sort(a, axis=-1, kind=None, order=None)

Return a sorted copy of an array.

Parameters
----------
a : array_like
    Array to be sorted.
axis : int or None, optional
    Axis along which to sort. If None, the array is flattened before
    sorting. The default is -1, which sorts along the last axis.
kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, optional
    Sorting algorithm. The default is 'quicksort'. Note that both 'stable'
    and 'mergesort' use timsort or radix sort under the covers and, in general,
    the actual implementation will vary with data type. The 'mergesort' option
    is retained for backwards compatibility.

    .. versionchanged:: 1.15.0.
       The 'stable' option was added.

order : str or list of str, optional
    When `a` is an array with fields defined, this argument specifies
    which fields to compare first, second, etc.  A single field can
    be specified as a string, and not all fields need be specified,
    but unspecified fields will still b

In [114]:
a.sort() # Sort an array
a

array([1, 2, 3])

In [115]:
b

array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])

In [116]:
b.sort(axis=1) # Sort the elements of an array's axis
b

array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])

### 4) Using Array-Generating Functions to Create Initial Placeholders

For larger arrays it is inpractical to initialize the data manually, using explicit Python lists. Instead we can use one of the many built-in functions in `Numpy` that generate arrays of different forms. Some of the more common are:

#### a) `arange(start, stop, step size)` Function: Return evenly spaced values within a given interval - **Only includes the lower end**.

In [117]:
np.arange(0,7.1)

array([0., 1., 2., 3., 4., 5., 6., 7.])

In [118]:
# create a range
x = np.arange(0, 103, 3) # arguments: start, stop, step
x

array([  0,   3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36,
        39,  42,  45,  48,  51,  54,  57,  60,  63,  66,  69,  72,  75,
        78,  81,  84,  87,  90,  93,  96,  99, 102])

In [119]:
x = np.arange(-1, 1, 0.1)
x

array([-1.00000000e+00, -9.00000000e-01, -8.00000000e-01, -7.00000000e-01,
       -6.00000000e-01, -5.00000000e-01, -4.00000000e-01, -3.00000000e-01,
       -2.00000000e-01, -1.00000000e-01, -2.22044605e-16,  1.00000000e-01,
        2.00000000e-01,  3.00000000e-01,  4.00000000e-01,  5.00000000e-01,
        6.00000000e-01,  7.00000000e-01,  8.00000000e-01,  9.00000000e-01])

In [120]:
# Students practice 
y = np.arange(-100,101,100)
y

array([-100,    0,  100])

#### b) `zeros()` and `ones()` Functions: Generate arrays of zeros or ones

In [121]:
import numpy as np
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [122]:
np.zeros((4,6))  # pass a tuple for two dimensional array

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [123]:
# Reshape
m = np.array([[1,2,3],[4,5,6]])
m

array([[1, 2, 3],
       [4, 5, 6]])

In [124]:
m.shape

(2, 3)

In [125]:
m.shape[0]

2

In [126]:
m.shape[1]

3

In [127]:
n = np.arange(0,30,3)
n

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])

In [128]:
n=n.reshape(2,5)
n

array([[ 0,  3,  6,  9, 12],
       [15, 18, 21, 24, 27]])

In [129]:
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [130]:
np.ones((3,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

#### c) `linspace(start, stop, n of evenly spaced numbers)` Funtion: 

Return evenly spaced numbers over a specified interval - both end points ARE included

In [131]:
o=np.linspace(0,4,9)
o

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ])

In [135]:
o.resize(3,3)
o

array([[0. , 0.5, 1. ],
       [1.5, 2. , 2.5],
       [3. , 3.5, 4. ]])

In [136]:
o.reshape(3,3)

array([[0. , 0.5, 1. ],
       [1.5, 2. , 2.5],
       [3. , 3.5, 4. ]])

In [137]:
np.linspace(0, 10, 10)

array([ 0.        ,  1.11111111,  2.22222222,  3.33333333,  4.44444444,
        5.55555556,  6.66666667,  7.77777778,  8.88888889, 10.        ])

In [138]:
# Exercise:
np.linspace(-5,10,5)

array([-5.  , -1.25,  2.5 ,  6.25, 10.  ])

#### d) `diag()` Function: Creates a diagonal matrix

In [139]:
# a diagonal matrix
np.diag([1,2,3,4,5])

array([[1, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 3, 0, 0],
       [0, 0, 0, 4, 0],
       [0, 0, 0, 0, 5]])

In [140]:
# diagonal with offset from the main diagonal
np.diag([1,2,3], k=1) 

array([[0, 1, 0, 0],
       [0, 0, 2, 0],
       [0, 0, 0, 3],
       [0, 0, 0, 0]])

In [141]:
np.diag([1,2,3], k=-2) 

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 3, 0, 0]])

#### e) `eye()` Function: Creates an identity matrix

In [142]:
np.eye(8)

array([[1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1.]])

#### f)  `full()` Function: Create a constant array

In [143]:
e = np.full((2,2),7)
e

array([[7, 7],
       [7, 7]])

#### g) `empty()` Function: Create an empty array

In [144]:
 np.empty((3,2))

array([[1.5, 2. ],
       [3. , 4. ],
       [5. , 6. ]])

In [145]:
np.vstack([p,2*p])

array([[1, 2, 0],
       [2, 4, 0]])

In [146]:
np.hstack([p,2*p])

array([1, 2, 0, 2, 4, 0])

In [147]:
p.T

array([1, 2, 0])

In [148]:
p.T.shape

(3,)

In [149]:
p.dtype

dtype('int32')

In [150]:
p.astype('f')

array([1., 2., 0.], dtype=float32)

In [151]:
a=np.array([1,2,3,4,5,6])

In [152]:
a.max()

6

In [153]:
a.min()

1

In [154]:
a.sum()

21

In [155]:
a.mean()

3.5

In [156]:
a.std()

1.707825127659933

In [157]:
# return index location of min or max
a.argmax()

5

In [158]:
a.argmin()

0

## Section 4. Numpy Random Module: Creates random number arrays

### 1) `rand()` Function: 
Create an array of the given shape and populate it with random samples from an uniform distribution over `[0, 1)`

In [159]:
np.random.seed(3)

In [160]:
from numpy import random
np.random.seed(123)
# uniform random numbers in [0,1]
np.random.rand(6,5)  # two dimensional array

array([[0.69646919, 0.28613933, 0.22685145, 0.55131477, 0.71946897],
       [0.42310646, 0.9807642 , 0.68482974, 0.4809319 , 0.39211752],
       [0.34317802, 0.72904971, 0.43857224, 0.0596779 , 0.39804426],
       [0.73799541, 0.18249173, 0.17545176, 0.53155137, 0.53182759],
       [0.63440096, 0.84943179, 0.72445532, 0.61102351, 0.72244338],
       [0.32295891, 0.36178866, 0.22826323, 0.29371405, 0.63097612]])

In [161]:
np.random.rand(4)   # one dimensional array from uniform distribution

array([0.09210494, 0.43370117, 0.43086276, 0.4936851 ])

In [162]:
np.random.rand(6,10)

array([[0.42583029, 0.31226122, 0.42635131, 0.89338916, 0.94416002,
        0.50183668, 0.62395295, 0.1156184 , 0.31728548, 0.41482621],
       [0.86630916, 0.25045537, 0.48303426, 0.98555979, 0.51948512,
        0.61289453, 0.12062867, 0.8263408 , 0.60306013, 0.54506801],
       [0.34276383, 0.30412079, 0.41702221, 0.68130077, 0.87545684,
        0.51042234, 0.66931378, 0.58593655, 0.6249035 , 0.67468905],
       [0.84234244, 0.08319499, 0.76368284, 0.24366637, 0.19422296,
        0.57245696, 0.09571252, 0.88532683, 0.62724897, 0.72341636],
       [0.01612921, 0.59443188, 0.55678519, 0.15895964, 0.15307052,
        0.69552953, 0.31876643, 0.6919703 , 0.55438325, 0.38895057],
       [0.92513249, 0.84167   , 0.35739757, 0.04359146, 0.30476807,
        0.39818568, 0.70495883, 0.99535848, 0.35591487, 0.76254781]])

### 2) `randn()` Function: 
Returns a sample (or samples) from the "standard normal" distribution. Unlike rand which is uniform:

In [163]:
# standard normal distributed random numbers
np.random.randn(3)

array([ 1.66095249,  0.80730819, -0.31475815])

In [164]:
np.random.randn(5,5)

array([[-1.0859024 , -0.73246199, -1.21252313,  2.08711336,  0.16444123],
       [ 1.15020554, -1.26735205,  0.18103513,  1.17786194, -0.33501076],
       [ 1.03111446, -1.08456791, -1.36347154,  0.37940061, -0.37917643],
       [ 0.64205469, -1.97788793,  0.71226464,  2.59830393, -0.02462598],
       [ 0.03414213,  0.17954948, -1.86197571,  0.42614664, -1.60540974]])

### 3) `randint(low, high, n)` Function: 
Returns n random integers from `low (inclusive)` to high (exclusive).

In [165]:
from numpy.random import randint
np.random.seed(119)
randint(1,100)  # Return random integers from `low` (inclusive) to `high` (exclusive).

67

In [166]:
np.random.seed(112)
randint(1,101,100)

array([44, 91, 66, 70, 45, 40, 41, 21, 42, 43, 49, 30, 69, 64,  7, 30, 49,
       52,  6, 29,  9,  4, 27, 75, 75, 38, 19, 53, 79, 49, 62, 43, 49, 97,
       41, 99, 60, 78, 15, 79, 25, 67,  2, 96,  4, 31, 53, 35, 36, 88, 33,
       78, 40, 35, 56, 74, 75,  8,  7, 44, 26, 75, 22, 27, 99, 87, 14, 18,
       90, 35, 53, 54, 53,  9, 40, 53, 40, 97, 44, 46,  6, 75, 64, 37, 76,
       28, 93, 45, 89, 35, 15, 72, 77, 98, 98, 80, 84, 80, 27, 10])

In [167]:
# Example
x=np.arange(0,36)
x.reshape(6,6)

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

In [168]:
x

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35])

In [169]:
x[::]

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35])

In [170]:
x[::5]

array([ 0,  5, 10, 15, 20, 25, 30, 35])

In [171]:
twentyfive_vals=np.random.randn(5,5)
twentyfive_vals

array([[ 2.03189605, -1.96204765,  1.91169115, -0.0354508 ,  1.43942482],
       [-0.46391558, -1.21449972,  0.14787248, -0.88348886, -0.07795269],
       [-1.86134428,  1.54278481,  0.37828452,  0.06062351, -2.12861767],
       [-0.08531725,  1.02274225, -1.219129  ,  0.77151108, -0.52415858],
       [-0.41272449, -0.97052835,  0.58905416,  0.69673544, -2.4535435 ]])

In [172]:
twentyfive_vals > 0.1

array([[ True, False,  True, False,  True],
       [False, False,  True, False, False],
       [False,  True,  True, False, False],
       [False,  True, False,  True, False],
       [False, False,  True,  True, False]])

In [173]:
cond1=twentyfive_vals[twentyfive_vals > 0.1]
cond1

array([2.03189605, 1.91169115, 1.43942482, 0.14787248, 1.54278481,
       0.37828452, 1.02274225, 0.77151108, 0.58905416, 0.69673544])

In [174]:
cond2=cond1[cond1<0.5]
cond2

array([0.14787248, 0.37828452])

In [175]:
twentyfive_vals[(twentyfive_vals > 0.1) and (twentyfive_vals < 0.5)]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [176]:
twentyfive_vals[(twentyfive_vals > 0.1) & (twentyfive_vals < 0.5)]

array([0.14787248, 0.37828452])

In [177]:
twentyfive_vals[(twentyfive_vals > 0.1) or (twentyfive_vals < 0.5)]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [178]:
twentyfive_vals[(twentyfive_vals > 0.1) | (twentyfive_vals < 0.5)]

array([ 2.03189605, -1.96204765,  1.91169115, -0.0354508 ,  1.43942482,
       -0.46391558, -1.21449972,  0.14787248, -0.88348886, -0.07795269,
       -1.86134428,  1.54278481,  0.37828452,  0.06062351, -2.12861767,
       -0.08531725,  1.02274225, -1.219129  ,  0.77151108, -0.52415858,
       -0.41272449, -0.97052835,  0.58905416,  0.69673544, -2.4535435 ])

In [179]:
# Exercise: from the above array, find all values greater than 0.5 and less than 0.7


## Further reading

- http://numpy.scipy.org
- http://scipy.org/Tentative_NumPy_Tutorial
- http://scipy.org/NumPy_for_Matlab_Users - A Numpy guide for MATLAB users.

#### Note: The course materials are developed mainly based on personal experience and contributions from the Python learning community
Referred Books: 
- Learning Python, 5th Edition by Mark Lutz
- Python Data Science Handbook, Jake, VanderPlas
- Python for Data Analysis, Wes McKinney    

Copyright ©2023 Mei Najim. All rights reserved. 