<img src="https://www.mines.edu/webcentral/wp-content/uploads/sites/267/2019/02/horizontallightbackground.jpg" width="100%"> 
### CSCI250 Python Computing: Building a Sensor System
<hr style="height:5px" width="100%" align="left">

# Introduction to `numpy`

# Objective
* introduce the `numpy` library
* motivate its use for sensing and data analysis

<img src="https://numpy.org/images/logos/numpy.svg" width="40%" align="left">

# Resources
* [numpy.org](http://www.numpy.org)
* [`numpy` user guide](https://docs.scipy.org/doc/numpy/user)
* [`numpy` reference](https://docs.scipy.org/doc/numpy/reference)

# Definition
`numpy` is a Python package designed for **efficient computing** on **multidimensional arrays** of identical numeric items. 
* Python is attractive because of its **simplicity**.
* `numpy` ads storage and execution **efficiency**.


`numpy` comes with many functions optimized for numeric calculations.

`numpy` is conventionally imported with the **alias** `np`. 

In [1]:
import numpy as np

# Python arrays

Python `list` types could be used to represent arrays of numbers. 

The `list` representation is inefficient:
* **storage**: Python lists require more memory than `numpy` arrays.
* **speed**: access to Python lists is slower than for `numpy` arrays.

<img src="https://www.dropbox.com/s/u628vjn2uc5h3ua/notebook.png?raw=1" width="10%" align="right">

See the [arrays1D notebook](s_NpArray1D.ipynb) for info on 1D `numpy` arrays.

See the [arrays2D notebook](s_NpArray2D.ipynb) for info on 2D `numpy` arrays.

# `ndarray`

`numpy` arrays are represented by the `ndarray` type.

* all elements are of the same type and size
* the access to array elements is more efficient
* the memory occupied by the array is smaller

In [2]:
n = 1000000

# python array (a list)
a = [i for i in range(n)]

print('type of a=',type(a))

type of a= <class 'list'>


In [3]:
# numpy array
b = np.arange(n)

print('type of b=',type(b))

type of b= <class 'numpy.ndarray'>


# `ndarray` size
`numpy` arrays occupy less space in memory than `list` arrays.

* `ndarray`: homogeneous 
    * specialized for numeric operations
* `list`: heterogeneous
    * can handle arbitrary elements (`float`,`string`, etc)

In [4]:
import sys

# python array
print('size of a =',sys.getsizeof(a))

size of a = 4224364


In [5]:
# numpy array
print('size of b =',sys.getsizeof(b))

size of b = 4000048


# `ndarray` access
`numpy` arrays have shorter access time than `list` arrays.

In [6]:
import time

In [16]:
tick = time.time()                           # start clock
a = [i for i in range(n)]
tock = time.time()                           #  stop clock
dtList = (tock-tick)*1e6                     # time difference 
print('elapsed time =',int(dtList),'(us)')   # micro seconds

elapsed time = 184421 (us)


In [17]:
tick = time.time()                          # start clock
b = np.arange(n)
tock = time.time()                          #  stop clock
dtNumpy = (tock-tick)*1e6                   # time difference 
print('elapsed time =',int(dtNumpy),'(us)') # micro seconds

elapsed time = 14054 (us)


In [18]:
# time ratio
int(dtList/dtNumpy)

13

We can also evaluate the access times using **magic commands**.

In [19]:
%%timeit
a = [i for i in range(n)]

134 ms ± 515 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [20]:
%%timeit
b = np.arange(n)

10.9 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


# execution time
`numpy` arrays facilitate faster processing than `list` arrays.

In [26]:
tick = time.time()
sa = [i**2 for i in a]
tock = time.time()
dtList = (tock-tick)*1e6
print('elapsed time =',int(dtList),'(us)')

elapsed time = 659034 (us)


In [27]:
tick = time.time()
sb = b**2
tock = time.time()
dtNumpy = (tock-tick)*1e6
print('elapsed time =',int(dtNumpy),'(us)')

elapsed time = 12971 (us)


In [28]:
# time ratio
int(dtList/dtNumpy)

50

We can also evaluate the access times using **magic commands**.

In [29]:
%%timeit
sa = [i**2 for i in a]

569 ms ± 2.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [30]:
%%timeit
sb = b**2

11.2 ms ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


**`numpy` calculations are much more efficient than `list` calculations**
***

<img src="https://www.dropbox.com/s/u628vjn2uc5h3ua/notebook.png?raw=1" width="10%" align="right">

See the [vectorize1D notebook](s_NpVectorize1D.ipynb) for info on fast 1D computing.

See the [vectorize2D notebook](s_NpVectorize2D.ipynb) for info on fast 2D computing.

<img src="http://www.dropbox.com/s/fcucolyuzdjl80k/todo.jpg?raw=1" width="10%" align="right">

Check how storage and access time depend on the array size. 

Evaluate 
* access time and 
* processing time 

ratios for array sizes from $n=10^1$ to $n=10^9$.

In [49]:
import numpy as np
import time

m = 7
i = 0
while i <= m:
    tick = time.time()
    a = [n for n in range(int(10**i))]
    tock = time.time()
    dtList = (tock-tick)*1e6
    print('10**',i,'calculation, elapsed time =', int(dtList), '(µs)')
    i += 1
          

10** 0 calculation, elapsed time = 267265 (µs)
10** 1 calculation, elapsed time = 19 (µs)
10** 2 calculation, elapsed time = 17 (µs)
10** 3 calculation, elapsed time = 83 (µs)
10** 4 calculation, elapsed time = 684 (µs)
10** 5 calculation, elapsed time = 7982 (µs)
10** 6 calculation, elapsed time = 111794 (µs)
10** 7 calculation, elapsed time = 1138523 (µs)


In [53]:
import numpy as np
import time

m = 8
i = 0
while i <= m:
    tick = time.time()
    b = np.arange(10**i)
    tock = time.time()
    dtList = (tock-tick)*1e6
    print('10**',i,'calculation, elapsed time =', int(dtList), '(µs)')
    i += 1
          

10** 0 calculation, elapsed time = 89669 (µs)
10** 1 calculation, elapsed time = 34 (µs)
10** 2 calculation, elapsed time = 10 (µs)
10** 3 calculation, elapsed time = 15 (µs)
10** 4 calculation, elapsed time = 420 (µs)
10** 5 calculation, elapsed time = 146 (µs)
10** 6 calculation, elapsed time = 10042 (µs)
10** 7 calculation, elapsed time = 101119 (µs)
10** 8 calculation, elapsed time = 1017870 (µs)


# `numpy` data types

`numpy` supports additional data types relative to Python.

They are useful for analysis of data provided by physical systems.

# `numpy` `int`
The `int` type can be represented by a different number of bits

* `int8`
* `int16`
* `int32`
* `int64`

which correspond to `int`'s in different ranges. 

*** 
Choose the appropriate `int` type that 
* covers the range needed by data representation, 
* while minimizing data storage.

In [54]:
print(np.iinfo(np.int8))
print(np.iinfo(np.int16))
print(np.iinfo(np.int32))
print(np.iinfo(np.int64))

Machine parameters for int8
---------------------------------------------------------------
min = -128
max = 127
---------------------------------------------------------------

Machine parameters for int16
---------------------------------------------------------------
min = -32768
max = 32767
---------------------------------------------------------------

Machine parameters for int32
---------------------------------------------------------------
min = -2147483648
max = 2147483647
---------------------------------------------------------------

Machine parameters for int64
---------------------------------------------------------------
min = -9223372036854775808
max = 9223372036854775807
---------------------------------------------------------------



In [56]:
i = 1

i08 = np.int8(i)
print('size of i08 =',sys.getsizeof(i08))

i16 = np.int16(i)
print('size of i16 =',sys.getsizeof(i16))

i32 = np.int32(i)
print('size of i32 =',sys.getsizeof(i32))

i64 = np.int64(i)
print('size of i64 =',sys.getsizeof(i64))

size of i08 = 13
size of i16 = 14
size of i32 = 16
size of i64 = 24


In [57]:
xmin = 0
xmax = 1e6
dx = 1

x08 = np.arange(xmin,xmax,dx, dtype=np.int8)
print('size of x08 =',sys.getsizeof(x08))

x16 = np.arange(xmin,xmax,dx, dtype=np.int16)
print('size of x16 =',sys.getsizeof(x16))

x32 = np.arange(xmin,xmax,dx, dtype=np.int32)
print('size of x32 =',sys.getsizeof(x32))

x64 = np.arange(xmin,xmax,dx, dtype=np.int64)
print('size of x64 =',sys.getsizeof(x64))

size of x08 = 1000048
size of x16 = 2000048
size of x32 = 4000048
size of x64 = 8000048


In [58]:
print( np.min(x08), np.max(x08) ) # this type is insufficient
print( np.min(x16), np.max(x16) ) # this type is insufficient
print( np.min(x32), np.max(x32) )
print( np.min(x64), np.max(x64) )

-128 127
-32768 32767
0 999999
0 999999


<img src="http://www.dropbox.com/s/fcucolyuzdjl80k/todo.jpg?raw=1" width="10%" align="right">

Use `np.iinfo?` to explore the `numpy` `int` types.

In [59]:
np.iinfo?

# `numpy` `float`
The`float` type can be represented by different number of bits
* `float16`
* `float32`
* `float64`

which correspond to floating point numbers of different ranges. 

*** 
Choose the appropriate `float` type that 
* covers the range needed by data representation, 
* while minimizing data storage.

In [60]:
print(np.finfo(np.float16))
print(np.finfo(np.float32))
print(np.finfo(np.float64))

Machine parameters for float16
---------------------------------------------------------------
precision =   3   resolution = 1.00040e-03
machep =    -10   eps =        9.76562e-04
negep =     -11   epsneg =     4.88281e-04
minexp =    -14   tiny =       6.10352e-05
maxexp =     16   max =        6.55040e+04
nexp =        5   min =        -max
---------------------------------------------------------------

Machine parameters for float32
---------------------------------------------------------------
precision =   6   resolution = 1.0000000e-06
machep =    -23   eps =        1.1920929e-07
negep =     -24   epsneg =     5.9604645e-08
minexp =   -126   tiny =       1.1754944e-38
maxexp =    128   max =        3.4028235e+38
nexp =        8   min =        -max
---------------------------------------------------------------

Machine parameters for float64
---------------------------------------------------------------
precision =  15   resolution = 1.0000000000000001e-15
machep =    -52   e

In [62]:
f = 1.0

f16 = np.float16(f)
print('size of f16 =',sys.getsizeof(f16))

f32 = np.float32(f)
print('size of f32 =',sys.getsizeof(f32))

f64 = np.float64(f)
print('size of f64 =',sys.getsizeof(f64))

size of f16 = 14
size of f32 = 16
size of f64 = 24


In [63]:
xmin = 0.0
xmax = 1.0
dx = 1e-6

x16 = np.arange(xmin,xmax,dx, dtype=np.float16)
print('size of x16 =',sys.getsizeof(x16))

x32 = np.arange(xmin,xmax,dx, dtype=np.float32)
print('size of x32 =',sys.getsizeof(x32))

x64 = np.arange(xmin,xmax,dx, dtype=np.float64)
print('size of x64 =',sys.getsizeof(x64))

size of x16 = 2000048
size of x32 = 4000048
size of x64 = 8000048


In [64]:
print( np.min(x16), np.max(x16) ) # this type is not sufficient
print( np.min(x32), np.max(x32) )
print( np.min(x64), np.max(x64) )

0.0 1.014
0.0 0.999999
0.0 0.999999


<img src="http://www.dropbox.com/s/fcucolyuzdjl80k/todo.jpg?raw=1" width="10%" align="right">

Use `np.finfo?` to explore the `numpy` `float` types.

In [None]:
np.finfo?