<img src="https://www.mines.edu/webcentral/wp-content/uploads/sites/267/2019/02/horizontallightbackground.jpg" width="100%"> 
### CSCI250 Python Computing: Building a Sensor System
<hr style="height:5px" width="100%" align="left">

# Introduction to `numpy`

# Objective
* introduce the `numpy` library
* motivate its use for sensing and data analysis

<img src="https://numpy.org/images/logos/numpy.svg" width="40%" align="left">

# Resources
* [numpy.org](http://www.numpy.org)
* [`numpy` user guide](https://docs.scipy.org/doc/numpy/user)
* [`numpy` reference](https://docs.scipy.org/doc/numpy/reference)

# Definition
`numpy` is a Python package designed for **efficient computing** on **multidimensional arrays** of identical numeric items. 
* Python is attractive because of its **simplicity**.
* `numpy` ads storage and execution **efficiency**.


`numpy` comes with many functions optimized for numeric calculations.

`numpy` is conventionally imported with the **alias** `np`. 

In [None]:
import numpy as np

# Python arrays

Python `list` types could be used to represent arrays of numbers. 

The `list` representation is inefficient:
* **storage**: Python lists require more memory than `numpy` arrays.
* **speed**: access to Python lists is slower than for `numpy` arrays.

<img src="https://www.dropbox.com/s/u628vjn2uc5h3ua/notebook.png?raw=1" width="10%" align="right">

See the [arrays1D notebook](s_NpArray1D.ipynb) for info on 1D `numpy` arrays.

See the [arrays2D notebook](s_NpArray2D.ipynb) for info on 2D `numpy` arrays.

# `ndarray`

`numpy` arrays are represented by the `ndarray` type.

* all elements are of the same type and size
* the access to array elements is more efficient
* the memory occupied by the array is smaller

In [None]:
n = 1000000

# python array (a list)
a = [i for i in range(n)]

print('type of a=',type(a))

In [None]:
# numpy array
b = np.arange(n)

print('type of b=',type(b))

# `ndarray` size
`numpy` arrays occupy less space in memory than `list` arrays.

* `ndarray`: homogeneous 
    * specialized for numeric operations
* `list`: heterogeneous
    * can handle arbitrary elements (`float`,`string`, etc)

In [None]:
import sys

# python array
print('size of a =',sys.getsizeof(a))

In [None]:
# numpy array
print('size of b =',sys.getsizeof(b))

# `ndarray` access
`numpy` arrays have shorter access time than `list` arrays.

In [None]:
import time

In [None]:
tick = time.time()                           # start clock
a = [i for i in range(n)]
tock = time.time()                           #  stop clock
dtList = (tock-tick)*1e6                     # time difference 
print('elapsed time =',int(dtList),'(us)')   # micro seconds

In [None]:
tick = time.time()                          # start clock
b = np.arange(n)
tock = time.time()                          #  stop clock
dtNumpy = (tock-tick)*1e6                   # time difference 
print('elapsed time =',int(dtNumpy),'(us)') # micro seconds

In [None]:
# time ratio
int(dtList/dtNumpy)

We can also evaluate the access times using **magic commands**.

In [None]:
%%timeit
a = [i for i in range(n)]

In [None]:
%%timeit
b = np.arange(n)

# execution time
`numpy` arrays facilitate faster processing than `list` arrays.

In [None]:
tick = time.time()
sa = [i**2 for i in a]
tock = time.time()
dtList = (tock-tick)*1e6
print('elapsed time =',int(dtList),'(us)')

In [None]:
tick = time.time()
sb = b**2
tock = time.time()
dtNumpy = (tock-tick)*1e6
print('elapsed time =',int(dtNumpy),'(us)')

In [None]:
# time ratio
int(dtList/dtNumpy)

We can also evaluate the access times using **magic commands**.

In [None]:
%%timeit
sa = [i**2 for i in a]

In [None]:
%%timeit
sb = b**2

**`numpy` calculations are much more efficient than `list` calculations**
***

<img src="https://www.dropbox.com/s/u628vjn2uc5h3ua/notebook.png?raw=1" width="10%" align="right">

See the [vectorize1D notebook](s_NpVectorize1D.ipynb) for info on fast 1D computing.

See the [vectorize2D notebook](s_NpVectorize2D.ipynb) for info on fast 2D computing.

<img src="http://www.dropbox.com/s/fcucolyuzdjl80k/todo.jpg?raw=1" width="10%" align="right">

Check how storage and access time depend on the array size. 

Evaluate 
* access time and 
* processing time 

ratios for array sizes from $n=10^1$ to $n=10^9$.

# `numpy` data types

`numpy` supports additional data types relative to Python.

They are useful for analysis of data provided by physical systems.

# `numpy` `int`
The `int` type can be represented by a different number of bits

* `int8`
* `int16`
* `int32`
* `int64`

which correspond to `int`'s in different ranges. 

*** 
Choose the appropriate `int` type that 
* covers the range needed by data representation, 
* while minimizing data storage.

In [None]:
print(np.iinfo(np.int8))
print(np.iinfo(np.int16))
print(np.iinfo(np.int32))
print(np.iinfo(np.int64))

In [None]:
i = 1

i08 = np.int8(i)
print('size of i08 =',sys.getsizeof(i08))

i16 = np.int16(i)
print('size of i16 =',sys.getsizeof(i16))

i32 = np.int32(i)
print('size of i32 =',sys.getsizeof(i32))

i64 = np.int64(i)
print('size of i64 =',sys.getsizeof(i64))

In [None]:
xmin = 0
xmax = 1e6
dx = 1

x08 = np.arange(xmin,xmax,dx, dtype=np.int8)
print('size of x08 =',sys.getsizeof(x08))

x16 = np.arange(xmin,xmax,dx, dtype=np.int16)
print('size of x16 =',sys.getsizeof(x16))

x32 = np.arange(xmin,xmax,dx, dtype=np.int32)
print('size of x32 =',sys.getsizeof(x32))

x64 = np.arange(xmin,xmax,dx, dtype=np.int64)
print('size of x64 =',sys.getsizeof(x64))

In [None]:
print( np.min(x08), np.max(x08) ) # this type is insufficient
print( np.min(x16), np.max(x16) ) # this type is insufficient
print( np.min(x32), np.max(x32) )
print( np.min(x64), np.max(x64) )

<img src="http://www.dropbox.com/s/fcucolyuzdjl80k/todo.jpg?raw=1" width="10%" align="right">

Use `np.iinfo?` to explore the `numpy` `int` types.

In [None]:
np.iinfo?

# `numpy` `float`
The`float` type can be represented by different number of bits
* `float16`
* `float32`
* `float64`

which correspond to floating point numbers of different ranges. 

*** 
Choose the appropriate `float` type that 
* covers the range needed by data representation, 
* while minimizing data storage.

In [None]:
print(np.finfo(np.float16))
print(np.finfo(np.float32))
print(np.finfo(np.float64))

In [None]:
f = 1.0

f16 = np.float16(f)
print('size of f16 =',sys.getsizeof(f16))

f32 = np.float32(f)
print('size of f32 =',sys.getsizeof(f32))

f64 = np.float64(f)
print('size of f64 =',sys.getsizeof(f64))

In [None]:
xmin = 0.0
xmax = 1.0
dx = 1e-6

x16 = np.arange(xmin,xmax,dx, dtype=np.float16)
print('size of x16 =',sys.getsizeof(x16))

x32 = np.arange(xmin,xmax,dx, dtype=np.float32)
print('size of x32 =',sys.getsizeof(x32))

x64 = np.arange(xmin,xmax,dx, dtype=np.float64)
print('size of x64 =',sys.getsizeof(x64))

In [None]:
print( np.min(x16), np.max(x16) ) # this type is not sufficient
print( np.min(x32), np.max(x32) )
print( np.min(x64), np.max(x64) )

<img src="http://www.dropbox.com/s/fcucolyuzdjl80k/todo.jpg?raw=1" width="10%" align="right">

Use `np.finfo?` to explore the `numpy` `float` types.

In [None]:
np.finfo?