<div class="alert block alert-info alert">

# <center> Scientific Programming in Python

## <center>Karl N. Kirschner<br>Bonn-Rhein-Sieg University of Applied Sciences<br>Sankt Augustin, Germany

# <center> NumPy: NUMerical PYthon

<b>Source</b>: https://numpy.org/doc/stable/
<br><br>


NumPy is the foundation for
- Pandas
- Matplotlib
- Scikit-learn
- PyTorch


- Excels at large <b>arrays</b> of data (i.e., VERY efficient)
    - RAM usage, and thus
    - Speed


- Array: an n-dimensional array (i.e., NumPy's name: `ndarray`):
    - a collections of values that have 1 or more dimensions
    - 1D array --> vector
    - 2D array --> matrix
    - nD array --> tensor


- All array data must be the same (i.e., homogeneous)


- Can perform computations on entire arrays without the need of loops


- Contains some nice mathematical funtions/tools (e.g., data extrapolation) - will be covered in the SciPy lecture


- Does not come by default with Python - must be installed

<hr style="border:2px solid gray"></hr>

<b>Comparisons to a regular list</b>:
1. Both are a container for items/elements
2. NumPy allows for faster items/elements getting (allows for faster mathematics), but
3. List are faster to insert new and remove existing items/elements

<hr style="border:2px solid gray"></hr>

#### Key Concept for NumPy
1. Each element in an array must be the same type (e.g., floats)
    - allows for efficient usage of RAM
    - NumPy always knows what the content of the array is


2. <b>Vectorizing operations</b><br>
    "This practice of replacing explicit loops with array expressions is commonly referred to as vectorization."
    - source: https://www.oreilly.com/library/view/python-for-data/9781449323592/ch04.html
    - do math operations all at once (i.e., does it one time only) on an ndarray


3. Integrates with C, C++ and Fortran to improve performance
    - In this sense, NumPy is  an intermediary between these low-level libraries and Python


4. The raw array data is put into a contiguous (and fixed) block of RAM
    - good at allocating space in RAM for storing the ndarrays
    

<b>More Information</b> for what is happening "under-the-hood": https://numpy.org/doc/stable/dev/internals.html#numpy-internals

<hr style="border:2px solid gray"></hr>

<b>Citing NumPy</b>: (https://numpy.org/citing-numpy/)

Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020).

@Article{         harris2020array,  
 title         = {Array programming with {NumPy}},  
 author        = {Charles R. Harris and K. Jarrod Millman and St{\'{e}}fan J.
                 van der Walt and Ralf Gommers and Pauli Virtanen and David
                 Cournapeau and Eric Wieser and Julian Taylor and Sebastian
                 Berg and Nathaniel J. Smith and Robert Kern and Matti Picus
                 and Stephan Hoyer and Marten H. van Kerkwijk and Matthew
                 Brett and Allan Haldane and Jaime Fern{\'{a}}ndez del
                 R{\'{i}}o and Mark Wiebe and Pearu Peterson and Pierre
                 G{\'{e}}rard-Marchant and Kevin Sheppard and Tyler Reddy and
                 Warren Weckesser and Hameer Abbasi and Christoph Gohlke and
                 Travis E. Oliphant},  
 year          = {2020},  
 month         = sep,  
 journal       = {Nature},  
 volume        = {585},  
 number        = {7825},  
 pages         = {357--362},  
 doi           = {10.1038/s41586-020-2649-2},  
 publisher     = {Springer Science and Business Media {LLC}},  
 url           = {https://doi.org/10.1038/s41586-020-2649-2}  
}

In [None]:
## For extra information given within the lectures

from IPython.display import HTML


def set_code_background(color: str):
    ''' Set the background color for code cells.

        Source: psychemedia via https://stackoverflow.com/questions/49429585/
                how-to-change-the-background-color-of-a-single-cell-in-a-jupyter-notebook-jupy

        To match Jupyter's dev class colors:
            "alert alert-block alert-warning" = #fcf8e3

        Args:
            color: HTML color, rgba, hex
    '''

    script = ("var cell = this.closest('.code_cell');"
              "var editor = cell.querySelector('.input_area');"
              f"editor.style.background='{color}';"
              "this.parentNode.removeChild(this)")
    display(HTML(f'<img src onerror="{script}">'))


set_code_background(color='#fcf8e3')

<hr style="border:2px solid gray"></hr>

In [None]:
import numpy as np
import pandas as pd
import timeit

print(np.__version__)
print(pd.__version__)

#%matplotlib inline

## N-dimensional array object (i.e., ndarray)

Let's create two objects:
1. a regular list
2. a numpy array via <b>A</b>rray <b>RANGE</b> (`arange`): https://numpy.org/doc/stable/reference/generated/numpy.arange.html), and


Then we can find demonstrate which is faster using the timeit library.

- timeit (to time code for performance): https://docs.python.org/3/library/timeit.html

In [None]:
my_list = list(range(100000))

print(type(my_list))
my_list

In [None]:
my_array = np.arange(100000)
my_array

Now, let's multiply containers by 2, and do that math 10000 times.

We need two "callable" functions for using `timeit` library.

<b>Disclaimer</b> - I am not properly formulating the functions below (e.g., including typing, context, etc.) in order to simplify the content for learning purposes.

In [None]:
def list_multiply(test_list):
    return test_list*2


def numpy_multiply(test_array):
    return test_array*2

#### timeit
https://docs.python.org/3/library/timeit.html

A very good library for testing code performance.

Multiply containers by 2, and do that math 10000 times.

For `timeit`, we can call our function through the use of a `lambda` (i.e., anonymous) function.

In [None]:
timeit.timeit(lambda:list_multiply(my_list), number=10000)

In [None]:
timeit.timeit(lambda:numpy_multiply(my_array), number=10000)

<b>Result</b> - the use of NumPy arrays is significantly faster than that for lists.

<div class="alert alert-block alert-warning">
<hr style="border:1.5px dashed gray"></hr>

## Extra Information

### 1. `lambda` functions
- anonymous functions for quick tasks
- https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions

### 2. `time`
- an alternative timing library
- https://docs.python.org/3/library/time.html

### 3. `_` - underscore

Notice how the underscore symbol will be used above. There are several instances where one can use `_` in Python (see https://www.datacamp.com/tutorial/role-underscore-python), but in the below example it is employed

1. to represent a variable that is not used further.

In [None]:
import time


## Regular List
start_time = time.process_time()

for _ in range(10000):
    list_multiply(my_list)

stop_time = time.process_time()

print(f"List timing: {stop_time - start_time:0.1f} seconds")


## NumPy Array
start_time = time.process_time()

for _ in range(10000):
    numpy_multiply(my_array)

stop_time = time.process_time()

print(f"NumPy timing: {stop_time - start_time:0.1f} seconds")

<div class="alert alert-block alert-warning">
<hr style="border:1.5px dashed gray"></hr>

<hr style="border:2px solid gray"></hr>

## Creating NumPy Arrays from Scratch

In the following we will create several arrays that we can uses throughout this lecture.

### Conversion from lists
Let's create 2 data lists with 5 data points each

In [None]:
list_1 = [6, 1, 6, 7, 9]
list_2 = [3, 5, 4, 2, 8]

Now create two arrays, with <b>shapes</b> of <b>(5,)</b>

In [None]:
array_1 = np.array(list_1)
array_2 = np.array(list_2)

display(array_1)
display(array_2)

Create a new <b>nested list</b>, and then <b>convert</b> the nested lists to a <b>NumPy array</b> with a <b>shape</b> of <b>(2, 5)</b>:

In [None]:
list_3 = [list_1, list_2]
list_3

In [None]:
array_3 = np.array(list_3)
array_3

Put `array_3` to memory - we will use it later in the lecture.

### Array shapes and dimensions

#### 1D shape

Recall that we created `array_1` via:

`list_1 = [6, 1, 6, 7, 9]`

`array_1 = np.array(list_1)`

Since we have it as a NumPy array, we can get's it shape:

In [None]:
display(array_1)

array_1.shape

<b>Note</b>: this would change if you added <b>double brackets</b> (i.e., a nested list) to the above declaration.

In [None]:
example = [[6, 1, 6, 7, 9]]

test = np.array(example)

display(test)

test.shape

#### nD shape

Use `array_3` as an example:

In [None]:
array_3

In [None]:
array_3.shape

In [None]:
array_3.ndim

#### Data types

Describes what <b>type</b> that the <b>items</b> are <b>within</b> an array.

- https://numpy.org/doc/stable/reference/arrays.dtypes.html
- https://numpy.org/doc/stable/user/basics.types.html

In [None]:
array_3.dtype

Reminder of using `type` to figure out what the object is that you are dealing with:

In [None]:
type(array_3)

<hr style="border:2px solid gray"></hr>

## More on creating new arrays

#### An array that contains the same number - `np.fill`

Create an array with a shape of <b>(3, 5)</b>, and <b>fill</b> it with an approximate <b>pi</b> value (e.g., 3.14):
- <font color='dodgerblue'>1 ndarray</font>, containing
- <font color='dodgerblue'>3 lists</font>, with
- each containing <font color='dodgerblue'>5 ca. pi</font> values


- `np.full`: https://numpy.org/devdocs/reference/generated/numpy.full.html#numpy.full


In [None]:
np.full((3, 5), 3.14)

#### An array of integers - `np.arange`

Create an array with a shape of <b>(1, 30)</b> from <b>-10 to 50</b> using a <b>stepping size of 2</b>

(similar to built-in `range` function)

- `np.arrange`: https://numpy.org/doc/stable/reference/generated/numpy.arange.html

In [None]:
np.arange(-10, 52, 2)

#### An array of floats - `np.linspace`

Create an array wiha shape of <b>(10,)</b> that contains <b>10 evenly spaced values</b> between <b>-1 and 1</b>

- `np.linspace`: https://numpy.org/devdocs/reference/generated/numpy.linspace.html

In [None]:
np.linspace(-1, 1, num=10)

<div class="alert alert-block alert-warning">
<hr style="border:1.5px dashed gray"></hr>

## Extra Information

### 1. Creating an array of nested lists containing floats

Source: Fırat Korkmaz, "Is there a multi-dimensional version of arange/linspace in numpy?" https://stackoverflow.com/questions/32208359/is-there-a-multi-dimensional-version-of-arange-linspace-in-numpy/55675355#55675355. Accessed on Nov. 15, 2022.
<br><br>


Create an array with 15 nested lists containing 2 items each (i.e., an array with shape (15,2))
- the first (i.e. `(1.0, 2.0)`) and last (i.e. `(10.0, 20.0)`) tuple specifies the values for the first and last list (indexed sequentially), with the indexes of the middle lists filled with values that range between the respective values (i.e. from <b>1.0--10.0</b> and from <b>2.0--20.0</b>).

In [None]:
np.linspace((1.0, 2.0), (10.0, 20.0), num=15)

<div class="alert alert-block alert-warning">

For further illustration...

Create an array with 15 nested lists containing 3 items each (i.e., an array with shape (15,3)):

Notice for each nested list:
- index 0 ranges from 1.0--10.0
- index 1 ranges from 1.5--15.0
- index 2 ranges from 2.0--20.0

In [None]:
np.linspace((1.0, 1.5, 2.0), (10.0, 15.0, 20.0), num=15)

<div class="alert alert-block alert-warning">
<hr style="border:1.5px dashed gray"></hr>

#### An array of random numbers - `np.random.random_sample`

Create array with random, but continuous distributed, values between 0 and 1
- random.random_sample function: https://numpy.org/doc/stable/reference/random/generated/numpy.random.random_sample.html#numpy.random.random_sample

An array with a shape of <b>(3,)</b>:

In [None]:
np.random.random_sample(3)

An array with a shape of <b>(3, 4)</b>:

In [None]:
np.random.random_sample((3, 4))

<hr style="border:2px solid gray"></hr>

## Accessing arrays

#### One dimensional array
Let's look at the <b>(5,)</b> `array_1` from above

In [None]:
array_1

Accessing the fourth item position (i.e., at an index of 3)

In [None]:
array_1[3]

#### A multidimensional array

Now look at a 2D array (i.e., the <b>(2, 5)</b> array_3 from above)

In [None]:
array_3

Access the fist sublist from the 2D array

i.e., array([<font color="dodgerblue"><b>[6,  1,  6,  7,  9]</b></font>, [-5,  0,  2,  4,  3]]

In [None]:
array_3[0]

Access the second sublist, fourth item (i.e., list positions 1 and then item position 3)

i.e., array([[6,  1,  6,  7,  9], [3, 5, 4, <font color="dodgerblue"><b>2</b></font>,  8]]

In [None]:
array_3[1, 3]

#### Slicing an array

Demo using `array_3[0]` and slicing via
- [0:1]
- [1:2]
- [0:2]
- [0:3]

Slice to obtain the first nested array (same as `array_3[0]`):

In [None]:
array_3[0:1]

Slice to obtain the second nested array

In [None]:
array_3[1:2]

Slice to obtain the entire array

In [None]:
array_3[0:2]

<b>Notice</b> that we can specify upper numbers that go beyond the array, without giving an error:

In [None]:
array_3[0:6]

But a better way would be to give just the array, since it <b>removes possible confusion</b> over the unclear slicing (i.e., <b>concise coding</b>):

In [None]:
array_3

<hr style="border:2px solid gray"></hr>

## Filter (search) for elements

- NumPy arrays <font color='dodgerblue'>are not</font> index like a list, so the more typical filtering/searching methods are <b>not available</b> (e.g. list comprehension)
- `numpy.where` is used instead
    - https://numpy.org/doc/stable/reference/generated/numpy.where.html

<b>Reminder</b> - filtering a <b>regular list</b>:

In [None]:
list_4 = [-6,  1,  6,  7,  9, -5,  0,  2,  4,  3]

[number for number in list_4 if number < 0]

<div class="alert alert-block alert-warning">
<hr style="border:1.5px dashed gray"></hr>

## Extra Information

### 1. list comprehension
- List comprehension are a Pythonic way to create a list using an iterative structure on one line
- This approach is more concise, but sometimes it looses readability
- https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions

The above "list comprehension" code to filter `list_4` replaces the following more traditional code structure:

In [None]:
set_code_background(color='#fcf8e3')

filtered_list = []

for number in list_4:
    if number < 0:
        filtered_list.append(number)

filtered_list

<hr style="border:1.5px dashed gray"></hr>

Now, how would we <b>filter</b> the `array_3` NumPy array for values less than 6

- <b>filter</b> using `np.where`: https://numpy.org/doc/stable/reference/generated/numpy.where.html

In [None]:
array_3

In [None]:
filtered_items = np.where(array_3 < 6)

array_3[filtered_items]

<b>Notice</b> that we obtained a 1 dimensional array:

In [None]:
array_3[filtered_items].ndim

<hr style="border:2px solid gray"></hr>

### Flatten a multidimensional array & conversion to a list

Collapsed a nD array into 1D:

- `ndarray.flatten()`: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html

In [None]:
array_3.flatten()

In [None]:
array_3.flatten().ndim

Convert the results to a list:

- 'ndarray.tolist()`: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.tolist.html?highlight=tolist

1D NumPy array to a list:

In [None]:
array_1

In [None]:
array_1.tolist()

nD NumPy array to a list:

In [None]:
array_3.flatten().tolist()

<hr style="border:2px solid gray"></hr>

## Joining arrays

#### Multiple arrays with the <font color='dodgerblue'>same dimensions</font>

In [None]:
array_1

In [None]:
array_2

<b>Concatenate</b>: https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html
- Pass a `list` of `np.array`
- Specify which axis you want to combine along

<br>

- `axis=0`:
    - Multiple <b>nD arrays</b>, along their first axis (i.e., <font color='dodgerblue'><b>axis=0</b></font>)
    - conceptually, this is like <font color='dodgerblue'> <b>adding more rows</b> </font>

In [None]:
array_long = np.concatenate([array_1, array_2], axis=0)

display(array_long)
display(array_long.shape)

Okay, we can present a NumPy array a bit more aesthetically pleasing.

Use <b>Pandas</b> to print out the table in a more human (e.g., a natural scientist) readable form

In [None]:
pd.DataFrame(array_long)

How about a nested array?

In [None]:
array_3

In [None]:
array_big = np.concatenate([array_3, array_3, array_3], axis=0)

display(array_big.shape)

pd.DataFrame(array_big)

<b>Notice</b>: The <b>individual</b> nested <b>arrays</b> are still being <b>added</b> as <i><b>rows</b></i> (due to axis=0). 

- `axis=1`:
    - Multiple <b>nD arrays</b>, along their second axis (i.e., <font color='dodgerblue'><b>axis=1</b></font>) - conceptually, this like <font color='dodgerblue'><b>adding more columns</b></font>

In [None]:
array_long = np.concatenate([array_3, array_3, array_3], axis=1)

display(array_long.shape)

pd.DataFrame(array_long)

#### Multiple arrays with <font color='dodgerblue'>inconsistent (i.e., <i><b>mixed</b></i>) dimensions</font>
- <font color='dodgerblue'>must pay attention to the dimensions</font>

<br>

- vertical stacked
- horizontal stacked

##### Vertical stacked
- nD arrays must be (x, <font color='dodgerblue'><b>N</b></font>) and (y, <font color='dodgerblue'><b>N</b></font>) where <font color='dodgerblue'>N is the same value</font>

Below we will combine `pi_array` (shape: (2, <b>5</b>)) with `array_big` (shape: (6, <b>5</b>)).

In [None]:
pi_array = np.full((2, 5), 3.14)
pi_array

In [None]:
array_big

Now stack them:

In [None]:
array_vstack = np.vstack([pi_array, array_big])
array_vstack

In [None]:
array_vstack.shape

Now logically, we can also do this with our array_1 (shape <b>(5,)</b>)

In [None]:
display(array_1)
array_1.shape

In [None]:
array_vstack = np.vstack([array_1, pi_array])
array_vstack

In [None]:
array_vstack.shape

<font color='red'>When would this not work?</font>

Demo when the arrays (ie. <b>(x, N)</b> and <b>(y, N)</b>) have <b><font color='red'>different N</font> values</b>

In [None]:
array_4 = np.array([[99, 99, 99, 99]])
array_4

In [None]:
display(pi_array.shape)
display(array_4.shape)

In [None]:
np.vstack([pi_array, array_4])

##### Horizontal Stacked
- nD arrays must be (<font color='dodgerblue'><b>M</b></font>, x) and (<font color='dodgerblue'><b>M</b></font>, y) where <font color='dodgerblue'>M is the same value</font>

Using our examples, we need a new array that has (<b>2</b>, x) values since array_3 is (<b>2</b>, y)

In [None]:
array_5 = np.array([[99], [99]])
array_5

In [None]:
pi_array

In [None]:
display(pi_array.shape)
display(array_5.shape)

In [None]:
array_hstack = np.hstack([array_5, pi_array])
array_hstack

In [None]:
array_hstack.shape

<font color='red'>When would this not work?</font>

Demo when the arrays (ie. <b>(M, x)</b> and <b>(M, y)</b>) have <b><font color='red'>different M</font> values</b>

In [None]:
array_big

In [None]:
print(array_4.shape)
print(array_big.shape)

In [None]:
array_hstack = np.hstack([array_4, array_big])
array_hstack

<hr style="border:2px solid gray"></hr>

## Math with ndarrays

https://numpy.org/doc/stable/reference/routines.math.html


- `np.add` and `np.subtract`
- `np.multiple` and `np.divide`
- `np.power`
- `np.negative` (multiply by -1)


#### Math performed on a single array

In [None]:
array_3

#### Method 1: NumPy's function

In [None]:
np.add(array_3, 5)

#### Method 2: using Python3's built-in function

In [None]:
array_3 + 5

<b>Note</b>
- Using the NumPy vs. built-in functions doesn't matter <b>too much</b> in this example since we are performing the action on a small numpy array.

- Nevertheless, you should try to maximize the use of NumPy functions when speed is important.

In [None]:
repeat = 2000000 # 2 million

In [None]:
timeit.timeit(lambda:np.add(array_3, 5), number=repeat) # NumPy

In [None]:
timeit.timeit(lambda:array_3 + 5, number=repeat) # Built-in

### Math operations between arrays
- math operations between <b>equal sized arrays</b> is done via <b>element-wise</b>

1. Add and subtract

In [None]:
np.add(pi_array, pi_array)

In [None]:
np.subtract(pi_array, pi_array)

2. Multiplication, Division and Power

In [None]:
np.multiply(pi_array, pi_array)

In [None]:
np.divide(1, pi_array) ## i.e., 1/array_3

In [None]:
np.power(pi_array, 3) ## i.e., item^3

### Absolute values

1. Use a NumPy function

In [None]:
large_array = np.full((100, 5), -3.14)
large_array

In [None]:
np.absolute(large_array)

In [None]:
timeit.timeit(lambda: np.absolute(large_array), number=repeat*2)

2. Python3's built-in function

In [None]:
timeit.timeit(lambda:abs(large_array), number=repeat*2)

Note that sometimes, depending on the shape of the array, the built-in `abs` function has the edge over NumPy (at least on my local machine).

### Trigonometric
- np.sin()
- np.cos()
- np.arcsin()
- etc.

Trigonometry on a single input value

In [None]:
np.sin(-6)

Trigonometry on a NumPy array

In [None]:
np.sin(pi_array)

### Exponents and logarithms
- `np.exp2`: https://numpy.org/doc/stable/reference/generated/numpy.exp2.html
- `np.power`: https://numpy.org/doc/stable/reference/generated/numpy.power.html
- `np.exp`: https://numpy.org/doc/stable/reference/generated/numpy.exp.html

In [None]:
x = np.array([2, 3, 4])

Now let's raise 2 to the power of x:

i.e., $2^2$ and $2^3$ and $2^4$

In [None]:
np.exp2(x)

In [None]:
example_sqr = np.exp2(x)        # i.e., 2^2 and 2^3
example_pow10 = np.power(10, x) # i.e., 10^2 and 10^3
example_exp = np.exp(x)         # i.e., e^2 and e^3

print(f'x     = {x}') # Note: possible bug - no commas in output
print()
print(f'2^x   = {example_sqr}')
print(f'10^x  = {example_pow10}')
print(f'e^x   = {example_exp}')

# To illustrate the presence of a comma
print()
example_sqr

<font color='dodgerblue'>Recall: you <b>reverse</b> the <b>exponential</b> calculations using <b>log</b> functions.</font>

Taking the above exponential output and operate on them using log functions:
- `np.log2`: https://numpy.org/doc/stable/reference/generated/numpy.log2.html
- `np.log10`: https://numpy.org/doc/stable/reference/generated/numpy.log10.html
- `np.log`: https://numpy.org/doc/stable/reference/generated/numpy.log.html

In [None]:
print(f'log2(x)  = {np.log2(example_sqr)}')
print(f'log10(x) = {np.log10(example_pow10)}')
print(f'ln(x)    = {np.log(example_exp)}')

### Booleans

In [None]:
array_3

Write a boolean expression:

In [None]:
array_3 == 6

Use that expression to <b>filter</b> an array:

In [None]:
array_3[array_3 == 6]

<div class="alert alert-block alert-warning">
<hr style="border:1.5px dashed gray"></hr>

## Extra Information

### A more complex, practical example

<b>Temperature conversion</b>: Celsius to Fahrenheit

Data set: Average temperature in Bonn throughout the calendar year (i.e., January ---> December)

In [None]:
data_celsius = [2.0, 2.8, 5.7, 9.3, 13.3, 16.5, 18.1, 17.6, 14.9, 10.5, 6.1, 3.2]

array_celsius = np.array(data_celsius)

array_celsius

In [None]:
array_fahrenheit = array_celsius*(9/5) + 32

array_fahrenheit

<div class="alert alert-block alert-warning">

<font color='dodgerblue'>Visualize</font> the resulting NumPy array using <font color='dodgerblue'>Pandas'</font> built-in function:
- convert `np.array` to `pd.DataFrame`
- visualization is often important for understanding data better

In [None]:
df_celsius = pd.DataFrame(array_celsius, columns=['celsius'])
df_fahrenheit = pd.DataFrame(array_fahrenheit, columns=['Fahrenheit'])

df_temperature = pd.concat([df_celsius, df_fahrenheit], axis=1)

df_temperature

In [None]:
temperature_plot = df_temperature.plot(kind='line', fontsize=16, legend=True)

temperature_plot.set_title('Celsius vs. Fahrenheit Temperature', fontsize=16)
temperature_plot.set_xlabel('Calendar Month', fontsize=16)
temperature_plot.set_ylabel('Temperature', fontsize=16)
temperature_plot.legend(loc='upper left')

temperature_plot

<div class="alert alert-block alert-warning">

<hr style="border:1.5px dashed gray"></hr>

## NumPy statistics

### NumPy's random number generators

- <b>generators</b> that uses different <b>distribution</b> (e.g., geometric, normal, binomial)
    - https://numpy.org/doc/stable/reference/random/generator.html


Two examples will be given as demonstrations: Geometric and Normal.

<br>

##### <font color='dodgerblue'>1. Geometric distribution</font> (a.k.a. sequence) 
- a sequence of <b>numbers</b> that follow a <b>common</b> (consistent) <b>ratio</b>.

- https://numpy.org/doc/stable/reference/random/generated/numpy.random.geometric.html

<b>Examples</b>
- 2, 4, 8, 16 ...  (common ratio of <b>2</b>)
- 2, 6, 18, 54 ... (common ratio of <b>3</b>)
- 20, 10, 5, 2.5 ... (common ratio of <b>1/2</b>)

<b>Probability distribution</b> for something that has <b>2 outcomes (success/failure)</b> tend to follow a <b>geometric distribution</b>.

Example systems
- flipping of a coin (head/tails)
- basketball free throws (make it/miss it)


<b>Generate</b> a random distribution that contains <b>100 attempts</b> (e.g., meeting people) that have a success probability of <b>60%</b> (e.g., that they have brown eyes), where the distribution itself is governed by a <b>geometric distribution</b>:

In [None]:
my_size = 10

random_geom = np.random.geometric(0.60, size=my_size)
random_geom

How many attempts were successful (i.e., items within array that are equal to 1)?

In [None]:
(random_geom == 1).sum() / my_size

<b>Note</b>: the probability will go closer to 60% as the size increases - try setting my_size to 1000, 10000, 100000.

##### <font color='dodgerblue'>2. Normal distribution</font>  (a.k.a. Gaussian)
- a distribution that is <b>symmetric about a mean</b>


- https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html

Create a <b>(2500, 3)</b> array (e.g., 2500 data points collected by 3 sensors) that have a <b>Gaussian distributed</b> random values (e.g., temperatures):

- <b>mean=10.0</b> and
- <b>standard deviation=0.1</b>

In [None]:
random_data = np.random.normal(10.0, 0.1, (2500, 3))

random_data_df = pd.DataFrame(random_data, columns=['Sensor 1', 'Sensor 2', 'Sensor 3'])

random_data_df

Visualize (i.e., plot) the random data array for clarity:

In [None]:
my_fontsize = 14

random_data_df.plot(kind='line',
                    xlabel="Data Point", ylabel="Temperature (Celsius)",
                    fontsize=my_fontsize, legend=True)

Look at its distribution (via Kernel Density Estimate plot):

In [None]:
kde_graph = random_data_df.plot(kind='kde', fontsize=my_fontsize, legend=True)

kde_graph.set_xlabel("Temperature (Celsius)", fontsize=my_fontsize)
kde_graph.set_ylabel("Occurrence", fontsize=my_fontsize)

Let's also prove to ourselves that our mean is close to 10 and the standard deviation is close to 0.1.

In [None]:
np.mean(random_data)

In [None]:
np.std(random_data)

Remember, in science the mean and standard deviations are often presented together in the following way:

mean $\pm$ standard deviation

$ 9.96 \pm 0.43 $

<div class="alert alert-block alert-warning">
<hr style="border:1.5px dashed gray"></hr>

## Extra Information

### 1. Details concerning standard deviation and variance

Why can't I reproduce results using spreadsheets or Matlab?

In [None]:
data = [1, 2, 4, 5, 8]

<div class="alert alert-block alert-warning">

#### variance
- Libreoffice spreadsheet give a variance of '=VAR(1,2,4,5,8)' of 7.5
- I believe Matlab also gives 7.5

Using the statistics's library

In [None]:
import statistics

In [None]:
statistics.variance(data)

<div class="alert alert-block alert-warning">

These above results are actually termed 'the sample variance.'

However, if you use NumPy by simply typing:

In [None]:
np.var(data)

<div class="alert alert-block alert-warning">

In this case there is a "hidden" variable called `ddof` ("Delta Degrees of Freedom")
    - the denomenator is divided by 'N -ddof'

https://numpy.org/doc/1.18/reference/generated/numpy.var.html?highlight=variance

- population: "ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables"
- sample: "ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population"

The same is true for standard deviation.

Population variance:

In [None]:
np.var(data, ddof=0)

<div class="alert alert-block alert-warning">

Sample variance (always larger than the population variance):

In [None]:
np.var(data, ddof=1)

<div class="alert alert-block alert-warning">

Standard deviation demo

<b>Libreoffice</b> gives '=stdev(1,2,4,5,8)' of <b>2.7386127875</b>

And the <b>Statistics library</b> gives:

In [None]:
statistics.stdev(data)

<div class="alert alert-block alert-warning">

NumPy's sample standard deviation

In [None]:
np.std(data, ddof=1)

<div class="alert alert-block alert-warning">

NumPy's population standard deviation

In [None]:
np.std(data, ddof=0)

<hr style="border:1.5px dashed gray"></hr>

### And finally, some weirdness

In [None]:
import statistics

The following should provide a <b>mean value</b> of <b>1.0</b>

(i.e., <b>sum</b> of the numbers is <b>4</b> and then <b>divide</b> by <b>4</b>)

In [None]:
numbers_list = [1e30, 1.0, 3.0, -1e30]

statistics.mean(numbers_list)

In [None]:
np.mean(numbers_list)

This appears to be coming from the data type

In [None]:
np.array(numbers_list).dtype

https://numpy.org/doc/stable/reference/generated/numpy.mean.html

`numpy.mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no value>)`

`dtype data-type, optional`
- <b>Type to use in computing the mean</b>.
    - For <b>integer</b> inputs, the <b>default</b> is <b>float64</b>
    - For <b>floating point</b> inputs, it is the same as the <b>input dtype</b>.


In [None]:
np.mean(numbers_list, dtype=np.float64)

In [None]:
np.mean(numbers_list, dtype=np.int8)

<b>Take home message</b>: you should always take a look at NumPy's manual to make sure you are doing what you think you are doing

- keep an eye out for default settings

<hr style="border:2px solid gray"></hr>

Additional resource to further learn and test your knowledge: https://github.com/rougier/numpy-100

<hr style="border:2px solid gray"></hr>