<a href="https://colab.research.google.com/github/lsuhpchelp/loniscworkshop2023/blob/main/day2/IntermediatePython.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Day 2: Intermediate Python
## 5th LBRN-LONI Scientific Computing Bootcamp
May 26 2023

# **Outlines**


1.   Intro to NumPy
2.   Basic Usage
3.   Shape Manipulation
4.   Faster with Numpy: Advanced Array Operations
5.   Examples

<br><br>

---
# **0. Quick Review: Lists & Tuples**
---


## 1) Lists

In [None]:
# Create a list

myList = [1, 4, 9, 16]
print(myList)
print(type(myList))

In [None]:
# List can be inhomogeneous

myList = [1, 2e10, "hello", True]
print(myList)
print(type(myList))
#print([type(ele) for ele in myList])

In [None]:
# Access / Modification elements in a list

myList = [1, 4, 9, 16]

# Accessing
print(myList[2])

# Modification
myList[2] = myList[0] * -1
print(myList)

In [None]:
# Indexing & slicing

myList = list(range(10))
print(myList)
#print(myList[1:5])
#print(myList[1:])
#print(myList[:5])
#print(myList[1:5:2])
#print(myList[-5:-1])
#print(myList[:])

Note: We will see some similarity & differences between lists and NumPy arrays. We will also get to the point why you want to use NumPy array for numerical calculations instead of lists.

## 2) Tuples

In [None]:
# Create a tuple

myTuple = (1, "two", 3, "four", 5, "six")  # Can be inhomogeneous
print(myTuple)

In [None]:
# Create a typle: "()" can be emitted:

myTuple = 1, "two", 3, "four", 5, "six"
print(myTuple)

In [None]:
# Tuple with one element: must include comma

myTuple = (50,)    # (50,)    50,   
print(myTuple)

In [None]:
# Indexing and slicing (similar to lists)

myTuple = (1, "two", 3, "four", 5, "six")
print(myTuple)

# Indexing
#print(myTuple[1])
#print(myTuple[-2])

# Slicing
#print(myTuple[1:5])
#print(myTuple[1:])
#print(myTuple[:5])
#print(myTuple[1:5:2])
#print(myTuple[-5:-1])
#print(myTuple[:])

In [None]:
# Primary difference b/w Tuple & List:
#    Tuple is IMMUTABLE!

myTuple = (1, "two", 3, "four", 5, "six")
myTuple[1] = 2     # Will fail!

In [None]:
# Typical usage: Unpacking

myList = ["apple", "banana", "citrus", "dragonfruit"]

# Return the first variable and the lenth of the list
firstVar, lengthList = (myList[0], len(myList))
print("First variable is:\t", firstVar)
print("Length of the list is:\t", lengthList)

Note: Tuples in many cases are used to ***return*** and ***unpack*** multiple variables in today's training

<br><br>

---
# **1. Intro to NumPy**
---

## 1) What is NumPy

- NumPy (Numeric Python) is the fundamental package for scientific computing in Python.
- A Python library, provides a multidimensional array object, various derived objects (such as masked arrays and matrices)
- An assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
- In short , NumPy package provides basic routines for manipulating large arrays and matrices of numeric data. 
- See https://numpy.org/

## 2) Why NumPy?

Arrays and matrices? Didn't we have ***List*** in Python already?





*   NumPy is much faster (We will see later in today's training)
*   NumPy provides easier and larger variety of accessing methods
*   A **LOT** (I mean, a **LOT**) of Python modules are built on NumPy (e.g., SciPy, pandas, Tensorflow, PyTorch, ...)

<br><br>

---
# **2. Basic Usage**
---

## 1) Import NumPy


In [None]:
import numpy as np
print(np.__version__)

## 2) Create a NumPy Array

In [None]:
# Basic usage: Create from a list
# https://numpy.org/doc/stable/reference/generated/numpy.array.html

myList = [1, 4, 9, 16]
print("myList = ", myList)
print("Type is: ", type(myList))

#myAry = np.array(myList)
#print("\nmyAry = ", myAry)
#print("Type is: ", type(myAry))

In [None]:
# Create a multi-dimensional array (from a multi-dimensional list)

myAry = np.array([[0, 1, 2, 3],
                  [4, 5, 6, 7],
                  [8, 9, 10, 11]])
print("myAry = ", myAry)
print("Type is: ", type(myAry))

In [None]:
# NumPy arrays are homogenous, i.e., ONE defined data type! (Unlike lists!)

# Return the dtype (data type) of array:
myAry = np.array([1, 4, 9, 16])
#myAry = np.array([1., 4., 9., 16.])
#myAry = np.array(["1", "4", "9", "16"])
print("myAry = ", myAry)
print("Data type is: ", myAry.dtype)

# What if I try to create an inhomogeneous array?



> *   Does not need to check data type -> Fast!
> *   It is possible to create inhomogeneous array (***structured array***), but only for rare cases (like talking to structured data in C). Will not discuss today.



In [None]:
# Specify array type:

# Method 1: Create an array with specific dtype
# Works for most other array creation methods!! Will see later
myAry = np.array([1, 4, 9, 16], dtype="float")
print("myAry = ", myAry)
print("Data type is: ", myAry.dtype)

# Method 2: Convert the dtype of an existing array
# https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html#numpy.ndarray.astype
myAry2 = myAry.astype("int")
print("\nmyAry2 = ", myAry2)
print("Data type is: ", myAry2.dtype)

##### List of dtypes:

> *   Integer: `int`, `int8`, `int16`, `int32`, `int64`,
*   Unsigned integer: `uint`, `uint8`, `uint16`, `uint32`, `uint64`
*   Float number: `float`, `float16`, `float32`, `float64`, `float128`
*   Complex number: `complex64`, `complex128`, `complex256`
*   Boolean: `bool`
*   String: `str`, `<U[LEN]` (e.g., `<U16`)

Not an exhaustive list. For more options and alias, see: https://numpy.org/doc/stable/user/basics.types.html

## 3) More Ways to Create a NumPy Array

#### a) 1-D array:

In [None]:
# Increment: np.arange()   (*Similar to range())
# https://numpy.org/doc/stable/reference/generated/numpy.arange.html

myAry = np.arange(10)
#myAry = np.arange(10, dtype="complex")
#myAry = np.arange(-0.5, 0.5, 0.1)
print("myAry = ", myAry)

In [None]:
# Linear / equal spacing: np.linspace()
# https://numpy.org/doc/stable/reference/generated/numpy.linspace.html

myAry = np.linspace(0, 10, 5)
#myAry = np.linspace(0, 10, 5, endpoint=False)
#myAry = np.linspace(0, 10, 5, dtype="int")
#myAry, step = np.linspace(0, 10, 5, retstep=True)
print("myAry = ", myAry)

In [None]:
# Equal spacing, but in log scale: np.logspace()
# https://numpy.org/doc/stable/reference/generated/numpy.logspace.html

myAry = np.logspace(0, 1, 5)
#myAry = np.logspace(0, 1, 5, base=2)
print("myAry = ", myAry)

# Plot it (Don't worry if you do not know matplotlib yet)
#import matplotlib.pyplot as plt
#plt.plot(np.logspace(0, 1, 5),'r-s')
#plt.plot(np.linspace(0, 1, 5),'b-s')

#### b) 2-D array:

In [None]:
# Identity matrix: np.identity()
# https://numpy.org/doc/stable/reference/generated/numpy.identity.html

myAry = np.identity(5)
#myAry = np.identity(5, dtype="int")
print("myAry = \n", myAry)

In [None]:
# Identity matrix: np.eye() 
# https://numpy.org/doc/stable/reference/generated/numpy.eye.html

myAry = np.eye(5)
#myAry = np.eye(5, dtype="int")
#myAry = np.eye(5, 7)
#myAry = np.eye(5, k=1)
print("myAry = \n", myAry)

In [None]:
# Diagonal matrix: np.diag() 
# https://numpy.org/doc/stable/reference/generated/numpy.diag.html

myAry = np.diag([1, 4, 9, 16, 25])
#myAry = np.diag([1, 4, 9, 16, 25], k=-1)
print("myAry = \n", myAry)

#### c) General Array:

In [None]:
# Create array with given dimension
# np.zeros(): Filled with zeros  https://numpy.org/doc/stable/reference/generated/numpy.zeros.html
# np.ones():  Filled with ones  https://numpy.org/doc/stable/reference/generated/numpy.ones.html
# np.full():  With given initial value  https://numpy.org/doc/stable/reference/generated/numpy.full.html
# np.empty(): No initial value https://numpy.org/doc/stable/reference/generated/numpy.empty.html

shape = (2,3,4)  # A 3-D array

myAry = np.zeros(shape)
#myAry = np.zeros(shape, dtype="int")

#myAry = np.ones(shape)
#myAry = np.ones(shape, dtype="int")

#myAry = np.full(shape, np.pi)
#myAry = np.full(shape, np.pi, dtype="int")

#myAry = np.empty(shape)
#myAry = np.empty(shape, dtype="int")

print("myAry = \n", myAry)

# You may think it is easy. Think about how to set all numbers to 0 with lists? 

#### d) Load Data from Text File

In [None]:
# Load an external data file (local or from remote server): np.loadtxt()
# https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html

txtfile = 'https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/data.txt'
myAry = np.loadtxt(txtfile, 
                    skiprows=16, 
                    usecols={0,1,2}, 
                    comments="#")
print("myAry = \n", myAry)

> * Keep all of above creation methods in mind. You will need them later in the practice!

## 4) Basic Operations
(Just a quick glimpse. Will see details later)

In [None]:
# Remeber how lists handle those operators?

myList1 = [2, 3, 4, 5]
myList2 = [2, 6, 24, 120]

print("myList1 + myList2 = \t", myList1 + myList2)  # List add
print("    myList1 * 3 = \t", myList1 * 3)            # List multiplication

In [None]:
# How does NumPy handle them?

myAry1 = np.array([2, 3, 4, 5])
myAry2 = np.array([2, 6, 24, 120])

print("myAry1 + myAry2 = \t", myAry1 + myAry2)
print("    myAry1 * 3 = \t", myAry1 * 3)

In [None]:
# Broadcast of basic operators: A quick glimpse

myAry1 = np.array([2, 3, 4, 5])
myAry2 = np.array([2, 6, 24, 120])

# Case 1: Array & array -> Same shape, element-by-element
print("myAry2 / myAry1 = \t", myAry1 / myAry2)
print("myAry2 ** myAry1 = \t", myAry2 ** myAry1)

# Case 2: Array & scalar -> Broadcast scalar to each element
print("\nmyAry1 % 2 = \t", myAry1 % 2)
print("myAry2 - 10 = \t", myAry2 - 10)

## 5) Basic Indexing

In [None]:
# Access individual elements
# https://numpy.org/doc/stable/user/basics.indexing.html

myAry = np.arange(5)
print("myAry = \t", myAry)
print("myAry[0] = \t", myAry[0])
print("myAry[-1] = \t", myAry[-1])

In [None]:
# Modify individual elements

myAry = np.arange(5)
print("myAry = \t", myAry)

myAry[0] = 10
#myAry[1] *= 100
#myAry[2] = 34.5789     # What if I give a float number to an integer array?
print("\nmyAry now it: \t", myAry)

In [None]:
# Multi-dimensional array

myAry = np.array([[1, 4, 9, 10],
                  [0, 8, 7, 3],
                  [5, 6, 2, 11]], dtype="float")
print("myAry = \n", myAry)

print("\nmyAry[2,1] = ", myAry[2,1])

myAry[0,3] *= -1
print("\nmyAry now is: \n", myAry)

In [None]:
# A common mistake! To copy or not to copy?

myAry1 = np.arange(10)
myAry2 = myAry1
print("myAry2 = \t", myAry2)

myAry2[5] = -5
print("myAry2 now is \t", myAry2)

#print("myAry1 now is \t", myAry1)       # <-  What is the result of this?

## 6) Slicing

In [None]:
# Basic slicing (similar to lists)
# https://numpy.org/doc/stable/user/basics.indexing.html

myAry = np.arange(10)
print("myAry = \t", myAry)

print("myAry[2:7] = \t", myAry[2:7])      # From index 2 to 7 (endpoint not included)
print("myAry[2:] = \t", myAry[2:])        # From index 2 to the end
print("myAry[:7] = \t", myAry[:7])        # From the beginning to index 7
print("myAry[2:7:2] = \t", myAry[2:7:2])  # From index 2 to 7, step length 2
print("myAry[-5:-2] = \t", myAry[-5:-2])  # From index -5 (5th to last) to -2
print("myAry[:] = \t", myAry[:])          # The entire array

In [None]:
# Multi-dimensional slicing

myAry = np.array([[0, 1, 2, 3],
                  [4, 5, 6, 7],
                  [8, 9, 10, 11],
                  [12, 13, 14, 15]])
print("myAry = \n", myAry)

print("\nmyAry[1, 1:-1] = \n", myAry[1, 1:-1])    # Fix the first index, slicing the second
print("\nmyAry[1::2, 2] = \n", myAry[1::2, 2])    # Fix the second index, slicing the first
print("\nmyAry[1,:] = \n", myAry[1,:])            # Fix first index, return the entire second index
print("\nmyAry[:,2] = \n", myAry[:,2])            # Fix second index, return the entire first index
print("\nmyAry[:, ::3] = \n", myAry[:, ::3])      # Return the entire first index, slicing the second
print("\nmyAry[:,:] = \n", myAry[:,:])            # Return the entire array

In [None]:
# For multi-dimensional arrays, indeces start from the *LEFT*. 
# Any missing index implies all ":" to the right.

myAry = np.array([[[0, 1, 2, 3],
                  [4, 5, 6, 7]],
                  [[8, 9, 10, 11],
                  [12, 13, 14, 15]]])     # 3-D

print("myAry = \n", myAry)
print("\nmyAry.shape = ", myAry.shape)
print("\nmyAry[1] = \n", myAry[1])        # Equivalent to myAry[1,:,:]
print("\nmyAry[1,0] = \n", myAry[1,0])    # Equivalent to myAry[1,0,:]

In [None]:
# Sliced arrays can be modified too

myAry = np.arange(10)
print("myAry = \t", myAry)

myAry[2:7] **= 2
print("myAry now is: \t", myAry)

In [None]:
# CAUTION! Sliced arrays are references too!

myAry1 = np.arange(10)
print("myAry1 = \t", myAry1)

myAry2 = myAry1[2:7]
myAry2 **= 2
print("myAry2 = \t", myAry2)

#print("myAry now is: \t", myAry1)       # <-  What is the result of this?

<br><br>

---
# **3. Shape Manipulation**
---

Why change shape? Is that even important?

*   Neural network training (picture <-> vector)
> ![nn](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/3_nn.png)
*   Wave function propagation (multi-dimensional array <-> vector)
> ![wavefunction](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/3_wavefunction.gif)
[Li and Thumm, Phys. Rev. A **101**, 013411](https://doi.org/10.1103/PhysRevA.101.013411)
*   ...

## 1) Preparation: Understanding ***Axis***

![axis](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/3_1_axis.png)

## 2) Get Shape Properties

In [None]:
# Three most frequently used attributes
# ndarry.shape : Return a tuple of array dimensions https://numpy.org/doc/stable/reference/generated/numpy.ndarray.shape.html
# ndarry.size : Return total number of elements in the array https://numpy.org/doc/stable/reference/generated/numpy.ndarray.size.html
# ndarry.ndim : Return number of array dimensions https://numpy.org/doc/stable/reference/generated/numpy.ndarray.ndim.html

myAry = np.array([[0, 1, 2, 3],
                  [4, 5, 6, 7],
                  [8, 9, 10, 11]])

print("myAry.shape = \t", myAry.shape)
print("myAry.size = \t", myAry.size)
print("myAry.ndim = \t", myAry.ndim)

## 3) Reshape an Array

In [None]:
# Reshape (size must match)

myAry = np.arange(12)
print("myAry = \n", myAry)

# Method 1: Change ndarry.shape attribute directly.
# This is *in-place*.
myAry.shape = (3,4)               
print("\nmyAry now is: \n", myAry)

# Method 2: Use ndarry.reshape() method. 
# https://numpy.org/doc/stable/reference/generated/numpy.ndarray.reshape.html 
# This *returns* the result 
myAry = myAry.reshape(2,3,2)
print("\nmyAry now is: \n", myAry)

# What if size does not match?
#myAry.shape = (3,5)

# Useful if you want to avoid doing something nasty

In [None]:
# I am lazy: Let NumPy figure out the rest

myAry = np.arange(12)
print("myAry = \n", myAry)

print("\nmyAry now is: \n", myAry.reshape(2, -1))

In [None]:
# Resize (size can mismatch)
# https://numpy.org/doc/stable/reference/generated/numpy.ndarray.resize.html

myAry = np.arange(12)
print("myAry =\n", myAry)

# This is *in-place* (unlike ndarry.reshape())
myAry.resize(3,5)                     # Larger size -> Will pad
print("\nmyAry now is: \n", myAry)
myAry.resize(2,5)                     # Smaller size -> Will truncate
print("\nmyAry now is: \n", myAry)

# Useful if padding / truncating is intended

In [None]:
# Other useful shape manipulation

myAry = np.arange(12).reshape(3,4)
print("myAry =\n", myAry)

# Transpose (This *returns* the result)
# https://numpy.org/doc/stable/reference/generated/numpy.ndarray.transpose.html
# https://numpy.org/doc/stable/reference/generated/numpy.ndarray.T.html  
myAry = myAry.transpose()
#myAry = myAry.T
print("\nmyAry now is: \n", myAry)

# Flatten (This *returns* the result)
# https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html
myAry = myAry.flatten()
print("\nmyAry now is: \n", myAry)

<br><br>

---
# **4. Faster with NumPy: Advanced Array Operations**
---

Example: Add 5 to each element of an integer array of size 100,000

In [None]:
#@markdown * How to do that with lists?

myList1 = list(range(100000))
myList2 = [val+5 for val in myList1]
print("Result: \n", myList2)

In [None]:
#@markdown * How to do that with NumPy arrays?

myAry1 = np.arange(100000)
myAry2 = myAry1 + 5
print("Result: \n", myAry2)

In [None]:
#@markdown - Which one is faster?

# Use the magic word: %timeit  (works for IPython & Jupyter notebooks)
# Don't worry if you do not know it

myList1 = list(range(100000))
tList = %timeit -o myList2 = [val+5 for val in myList1]

myAry1 = np.arange(100000)
tAry = %timeit -o myAry2 = myAry1 + 5

speedup = tList.average / tAry.average
print("Speedup = %.1f times" % (speedup))

## ***Rule of Thumb***: 

**Avoiding using loops, as you would avoid fire!!!**

↑↑↑ ( Take-home message ) ↑↑↑ 

Why is it faster with NumPy?

- Avoid type check overhead
- Vectorization (simplified)
 - Instead of looping through array sequentially, NumPy distributes the task to multiple cores and running parallelly, making it N times faster (N is the numebr of used cores)
- Many of the built-in functions are implemented in compiled C code.
 - They can be much faster than the code on the Python level

## 1) Universal Functions (ufunc)

Available ufuncs:
* Arithmetic operators 
 * `+`, `-`, `*`, `/ `
 * `**` (power), `//` (integer division), `%` (modulus)
* Bitwise operators
 * `&` (AND), `|` (OR), `^` (XOR), `~` (NOT)
 * `<<` (left shift), `>>` (right shift)
* Assignment operators
 * w/ arithmetic: `+=`, `-=`, `*=`, `/=`, `**=`, `//=`, `%=`
 * w/ bitwise: `&=`, `^=`, `|=`, `~=`, `<<=`, `>>=`
* Comparison operators 
 * `>`, `<`, `>=`, `<=`, `==`, `!=`, `<>` 
* Mathematical functions
 * `np.sin()`, `np.cos()`, `np.tan()` ...
 * `np.exp()`, `np.log()`, `np.log10()` ...
 * `np.abs()`, `np.sqrt()` ...
* Special Functions
 * `scipy.special.*`
* . . . and many, many more.

### a) Examples

In [None]:
# Arithmetic operators

a = np.arange(10)
b = np.arange(10, 20)
print("  a \t = ", a)
print("  b \t = ", b)

print("a + b \t = ", a + b)     # array & array
print("2 ** a \t = ", 2 ** a)   # array & scalar

In [None]:
# Bitwise operators

a = np.array([1, 2, 4, 8, 16])
b = np.arange(5)
print("  a \t = ", a)
print("  b \t = ", b)

print("a | b \t = ", a | b)     # array & array
print("a << 1 \t = ", a << 1)   # array & scalar
print("  ~a \t = ", ~a)         # Not

In [None]:
# Assignment operators

a = np.arange(10)
b = np.arange(10, 20)
print("  a \t = ", a)
print("  b \t = ", b)

a += b        # array & array
b %= 2        # array & scalar

print("New a \t = ", a)
print("New b \t = ", b)

In [None]:
# Comparison operators

a = np.arange(10)
b = np.arange(10, 0, -1)
print("  a \t = ", a)
print("  b \t = ", b)

print("a >= b \t : ", a >= b)   # array & array
print("a == 5 \t : ", a == 5)   # array & scalar

In [None]:
# Trigonometric & exponential functions

x = np.linspace(0, 2*np.pi, 100)

y = np.sin(x)           # Pass the entire array as argument
print("  y = \n", y)

# Let's plot it!
import matplotlib.pyplot as plt
plt.plot(x,y)
plt.show()

In [None]:
# Special functions: Spherical Bessel Function of the 1st kind j_n
# https://docs.scipy.org/doc/scipy/reference/special.html

x = np.linspace(0, 20, 100)

import scipy
y = scipy.special.spherical_jn(0, x)  # Pass the entire array as argument
print("  y = \n", y)

# Let's plot it!
import matplotlib.pyplot as plt
plt.plot(x,y)
plt.show()

### *) A Practice with a Practical Problem (Part 1)

![4_1_problem_pt1](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_1_problem_pt1.png)

**!!CAUTION!!**

* Use ufunc for efficiency
* Do **NOT** use loop even if you are attempted to!!

You may be attempted to do something like this:

(This is slow!)

```
r = ...

psi_r = np.empty(1000, dtype="float")
for i in range(1000):
  psi_r[i] = r[i]**2 * np.exp(-r[i]/3)
```



Instead, remember `*`, `**`, `np.exp()`, `np.sin()`, `np.cos()` are all ufuncs!

In [None]:
# Hint: If you wrote more than one line for each "FIX ME", you are doing something wrong

import numpy as np

# Define coordinates
# Hint: Use np.linspace()
r = # FIX ME
theta = # FIX ME
phi = # FIX ME

# Calculate function on each dimension
# Hint: Use ufunc np.exp(), np.sin(), np.cos(); imaginary unit is "1j"
psi_r = # FIX ME
psi_theta = # FIX ME
psi_phi = # FIX ME
print("psi_r.shape = \t\t", psi_r.shape)
print("psi_theta.shape = \t", psi_theta.shape)
print("psi_phi.shape = \t", psi_phi.shape)

In [None]:
#@markdown Show solution

import numpy as np

# Define coordinates
r = np.linspace(0, 40, 1000)
theta = np.linspace(0, np.pi, 300)
phi = np.linspace(0, 2*np.pi, 600)

# Calculate function on each dimension
psi_r = r**2 * np.exp(-r/3)
psi_theta = np.sin(theta) * np.cos(theta)
psi_phi = np.exp(1j*phi)
print("psi_r.shape = \t\t", psi_r.shape)
print("psi_theta.shape = \t", psi_theta.shape)
print("psi_phi.shape = \t", psi_phi.shape)

## 2) Broadcasting

### a) What is Broadcasting?
**Definition**: Describes the way NumPy treats arrays with ***different shapes*** during arithmetic operations.

(See: https://numpy.org/doc/stable/user/basics.broadcasting.html)

In [None]:
# We saw this before (array & scalr):

a = np.arange(10)
print("  a \t = ", a)
print("a + 2 \t = ", a + 2)

# Makes sense.

In [None]:
# But some cases make no sense...

a = np.arange(12).reshape(2,2,3,1)
b = np.arange(12).reshape(3,4)

print("a = \n", a)
print("")
print("b = \n", b)
print("")

print("a + b = \n", a + b)
print("Shape is: \n", (a+b).shape)

# How is NumPy even able to add these two?

### b) Basic Rules

How does broadcasting work in general? Use the above example:

![4_2_broadcasting_def1](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_broadcasting_def1.png)

![4_2_broadcasting_def2](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_broadcasting_def2.png)

![4_2_broadcasting_def3](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_broadcasting_def3.png)

### c) More Examples

#### ![4_2_broadcasting_eg1](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_broadcasting_eg1.png)

![4_2_broadcasting_eg1_sol1](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_broadcasting_eg1_sol1.png)

![4_2_broadcasting_eg1_sol2](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_broadcasting_eg1_sol2.png)

![4_2_broadcasting_eg1_sol3](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_broadcasting_eg1_sol3.png)

In [None]:
a = np.ones((3,3))
b = np.arange(3)

print("a = \n", a)
print("")
print("b = \n", b)
print("")
print("a + b = \n", a + b)

#### ![4_2_broadcasting_eg2](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_broadcasting_eg2.png)

![4_2_broadcasting_eg2_sol1](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_broadcasting_eg2_sol1.png)

![4_2_broadcasting_eg2_sol2](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_broadcasting_eg2_sol2.png)

![4_2_broadcasting_eg2_sol3](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_broadcasting_eg2_sol3.png)

In [None]:
a = np.ones((3,3))
b = np.arange(3).reshape(3,1)

print("a = \n", a)
print("")
print("b = \n", b)
print("")
print("a + b = \n", a + b)

#### ![4_2_broadcasting_eg3](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_broadcasting_eg3.png)

![4_2_broadcasting_eg3_sol1](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_broadcasting_eg3_sol1.png)

![4_2_broadcasting_eg3_sol2](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_broadcasting_eg3_sol2.png)

In [None]:
a = np.ones((3,2))
b = np.arange(3)

print("a = \n", a)
print("")
print("b = \n", b)
print("")
print("a + b = \n", a + b)

### *) A Practice with a Practical Problem (Part 2)

![4_2_problem_pt2](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_problem_pt2.png)

**!!CAUTION!!**

* Use broadcasting for efficiency
* Do **NOT** use loop even if you are attempted to!!

You may be attempted to do this:

(This is horribly slow!)

```
psi_r = ...
psi_theta = ...
psi_phi = ...

psi = np.empty((1000, 300, 600), dtype="complex")
for i in range(1000):
  for j in range(300):
    for k in range(600):
      psi[i,j,k] = psi_r[i] * psi_theta[j] * psi_phi[k]
```



Instead, remember how broadcasting works:

![4_2_problem_pt2_sol1](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_problem_pt2_sol1.png)

![4_2_problem_pt2_sol2](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_problem_pt2_sol2.png)

![4_2_problem_pt2_sol3](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_2_problem_pt2_sol3.png)

Finish the code below as a practice:

In [None]:
# Hint: If you used more than one line for each "FIX ME", you are doing something wrong

import numpy as np

# Define coordinates
# Hint: Use np.linspace()
r = # FIX ME
theta = # FIX ME
phi = # FIX ME

# Calculate function on each dimension
# Hint: Use ufunc np.exp(), np.sin(), np.cos(); imaginary unit is "1j"
psi_r = # FIX ME
psi_theta = # FIX ME
psi_phi = # FIX ME
print("psi_r.shape = \t\t", psi_r.shape)
print("psi_theta.shape = \t", psi_theta.shape)
print("psi_phi.shape = \t", psi_phi.shape)

# Multiply three 1-D arrays to make a 3-D array:
# Hint: Use reshape() and broadcasting
psi = # FIX ME
print("psi.shape = \t", psi.shape)

In [None]:
#@markdown Show solution

import numpy as np

# Define coordinates
r = np.linspace(0, 40, 1000)
theta = np.linspace(0, np.pi, 300)
phi = np.linspace(0, 2*np.pi, 600)

# Calculate function on each dimension
psi_r = r**2 * np.exp(-r/3)
psi_theta = np.sin(theta) * np.cos(theta)
psi_phi = np.exp(1j*phi)
print("psi_r.shape = \t\t", psi_r.shape)
print("psi_theta.shape = \t", psi_theta.shape)
print("psi_phi.shape = \t", psi_phi.shape)

# Multiply three 1-D arrays to make a 3-D array:
psi = psi_r.reshape(1000,1,1) * psi_theta.reshape(300,1) * psi_phi
print("psi.shape = \t", psi.shape)

In [None]:
#@markdown Wanna see how horribly slow looping is? You do not even need %timeit!
for i in range(1000):
  for j in range(300):
    for k in range(600):
      psi[i,j,k] = psi_r[i] * psi_theta[j] * psi_phi[k]

# Took me > 11 min on Google Colab!

## 3) Aggregation Functions

* Aggregations are functions which summarize the values in an array (e.g. min, max, sum, mean, etc.)
* Numpy aggregations are much faster than Python built-in functions
* Available functions:
  * Sort, search and counting
    * `min()`, `max()`, `argmin()`, `argmax()`, ...
    * `sum()`, `prod()` (calculate product instead of sum), ...
    * `sort()`, ...
  * Statistics
    * `mean()` (arithmetic average), `average()` (weighted average), `median()` (median value), ...
    * `var()` (variance), `std()` (standard deviation, square root of variance), ...
  * . . . and many, many more.
* The specific syntax are different for different functions. Here we just give you a glimpse how they work. For details, see https://numpy.org/doc/stable/reference/routines.html

### a) Examples

In [None]:
# Basic usage. (E.g.: sum())

myAry = np.arange(9).reshape(3,3)
print("myAry = \n", myAry)

# Two ways to use (works for most of them):
print("\nnp.sum(myAry) = ", np.sum(myAry))  # As a function
print("\nmyAry.sum() = ", myAry.sum())      # As a method

In [None]:
# Sometimes, function and method forms work slightly differently (E.g., sort())

myAry = np.random.random(10)  # Create an array of random values
print("myAry = \n", myAry)

# As function: *Returns* the result
print("\nnp.sort(myAry) = \n", np.sort(myAry))

# As method: *in-place*
myAry.sort()
print("\nmyAry.sort() = \n", myAry)

In [None]:
# Specifying axes

myAry = np.arange(9).reshape(3,3)
print("myAry = \n", myAry)

print("\nmyAry.sum() = \n", myAry.sum())              # Sum all
print("\nmyAry.sum(axis=0) = \n", myAry.sum(axis=0))  # Sum axis 0 (rows), keep axis 1 (columns)
print("\nmyAry.sum(axis=1) = \n", myAry.sum(axis=1))  # Sum axis 1 (columns), keep axis 0 (rows)

In [None]:
# Return results or arguments (indeces)

myAry = np.random.random(10)  # Create an array of random values
print("myAry = \n", myAry)

print("\nmyAry.max() = \t", myAry.max())        # Return the maximum value
print("\nmyAry.argmax() = \t", myAry.argmax())  # Returns the argument (index) that contains the maximum value

In [None]:
# Let's try some statistics

myAry = np.arange(10).reshape(2,5)
print("myAry = \n", myAry)

# Arithmetic average
print("\nArithmetic average: \n", myAry.mean())
print("\nArithmetic average over the column: \n", myAry.mean(axis=1))

# Weighted average
# average() does not have method form *
weightAry = np.random.random(myAry.shape)   # A random weight array of the same shape
weightAry /= weightAry.sum()                # Normalize weight (should add up to 1)
print("\nWeighted average: \n", np.average(myAry, weights=weightAry))

# Standard deviation
print("\nStandard deviation: \n", myAry.std())
print("\nStandard deviation over the column: \n", myAry.std(axis=1))

### *) A Practice with a Practical Problem (Part 3)

#### ![4_3_problem_pt3](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_3_problem_pt3.png)

**!!CAUTION!!**

* Use aggregation functions for efficiency
* Do **NOT** use loop even if you are attempted to!!

You may be attempted to do something like this:

(This is slow!)

```
P_r_max = 0
r_index = 0
for i in range(1000):
  # Find the index corresponding to maximum probability and corresponding index using a loop
```



Instead, use aggregation functions!

In [None]:
# Hint: If you wrote more than one line for each "FIX ME", you are doing something wrong

import numpy as np

# Define coordinates
# Hint: Use np.linspace()
r = # FIX ME

# Calculate function on each dimension
# Hint: Use ufunc np.exp()
psi_r = # FIX ME

# Calculate probability distribution
# For simplicity, first calculate without dividing by the normalization factor, then do the normalization
# Hint: Use np.sum(), np.abs() and other ufuncs
P = # FIX ME
P /= # FIX ME

# Find the most probably radius (You result should be close to 9)
# Hint: Use aggregation function argmax()
r_max = # FIX ME

print("3rd Bohr radius is:", r_max)

In [None]:
#@markdown Show solution

import numpy as np

# Define coordinates
r = np.linspace(0, 40, 1000)

# Calculate function on each dimension
psi_r = r**2 * np.exp(-r/3)

# Calculate probability distribution
# For simplicity, first calculate without dividing by the normalization factor, then do the normalization
P = r**2 * np.abs(psi_r)**2 
P /= P.sum()

# Find the most probably radius. (You result should be close to 9)
r_max = r[P.argmax()]

print("3rd Bohr radius is:", r_max)

## 4) Advanced Indexing

We saw basic indexing and slicing before. But there is more...

https://numpy.org/doc/stable/user/basics.indexing.html#advanced-indexing

### a) Integer Array Indexing

The basic indexing work like this:
```
myAry = np.array(...)
print(myAry[1])
```

NumPy also allows an **integer array** in the place of an integer as index. 

Returned: **Same shape** of the input index

In [None]:
# Use integer array in the place of integer

myAry = np.arange(10, 0, -1)
print("myAry = ", myAry)

# Integer as index
print("\nScalar in, scalar out:\n", myAry[5])

# 1-D array as index
indAry = np.array([1, 3, -2])
print("\n1-D array in, 1-D array out:\n", myAry[indAry])

# 2-D array as index
indAry = np.array(
    [[4,3],
     [2,-1]])
print("\n2-D array in, 2-D array out:\n", myAry[indAry])

# TIP: I advice against using lists instead of NumPy arrays in integer array indexing. 
#      The behavior is inconsistent because of some deprecated syntax.

In [None]:
# You can use integer array indexing on multi-dimensional array

myAry = np.arange(25, 0, -1).reshape(5,5)
print("myAry = \n", myAry)

# Integers as indeces
print("\nScaler in, scalar out:\n", myAry[3,4])

# 1-D arrays as indeces
xAry_1 = np.array([1, 3, -2])
yAry_1 = np.array([0, -1, 1])
print("\n1-D array in, 1-D array out:\n", myAry[xAry_1, yAry_1])

# 2-D arrays as indeces
xAry_2 = np.array(
    [[4,3],
     [2,-1]])
yAry_2 = np.array(
    [[-3,1],
     [-2,0]])
print("\n2-D array in, 2-D array out:\n", myAry[xAry_2, yAry_2])

# But! The input shapes of different axis have to be **COMPATIBLE**
#print(myAry[xAry_1, yAry_2])

# Think again, what does "compatible" here mean?

In [None]:
# Advanced Indexing can be combined with basic indexing and slicing

myAry = np.arange(25, 0, -1).reshape(5,5)
print("myAry = \n", myAry)

# Integer array & basic indexing
indAry = np.array(
    [[4,3],
     [2,-1]])
print("\nInteger array indexing & basic indexing:\n", myAry[indAry, 1])

# Integer array & slicing
indAry = np.array([0, -1, 1])
print("\nInteger array indexing & slicing:\n", myAry[indAry, :-1:2])

In [None]:
# Do you understand the difference between these two?

myAry = np.arange(25, 0, -1).reshape(5,5)
print("myAry = \n", myAry)

# Scenario 1:
indAry = (1, 3)
print("\nScenario 1:\n", myAry[indAry])   

# Scenario 2:
indAry = np.array([1, 3])
print("\nScenario 2:\n", myAry[indAry])

# "myAry[(1,3)]":
#     * Basic indexing
#     * Equivalent to "myAry[1,3]"
# "myAry[np.array([1,3])]":
#     * Combination of advanced indexing & slicing
#     * Equivalent to "myAry[np.array(1,3), :]"

### b) Boolean Array Indexing (Masking)

If you have used `R` or `pandas` module before, this section should look very familiar to you.

(Also known as "masking". But `NumPy` now has a dedicated submodule `numpy.ma` for masked array. What we are discussing here is officially called "Boolean Array Indexing")

In [None]:
# Basic usage

# Remember we said comparison operators are ufuncs too?
a = np.arange(5)
b = np.arange(5, 0, -1)
mask = a > b
print(" a = \t\t", a)
print(" b = \t\t", b)
print("a>b = \t\t", mask)

# Do you know, this boolean array can be used as index too? See the results:
print("a[mask] = \t", a[mask])

# Basic rules:
#   A True of False index:
#     True:   Included
#     False:  Excluded

In [None]:
# Also works for multi-dimensional arrays

# 1-D boolean array as index
# Will work on each axis
myAry = np.arange(25, 0, -1).reshape(5,5)
mask = np.array([True, False, False, True, False])
print(" myAry = \n", myAry)
print("\n mask = \n", mask)
print("\n myAry[mask] = \n", myAry[mask])             # On rows
print("\n myAry[:,mask] = \n", myAry[:,mask])         # On columns
print("\n myAry[mask,mask] = \n", myAry[mask,mask])   # On both

# n-D boolean array as index
# Will flatten the entire array (Don't really have a better solution for irregular shape, do we?)
a = np.arange(25, 0, -1).reshape(5,5)
b = np.arange(25).reshape(5,5)-8
mask = a > b
print("\n a = \n", a)
print("\n b = \n", b)
print("\na>b = \n", a>b)
print("\na[mask] = \n", a[mask])

In [None]:
# Advanced Indexing can be combined with basic indexing and slicing

myAry = np.arange(25, 0, -1).reshape(5,5)
print("myAry = \n", myAry)

# Integer array & basic indexing
indAry = np.array([True, False, False, True, False])
print("\nBoolean array indexing & basic indexing:\n", myAry[indAry, 1])

# Integer array & slicing
indAry = np.array([0, -1, 1])
print("\nBoolean array indexing & slicing:\n", myAry[indAry, :-1:2])

In [None]:
# A typical usage: Exclude all NaNs (Not-a-Number)

# Find the mean value of all numbers in this array:
myAry = np.array([[1., 2.], [np.nan, 3.], [np.nan, np.nan]])
print("myAry = \n", myAry)

# Not what I want
print("\n<Not what I want>")
print("Mean: ", myAry.mean())                     

# Is what I want
print("\n<Is what I want>")
print("\nIs it a number?:\n", ~np.isnan(myAry))
print("\nmyAry[~np.isnan(myAry)] = ", myAry[~np.isnan(myAry)])
print("Mean: ", myAry[~np.isnan(myAry)].mean())

In [None]:
# Another typical usage: Plotting with mask

import matplotlib.pyplot as plt

# Plot y = sin(x)
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)
plt.plot(x, y)
plt.show()

# Cut the negative
plt.plot(x[y>=0], y[y>=0])
plt.show()

### *) A Practice with a Practical Problem (Part 4)

![4_4_problem_pt4](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_4_problem_pt4.png)

![4_4_problem_pt4_FWHM](https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/4_4_problem_pt4_FWHM.png)

**!!CAUTION!!**

* Use boolean array indexing for efficiency
* Do **NOT** use loop even if you are attempted to!!

In [None]:
#@markdown (Code to generate above FWHM figure)

import matplotlib.pyplot as plt

r = np.linspace(0, 40, 1000)
psi_r = r**2 * np.exp(-r/3)
P_r = r**2 * psi_r**2 / np.sum(r**2 * psi_r**2)
filter = P_r > P_r.max()/2
r_filtered = r[filter]
P_filtered = P_r[filter]

plt.plot(r, P_r)
plt.hlines(P_r.max(), r[0], r[-1], 'k', ':')
plt.hlines(P_r.max()/2, r[0], r[-1], 'k', ':')
plt.hlines(P_r.max()/2, r_filtered[0], r_filtered[-1], 'k', lw=3)
plt.vlines(r_filtered[0], 0, P_filtered[0], 'k', ':')
plt.vlines(r_filtered[-1], 0, P_filtered[-1], 'k', ':')
plt.plot(r_filtered, P_filtered)
plt.annotate('P(r)', xy=(40, 0.0001), xytext=(40, 0.0001), ha='right')
plt.annotate('Max P(r)', xy=(40, 0.0041), xytext=(40, 0.0041), ha='right')
plt.annotate('Half max P(r)', xy=(40, 0.0019), xytext=(40, 0.0019), ha='right')
plt.annotate('FWHM', xy=(9.5, 0.0019), xytext=(9.5, 0.0019), ha='center')
plt.xlabel("r")
plt.ylabel("P")
plt.show()

You may be attempted to do something like this:

(This is slow!)

```
P = ...
for i in range(1000):
  if P[i] > ...:
    ...
```



Instead, use boolean array (and aggregation functions)!

In [None]:
# Hint: If you wrote more than one line for each "FIX ME", you are doing something wrong

import numpy as np

# Define coordinates
# Hint: Use np.linspace()
r = # FIX ME

# Calculate function on each dimension
# Hint: Use ufunc np.exp()
psi_r = # FIX ME

# Calculate probability distribution
# Hint: Use aggregation function np.sum(), np.abs() and ufuncs
P = # FIX ME

# Filter r array to only include the part where P is greater than half of maximum P
# Hint: Use boolean array indexing and aggregation function max()
r_filtered = # FIX ME

# Calculate FWHM
FWHM = # FIX ME

print("FWHM is:", FWHM)

In [None]:
#@markdown Show solution

import numpy as np

# Define coordinates
r = np.linspace(0, 40, 1000)

# Calculate function on each dimension
psi_r = r**2 * np.exp(-r/3)

# Calculate probability distribution
P = r**2 * psi_r**2 / np.sum(r**2 * np.abs(psi_r)**2)

# Filter r array to only include the part where P_r is greater than half of maximum P_r
# Hint: Use boolean array indexing and aggregation function max()
r_filtered = r[P > P.max()/2]

# Calculate FWHM
FWHM = r_filtered[-1] - r_filtered[0]

print("FWHM is:", FWHM)

<br><br>

---
# **5. Examples**
---

## 1) Calculate Derivative of Function

In [None]:
import numpy as np

# Define coordinates (and return dx)
x, dx = np.linspace(0, 4*np.pi, 200, retstep=True)

# Get y
y = np.sin(x)

# Calculate derivative (using difference to approximate derivative)
dy_dx = (y[1:] - y[:-1]) / dx     # y[1]-y[0], y[2]-y[1], ..., y[-1]-y[-2]

print("dy_dx.shape = ", dy_dx.shape)
print("\ndy_dx = \n", dy_dx)

In [None]:
# Plot and compare against the analytical results

import matplotlib.pyplot as plt

plt.plot(x[:-1], dy_dx, 'x', label="Calculated")
plt.plot(x, np.cos(x), '-', label="Analytical")
plt.legend()
plt.show()

## 2) Color Editing of an RGB Image

In [None]:
# Import an RGB image

# Do not worry about these modules if you are not familiar with them.
import numpy as np 
import imageio.v3 as iio
import matplotlib.pyplot as plt

# Read an image
img = iio.imread('https://github.com/lsuhpchelp/loniscworkshop2023/raw/main/day2/images/5_2_racoon.png')

# Plot the image
plt.imshow(img)
plt.show()

# What's in img?
print("\nThe type if img is:", type(img))
print("\nThe shape if img is:", img.shape)
#Co#print("\nThe value if img is:\n", img)

In [None]:
# Show only the red channel
#
# * For an RGB image:
#     - Red channel:    [:, :, 0]
#     - Green channel:  [:, :, 1]
#     - Blue channel:   [:, :, 2]
# * To keep only the red channel, set all others to 0

# Make a copy of original image
img_red = img.copy()

# Set green and blue channel to 0
img_red[:, :, 1:] = 0

# Plot
plt.imshow(img_red)
plt.show()

In [None]:
# Change color image to gray
#
# * Gray color is when all 3 channels have the same value
# * Change the value of each channel at each pixel to the mean of all 3 channels

# Create image of the same shape
img_gray = np.empty_like(img, dtype="uint8")

# Calculate the average value of all 3 chanels at each pixel
img_mean = img.mean(axis=-1)                    # What is the shape of this?
#print(img_mean.shape)

# Give this averaged image to all 3 channels
img_gray[:,:,:] = img_mean.reshape(512,512,1)   # What is this? Why reshape?

# Plot
plt.imshow(img_gray)
plt.show()

## 3) Scipy Example: Linear Regression

In [None]:
# Calculate the linear regression of these data points:

import numpy as np
import matplotlib.pyplot as plt

# Data points
x = np.array([1, 2, 5, 7, 10, 15])
y = np.array([2, 6, 7, 9, 14, 19]) 

# Plot the data points
plt.plot(x, y, "o")
plt.show()

In [None]:
# Calculate linear regression

from scipy.stats import linregress

# Calculate
slope, intercept, rvalue, pvalue, std_err = linregress(x, y)

# Plot the results
plt.plot(x, y, 'o')                       # Data points
plt.plot(x, slope * x + intercept, '-')   # Results of linear regression
plt.show()