<a href="https://colab.research.google.com/github/mdaugherity/MachineLearning2022/blob/main/Tutorial_2_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial 2: NUMPY
### Dr. Daugherity, PHYS 453 - Spring 2022

All data scientists use numpy for fast, efficient arrays with tons of powerful libraries.  I highly recommend the [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/index.html) as a great introduction to numpy 

**REFERENCES**
* https://jakevdp.github.io/PythonDataScienceHandbook/index.html
* https://numpy.org/devdocs/user/quickstart.html

In [None]:
import numpy as np
np.__version__

'1.19.5'

## Creating Arrays
This arrays are **NOT** dynamically sized.  You need to declare them before using!

In [None]:
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

In [None]:
np.array([(1.5,2,3), (4,5,6)])

array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])

In [None]:
np.zeros(10)  # my most common method

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [None]:
np.ones((3, 5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [None]:
np.arange(0, 20, 2)  # like the range command (start, stop, step)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [None]:
np.linspace(0, 1, 5)  # my second most common method

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [None]:
x = np.array([(1.5,2,3), (4,5,6)])
print("x ndim: ", x.ndim)
print("x shape:", x.shape)
print("x size: ", x.size)

x ndim:  2
x shape: (2, 3)
x size:  6



## Indexing Arrays
Three fun tricks:
1.  we can use [start:stop:step]
1.  negative numbers read from the end
1.  a single colon means _everything_


In [None]:
x = np.linspace(0,8,9)  # (start, stop, num)
x

array([0., 1., 2., 3., 4., 5., 6., 7., 8.])

In [None]:
x[:5]  # first five elements

array([0., 1., 2., 3., 4.])

In [None]:
x[5:]  # elements after index 5

array([5., 6., 7., 8.])

In [None]:
x[::2]  # every other element

array([0., 2., 4., 6., 8.])

In [None]:
x[-1] # last element

8.0

In [None]:
x[-2] # second-to-last element

7.0

In [None]:
x[::-1] # all elements, reversed

array([8., 7., 6., 5., 4., 3., 2., 1., 0.])

In [None]:
# Fun in 2D
y = x.reshape((3,3))
y

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

In [None]:
y[:, 0]  # first column 

array([0., 3., 6.])

In [None]:
y[0, :]  # first row 

array([0., 1., 2.])

In [None]:
y[0]  # equivalent to y[0, :]

array([0., 1., 2.])


# Looping over Arrays

**AVOID looping over arrays whenever possible!**  

Here's why

In [None]:
# make a simple function to find a reciprocal
def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output

In [None]:
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

1 loop, best of 5: 2.24 s per loop


In [None]:
%timeit (1.0 / big_array)

1000 loops, best of 5: 1.71 ms per loop


The *vectorized* version which acts on the array as a whole runs **500 TIMES FASTER**.  Try to operate on the array as a whole instead of looping through elements.  There are many helpful tools and functions  for doing this.


But sometimes, you gotta do things the hard way...

In [None]:
# Get values from an array
x = np.linspace(0,1,5)
for val in x:
    print(val)

0.0
0.25
0.5
0.75
1.0


In [None]:
# Get indices the Bad way
x = np.linspace(0,1,5)
for i in range(len(x)):
    print(f"{i}:\t{x[i]}")

0:	0.0
1:	0.25
2:	0.5
3:	0.75
4:	1.0


In [None]:
# Use enumerate instead!
for i in enumerate(x):
    print(i)

(0, 0.0)
(1, 0.25)
(2, 0.5)
(3, 0.75)
(4, 1.0)


In [None]:
for i,val in enumerate(x):
    print(f"{i}:\t{val}")

0:	0.0
1:	0.25
2:	0.5
3:	0.75
4:	1.0


In [None]:
# Finally, a fun trick for multiple arrays
x = np.linspace(0,1,5)
y = x**2

# Bad way
for i in range(len(x)):
    print(f"{x[i]}\t{y[i]}")

0.0	0.0
0.25	0.0625
0.5	0.25
0.75	0.5625
1.0	1.0


In [None]:
# Good way - USE ZIP!
for xv, yv in zip(x,y):
    print(f"{xv}\t{yv}")

0.0	0.0
0.25	0.0625
0.5	0.25
0.75	0.5625
1.0	1.0
