<a href="https://colab.research.google.com/github/mdaugherity/MachineLearning2024/blob/main/tutorial/Tutorial_02_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial 2: NUMPY
### Dr. Daugherity, PHYS 453 - Spring 2024

All data scientists use numpy for fast, efficient arrays with tons of powerful libraries.  I highly recommend the [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/index.html) as a great introduction to numpy

**REFERENCES**
* https://jakevdp.github.io/PythonDataScienceHandbook/index.html
* https://numpy.org/devdocs/user/quickstart.html

In [1]:
import numpy as np
np.__version__

'1.23.5'

## Creating Arrays
This arrays are **NOT** dynamically sized.  You need to declare them before using!

In [2]:
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

In [3]:
np.array([(1.5,2,3), (4,5,6)])

array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])

In [4]:
np.zeros(10)  # my most common method

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [5]:
np.ones((3, 5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [6]:
np.arange(0, 20, 2)  # like the range command (start, stop, step)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [7]:
np.linspace(0, 1, 5)  # my second most common method

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [8]:
x = np.array([(1.5,2,3), (4,5,6)])
print("x ndim: ", x.ndim)
print("x shape:", x.shape)
print("x size: ", x.size)

x ndim:  2
x shape: (2, 3)
x size:  6


## Warning About Copies
Setting array $a=b$ does **NOT** copy all of the array contents.  Instead, now a and b simply point to the same place in memory so that changing something in a will also make the change in b.

If this isn't what you want, use

```
b = a .copy()
```






In [9]:
x = np.arange(5)
y = x
print(x,y)

[0 1 2 3 4] [0 1 2 3 4]


In [10]:
x[0] = 99  # change in x also changes y
print(x,y)

[99  1  2  3  4] [99  1  2  3  4]


In [11]:
x = np.arange(5)
y = x.copy()  # and now it doesn't
x[0] = -123412
print(x,y)

[-123412       1       2       3       4] [0 1 2 3 4]



## Indexing Arrays
Three fun tricks:
1.  we can use [start:stop:step]
1.  negative numbers read from the end
1.  a single colon means _everything_


In [12]:
x = np.linspace(0,8,9)  # (start, stop, num)
x

array([0., 1., 2., 3., 4., 5., 6., 7., 8.])

In [13]:
x[:5]  # first five elements

array([0., 1., 2., 3., 4.])

In [14]:
x[5:]  # elements after index 5

array([5., 6., 7., 8.])

In [15]:
x[::2]  # every other element

array([0., 2., 4., 6., 8.])

In [16]:
x[-1] # last element

8.0

In [17]:
x[-2] # second-to-last element

7.0

In [18]:
x[::-1] # all elements, reversed

array([8., 7., 6., 5., 4., 3., 2., 1., 0.])

In [19]:
# Fun in 2D
y = x.reshape((3,3))
y

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

In [20]:
y[:, 0]  # first column

array([0., 3., 6.])

In [21]:
y[0, :]  # first row

array([0., 1., 2.])

In [22]:
y[0]  # equivalent to y[0, :]

array([0., 1., 2.])


# Looping over Arrays

**AVOID looping over arrays whenever possible!**  

Here's why

In [23]:
# make a simple function to find a reciprocal
def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output

In [24]:
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

2.1 s ± 446 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [25]:
%timeit (1.0 / big_array)

1.96 ms ± 203 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


The *vectorized* version which acts on the array as a whole runs **500 TIMES FASTER**.  Try to operate on the array as a whole instead of looping through elements.  There are many helpful tools and functions  for doing this.


But sometimes, you gotta do things the hard way...

In [26]:
# Get values from an array
x = np.linspace(0,1,5)
for val in x:
    print(val)

0.0
0.25
0.5
0.75
1.0


In [27]:
# Get indices the Bad way
x = np.linspace(0,1,5)
for i in range(len(x)):
    print(f"{i}:\t{x[i]}")

0:	0.0
1:	0.25
2:	0.5
3:	0.75
4:	1.0


In [28]:
# Use enumerate instead!
for i in enumerate(x):
    print(i)

(0, 0.0)
(1, 0.25)
(2, 0.5)
(3, 0.75)
(4, 1.0)


In [29]:
for i,val in enumerate(x):
    print(f"{i}:\t{val}")

0:	0.0
1:	0.25
2:	0.5
3:	0.75
4:	1.0


In [30]:
# Finally, a fun trick for multiple arrays
x = np.linspace(0,1,5)
y = x**2

# Bad way
for i in range(len(x)):
    print(f"{x[i]}\t{y[i]}")

0.0	0.0
0.25	0.0625
0.5	0.25
0.75	0.5625
1.0	1.0


In [31]:
# Good way - USE ZIP!
for xv, yv in zip(x,y):
    print(f"{xv}\t{yv}")

0.0	0.0
0.25	0.0625
0.5	0.25
0.75	0.5625
1.0	1.0


## Array Shape
Pay attention to the shape (dimensions) of the array.
Use ```reshape``` to change.

An issue you will encounter is the difference between a 1D array and a 2D array with a single row.  Learn how to notice this and reshape between them  


In [34]:
a = np.zeros(5)
b = np.zeros((1,5))

In [35]:
a

array([0., 0., 0., 0., 0.])

In [36]:
b

array([[0., 0., 0., 0., 0.]])

In [37]:
a.shape

(5,)

In [38]:
b.shape

(1, 5)

In [39]:
x = np.array([1,2,3,4])

In [40]:
x.shape

(4,)

In [41]:
x.reshape((4,1))

array([[1],
       [2],
       [3],
       [4]])

In [42]:
x.reshape((1,4))

array([[1, 2, 3, 4]])

Can use -1 as a wild card

In [43]:
x.reshape( (1,-1))

array([[1, 2, 3, 4]])

In [44]:
x.reshape( (-1,2))

array([[1, 2],
       [3, 4]])

In [45]:
a.shape

(5,)

In [46]:
b.shape

(1, 5)

In [47]:
a.reshape((1,-1))

array([[0., 0., 0., 0., 0.]])

In [48]:
b2 = b.reshape(-1)

In [49]:
np.reshape?

In [50]:
x = np.arange(8).reshape(4,-1)
x

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])