# Scientific Data and NumPy

Scientific data usually consists of numbers and large data sets. These can be multi-dimensional values that often include uncertainty and errors that must be considered.

NumPy, short for numerical python, is a Python library that provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays. It is the fundamental package for scientific computing with Python.
An important feature of NumPy is the ndarray, a multi-dimensional array object that is fast and flexible. It is the foundation of the scientific computing stack in Python.
An example of a 1D ndarray is as follows:


In [2]:
import numpy as np

my_list = [0, 1, 2, 3, 4]
my_array = np.array(my_list) # this is casting a list to an ndarray

print(f'list: {my_list}')
print(f'array: {my_array}')
print(f'list[0]: {my_list[0]}')
print(f'array[0]: {my_array[0]}')

list: [0, 1, 2, 3, 4]
array: [0 1 2 3 4]
list[0]: 0
array[0]: 0


While there doesn't appear to be much difference between a list and a 1D ndarray, the ndarray is much more powerful and flexible. For example, you can perform mathematical operations on the entire array at once, and you can easily reshape the array into a 2D or 3D array.
Indexing arrays and slicing arrays looks very similar to list slicing. For example, to get the first 3 elements of the array, you would use the following code:

``` python
import numpy as np

a = np.array([1, 2, 3, 4, 5])
print(a[:3])
```

When creating a 2d array with numpy, you can access specific elements in both the columns and rows of the list by using typical slicing notation, such as array[0, 1] to access the element in the first row and second column. You can also use slicing to access entire rows or columns of the array. For example, array[0, :] will return the entire first row of the array, and array[:, 1] will return the entire second column of the array.

``` python
import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(a[0, 1])
print(a[0, :])
print(a[:, 1])
```

Typically when working with lists containing large data sets, each data address would have to be accessed individually in order to perform operations on the data using for loops and similar funtions. This results in rather 'slow' execution, with is unideal for large data sets. NumPy, on the other hand, facilitates vectorized computation to replace python loops. Vectorization means that the loops are not done in python code, but rather by pre-compiled C code, which is much faster. This is one of the reasons why NumPy is so much faster than standard python lists. This is possible due to the fixed size and fixed type of an ndarray, which allows for the array to be stored in a single block of memory. This allows for fast access to the elements in the array, and also allows for fast mathematical operations on the entire array.

In [4]:
import time as t
import numpy as np

long_list = list(range(1000000))
long_array = np.array(long_list)

t0 = t.time()

maximum = 0
for i in long_list:
    if i > maximum:
        maximum = i

t1 = t.time()
max(long_list)

t2 = t.time()
np.max(long_array)

t3 = t.time()

print(f'Time to find max of list (for-loop)): {t1 - t0}')
print(f'Time to find max of list (max()): {t2 - t1}')
print(f'Time to find max of array (np.max()): {t3 - t2}')

Time to find max of list (for-loop)): 0.13540410995483398
Time to find max of list (max()): 0.007051229476928711
Time to find max of array (np.max()): 0.0


The general approach of applying an operation to all entries in an array, rather than looping over individual elements, is called vectorization. We can apply many mathematical operations to and between ndarrays that will be applied to each element in the array. For example, if we have an array a, we can easily add 1 to each element in the array by using the following code:

``` python
import numpy as np

a = np.array([1, 2, 3, 4, 5])
a = a + 1
print(a)
```

This will add 1 to each element in the array, and the result will be [2, 3, 4, 5, 6]. This is much faster than using a for loop to add 1 to each element in the array.


In [5]:
import numpy as np

a1 = np.array([1, 2, 3, 4])
a2 = np.array([5, 6, 7, 8])

print(a1 + 2)
print(a1 * 2)
print(a1 **2, '\n')

print(a1 + a2)
print(a1 * a2)
print(a1 ** a2)


[3 4 5 6]
[2 4 6 8]
[ 1  4  9 16] 

[ 6  8 10 12]
[ 5 12 21 32]
[    1    64  2187 65536]


Some useful functions that can be used with ndarrays include:
- ndarray.shape: returns the dimensions of the array
- ndarray.size: returns the number of elements in the array
- ndarray.dtype: returns the type of the elements in the array
- ndarray.ndim: returns the number of dimensions of the array
- ndarray.T: returns the transpose of the array
- ndarray.reshape: returns a new array with the same data but a different shape
- ndarray.flatten: returns a 1D array with all the elements of the original array
- ndarray.min: returns the minimum value in the array
- ndarray.max: returns the maximum value in the array
- ndarray.mean: returns the mean of the array
- ndarray.sum: returns the sum of the array
- ndarray.std: returns the standard deviation of the array
- ndarray.var: returns the variance of the array
- ndarray.argmin: returns the index of the minimum value in the array
- ndarray.argmax: returns the index of the maximum value in the array
- ndarray.argsort: returns the indices that would sort the array

These functions can be used to perform a wide variety of operations on ndarrays, and are very useful when working with large data sets.


The simplest way to picture the abilities of ndarrays is an n-dimenstional matrix like seen in linear algebra. While lists and n-dimensional lists are meant to closely resemble the structure of a matrix, ndarrays are meant to be used as a matrix. This means that you can perform matrix operations on ndarrays, such as matrix multiplication, matrix addition, and matrix inversion. This is very useful when working with large data sets, as it allows for fast and efficient computation of large matrices. This is one of the reasons why NumPy is so popular in the scientific computing community, as it allows for fast and efficient computation of large data sets.