# Scientific Python

Base Python is great and has been around for 30 years. It first appeared in 1991. The power of python lies within the use of community built modules that lay on top of the core python programming language. These modules are abstract code designed to maximize the usefulness of Object Oriented Programing (OOP) tools by providing advanced functions and methods to the objects in a program. What makes it powerful for scientific applications are the various libararies that have been written to work with scientific (and other) datasets. There is a great [graphic](https://fabienmaussion.info/acinn_python_workshop/figures/scipy_ecosystem.png) developed by Jake Vanderplas in 2015 and reproduced below to illustrate the structure of scientific Python libraries. At the base is the language itself, then the first tier libraries that are built off of the language, then packages that build upon both the language and the first tier libraries, and so on.

![scientific python](https://fabienmaussion.info/acinn_python_workshop/figures/scipy_ecosystem.png)

## Arrays in Python using the Numpy Library

Numpy is a key library used in meteorology. It deals well (and quickly) with array manipulations. Arrays work similar to those in other languages like C++ and Fortran. In Python, they will often feel similar to lists, but the differences can be both advantageous and frustrating. This notebook introduces the numpy module and many of its useful functions, as well as continuing to develop your Python language skills.

Numpy Source: https://numpy.org/doc/stable/user/index.html  
Numpy Quickstart: https://numpy.org/doc/stable/user/quickstart.html

## Importing Libraries/Modules
In order to use a library (or module, or package) within your program you need to first tell the computer to bring it in! The way that this happens is with an import statement. Numpy functions are called with `numpy.function_name()`. This is good because we don't want any similar named functions (e.g., a maximum function) from different modules to overlap! However, it is common to give commonly used modules a handle (or pseudonym) so we can abbreviate it in the code. The common practice is to call the handle for the numpy module "np" so we can write `np.function_name()`.
In the future, xarray is often abbreviated "xr". Matplotlib's pyplot is often abbreviated "plt". Statsmodels is often called "sm".

An example of an import statement for numpy is below. 

In [None]:
import numpy as np

If you have run the above cell, it appears that nothing has happened, but actually a lot has. You have just imported all of the functions that are contained within the numpy module, sitting in the background ready for you to get to work on some data!

## Creating Arrays
The ndarray object is a general array object that works for 1D, 2D, and higher dimensional arrays. The most common arrays are 1D (time series), 2D (table), 3D (cube), 4D (series of cubes, often changing over time). In Numpy, each dimension of an array is called an axis.

How can we make a Numpy array? We write `np.array()` and put our list in the parentheses. Try it here:

In [None]:
dist = [10, 20, 30, 40, 50]


Making a 2D array from a nested list.

In [None]:
list2d = [[41.5, -88.1],
          [42, -101.5],
          [55, -91.3],
          [22.5, -142.2]]


There are other ways to create lists when you don't have specific numbers to put in it right away but know what shape and size you want. Try using `np.arange()`, `np.zeros()`, `np.ones()`, and `np.empty()` and see what you get.  

You can also use `np.arange()`, which works similar to the `range()` function in base Python.  

You can also look up and try using `.reshape()` on an array to reorganize it.

## Numpy Object
Unlike a list, all the elements in an array must be the same type. This is what allows Numpy to be fast. With every element being floats or ints or complex or boolean, this allows numpy to perform mathematical operations on every array element at once. With lists, the operations must happen one element at a time. So math with large arrays saves a lot of time compared to math with large lists.

Valid Object Types:
* `np.float64` - floating point number.
* `np.int64` - integer number
* `np.complex` - complex number
* `np.bool` - boolean (True/False)

With your arrays above, use `.dtype` (for data type) and see what you get. Change the array elements to see how you can change its dtype result.
```python
e = d.dtype
```
You can change the type of an array with a simple method call, `.astype()`. 

```python
e = d.astype('float64')
```
Note that `.astype('float')` also works to change values to `np.float64`. The same is true for `.astype('int')` and `np.int64`.

## Working with Data with Numpy helps

While seeing the data printed out is great, its hard to determine how the data is actually stored. Luckily for us, numpy has some great functions that we can use to investigate the arrays. One of the most helpful is attribute attached to a numpy array called [`shape`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html).

REF: DeCaria Chapter 7

Use `.shape`, `.ndim`, and `.size` to see what they reveal.

In [None]:
# Print object shape attribute


The <span style="font-family:Courier">**shape**</span> attribute gives us a tuple value that tells us about the shape of the array. Starting from the outermost set of `[ ]`, the numbers tell you how many elements are in each level of the array. If the array is 2D, two numbers will be given, which we often think of as rows and columns, or `[row, col]` So numpy reads the data into an array where the first element is the row index and the second is the column index.

If you want to change the shape of an array, you can do that. Test this by using `.reshape`, `.resize`, `.flat`, and `.ravel` on a large array.

In [None]:
big_array = np.array([[3., 7., 3., 4.],
                      [1., 4., 2., 2.],
                      [7., 2., 4., 9.]])

## Slicing arrays
But we don't have to print out the whole array at one time. We can slice and dice the data any way that we desire using an array call and telling the program what element(s) we want out of our array.

### NOTE: Numpy and Python use zero-based arrays
This is done because the number line actually goes from zero to nine [0,1,2,3,4,5,6,7,8,9] and **_not_** [1,2,3,4,5,6,7,8,9,10]

So.... <br> <span style="font-family:Courier">
1st element -> 0 <br>
2nd element -> 1 <br>
3rd element -> 2 <br>
. <br>
. <br>
. <br>
nth element -> n-1 <br> </span>

REF: DeCaria Section 7.7

In [None]:
temps = np.array([80, 74, 72, 71, 69, 62, 58, 55, 61, 62, 64, 63])

# Given the temps array, print the temperature that is 62.


In [None]:
# Update the temperature at the 6th element of the temp array to be 60


In [None]:
locations = [[41.5, -88.1],
             [42, -101.5],
             [55, -91.3],
             [22.5, -142.2]]
city_locations = np.array(locations)

# Given the 2D array, where each row contains a lat/lon location of a city, print the longitude of the third city in the array.


## Array range operation
We can also specify a range of values to use, instead of all or just a single element. However, we must note the Python behavior for this action, which we already saw in a previous lecture with the range function. To summarize:

<span style="font-family:Courier">
[0:2] is 0, 1    ->  mathematically we would write the set of numbers as [0,2) <br>
[0:3] is 0, 1, 2 ->  mathematically we would write the set of numbers as [0,3) <br>
[1:4] is 1, 2, 3 ->  mathematically we would write the set of numbers as [1,4) <br>
</span>

This can be used in one or both (or for however many elements) that the array contains.

In [None]:
# Print the first five values from the temps array


In [None]:
# Print the lat/lon locations for the second and third cities from the city_locations array


In [None]:
# Advanced: Create a 2D array.
# Axis 1 should represent 1950-1999.
# Axis 2 should contain a list of 1-12, which are the months in the year.
# Now slice the array so it only contains June, July, August for 1960-1980. Print the result.


## Mathematical Operations with Arrays

The power of Numpy arrays is in the ability to do "vectoried" mathematical operations. The idea of vectorized operations means that we don't have to loop over each element of a list separately to compute new values. Instead, Numpy will do that implicitly in the background and appear to operate on all elements simultaneously. For example,

Traditional program looping:
```python
%%timeit
n = 1000000
d = 1000*np.random.random(n) - 500
diff = np.zeros(n-1)

for i in range(0, len(diff)):
    diff[i] = d[i+1] - d[i]
```
```
300 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```
Python Vectorized Calculation (implicit looping)
```python
%%timeit
diff[0:n-1] = d[1:n] - d[0:n-1]
```
```
3.14 ms ± 35.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
The implicit looping with Numpy is 100 time faster than running a loop!


All of the common mathematical operations from Python work on Numpy arrays. The operators for addition, subtraction, multiplication, division, exponentiation, etc. are all the same.

```python
# Exponent example: Raise each element to the second power
x = np.array(list(range(-5, 6)))
y = x**2
print(y)
```
Output:
```
[25 16 9 4 1 0 1 4 9 16 25]
```

In [None]:
# Add 10 to the temps array and print array to screen


In [None]:
# Divide the temps array by 2 and print array to screen


In [None]:
# Multiply the 3rd element of the temps array by 1.25 and print array to screen


In [None]:
# Subtract 12.2 from the temps array and print array to screen


In [None]:
# Add one degree of latitude to the city_locations array and print array to screen


### Comparison operators

Test comparing your array to various thresholds. For example, to find all elements less than zero, you can just type `a<0`. Test other comparison operators on your arrays. 

## Exercise
Create a Numpy array from a list of values representing five exam grades of 78, 83, and 92. After you have created the array, add a five point curve to the first grade, add a two point curve to the second grade, and calculate the average exam grade after the curves were applied.