Originally created for an NAGT webinar and hosted in the repo [here](https://github.com/pycogss/pycogss-intro-to-pythonhttps://github.com/pycogss/pycogss-intro-to-python). Adapted for 437. 

# Introduction to `numpy`

<b>Computation learning objectives:</b>
- Understand importing packages like `numpy` when you require more than built-in Python functions

<b>Geoscience learning objectives:</b>
- Recognize useful `numpy` functions for geoscience applications
- Identify situations in which data may be missing (e.g. instruments, digital elevation models, etc.)

<b>Previous skills leveraged:</b>
- Setting [variables](https://www.learnpython.org/en/Variables_and_Types), especially numbers
- [Operators](https://www.w3schools.com/python/python_operators.asp) (though `numpy` has its own operators)

<b>Real-world context:</b>
- Find statistics of a dataset (e.g. maximum value of a time series)
- Performing transformations on time series or gridded data (e.g. unit conversion, multiplying two datasets together)


<b>Tips for success:</b>
- Always start with `import numpy as np` and use `np.function_name()` to access `numpy` tools.
- To practice on your own, try creating arrays with `np.array()`, `np.arange()`, and `np.linspace()` to get comfortable with how `numpy` handles data, and then try some operations. 
- Use functions like `np.nanmax()` instead of `np.max()` when your data includes NaN values.

## Previous module review

Try:
1. setting two variables equal to two different values, and then

2. adding those two variables together and

3. printing the result with a `print` statement

In [None]:
# Your code here

# Numpy

[`Numpy`](https://numpy.org/doc/stable/) is a fundamental library in Python used for numerical computing. It provides powerful tools for working with arrays and matrices, along with a vast collection of mathematical functions to operate on these arrays efficiently. Numpy is widely used in scientific computing. A beginner's guide is [here](https://numpy.org/doc/stable/user/absolute_beginners.html)!

We must <b>import</b> `numpy` because it is a [package that is installed on top of Python.](https://docs.python.org/3/tutorial/modules.html)

In [None]:
import numpy

You can then perform functions with the package by using the name of the package followed by a `.`. For example, you can use [`numpy.linspace()`](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html) like this:

In [None]:
array = numpy.linspace(0,6,13)
print(array)

This creates a numpy `array`, which acts like a list but can be computed on more easily than Python lists. Check it out:

In [None]:
print(type(array))

You may see people shortening the name of `numpy` to `np`, which can be achieved by renaming it when you import it. I'm using the [`np.arange()`](https://numpy.org/doc/stable/reference/generated/numpy.arange.html) function here. 

In [None]:
import numpy as np
another_array = np.arange(0.,6.,1) # Here the numbers are floats now!
print(another_array)

So when would you use `numpy`? Any time you want to compute data, there's probably a `numpy` function for that! (or in `scipy` or in `scikit-learn`, but you'll worry about those later...). 

Here is a pretend 2-dimensional digital elevation model (it's very small). And maybe we are [missing some data](https://numpy.org/doc/stable/user/misc.html) in it (the sensor failed momentarily?):

In [None]:
elevation_data = np.array([[100, 200, 150],
                           [300, np.nan, 180],
                           [220, 210, 190]])

elevation_data

This won't work because Python doesn't know how to handle that missing data:

In [None]:
max(elevation_data)

You also cant just use the plain `np.max()` function either or else it'll spit out the missing value:

In [None]:
np.max(elevation_data)

Instead you have to use the [`np.nanmax()`](https://numpy.org/doc/stable/reference/generated/numpy.nanmax.html) to find the largest non-NaN elevation value:

In [None]:
np.nanmax(elevation_data)

Little things like that make `numpy` a powerful tool to cut through messy data to get the answer you want without doing anything by hand. 

# Next steps

You don't have to turn this notebook in, only the second notebook (`numpy_matplotlib.ipynb`)

# For the capstone

You will likely need to perform some "cleaning" of any data you find (converting to certain units, replacing or revising missing data). You might also create new derived datasets based on raw data (for example, calculating discharge from a timeseries of water level). You may also find yourself with gridded data like a digital elevation model that you will want to analyze. All these tasks will likely involve calls to `numpy`!