Originally created for an NAGT webinar and hosted in the repo [here](https://github.com/pycogss/pycogss-intro-to-pythonhttps://github.com/pycogss/pycogss-intro-to-python). Adapted for 437. 

# Introduction to Python for Geoscience

## What is Python?
[Python](https://docs.python.org/3/faq/general.html) is a programming language, which helps you deliver commands to a computer (and Ptyhon is a "high-level" language, which means that it looks more like human language instead of computer `beeps` and `boops`, which is good for us scientists). Python is a very popular language for many use cases and fields, and over the past decade it has been increasingly adopted by research scientists who write packages (more on that soon) to do their work (R is another language that scientists also like to use, but its packages have historically focused on statistics). It is also free to download and use, which is great and in contrast to Matlab, which is perhaps the other language that geoscientists may have encountered. It is indeed named after Monty Python's Flying Circus. 

## What is a Jupyter Notebook?

A Jupyter Notebook is an interactive computing environment that allows you to create and share documents containing live code, equations, visualizations, and explanatory text.

A Jupyter Notebook consists of cells. There are two main types of cells:

<b>Code</b> cells are used to write and execute code. You can write Python code in these cells and run it by pressing Shift + Enter or clicking the "Run" button in the toolbar.

<b>Markdown</b> cells are used to write text, formatted using Markdown syntax. You can write explanations, documentation, or even LaTeX equations in these cells.

In [None]:
# This is a Python cell
# I have "commented out" these lines which means the computer ignores them
# With the pound symbol in front
# If you hit Shift + Enter this cell will "execute"

This is a markdown cell. If you hit Shift + Enter it will turn it into text

### An important note about excecuting cells in notebooks!

When working with Jupyter Notebooks, the <b>order in which cells are executed matters</b>. This is a common pitfall for novice notebook users. Variables and their values are stored in memory and can be accessed and modified across different cells. If cells are not executed in order, the variable state might be different from what you expect, leading to errors or unexpected behavior. Code cells may depend on variables or functions defined in previous cells. If you skip executing certain cells or execute them out of order, you may encounter errors due to missing dependencies. 

My personal strategy for checking if all is going to plan is to "restart" the notebook, which clears all the stored variables, and run the whole thing through start to finish often in the creation process to make sure I haven't accidentally removed an important variable, etc. 


## How do I use this notebok?

If you are reading this notebook on GitHub or another place where the notebook is just posted, you'll have to import the notebook into an environment that is running Python and the `notebook` package. This might be a local installation (via [miniconda](https://docs.anaconda.com/free/miniconda/index.html) or [Anaconda](https://docs.anaconda.com/free/navigator/index.html)), a [JupyterHub](https://jupyter.org/hub) you've been granted access to running in the cloud, or [Binder](https://mybinder.org/), [Google Colab](https://colab.research.google.com/) or something similar. 

Depending on where you have sent your notebook to be run you may have to choose a [kernel](https://docs.jupyter.org/en/latest/install/kernels.html) that has the `IPython` package installed - basically, you need to run your notebook in a Python environment that has the necessary notebook packages installed. Many notebook platforms do this automatically, but if you are running on a local install or in VS Code, you will have to choose a kernel. 

If you are using this notebook (you can execute the cells) then you are already somewhere where Python is installed - congrats!

# Let's do some Python!

Here we're going to look at a few examples of some of the fundamental concepts in Python:
- **Variables**: Used to store data.
- **Data Types**: Different types of data that can be stored in variables.
- **Operators**: Symbols used to perform operations on variables and values.
- **Control Flow**: Structures that control the flow of execution in a program, like loops and conditionals.

## Variables

Variables are used to store data values. In Python, you can assign a value to a variable using the `=` operator. The variable name is on the left side, and the value you want to assign is on the right side.

Let's create a variable to store the value of the acceleration due to gravity (`g`) in meters per second squared (m/s^2):

In [None]:
# Assigning a value to the variable g
g = 9.81  # m/s^2 (acceleration due to gravity)

In [None]:
# Print the value of g
print(g)

## Data types

Python supports various [data types](https://python-reference.readthedocs.io/en/latest/basic_data_types.html), including:

- **Numeric Types**: Integers, floating-point numbers, and complex numbers.
- **Sequence Types**: Lists, tuples, and range objects.
- **Mapping Type**: Dictionaries.
- **Boolean Type**: `True` or `False`.
- **None Type**: Represents the absence of a value.

Your data will dictate which data type(s) make the most sense, and without getting too deep into the computer science of it, some functions just won't work if you have the wrong data type. 

For example:
- [integers](https://docs.python.org/3/library/functions.html#int) (`int`): Used for discrete quantities like counts of samples or observations, such as the number of trees in a forest plot.
- [floating-point numbers](https://docs.python.org/3/library/functions.html#float) (`float`): Represent continuous measurements such as temperature, precipitation amounts, or chemical concentrations.
- [lists](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists) (`list`): Useful for storing ordered collections of data points, like time series of temperature readings or lists of soil properties at different depths. Lists can be [indexed](https://www.geeksforgeeks.org/python-list-index/) to give you only the values in the list you want for your application. 
- [dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) (`dict`): Used for holding metadata attributes (e.g., instrument specifications, geographic coordinates) associated with environmental measurements. They also are used for organizing and accessing information, such as referencing a list of properties of data point by its name.

In [None]:
# Here is a list of floats
temperatures = [20.5, 21.3, 22.1, 23.0, 24.5]

temperatures

In [None]:
# We can index lists with putting the element number of the value we want in brackets
# Note Python uses zero indexing (the 1 element is not the first element)
# which is different from Matlab
temperatures[1]

In [None]:
# Here is a dictionary of string and integer data
unit = {'unit name': 'Tuscarora', 'thickness (m)': 200, 'age': 'Silurian'}

unit

## Operators

Python supports various operators for performing operations on variables and values. Some common operators include:

- **Arithmetic Operators**: `+`, `-`, `*`, `/`, `//` (floor division), `%` (modulus), `**` (exponentiation).
- **Comparison Operators**: `==` (equal), `!=` (not equal), `<`, `>`, `<=`, `>=`.
- **Logical Operators**: `and`, `or`, `not`.
- **Assignment Operators**: `=`, `+=`, `-=` etc.

In [None]:
# Arithmetic operators
a = 10
b = 3

In [None]:
# Addition
print(a + b)  # Output: 13

In [None]:
# Subtraction
print(a - b)  # Output: 7

We will see below that you can do all sorts of operations with the `numpy` package. 

## Control Flow

Control flow statements allow you to control the flow of execution in a program. Common control flow constructs include:

- **Conditional Statements**: `if`, `elif`, `else`.
- **Loops**: `for` loops, `while` loops.
- **Break and Continue**: Used within loops to alter their behavior.

These are useful if you are iteratively solving some equation or iterating through a list of data.

In [None]:
x = 10

# Here are your conditional statements
if x > 5:
    print("x is greater than 5")
else:
    print("x is less than or equal to 5")

# And here is a for loop
for i in range(5):
    print(i)  

# Numpy

[`Numpy`](https://numpy.org/doc/stable/) is a fundamental library in Python used for numerical computing. It provides powerful tools for working with arrays and matrices, along with a vast collection of mathematical functions to operate on these arrays efficiently. Numpy is widely used in scientific computing. A beginner's guide is [here](https://numpy.org/doc/stable/user/absolute_beginners.html)!

We must <b>import</b> `numpy` because it is a [package that is installed on top of Python.](https://docs.python.org/3/tutorial/modules.html)

In [None]:
import numpy

You can then perform functions with the package by using the name of the package followed by a `.`. For example, you can use [`numpy.linspace()`](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html) like this:

In [None]:
array = numpy.linspace(0,6,13)
print(array)

This creates a numpy `array`, which acts like a list but can be computed on more easily than Python lists. Check it out:

In [None]:
print(type(array))

You may see people shortening the name of `numpy` to `np`, which can be achieved by renaming it when you import it. I'm using the [`np.arange()`](https://numpy.org/doc/stable/reference/generated/numpy.arange.html) function here. 

In [None]:
import numpy as np
another_array = np.arange(0.,6.,1) # Here the numbers are floats now!
print(another_array)

So when would you use `numpy`? Any time you want to compute data, there's probably a `numpy` function for that! (or in `scipy` or in `scikit-learn`, but you'll worry about those later...). 

Here is a pretend 2-dimensional digital elevation model (it's very small). And maybe we are [missing some data](https://numpy.org/doc/stable/user/misc.html) in it (the sensor failed momentarily?):

In [None]:
elevation_data = np.array([[100, 200, 150],
                           [300, np.nan, 180],
                           [220, 210, 190]])

elevation_data

This won't work because Python doesn't know how to handle that missing data:

In [None]:
max(elevation_data)

You also cant just use the plain `np.max()` function either or else it'll spit out the missing value:

In [None]:
np.max(elevation_data)

Instead you have to use the [`np.nanmax()`](https://numpy.org/doc/stable/reference/generated/numpy.nanmax.html) to find the largest non-NaN elevation value:

In [None]:
np.nanmax(elevation_data)

Little things like that make `numpy` a powerful tool to cut through messy data to get the answer you want without doing anything by hand. 

# Next steps

You don't have to turn this notebook in, only the second notebook (`numpy_matplotlib.ipynb`)