# MIT-WHOI JP 2021 Summer Math Review: Introduction to Python
## *Created by Megan Gillen, Mason Rogers, and Rachel Kahn*
##### *Last Modified: 7-13-21*

*This notebook utilizes resources from the CSDMS ESPIn 2021 Summer Course:* 

Mark Piper, Benjamin Campforts, Irina Overeem, Nicole Gasparini, and Leilani Arthurs, 2020. Earth Surface Processes Institute (ESPIn) Course Material (Version v1.0). Zenodo. http://doi.org/10.5281/zenodo.4000979.

*You can access the ESPIn course material & helpful tutorials through their [GitHub repository](https://github.com/csdms/espin)*. 

# Welcome to Python!
### This is a Jupyter Notebook that will walk us through the basics of Python, data science applications, and relevant libraries.

Python is an open source, object-oriented programming language with useful applications in data science, numerical modeling, GIS, and developing open source software.

*If you're interested in how I'm able to make these blocks of text in between live coding blocks, check out these cool resources on Markdown syntax:*

https://www.markdownguide.org/cheat-sheet/ *(simplified cheat sheet with basic syntax)*

https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet *(extended cheat sheet with even more info!)*

## Python Basics

We'll start class today with some generic skills/syntax to get everyone up to speed on all things Python.

Python is an [object-oriented programming (OOP) language](https://www.educative.io/blog/object-oriented-programming), which means it uses & performs operations upon variables with pre-assigned values. Let's begin by exploring the basics of OOP with the classic `Hello World!` example:

In [None]:
# first, let's generate a print statement that greets us to the world of Python!
print('Hello World!')

We can also assign & print variables of different [data types](https://www.w3schools.com/python/python_datatypes.asp):

In [None]:
# assign our string to a variable
greet = 'Hello World!'

# assign a numeric value to a variable
x = 1930

# print both our variables
print(greet)
print(x)

# we can also combine strings & variables in a print statement with commas
print('WHOI was founded in:',x)

You can also determine the data type of a specific variable with the `type` function. This is especially helpful when using numeric values, as Python can be a bit finnicky between integers and floats.

In [None]:
print(type(greet))
print(type(x))

You can also create user-defined variables with the `input` function:

In [None]:
name = input('What is your name?')
# the + sign concatenates strings
print('Hello, ' + name + '!')

The Python Shell can also perform some basic calculator operations:

In [None]:
# assign numeric values to variables
a = 2
b = 4
c = a+b

print(c)

The basic Python operators are:

- `+`: addition
- `-`: subtraction
- `*`: multiplication
- `/`: division
- `//`: integer division
- `**`: exponential multiplication
- `%`: modulus (aka the remainder in division)

You can find more information about Python operators & variable assignment [here](https://www.educative.io/blog/object-oriented-programming).

### Attributes vs. Functions

The variables we have created previously are also called **objects** and are the building blocks for our code. They have inherent ***attributes***, or properties about themselves we can access and use to inform our code. We can also perform ***functions*** on objects, which are a series of operations we can use to manipulate, extract, and/or alter the *attributes* of the object.

To access the **attributes** of an object, we typically use this notation: `attribute(object)`. This will return whatever specific property we want to know about an object. For example, if we want to know the length of a string, we can type `len(my_string)` and that will return the number of characters in the string.

**Functions** are utilized with the *dot* notation: `object.function()`. For example, if we wanted to determine the max value in a list of elevations, we would code `elevations.max()` into Python.

### String Operations
As strings are objects, they have specific attributes and methods associated with their properties. 

Some useful string methods include `strip()` (removes specific characters at beginning & end of strings), `split()` (divides a string based on an input criteria), and `replace()` (replaces specific character in string with another character):

In [None]:
# create a string with all the department names:
depts = '#AOPE; BIO; MC&G; MG&G; PO#'
print(depts)

# determine length of string using len()
depts = depts.strip('#')
print(depts)

# change the commas to semicolons
depts = depts.replace(';',',')
print(depts)

# split the string after each comma
depts = depts.split(', ')
# note that this function returns a list with all the new strings!
print(depts)

String methods have many applications outside just messing around with words. These operations can also help create and read in file names, work as input arguments for functions, and process data into a usable format. String functions are tools that allow us to extract the information we need, and organize data for analysis & visualization. To learn about more string functions, click [here](https://www.w3schools.com/python/python_ref_string.asp).

If you ever get confused on what a specific method or function does, you can use the `help()` function to access documentation:

In [None]:
help(type)

### Data Structures

There are several base data structures embedded in Python, and plenty more that can be accessed with specific libraries. First, we'll look at **lists**, which store items of various datatypes:

In [None]:
# generate a list of mixed variables
list1 = [1930, 260, 'WHOI']
print(list1)

# we can also index specific variables within a list using [] notation
# *NOTE* python indexing starts at 0!
print(list1[2])

# we can iterate over existing items in our list
list1[2] = 'MIT-WHOI'
print(list1)

Another form of lists are called **tuples** which use () notation. They function the exact same way, except after data has been entered you *cannot change* items in a Tuple.

In [None]:
# generate a tuple of mixed variables
tuple1 = (1930,260,'MIT-WHOI')

# this next line should generate an error!
tuple1[2] = 'fish'

**Dictionaries** are another useful data type that use the `{}` notation. Values are assigned to keys in pairs that can be accessed by calling the keys in functions/scripts.

In [None]:
# create a dictionary of model parameters
params = {'sea water depth': 5, 'sediment discharge': 1000, 'island width': 300}

# print our sediment discharge value
print(params['sediment discharge'])

# we can iterate over existing values in our dictionary by calling the associated keys
params['sea water depth'] = 10
print(params)

### Indexing

Python uses bracket `[]` notation to index items from a string or list. An ***extremely*** important thing to remember is that indexing in Python ***starts at 0!!***. So if you want to access the first item in a list, the syntax would be: `my_list[0]`.

We'll start with strings:

In [None]:
# assign a string with your first name to a variable
first_name = 'megan'

# index the first 3 letters of this variable
print(first_name[0:3])

# we can also use negative numbers to index in reverse
print(first_name[-2:])

We can use the same indexing approach to lists:

In [None]:
# assign a list of odd numbers
odd_num = [1,3,5,7]

# index the 2nd list item
print(odd_num[1])

# if we want to change an item in the list, we index the position and assign a new object
odd_num[-1] = 11
print(odd_num)

# we can also add items to the ends lists with .append()
odd_num.append(13)
print(odd_num)

# and remove items with del
del odd_num[-1]
print(odd_num)

### Conditionals & Loops

`For` **loops** iterates through a specified range of items/time/etc., and performs tasks at each step. Let's practice by printing all items in a list:

In [None]:
# create a list with all the world's oceans
oceans = ['Arctic', 'Antarctic', 'Atlantic', 'Indian', 'Pacific']

# using a for loop, print every item in the list
for item in oceans:
    print(item)

`If/Elif/Else` **conditional statements** set criteria that objects need to pass through in order for specific functions to run.

In [None]:
# generate an input statement asking for an random integer
num = int(input('Input a random integer:'))

# print different statements based on numeric value
if num > 0:
    print(num,'is positive!')
elif num==0: ## the '==' means the object value has to match EXACTLY with the criteria
    print(num,'is zero!')
else:
    print(num,'is negative!')

You can also add multiple criteria to conditional statements. We use `and` when we need all criteria to be met, and `or` when at least 1 criteria needs to be met.

In [None]:
# generate an input statement asking for a random integer
num = int(input('Input a random integer:'))

print('Is your number between 1-10?')
if (num > 1) and (num < 10):
    print('Yes!')
else:
    print('No!')

### Libraries

One of the best features of Python is **libraries**, or commonly open sourced packages you can import into the software and use their built in functions. There are likely thousands out there, ranging from simple mathematical operations all the way to complex plotting & machine learning algorithms. Below I've outlined a couple libaries I've found useful when working with Python. Many of these are foundational libraries used often as well:

#### Click on the Library Name for a link to their website

__[numpy](https://numpy.org/):__ basic arthimetic operations all the way to complex mathematical functions

__[matplotlib](https://matplotlib.org/):__ basic plotting library *(import as matplotlib.pyplot)*

__[pandas](https://pandas.pydata.org/):__ super useful library for data processing and organization!! one of the main reasons to use python

__[seaborn](https://seaborn.pydata.org/#):__  fancier plotting library (aka prettier plots)

__[sklearn](https://scikit-learn.org/stable/):__ machine learning library with learning data sets and useful algorithms

__[cmocean](https://matplotlib.org/cmocean/):__ colorschemes imbedded within matplotlib specifically for commonly measured oceanographic parameters

Other useful libraries that may be of interest: https://wiki.python.org/moin/UsefulModules
(fair warning I have only used the ones I've outlined explicitly above, so cannot say anything about the utility of everything on this list!)

--------------

In order to use libraries in Python, we have to import them into the script. Typically this is done at the very beginning of files, which you will see in the applied examples notebooks.


In [None]:
## Installing Libraries
## If you have never used python or these libraries previously before, please run this code block by first removing the hashtags infront of the code, then press shift-enter

#import sys
#!conda install --yes --prefix {sys.prefix} numpy
#!conda install --yes --prefix {sys.prefix} matplotlib
#!conda install --yes --prefix {sys.prefix} pandas
#!conda install --yes --prefix {sys.prefix} random
#!conda install --yes --prefix {sys.prefix} sklearn
#!conda install --yes --prefix {sys.prefix} cmocean

## if you would rather install packages from the terminal/shell, you can find instructions from anaconda here: https://docs.anaconda.com/anaconda/user-guide/tasks/install-packages/

In [None]:
## Import Libraries
# Run this code block to activate the libraries in this notebook

import numpy as np #the "as" portion of the code allows you to rename a library (so you can type less when using it a lot!)
import matplotlib.pyplot as plt #I am accessing a specific portion of a library using the dot syntax
import pandas as pd
import random
from sklearn.linear_model import LinearRegression # we can also extract specific functions from parts of packages, in this case the linear regression function from sklearn
import cmocean

Let's explore the `cmocean` library, which contains useful colormaps for specific oceanographic parameters. We use the `help()` command to examine the cmocean documentation.

In [None]:
help(cmocean)

Let's look at what colormaps are in this library!

In [None]:
cmocean.plots.plot_gallery()

While `cmocean` may have a limited amount of methods, often times there are hundreds of functions within libraries. I highly encourage you read through their respective documentation if you're looking for a specific task you want to accomplish. You can also always Google questions! I often search "how do I do this oddly specific thing in Python" and actually get useful answers. I've also outline some resources at the end of this notebook I find helpful for using Python (whether brand new or quite experienced!) that may help with finding out how to do specific things.

There is one library I want to explore before diving into specific applications with our applied examples. `numpy` contains basic mathematical operations and the **array** datatype that is useful in modeling applications. This is one of the most foundational libraries in Python, as many other libraries are based on functions found in `numpy`.

### Numpy

First, we'll explore basic mathematical operations.

In [None]:
# access pi from the numpy library
print(np.pi)

# calculate the mean and standard devation of a list of values
scores = [98,72,85,53]
print(np.mean(scores))
print(np.std(scores))

The **array** datatype from the `numpy` library is incredibly useful for storing numeric data, whether imported or created within Python. Numpy arrays are analagous to one of the main data structures in ***MATLAB***, and has a series of [accompanying functions](https://numpy.org/doc/stable/reference/routines.array-creation.html) built into the numpy library.

In [None]:
# create a 1D numpy array
data1D = np.array([1,2,3])
print(data1D)

# to create an array a multidimensional array, add each row's values in another [] within the .array command
data3D = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(data3D)

You can also create empty, zeros, and ones arrays based on the dimensions of another array:

In [None]:
# create an empty array based on the 3D array from the cell block above
ones_data3D = np.ones_like(data3D)
print(ones_data3D)

# create a zeros array based on the same array
zeros_data3D = np.zeros_like(data3D)
print(zeros_data3D)

There are two main ways to create arrays of equally spaced values: the `.arange()` function divides values based on step size, while `.linspace()` returns a specified number of equally spaced values over an interval.

For most numpy array functions, The starting range value is **inclusive**, meaning Python/numpy will keep this value inside the array. The ending range value is **exclusive**, so Python/numpy does not input this value into the array. That means if you want a range of values from 0-100, the input arguments would be: `(0,101)`. An exception to this rule is `.linspace()`, who's stopping range value (`n`) is automatically assumed to be `n+1` based on the purpose of this function.

In [None]:
# create an array of values from 0:50 spaced by 5
print(np.arange(0,51,5))

# create an array of 3 equally spaced values from 0:18
print(np.linspace(0,18,3))

Basic statistical operations can be performed on arrays using the `.()` notation. Note that you do not have to call `np` before using these functions, as arrays are an embedded datatype within `numpy`!

In [None]:
# find the mean of the 3D dataset
print(data3D.mean())

# find the max of the 3D dataset
print(data3D.max())

# find the min of the 3D dataset
print(data3D.min())

## Applied Exercises: develop & practice your skills!

Let's practice some of our Python skills & applying libraries with these excercises!

[Example #1: Numerical Modeling Applications with Python - Diffusion of Random Particles](exercise1_diffusion_modeling.ipynb)

[Example #2: Exploring Data Science with Python - Analyzing Sea Level Trends](exercise2_slr_data.ipynb)

### This concludes the MIT-WHOI Summer 2021 JP Math Review, Intro to Python class! 
#### Below I've listed some more resources that may help in your respective Python journeys. Please don't hestitate to also contact me via Slack or [e-mail](mgillen@mit.edu) if you have any other questions or want to chat about all things Python!

### More Beginner's Python Resources:

[Python WikiPage Beginner's Guide](https://wiki.python.org/moin/BeginnersGuide/Programmers)

[Python for Non-programmers](https://wiki.python.org/moin/BeginnersGuide/NonProgrammers)

[Translating into Python from other languages](https://wiki.python.org/moin/MovingToPythonFromOtherLanguages)
- [Translating between R & Python](https://towardsdatascience.com/essential-guide-to-translating-between-python-and-r-7cb18b786e5d)
- [Numpy for MATLAB Users](http://mathesaurus.sourceforge.net/matlab-numpy.html)

[Google's Intro Python Course](https://developers.google.com/edu/python/)

[LearnPython.org](https://www.learnpython.org/)

[Large list of Free Python Resources!](https://hakin9.org/list-of-free-python-resources/)

### Other Useful Python Resources:

[Stack Overflow](https://stackoverflow.com/): likely has an answer to almost any programming question you can think of

[Anaconda](https://docs.anaconda.com/anaconda/): information on installation and uses of Anaconda, a helpful integrated Python tool

[GitHub](https://github.com/): data/code repository - use it as a way to keep records of changes to your codes (i.e. version control) and a way for collaborators to work on the same scripts & projects
    
- Very useful [GitHub workflow lesson](https://github.com/csdms/espin/blob/main/lessons/git/index.md) from CSDMS!

[OceanPython](https://oceanpython.org/): Python for Oceanographers & Marine Scientists! Cool (& probably relevant) oceanographic coding examples in Python.

Hannah Mark, former JP student, created great notes from the 2017 version of this class about Python & general best programming practices. If you would like access to these, [contact me](mgillen@mit.edu) and I will pass along her materials!

#### Cheat Sheets:
[Beginner's Python](https://ehmatthes.github.io/pcc_2e/cheat_sheets/cheat_sheets/)

[Matplotlib](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf)

[Numpy](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf)

[Pandas](http://datacamp-community-prod.s3.amazonaws.com/f04456d7-8e61-482f-9cc9-da6f7f25fc9b)

[Data Wrangling with Pandas](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)

[GitHub Commands](https://education.github.com/git-cheat-sheet-education.pdf)