# Geochemistry 461: Python Introduction

## Why Learn Python Programming in Geochemistry ?

Any geochemistry study requires some form of the following workflow:

1. Get data (simulation, experiment, field collection)
2. Manipulate, process data, and test with theoretical or conceptual models
3. Visualize results, quickly to understand, but also with high quality figures, for reports or publications

[Python](https://docs.python.org/3.7/) provides a powerful environment for this kind of work because:

- **Batteries included** Rich collection of already existing bricks of classic numerical methods, plotting or data processing tools. We don’t want to re-program the plotting of a curve, a Fourier transform or a fitting algorithm. Don’t reinvent the wheel!
- **Easy to learn** Most scientists are not payed as programmers, neither have they been trained so. They need to be able to draw a curve, smooth a signal, do a Fourier transform in a few minutes.
- **Easy communication** To keep code alive within a lab or a company it should be as readable as a book by collaborators, students, or maybe customers. Python syntax is simple, avoiding strange symbols or lengthy routine specifications that would divert the reader from mathematical or scientific understanding of the code.
- **Efficient code** Python numerical modules are computationally efficient. But needless to say that a very fast code becomes useless if too much time is spent writing it. Python aims for quick development times and quick execution times.
- **Universal Python** is a language used for many different problems. Learning Python avoids learning a new software for each new problem.
- **Free software** released under an open-source license: Python can be used and distributed free of charge, even for building commercial software. 
- **Multi-platform** Python is available for all major operating systems, Windows, Linux/Unix, MacOS X

## Python libraries

Python is modular and uses libraries to accomplish specific tasks or objectives. The most important libraries that we'll load almost every coding session are:

### Numpy

[NumPy](https://docs.scipy.org/doc/numpy/user/whatisnumpy.html) is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

### Pandas

[Pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/overview.html) is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.

### Matplotlib

[Matplotlib](https://matplotlib.org/)Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.Matplotlib tries to make easy things easy and hard things possible. You can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code. 

### Importing libraries

At the start of each notebook or script, you have to import the needed Python packages.  This is done with the `import` function.  Each library is given a shorthand abbreviation to shorten the code and make it more readable.  Any function within the imported libary can then be called by using hte abbreviation, followed by a period.  For example, numpy contains a function for the number pi and be imported by typing: `np.pi`

* **Execute the code block below, by clicking within the cell and hitting return**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Getting help

Direct links to each of the important libraries can be found under the Help dropdowon menu a the top of the page. In addition, you can get help for any function by typing a `?` before or after the function. For example:
* **Execute the code block below to get help for the numpy sin function**

In [None]:
np.sin?

## Basic math with Python and Numpy

* **Execute each of the following cells to see the results of the operation**

In [None]:
# Addition (or Subtraction)

2.0 + 4.5

In [None]:
# Multiplication (or Division)

2 * np.pi

In [None]:
# Raising to a power

3.0**2

In [None]:
# Square root

np.sqrt(9.0)

In [None]:
# Exponenentials

np.exp(1)

## Solving equations and writing functions

We can combine operations to evaluate complex equations. Consider the value of the equation $x^{3}−log(x)$ for the value $x = 4.1$. 

In [None]:
x = 4.1
x**3 - np.log(x)

We can also lexpress this equation as a new function, which we can call with different values througout our code. 

In [None]:
def f(x):
    "comment about what this function does"
    return(x**3 - np.log(x))

In [None]:
f?

In [None]:
f(4.1)

In [None]:
f(3)

## Basic plotting with Matplotlib

The Matplotlib [Usage Guide](https://matplotlib.org/tutorials/index.html) and [Pyplot Tutorial](https://matplotlib.org/tutorials/introductory/pyplot.html#sphx-glr-tutorials-introductory-pyplot-py) under the link the Help dropdown menu are the best starting points.

* **The first block of code below, plots the value of y calculated from a range of x values using our function above**

In [None]:
x = np.linspace(0.1, 2, 100)
x

In [None]:
# create a varaible x, with 100 points evenly spaced between 0.1 and 2
x = np.linspace(0.1, 2, 100)

# calculate y using our function created above
y = f(x)

# plot x vs y
plt.plot(x, y)

# label the x and y axes and give the plot a title
plt.xlabel('x');
plt.ylabel('y = x**3 - log(x)');
plt.title('Our first plot');

* **You can also combine multiple plots into one using the `plt.subplot` command**

In [None]:
x = np.linspace(0.0, 5.0, 75)

y1 = np.cos(2 * np.pi * x) * np.exp(-x)
y2 = np.cos(2 * np.pi * x)

plt.subplot(2, 1, 1)
plt.plot(x, y1, 'o-')
plt.title('Two plots in one graphic!');
plt.ylabel('y1');

plt.subplot(2, 1, 2)
plt.plot(x, y2, '.-')
plt.xlabel('time (s)');
plt.ylabel('y2');

## Importing and summarizing data with Pandas

The Pandas library allows one to import data from a wide variety of data formats. Unlike Numpy, that only allows for matrices and array of the same data type (text, or numeric), pandas imports data of mixed format common to many geological datasets.

### Import data with pd.read_csv

Data should be stored from spreadsheet software in a `.csv` format (comma separated value.  The data can then be imported into a Pandas dataframe with the [read_csv](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html) command `pd.read_csv('/path_to_filename/filename.csv')`.

You can then view the first few rows of a data frame `df` with the command `df.head()` or the last few rows with `df.tail()` to confirm the data imported correctly.

* **Execute the code block below to import the file morb-bulk.csv and see the first few rows of data**

In [None]:
df = pd.read_csv('../data/morb-bulk.csv')
df.head()

### Get quick summary statistics of the data

Once imported, the `describe` command quickly summarizes basic statistics of each numberic column.  

* **Execute the code block below to generate a summary of the data**

In [None]:
df.describe()

### Group data by a category

We commonly want to comapre different classes of data (geologic units, watersheds, analytical methods).  The `groupby` command will group data by a common values in a text column.  You can then summarize values by group.

* **Execute the code block below to compare whole rock FeOt values measured by EPMA and XRF**

In [None]:
grp = df.groupby('Method')
grp.FeOt.describe()

## Plotting data stored in pandas dataframes

Plotting from Pandas is similar to plotting values as above. To plot any value in a dataframe `df` call that value in the plot command with `df.column_name`.

* **Execute the code below to plot dataframe values of FeOt vs MgO**

In [None]:
# plot total FeO vs MgO for all of the samples
plt.plot(df.FeOt, df.MgO, 'ko')

# label the axes
plt.xlabel('FeOt');
plt.ylabel('MgO');

### Group samples and plot by group

Using a `for loop` we can plot groups of data with different colors or symbols.

* **Execute the code below to plot the same data as above, but grouped by analytical method**

In [None]:
# plot 
for name, group in grp:
    plt.plot(group.FeOt, group.MgO, 'o', label = name)

# label the axes and provide a legend
plt.xlabel('FeOt');
plt.ylabel('MgO');
plt.legend(title='Method');