<img src="NB_images\portada.png" style="width:750px" align="center">

<h1><center>Introduction to Python for Geosciences</center></h1>

<h1><center>Session 3 - Text files, plots and multiplots. Widgets</center></h1>

<h2><center>Theory and demonstrations</center></h2>

<h3>Course created by</h3>  

Manuel David Soto

<a  id="toc"></a>

<h3>Table of contents</h3>

* [1 Jupyter extensions](#ext)
    * [1.1 Some important extensions](#some)
    * [1.2 Variable inspector](#inspect)

* [2 Reading and writing files](#rwfiles)
    * [2.1 Reading text files](#text)
        * [2.1.1 Reading single column text files](#rsingle)
        * [2.1.2 Reading multi column text files with missing values](#rmulti)
    * [2.2 Writing text files](#wtext)
        * [2.2.1 Writing simple text reports](#wreport)
        * [2.2.2 Writing single or multi column text files](#wcolumn)

* [3 Plots](#plots)
    * [3.1 Single plots](#splots)
        * [3.1.1 Point-line plots](#line)
        * [3.1.2 Scatter plots](#scat)
        * [3.1.3 Bar and pie plots](#bar)
    * [3.2 - Multiplots by subplot](#mplotsub)
    * [3.3 - Multiplots by loops](#mplotloop)
   
* [4 Interactive Notebook](#inter)
    * [4.1 Widgets](#wid)

<a  id="ext"></a>

# 1 Jupyter extensions

There is a way to give super powers to your Jupyter Notebook and make it closer to other powerful IDEs. By installing the Jupyter extensions your Notebook get 66 new features. Let's see some of them:

<a  id="some"></a>

## 1.1 Some important extensions

Some important extensions that you can activate as you wish and necessity:

* Ruler: Enables the Ruler CodeMirror feature
* ExecuteTime: Display when each cell has been executed and how long it took
* highlighter: Provides several toolbar buttons for highlighting a selected text within a markdown cell.
* **spellchecker**: Adds a CodeMirror overlay mode for Typo.js spellchecking
* Autopep8: Use kernel-specific code to reformat/prettify the contents of code cells. 
* **Scratchpad**: Adds a scratchpad cell to Jupyter notebook.

As Autopep8, some extensions require been installed before using them

<a  id="inspect"></a>

## 1.2 Variable inspector

In addition the the extensions in bold above, the `Variable Inspector` is perhaps the most important extension because it allow you to know easily the characteristics and value of all your available variables. 

In [None]:
# Importing random, a Python Standard Library (PSLI)

import random 

In [None]:
a = random.randint(1, 10)
b = random.randint(1, 10)

c = a + b

<div class="alert alert-block alert-warning"> <font size="6"> &#9757;&#127997;</font> <b> Question:</b> How do you know the value of a, b, and c without printing or executing them?
</div>

In [None]:
# Previously we used who

%who

In [None]:
a

The `Variable Inspector` extension and its icon is below allow to inspect the status of your variables:

<img src="NB_images\variable_inspector.png" style="width:35px" align="left">

<a  id="rwfiles"></a>

<h1>2 Reading and writing files </h1>

In geosciences we always have the necessity of reading and writing files which contain information that we want to process or delivery. Some common format (from simpler to complex) to us are:

* Text (columnar data separated by space , commas...)
* LAS (just a pre organized text files recognized by certain programs)
* Excel files (mainly arrange by columns)
* Image files (bmp, jpg, tiff,...)
* PDF (with text, images and/or tables)
* SEGY

As is common in Python, there are different ways to input or output information, depending on the format of the files and on the library or module you use. In this and the following session we are going to explore simple ways to get information (text files and images) in and out of Python.

<a  id="text"></a>

<h2>2.1 Reading text files </h2>

Text files are the most simple files used in any computer (regardless of the operative system) to save information in a format familiar to us. Some specific use of the text files are:

* Simple document like notes or letters
* Data in columnar shape
* Keepers of parameters or environment variables for programs or operative systems
* Scripts or programs on different programing languages
* Communication between machines, programs and persons
* Databases

Characters in text files where initially coded for electronic communication (teleprinters) base on different standards that in 1963 became the **ASCII** (American Standard Code for Information Interchange) code. Soon was clear that the ASCII code, which only has the English characters, some mathematical and especial symbols (128 characters), was insufficient for other written languages. So the ASCII started to evolves to different codes until we get to the actual **UTF-8** (Universal Transformation Format) which has 1,112,064 characters, enough for all human written languages and disciplines. The UTF-8 standard is the base in which the World Wide Web works.

More information on character encoding at: https://www.w3schools.com/charsets/

The `zn.dat` file is a very simple text file with just one column that records "a series of 118 assays for Zn (weight % of zinc) made at two meter intervals along a single sphalerite quartz vein (see image below) in the **Pulacayo Mine** in Chile (Bolivia, in fact)." (Middleton, 2000). Zn is used as tracer element for Au and Ag because it is a more mobile element and it is easier to analyze, so if Zn goes up, Au and Ag should also go up.

Data from: https://books.google.es/books/about/Data_Analysis_in_the_Earth_Sciences_Usin.html?id=mNsSAQAAIAAJ&redir_esc=y
<img src="NB_images\pulacayo_mine_bolivia.png" style="width:700px">

Pulacayo Mine south Bolivia. Source: https://www.mindat.org/loc-332.html


<img src="NB_images\vein_nalunaq_mine_australia.png" style="width:700px">

Quartz vein in subterranean mine in south Australia. Source:

https://www.mining-technology.com/deals-analysis/fosterville-south-acquires-three-gold-projects-from-ecr-minerals/

<a  id="rsingle"></a>

<h3> 2.1.1 Reading single column text files </h3>

To load the zn.dat we are going to use the `genfromtxt()` command of the Numpy library which creates an array of float numbers.

In [None]:
# Importing the Numpy library

import numpy as np

In [None]:
# Data loading and assignation. The Input folder in the working directory

zn = np.genfromtxt('Input/zn.dat') # Again this is a relative path

# zn = np.genfromtxt('C:\Users\Manuel David Soto\Documents\Python\Cursos\Geociencias\Session_3\Input\zn.dat') # Absolute path

print(zn, '\n')
print(type(zn))

You can use Unix commands inside your NB

In [None]:
pwd

In [None]:
ls

In [None]:
# Skiping the header of a better documented file

zn = np.genfromtxt('Input/zn_w_info.dat', skip_header = 2)

print(zn)
print()
print(type(zn))

In [None]:
# Look for specific points in the zn array and verify the type

print(zn[0])
print(zn[59])
print(zn[-1])
print()
print(type(zn[117]))

In [None]:
# Some basic parameters of the zn array

print('samples =',len(zn))
print('min =', min(zn))
print('max =', max(zn))

zn_range = max(zn) - min(zn)

print('range =', zn_range)

<a  id="rmulti"></a>

<h3> 2.1.2 Reading multi column text files with missing values </h3>

To have a file with just a column it is very rare, normally we have files with several numerical columns, sometime even with text. When you load a multi column file and array is created with the same number of columns, later each one can be defined as independent variables. Important, in order to operate with this variables, they have to have the same dimensions (same number of rows).

A common situation is to have several **missing values**, places where the measurements were not possible. These missing values used to be indicated with special numbers such as -999.000, or -999.2500 or just with an empty spaces. Numpy recognizes automatically empty spaces as missing values and replaced them by ***nan***, not a number.

We are going to load the *rampi.csv* file, which is a five-columns file that comes from a study of gold and associated elements (in ppm) in quartz veins in the **Rampi block prospect**, Indonesia (see image below from Google Earth):

<img src="NB_images\rampi_indonesia.png"  style="width:700px">

Data and information on the area in:

https://www.researchgate.net/publication/321018113_Occurrences_and_Characteristics_of_Gold_Mineralization_in_Rampi_Block_Prospect_North_Luwu_Regency_South_Sulawesi_Province_Indonesia/figures?lo=1


In [None]:
# Loading the multi column file

rampi = np.genfromtxt('Input/rampi.csv', skip_header = 1, delimiter=',')

print(rampi, '\n')
print(type(rampi))

In [None]:
# Variables assignation

au = rampi[:,0]
ag = rampi[:,1]
cu = rampi[:,2]
zn2 = rampi[:,3]

# we use zn2 in order to avoid the overwriting of the zn from the Bolivian data

pb = rampi[:,4]

print('Silver (Ag):\n', ag, '\n')
print(type(ag))

In [None]:
# Some basic operations cannot be executed because some arrays have missing values (nan)

print('samples =',len(rampi))
print('min =',np.min(ag))
print('max =',np.max(ag))

ag_range = np.max(ag) - np.min(ag)
print('range =', ag_range)

In [None]:
# Arrays with nan requires especial functions that ignores the missing values

print('samples =',len(ag))

print('min =',np.nanmin(ag))
print('max =',np.nanmax(ag))

ag_range = np.nanmax(ag) - np.nanmin(ag)
print('range =', ag_range)

<a  id="wtext"></a>

<h2> 2.2 Writing text files </h2>

It is common to write files in order to save simple texts and/or the value of a variable. In a pure Python way (no extra libraries used), let's start witting very simple text files, then another (report type) which incorporates the value of actual variables, and finally a columnar type similar to those we just loaded.

In [None]:
#  A very simple text file

# Open the file to write

file = open("Output/simple.txt", "w")

# Write

file.write("Today is 13/07/2023\n")
file.write("\n")
file.write("This is a text file. \n")
file.write("\n")
file.write("In fact, a very simple one. \n")
file.write("\n")
file.write("End of the simple file.")

# Close the file

file.close()

# look at your home directory

In [None]:
# Writing elements in a single-column text file

elementsRampi = open('Output/elements_rampi.txt', 'w')

for i in range(5):
    name = input('Enter the element: ')
    elementsRampi.write(name + '\n')

elementsRampi.close()

In [None]:
# Reading elements in a single-column text file and printing them

elementsRampi = open('Output/elements_rampi.txt', 'r')

for elements in elementsRampi:
    print(elements)

elementsRampi.close()

<a  id="wreport"></a>

<h3> 2.2.1 Writing a simple text report </h3>

Now let's write a simple text report which incorporates the value of actual variables calculated with the Bolivian data (*zn.dat*). For this purpose we are going to use the `str()` function that converts the value of the variables in to strings that can be included in the text file:

In [None]:
# Text file with actual variables

file = open("Output/zn_basic_stat.txt","w")

file.write("Very simple Statistical parameters. \n")
file.write("Weight % of Zinc in a quartz vein. \n")
file.write("Pulacayo nine, Bolivia. \n")
file.write(" \n")
file.write("Minimum value          ="+str(np.min(zn))+"\n")
file.write("Maximum value          ="+str(np.max(zn))+" \n")
file.write("Range                  ="+str(zn_range)+" \n")
file.write("Mean                   ="+str(np.mean(zn))+" \n")
file.write(" \n")

file.close()

<a  id="wcolumn"></a>

<h3> 2.2.2 Writing single or multi column text files </h3>

After manipulating your data you could have the necessity of exporting your results to other users or programs. The previous file is more suitable for single or few variables. For massive variables with a lot of rows it is more convenient to export then in single or multi columnar text files. Contrary to the previous examples, here it is important to take care of the format in which those variables are written. Maybe you need just four decimals or scientific notation. Here are two references about numbers and strings format:

https://pyformat.info/

https://mkaz.blog/code/python-string-format-cookbook/

In [None]:
# Ratios calculation of same-size variables

cuau_rat = cu/au
znau_rat = zn2/au
pbau_rat = pb/au

cuau_rat

In [None]:
# Writing a variable to a single column file

np.savetxt('Output/cuau_rat.txt', cuau_rat, header='cuau ratio',fmt='%1.4f')

# fmt='%1.4f' write a float with this format: 20.2431

In [None]:
# Writing a multi column text file

    # Creating the empty container (array) for the variables

ratios = np.zeros((len(rampi), 3))
print('ratio is a :', type(ratios))
ratios

In [None]:
# Filling the container with variable of the same dimension

ratios[:,0] = cuau_rat
ratios[:,1] = znau_rat
ratios[:,2] = pbau_rat

ratios

In [None]:
# Writing the whole container to a multi column text file

np.savetxt('Output/ratios.txt', ratios, delimiter=',', header='cuau,znau,pbau,', fmt='%1.2f')

# fmt='%1.2f' write a float with this format: 20.24

<a  id="plots"></a>

<h1> 3 Plots </h1>

We are graphical animals, for us is easier to make conclusion about a data set base on a bad graphic than on a sophisticated table. Python help us on that as no other programing language. Let's see some basic plots such as:

* Simple point or line plot
* Scatter plot
* Bar plot
* Multi plots

**Matplotlib** is the library we are going to use for all plots related issues. A huge variety of examples, with its respective code, can be seen at:

https://matplotlib.org/3.2.1/gallery/index.html

The references in the Matplotlib site use to be complicated for beginners user, they are more suitable for computer scientist because they involve complex topic on programming like objects, sequences, attributes, ... Look at this example on sine plot and compare it the one at the end of the previous session:

https://matplotlib.org/examples/pylab_examples/pythonic_matplotlib.html

Don't be afraid of picking whatever you need from this explanations and make you own version.

The methodology (paradigm) of coding presented in the Matplotlib web site, is call **object-oriented programming**, is powerful but more complicated. Here we are going to keep things simple and work with a simpler methodology calls **procedural or imperative programming** in which calculations or functions are executed basically one after the other.

<a  id="splots"></a>

<h2> 3.1 Single plots </h2>

By single plots we mean plots that are alone or isolated in the Jupyter output or in a file.

<a  id="line"></a>

<h3> 3.1.1 Point-line plots </h3>

For this demostration we are going to use the same data as the previous section, the data from Pulacayo Mine (*zn.dat*) and from the Rampi prospect block (*rampi.csv*). The `import` and data loading are repeated so you can run the notebook from any point.

In [None]:
# Importing libraries

import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Loading the Pulacayo data, single zinc concentration (weight %)

zn = np.genfromtxt('Input/zn.dat')
print('Length of the zn file: ', len(zn))

In [None]:
# A simple point-line plot of the original data. Just in the same file order, no x is involved.

plt.plot(zn)

plt.plot(zn, '.')

In [None]:
# Let's plot concentration of zinc against its real location along the vein, each 2 m

# This will be a cross plot (xy plot) with axes label.

x = range(0, 2*len(zn), 2)

# remember the sintaxis is: range(start, end, step)

plt.figure(figsize=(8, 5))

plt.plot(x, zn)

plt.xlabel("Distance (m)")
plt.ylabel("Weight %")

plt.title("Zn along quartz vein")

plt.grid()

plt.show()

In [None]:
# Let's show the mean value in the previous plot

y = np.mean(zn) # y is a single value

plt.figure(figsize=(8,5))
plt.plot(x, zn)

# axhline put an horizontal line base on y

plt.axhline(y=y, ls="-", color='red', label="mean")

plt.legend()

plt.xlabel("Distance (m)")
plt.ylabel("Weight %")
plt.title("Zn along quartz vein")
plt.grid()

plt.show()

In [None]:
# Fill between curves

plt.figure(figsize=(8,5))
plt.plot(x, zn)
plt.axhline(y=y, ls="-", color='black', label="mean")

plt.fill_between(x, zn, y, where=(zn >= y), facecolor='b', interpolate=True, label="over mean")
plt.fill_between(x, zn, y, where=(zn <= y), facecolor='r', interpolate=True, label="under mean")

plt.xlabel("Distance (m)")
plt.ylabel("Weight %")
plt.title("Zn along quartz vein")
plt.legend()
plt.grid()
plt.show()

In [None]:
# Loading the Rampi data with five elements concentration (ppm), and variables definition

rampi = np.genfromtxt('Input/rampi.csv', skip_header = 1, delimiter=',')

au = rampi[:,0]
ag = rampi[:,1]
cu = rampi[:,2]
zn2 = rampi[:,3]
pb = rampi[:,4]

In [None]:
# Multi point-line plot of the original data, just in the same file order, no x is involved, with grid.

plt.figure(figsize=(8,5))

plt.plot(au, 'y', label="Au")
plt.plot(ag, 'c', label="Ag")
plt.plot(cu, 'r', label="Cu")
plt.plot(zn2,'m', label="Zn")
plt.plot(pb, 'b', label="Pb")

plt.title("Rampi elements")
plt.xlabel("File order")
plt.ylabel("Concentration(ppm)")
plt.legend()
plt.grid()
plt.show()

<a  id="scat"></a>

<h3> 3.1.2 Scatter plots </h3>

Scatter or cross plots are probably the most used plots in geosciences because they give you an idea about the relation of two variables

In [None]:
# Cross plot or scatter plot

plt.figure(figsize=(8,5))

plt.plot(au,ag,'.')

# plt.scatter(au,ag)

plt.title("Au vs. Ag")
plt.xlabel("Au (ppm)")
plt.ylabel("Ag (ppm)")
plt.grid(True)

# The range of the axis can be modified with

# plt.axis([0 , 1, 0, 5])
# plt.xlim(0 , 5)
# plt.ylim(0 , 1)

plt.show()

In [None]:
# Cross plot in log scale

plt.figure(figsize=(8,5))

plt.loglog(au,ag, 'o')

# plt.semilogx(au,ag,'.')

# For just one axe used semilogx or semilogy

plt.title("Au vs. Ag")
plt.xlabel("Log Au (ppm)")
plt.ylabel("Log Ag (ppm)")
plt.grid(True)
plt.show()

<a  id="mplotsub"></a>

<h2> 3.2 Multiplots by subplot </h2>

With the Matploplib's **subplot** function it is possible to gather several single plots in a composition of plots, arranges base on a rectangular grid of raw and columns. The arguments (2,2,1) in the following command:

`plt.subplot(2,2,1)`
    
mean the first plot in an arrange of two rows and two columns of plots, something like this:

<img src="NB_images\multiplot_example.png"  style="width:700px">

In [None]:
# Building and saving a 2 x 3 multiplot with the Rampi data

plt.figure(figsize=(16,10))

plt.subplot(2,3,1)

# Point_line plot
plt.plot(au, 'y', label="Au")
plt.plot(ag, 'c', label="Ag")
plt.plot(cu, 'r', label="Cu")
plt.plot(zn2,'m', label="Zn")
plt.plot(pb, 'b', label="Pb")
plt.title("Elements")
plt.xlabel("File order")
plt.ylabel("ppm")
plt.legend()
plt.grid(True)

plt.subplot(2,3,2)

# Cross plot
plt.loglog(au,ag,'.')
plt.title("Au vs. Ag")
plt.xlabel("Log Au (ppm)")
plt.ylabel("Log Ag (ppm)")
plt.grid(True)

plt.subplot(2,3,3)

# Cross plot
plt.loglog(au,cu,'.')
plt.title("Au vs. Cu")
plt.xlabel("Log Au (ppm)")
plt.ylabel("Log Cu (ppm)")
plt.ylim(1,200)
plt.grid(True)

plt.subplot(2,3,4)

# Cross plot
plt.loglog(au,zn2,'.')
plt.title("Au vs. Zn")
plt.xlabel("Log Au (ppm)")
plt.ylabel("Log Zn (ppm)")
plt.ylim(1,200)
plt.grid(True)

plt.subplot(2,3,5)

# Cross plot
plt.loglog(au,pb,'.')
plt.title("Au vs. Pb")
plt.xlabel("Log Au (ppm)")
plt.ylabel("Log Pb (ppm)")
plt.ylim(1,200)
plt.grid(True)

# Avoid the overlap between labels

plt.tight_layout()

# Saving multiplo to file

plt.savefig("Output/multiplot.pdf")
plt.savefig("Output/multiplot.png")
plt.show()

<a  id="mplotloop"></a>

<h2> 3.3 Multiplots by loop </h2>

Multiplots can be built by a loop, keeping the code shorter and more efficient. Let see an example with the Rampi data: 

In [None]:
# Elements, colors, and position in the vein according to the rampi array

elements = ['Au', 'Ag', 'Cu', 'Zn', 'Pb']
colors = ['y', 'c', 'r','m','b']
x = range(0, 2*rampi.shape[0], 2)

# Loop according to the number of columns in the rampi array

for i in range(rampi.shape[1]):
    
    plt.figure(figsize=(16,4))
    
    # Line plot of the element
    plt.subplot(1,2,1)
    plt.plot(x, rampi[:,i], color=colors[i], alpha=0.5)
    plt.xlim(0, 65)
    plt.xlabel('Position in the vein(m)')
    plt.ylabel('Concentration(ppm)')
    plt.title('Plot of '+elements[i])
    plt.grid(True)
    
    # Histogram of the element
    plt.subplot(1,2,2)
    plt.hist(rampi[:,i], color=colors[i], alpha=0.5)
    plt.ylabel('Frequency')
    plt.xlabel('Concentration(ppm)')
    plt.title('Histogram of '+elements[i])
    plt.grid(True)
    plt.savefig("Output/plot_"+str(elements[i])+".png")  
     
# plt.tight_layout()
# plt.savefig("Output/multiplot_loop.png")      
plt.show()        

In summary we have:

| Multiplot by | Number of plots | Number of Effective lines of code |
| --- | --- |--- |
| Pure subplot, part 3.2 of this NB| 5 | 43 |
| For loop | 10 | 20 |

The way forward is clear. Whenever possible, try to use for loops to build your multiplost, especially if you are repeating the same set of plots.

<a  id="wid"></a>

# 4 Widgets

Widgets are interactive controls or GUI tools that bring your Jupyter Notebook to life. There are many sources and types of widgets, however, all of them are focused on facilitating the interaction between the user and the notebook, minimizing the direct modification of the code. The main widgets are:

* Basic form controls like sliders, checkboxes, text inputs
* Tabs, accordions, horizontal and vertical layout boxes, grid layouts
* Advanced controls like maps, 2d and 3d visualizations, datagrids, and more

One of the main library of widgets is: https://ipywidgets.readthedocs.io/en/stable/

A good summary on this topic at: https://towardsdatascience.com/bring-your-jupyter-notebook-to-life-with-interactive-widgets-bc12e03f0916

Let's see an example of widgets, an interactive slider:

In [None]:
# Importing libraries

import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets

In [None]:
# Sin plot with interactive slider

# Plotting function

def plot_func(freq):
    plt.figure(figsize=(11,5))
    x = np.arange(0, 2*np.pi, 0.01) # 2*np.pi is 360º
    y = np.sin(x * freq)
    plt.plot(x, y)
    plt.xlabel('Radians')
    plt.ylabel('Amplitude')
    plt.grid()

In [None]:
# The widget receives the plotting function

widgets.interact(plot_func, freq = widgets.FloatSlider(value=1, min=1, max=20, step=0.1))
plt.show()