# Introduction to Python for Geoscientists

This is a "Jupyter Notebook". Formerly an IPython Notebook. It is a good way to document workflows and analysis pipelines. You can have Python, R, Fortran, Julia, etc notebooks. This is a "Markdown" cell, you can write notes, equations in latex style, $E=\kappa A ^m (\nabla z)^n$, or embed figures. 

Today we are using a Python kernel to do some cool stuff with Geoscience data. The following cell is your first Python code

In [None]:
2+4*10

In [None]:
#This is a comment. This is for a human to read (so you remember what your code does!)
#Python ignores anything behind the '#'.

#The next line is an example of a 'variable'. Assign values using a single '=' sign.
time=145

In [None]:
#Now you can use that variable in different ways.... firstly print it out to the screen
print("The age of the sample is", time, " Million years")

In [None]:
#Make a new variable called 'endtime' and add a constant to our 'time' variable
endtime=time+56

In [None]:
#Nothing printed out above? Good. Jupyter Notebooks won't always do that so let's tell Python to print it.
print(endtime)

In [None]:
#Make a new 'string' variable
geological_age='Jurassic'

#Print out some useful information that includes our different variables
print("My sample is ", endtime, " Million years old from the", geological_age, " age.")

In [None]:
#Make a Python List object, similar to an array.
times=[1,4.5,5+3.2,geological_age,"Another string"]

print(times)

#There are many different types of data types and objects: 
#int, long, float, complex, String, Lists, Tuple, Dictionary, functions, etc

In [None]:
#indexing
print(times[0])

In [None]:
print(times[4])

That is the basics. Now we are going to load in some data and manipulate it.

## Loading data

In [None]:
#First we have to load some modules to do the work for us.
#Modules are packages people have written so we do not have to re-invent everything!

#The first is NUMerical PYthon. A very popular matrix, math, array and data manipulation library.
import numpy

#Pandas is a module that is great for dealing with tables of data
import pandas

#This is a library for making figures (orignally based off Matlab plotting routines)
#We use the alias 'plt' because we don't want to type out the whole name everytime we reference it!
import matplotlib.pyplot as plt 

In [None]:
#Set the variable name for the file we are loading in. 
#It is in the 'data' directory, and the file is called EarthChemCU.txt. 
#We are currently working in /examples.
filename = '../data/EarthChemCU.txt'

#Now read in the data
chemdata=numpy.loadtxt(filename, delimiter=',')
#chemdata <- the name of a variable we are making that will hold the table of data
#filename <- this is the name of the variable we declared above
#delimiter <- this is a csv file

### Want more details about a command/function we use?

In [None]:
#Try this help command
help(numpy.loadtxt)

### It is often a good idea to look at the data to have some idea with what you are working with

In [None]:
#What does the data look like. Print it out
print(chemdata)

In [None]:
#This is in the style: Latitude, Longitude(-180:180), Age(Ma), pp
#Print the dimensions of the data
print(chemdata.shape)

207431 rows! A good example for why we use Python and not something like Excel.

In [None]:
#Print the first row
print(chemdata[0,:])

In [None]:
#Print the third column. Note, Python counts from 0
print(chemdata[:,2])

In [None]:
#Print the first two columns for row id 2, 5 and 6. 
print(chemdata[[2,5,6],0:2])

In [None]:
#Plot the lats and lons, i.e. the first column vs the second column
plt.plot(chemdata[:,1],chemdata[:,0],'k.')
plt.title('Copper Deposit Data')
plt.ylabel('Latitude')
plt.xlabel('Longitude')
plt.show()

This does not look right... It is a messy dataset! This is not uncommon. 
Maybe the Lats/Lons are stored as Norhtings/Eastings for some samples. 
Maybe they are missing a decimal place.

Anyway, Python is a great tool to clean things up! Let's investigate further.

In [None]:
#Plot the Latitudes
plt.plot(chemdata[:,0])
plt.ylabel('Latitude')
plt.xlabel('Number')
plt.show()

#Plot the Longitudes
plt.plot(chemdata[:,1],'r')
plt.ylabel('Longitude')
plt.xlabel('Number')
plt.show()

In [None]:
#Clean up the data, remove anything outside lat lon extent

#Find all the "chemdata" column 1 (i.e. longitude) data points that are greater than -180, save it in a new variable
cudata=chemdata[chemdata[:,1]>-180]
#Repeat for less than 180
cudata2=cudata[cudata[:,1]<180]

#Repeat for latitudes less than 90
cudata3=cudata2[cudata2[:,0]<90]
#Repeat for greater than -90
cudata4=cudata3[cudata3[:,0]>-90]

print("We have removed", chemdata.shape[0]-cudata4.shape[0], "samples")

In [None]:
plt.plot(cudata4[:,1],cudata4[:,0],'k.')
plt.title('Copper Deposits from EarthChem.org')
plt.ylabel('Latitude')
plt.xlabel('Longitude')
plt.show()


In [None]:
#Let's make a nicer map

#Import another module called Basemap - great for plotting things on globes
from mpl_toolkits.basemap import Basemap
from sklearn import preprocessing

#Make new variables from our array (so it is easier to see what we are doing)
lats=cudata4[:,0]
longs=cudata4[:,1]
age=cudata4[:,3]

#######
## Make the figure
#######

#Create a figure object
fig = plt.figure(figsize=(16,12),dpi=150)

#Make the basemap, shade it and put down some other map symbols
pmap = Basemap(projection='hammer', lat_0=0, lon_0=0, #You can break loooong commands over multiple lines
           resolution='l')
pmap.drawmapboundary(fill_color='white')
pmap.fillcontinents(color='grey', lake_color='white', zorder=0)
pmap.drawmeridians(numpy.arange(0, 360, 30))
pmap.drawparallels(numpy.arange(-90, 90, 30))

#compute native map projection coordinates of lat/lon grid.
xh, yh = pmap(longs, lats)

#Make a scatter plot of the data coloured by age. Restrict the colour range between 0 and 2000
#And also set the 'plot' as a variable so we can reference it
mapscat = pmap.scatter(xh,yh,marker=".",c=age,vmin=0,vmax=100)

#Add a colourbar, 
cbar=pmap.colorbar(mapscat,location='bottom')
cbar.set_label('Age (Ma)')

# Add a map title, and tell the figure to appear on screen
plt.title('Age of Copper Deposits in the EarthChem.org database')
plt.show()