# Exercise 2
## Functions
***

### Question
How can I write code that is easily used again and again?

### Objectives
<div class=obj>
<ol>
    <li>Get to know what a library is and what they are used for.</li>
    <li>Learn to write simple functions.</li>
</ol>

<ul>
Revise:
    <li>Defining variables;</li>
    <li>Using Python for maths;</li>
    <li>Writing loops.</li>
</ul>
    
</div>



### Independent coding
Write a function to calculate the distance between two locations on Earth.



## 2.1 Libraries
***

Before looking at writing functions in our own code, let's investigate the pre-written pieces of code that are available through Python.  These bits of pre-written code are called __libraries__.  Libraries in Python provide additional commands, beyond what the basic language gives you, and using these libraries can massively increase the efficiency with which you write code.  Libraries do a lot of the hard work for you.

Each time you use Python you have a standard set of commands, some of which we have already used, such as `print()`.  These commands are part of Python's 'built-in' library, you can see the full list of these __[here](https://docs.python.org/3.3/library/functions.html)__.

However, you will quickly want to go beyond this basic set of commands for scientific computing, and although you _could_ code everything you want out of pure built-in Python commands, that way lies madness.  

### 2.1.1 How to use libraries
Take for example trigonmetric functions

In [None]:
cos(3.14)

You have (perhaps...) just experienced your first error message, congratulations!

Although error messages can look like random junk output, they contain *very* useful information.  Learning to read them is essential to being able to program, because you are not going to write grammatically perfect code without some (many) errors along the way.

The key part of this error message is
> NameError: name 'cos' is not defined
.

This is telling us that the function we have tried to access, cos, is not defined.  WHAT!? Trigonometric functions aren't directly available in Python!?

The solution to this is to import a library that does include those functions.  We will use the __[NumPy](https://numpy.org)__ library, which contains lots of numerical tools for Python.

In [None]:
import numpy

numpy.cos(3.14)

Now we can do simple maths!  We use `numpy.name_of_function` to access functions within numpy.

However, you may find it frustrating that we have to write `numpy.` each time we want to write a mathematical expression.  There are a couple of solutions to this

In [None]:
# 1. we can import the NumPy library and give it a new, shorter, name
# *** This is the Best Solution! ***
import numpy as np

np.cos(np.pi)

In [None]:
# 2. we can import a specific function from NumPy that we know we are going to use lots 
#    then we can refer to it directly by its name
from numpy import cos

cos(np.pi)

In [None]:
# ...but we would have to do this for every function :-(
sin(np.pi)

In [None]:
# 3. if we were brave we could just load _everything_ from numpy so we have access to it all directly
# the '*' symbol is a wildcard, and just grabs everything
from numpy import *
sin(1.5*pi)

This last solution is **BAD**, don't do this!  The reason is we now have lots of names defined for functions we are not interested in and probably aren't aware even exist, yet we have just made all of these available to ourselves.  These just clutter up the place and we might accidently access them without realising it, or redefine them with our own functions.  It is the coding equivalent of emptying your entire pencil case onto the desk when all you want is a single biro.  Don't be that person.

Keep things tidy and just import what you need from libraries, or keep all the functions as explicit members of the library, like we did with `np.cos()`.

### 2.1.2 Trying to make do without libraries

There any many tasks you will want to perform that would be awkward to have to code from scratch.  In software like Matlab or excel, all the 'extra' functionality like plotting is an intrinsic part of the program.  In Python, that's not the case, many specific tasks like plotting are best achieved by loading in libraries that contain the extra functions and tools for you to use.  The libraries will make the tasks you want to do much much much simpler.  If you have used Matlab, think of these as like Toolboxes.

Let's take a look at calculating the sample standard deviation to illustrate this point.  

\begin{equation}
\sigma = \Bigg(\frac{\sum(x_i-\mu)^2}{N-1}\Bigg)^{0.5},
\end{equation}
where $x_i$ is the i'th value from the sample, $\mu$ is the mean, $N$ is the total number of data in the sample.

First, let's code this up without using any additional libraries and really very little of Python's in-built functionality.

In [None]:
# ----------------- The very long way ------------------------

#Let's create a fake dataset, storing it in a list object
fake_data = [0, 1, 2, 5, 1, 0, 100, 32, 3001, 5894, 22]

#Now, let's calculate the standard deviation
# we first need to calculate the mean
ssum = 0  # This variable is going to store our sum of the data to calculate the mean
count = 0 # This variable is going to count how many data we have to give us N
#We are going to loop through our dataset adding up every value
for f in fake_data:
    # a += b, means 'add b onto a and save the result as a', equivalent to a = a+b
    ssum += f
    count += 1
mean = ssum/count

#now we have the mean, we can calculate the standard deviation
# we reset ssum to 0 so that we can use it now to store the sum of the (data - mean)**2
# i.e., the numerator of our standard deviation calculation
ssum = 0
for f in fake_data:
    ssum += (f-mean)**2

#we finish by dividing the sum by the number of data minus 1 and taking the square root.
std = (ssum/(count-1))**0.5
print(f"Calculating the sample standard deviation...\nThe hard way: {std:.0f}")

We can actually do it a little more easily than this, as Python has a built in function to calculate the sum of numbers in a list, `sum()`, and to determine the length of lists, `len()`.

In [None]:
# ----------------- A shorter way ------------------------
#Without loading any packages, python is a bit cleverer than this
ssum = sum(fake_data)
mean = ssum/len(fake_data)

ssum = 0
for f in fake_data:
    ssum += (f-mean)**2
    
std = (ssum/(len(fake_data)-1))**0.5
print(f"The less hard way: {std:.0f}")

In [None]:
# ----------------- The easy way ------------------------
#import numpy
import numpy as np

#We need to specify we want N-1 for the sample standard deviation rather than N-0, the population stdev
# which is what numpy.std calculates by default.
std = np.std(fake_data, ddof=1)
print(f"The easiest way: {std:.0f}")

## 2.2 Lists vs. arrays
***

In __[Exercise 1](Exercise1.ipynb)__ we saw how performing arithmetic on an object containing multiple numerical entries required looping over the object, element by element.

The use of libraries, and in particular NumPy offers an alternative to this, the __[NumPy array](https://numpy.org/doc/stable/reference/generated/numpy.array.html)__.  From a scientific computing perspective, NumPy arrays are a much more flexible object for storing numbers and manipulating them than the python's native list format.

Let's see what arrays can do, going back to our simple problem of addition.

In [None]:
#let's initialise an array, note we still need to use square brackets inside of the parentheses of the .array() method.
test = np.array([1,2,3,4])

#now perform our simple addition to it
test + 9

Ahh, now that was simple!

Of course, our array now does everything we expect of it.

In [None]:
print("subtraction:", test - 1 )
print("multiplication:", test * 3 )
print("division:", test/5)

NumPy is using __broadcasting__ to loop the arithmetic operation across the whole array for us, so we don't have to write out the loop ourselves.  This is a clever trick, and one that has some __[big benefits](https://numpy.org/doc/stable/user/basics.broadcasting.html)__.

It even works multiplying arrays together.

In [None]:
test2 = np.array([13, 14, 15, 16])
print( test * test2 )
print( test / test2 )

But, you have to be careful the arrays are the same size otherwise NumPy doesn't know how to broadcast them...  

In [None]:
test3 = np.array([13, 14, 15])

test + test3

## 2.3 Declaring a function
***

Although we can load a lot of functions into Python using libraries to do work for us we often want to perform a calculation, or a linked series of operations, which isn't exactly done by any existing library.  For this we need to write our own functions.

You have already seen a function in NumPy's cos and sin (`np.cos()`).  Functions are a name, that can be lower or upper case regular characters, followed by parentheses `()`.  The parantheses can contain information we want to pass to the function, such as values of variables, but they can also be left empty.

You can write your own functions in order to package up calculations that you would otherwise have to write out repeatedly when you need to use them again later in your code.

The syntax for declaring a function is given below

In [None]:
def myFunc(name):
    print('My name is', name)

Notice that the function uses **indentation**, just like with for loops, to associate the following lines of code with the function.

Because the function has something in its parantheses, in this case `name`, this is telling us that `myFunc()` is expecting an **argument** to be passed to it and that argument once it is inside the function will be called `name`.  We could use any valid variable name in place of `name` - change `name` to something else and see what happens.

Now, try running `myFunc()` without passing it any arguments (leaving the parantheses empty).

In [None]:
myFunc()

The error message you get is again very informative 
>TypeError: myFunc() missing 1 required positional argument: 'name'

This is telling us that an argument that the function is expecting, what we have called `name` here, is missing.

We often want a function to perform an operation and give us back the output in a form we can use. Simply printing something from the function does not do this, printing just has the program shout the result at us and then it immediately forget what it has said.  To remember the result we use a `return` statement in the function, to return an object to us which we can then store.

In [None]:
#lets define a simple function
def fetch(stick):
    return 'Look, I found your ' + stick + '!!!'

#run the function, storing the output
found = fetch('Hand Lens')

#let's look at the output
print(found)

## 2.4 A complete function example
***
These were trivial examples, now let's write a function to calculate sin, to see how bad it would be to do the job of NumPy.  

You might not follow all bits and pieces in this function, but enough should be familiar to give you an idea of what functions can do.  For reference, we are going to be using a Taylor expansion of sin so that we can estimate it with just simple arithmetic operators (multiplication, division, addition, subtraction).  For the maths behind this look __[here](https://en.wikipedia.org/wiki/Taylor_series)__, where the Taylor expansion of sin(x) is included as a specific example.

Before we write our function, we need to introduce one new tool that Python has, which is the `%` operator (called the modulus operator).  This performs division and leaves the remainder, for example

In [None]:
print( 19%3 )
print( 19%5 )

This operator is useful for identifying odd numbers:

In [None]:
print( 4%2 )
print( 25%2 )

You will see how we can use this to our advantage in the following code to implement our own version of sin(x).

In [None]:
#let's start be defining our function, and naming it in honour of geology
# it will take one argument, x
def geo_sin(x):
    #first we have to deal with the fact that sin is periodic
    # so we need to reduce the range of our input down to fall between -pi/2 and pi/2
    # ...which means we need to know pi, which we would normally get from NumPy.  Oh dear.
    # Let's just google the answer like any good programmer
    pi = 3.14159265358979323846264
   
    #now we can try and reduce the range of the input variable
    # a first step is to get rid of all but the fractional number of pi that we have
    # we can do this using Python's int() function, which converts a number to an integer, always rounding down
    xs = x - pi*int(x/pi)
    #but this isn't enough!
    # now we need to check if the result is odd or even, to decide where in sin's 
    # infinitely repeating cycle we are (comment these next bits out if you want to see the effect
    # of forgetting to do this)
    if int(x/pi)%2 == 0:
        #its even! the remainder = 0
        e = 1
    else:
        #its odd! the remainder does_not= 0
        e = -1
    #if it is odd, we want to flip the sign of xs
    xs = xs*e
    
    #we can use a Taylor expansion of sin(x) to code it using simply arithmetic operators
    a = xs - xs**3/(3*2) + xs**5/(5*4*3*2) - xs**7/(7*6*5*4*3*2) + xs**9/(9*8*7*6*5*4*3*2)
    #...and we should probably keep going to improve accuracy!
    # remove some of the later terms if you want to see how bad it gets to reduce the order of the
    # Taylor expansion
    
    #we want the function to give us back the answer, so we use return
    return a

Now, let's test how well we have done.  

Again, don't worry about following the parts of the code that are unfamiliar, we will cover them in later examples, but the comments should give you an overview of what is going on. (Make sure you have run the code cell above so that the function is available to the next cell).

In [None]:
#we are going to visualise our results
import matplotlib.pyplot as plt
#if you have called NumPy earlier in the notebook then this isn't necessary, but it is harmless
# to call it again here just to make sure we have it handy
import numpy as np

#let's generate some x-values (radians) from -10 to 10 to calcualte our sin function over
x = np.linspace(-10,10,1000)

#let's populate our geo_sin calculated values, this is a shorthand way of looping over 
# all of the values in x and passing them to geo_sin to be calculated and then storing the 
# output in a list
y_geo = [geo_sin(i) for i in x]

#we can do the same for numpy's sin function, but it is cleverer than our function, and can 
# just take the x values directly
y_npy = np.sin(x)

#now lets view how we have done
fg, ax = plt.subplots(1)
#Our geo sin function is going to be an orange line and we are going to give
# the x and y (y_geo) values to the plotting function
l1, = ax.plot(x, y_geo, ls='-', c='orangered')

#our numpy results are going to be plotted on top of geo_sin() as a dotted line
l2, = ax.plot(x, y_npy, ls=':', c='black')

#let's add some lines to indicate where integer values of pi lie
#these next three lines get the automatically assigned y-value limits to the plot
# and expand the limits slightly to fit our new labels on
yt = ax.get_ylim()[1]
yb = ax.get_ylim()[0]
ax.set_ylim(yb*1.04, yt*1.04)
for i in range(-3,4,1):
    #plot the vertical lines
    ax.axvline(i*np.pi, ls=':', lw=1, c='gray', zorder=-1)
    yt = ax.get_ylim()[1]
    yb = ax.get_ylim()[0]
    #plot the pi value
    ax.text(i*np.pi, yb+0.02*(yt-yb), str(abs(i))+' $\pi$', ha='center')

#finally, we will add some labels
ax.set_title('I think geo_sin() did ok!');
ax.set_xlabel('Radians')

ax.legend((l1, l2), ('geo_sin()', 'np.sin()'));

In [None]:
#But, to really test we should look at the difference
fg, ax = plt.subplots(1)

#add a horizontal line at 0, i.e., the value for when geo_sin() = np.sin()
# let's hope we are close to this!
ax.axhline(0, c='black')

#plot the difference between the two functions
l1, = ax.plot(x, y_npy-y_geo, ls='-', c='orangered')

#let's add some lines to indicate where pi is
yt = ax.get_ylim()[1]
yb = ax.get_ylim()[0]
ax.set_ylim(yb*1.04, yt*1.04)
for i in range(-3,4,1):
    ax.axvline(i*np.pi, ls=':', lw=1, c='gray', zorder=-1)
    yt = ax.get_ylim()[1]
    yb = ax.get_ylim()[0]
    ax.text(i*np.pi, yb+0.02*(yt-yb), str(abs(i))+' $\pi$', ha='center')
    
#finally, we will add some labels
ax.set_title('Oh dear... let\'s stick to NumPy');
ax.set_xlabel('Radians');

# Independent coding
***

<div class=obj>
    <b>Aim:</b> To write a function to calculate the distance between two points on a sphere.
</div>

<p></p>

This is a classic problem, needing to know the distance between two locations, and comes up in all areas of geology.  It can be solved a variety of ways, and requires fair complexity to do so with accuracy.  

Let's use a simple approach, the haversine formula, given by

\begin{equation}
d = 2r\arcsin \Bigg( \sqrt{\sin^2\bigg(\frac{\phi_2-\phi_1}{2}\bigg) + \cos{(\phi_1)}\cos{(\phi_2)}\sin^2\bigg(\frac{\lambda_2-\lambda_1}{2}\bigg)}  \Bigg),
\end{equation}

where $d$ is distance, $r$ is the radius of the sphere (Earth), $\phi_1$, $\phi_2$ are the latitudes of points one and two, $\lambda_1$, $\lambda_2$ are the longitudes of the respective points.

Now, use this formula to calculate whether spending your entire term in the Granchester Tea Rooms ($\phi=52.177364$, $\lambda=0.096534$), would place you outside 4.8$\,$km from Great St Mary's ($\phi=52.205301$, $\lambda=0.118262$), and thereby in brazen __[defiance of University Ordinances](https://www.cambridgestudents.cam.ac.uk/new-students/manage-your-student-information/personal-information/residing-outside-universitys)__.

You should:
1. Create a new notebook, called `Exercise1_solution`.
1. Import the libraries you think you will need.
1. Define the variables relevant for the problem.
1. Write your function for the haversine formula so it can accept arbitrary lat. and long.
1. Run your programme, passing the coordinates of the tea rooms and Great St Mary's.

_Hint:_ does `np.sin()` assume degree or radian input?  If you are unsure look __[here](https://docs.scipy.org/doc/numpy/reference/generated/numpy.sin.html)__ or put some values into it and see what you get out.

### If you want to challenge yourself...

- Make your function accept multiple lat. lon. points at once and calculate the distance between the pairs, or between a single point and multiple alternate locations.
- Write the degrees to radian conversion as a separate function and call that function from within your haversine function.
- Nicely format the output, reducing the number of decmial places provided when you print the resulting distance (_Hint_: search for 'python .format' to see how this can be done).
- Write an `if` statement to tell the user whether they have strayed too far from Great St Marys.

