
# Notebook 1: Basics of programming in Python

*Developed by Johannes Haas and Raoul Collenteur, Institute of Earth Sciences, NAWI Graz Geocenter, University of Graz, 2019*

*Parts based on [Exploratory computing with Python by Mark Bakker](http://mbakker7.github.io/exploratory_computing_with_python/)*



## This lecture will introduce some basic principles

- jupyter notebooks
- importing packages
- Variables
- data types
- reading errors
- plotting


### jupyter notebooks

If you can read this, you have followed the instructions on how to open a jupyter notebook.

Now double click (or click + enter) into this textbox. Can you figure out how to add another textbox below, for your comments on this lecture?



#### What is a jupyter notebook?

https://jupyter.org/:

*"The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text."*

- formerly known as ipython notebooks (see filename)
- used to be exclusively for python, now also available for many more languages
- JSON document

Potential revolution for scientific publishing, or just quite useful for teaching?



# 1.Packages

You can do everything with pure python, but a lot of things do require a lot of work. For example, try to calculate the square root of 2. Easiest way you maybe remember from school:
$1 * 1 = 1; 2 * 2 = 4$
OK, it's not those two. Maybe $1.5 * 1.5$? Nope, that's 2.25. So, it must be smaller than 1.5. Maybe 1.3? Nope...


In [1]:
1.4 * 1.4

1.9599999999999997

What you probably want, is something like your calculator, where you simply press $\sqrt 2$ and are presented with 1.414213... 
Luckily, such a thing already exists with `np.sqrt(2)`.

In [2]:
np.sqrt(2)

NameError: name 'np' is not defined

What happened here?
We are asking to python to run a function it does not know!
We need to import it first.

In [None]:
import numpy as np

You could also give it another name, e.g. `import numpy as calc` or just import that specific function, e.g. `from numpy import sqrt as wurzel`, but unless you know what you are doing and have a good reason for that, you shouldn't!
Most well known packages have a standard way they are imported, e.g. `import pandas as pd`.

## What's a package?

In the most simple terms, a package is a collection of functions. 
These can be either python functions or written in another language.
Generally, they tend to be packed together under a common theme or use case.
E.g. [NumPy](https://docs.scipy.org/doc/numpy/about.html) is a package with lots of functions for numerical operations (done in C) which is now the de-facto standard for much of scientific computation.
Numpy is now also a part of [ScyPy](https://scipy.org/about.html) which has a few more functions for scientific computation.
[Pandas](http://pandas.pydata.org/) is another widely used package which makes working with data frames much easier.
And [Matplotlib](http://matplotlib.org/) is generally used to make a figure of whatever the output of the above.

And then there's about 172 536 more packages available on https://pypi.org/ for about any use case you can think of.
So most of the issues you might face ar already solved.
But: **Think before you install!**

## Installing packages

Anaconda (what we are using here) comes with the most common packages installed already.
For things it does not yet have, there's an inbuilt package manager. 
You can search or install with `conda search PACKAGENAME` or `conda install PACKAGENAME`.
If it's not on conda, pip will probably have it, with the same syntax.
And if it's not on pip, the project will have an installation on how to install it from source.

## Importing packages



In [None]:
import matplotlib.pyplot as plt 

Python will throw an error if you mistype something (e.g. `imp0rt`) or try to import a module that does not exist, but it will not care if you use a weird name!

If you did the imports correctly, running the two cells below should give you some results:

In [None]:
plt.plot([0,1,2,3],[4,5,6,4])
plt.show()

# 2.Variables and datatypes



In [None]:
a = 1 
b = 2.9 
c = 'Hello'
d = [1,2]
e = {'var1':0.111, 'var2':0.222, 'var3':0.333}

What can we do with that? What are these?

### Find out the data type
It is often convenient to know what data type we are working with. For this, you can use the `type` command which returns the data type of an object. As follows:

In [None]:
print("the data type of a is ", type(a))
print("the data type of b is ", type(b))
print("the data type of c is ", type(c))
print("the data type of d is ", type(d))
print("the data type of e is ", type(e))

## Most common variable types:

- integer
- float
- string
- list
- dict
- dataframe
- ndarray

What do these types mean?

The type defines what you can do with a variable, i.e. what you can store in it and what operations you can do.
Float and integer can be easily switched back and forth and you can easily turn a number into a string:

In [None]:
a_f = float(a)
f_i = int(b)

In [None]:
print(a_f, a)
print (b, f_i)

What's the problem with that?

In [None]:
r_test = round(b)
print(r_test)

## Naming variables

Above we have set some variables, using single letters, as you might be used to from highschool maths.
In many cases, this makes sense, be it for some counts or numbers, where `n` and `i` are popular variable names, or for obvious things like the Conductivity $K_f$, which tends to be set as `Kf` or `K_f`, using standard LaTex notation.

But in many cases, it makes sense to be more verbose, to make it clear what a variable stands for.
Variable names can essentially be as long as you like them to be, so having a variable like

In [None]:
MyStringVariableForTheFirstLetterOfTheAlphabetSetInMyFirstPythonExercise = 'a'

is totally possible (but maybe a bit over the top).
The only constraints relevant for our use case are that you can not start a variable with a number (e.g. `7variable = 7` will not work), can not use reserved words in the python language (e.g. `for = 7`, since `for` is a fixed python command), can not contain spaces and that variable names are case sensitive (e.g. `Var1` and `var1` are different variables).

As a rule of thumb, make variables as long as they need to be, but keep them as short as possible.
If you want to set a variable that contains two words, e.g. hydraulic conductivity or first name, you have to remember that you can't use space.
For those cases, people tend to either use and underline `_` or *camel case* to keep the distinction between the two words, like `hydraulic_conductivity`, `first_name` or `HydraulicConductivity` and `FirstName` respectively.


# 3.Calculations and printing of results

So far, we did some very easy operations and got the result immidiately from running the cell.
Suppose you are not using a jupyter notebook or ipython, but you want to write `mymegacalculator.py` and run it in terminal/cmd.
Saving the lines `import numpy as np` `np.sqrt(2)` as a `.py` file and running it, will not result in any output.

How do we get our results?

In [None]:
print?

*Remember, print in python 2 works a bit different than in 3!*

Write a line that prints `The square root of 2 is 1.414`

In [None]:
print('the square root of 2 is', np.sqrt(2))

You can also use print for longer "sentences" by using the newline character `\n` at the end of a string.
Write a statement that prints

    Earlier we have set a = 1,
    b as 2.1 and c  as Hallo.

In [None]:
print('Earlier we have set a =', a,'\n','and b as', b, 'and c as', c, '.')

## Mathematical operations

Take a moment to look at the [numpy documentation](https://www.numpy.org/devdocs/reference/routines.html).
Which of those topics seem useful and familiar to you?


### Exercise 1

Use the documentation to do the following exercises:

 - Calculate $sin$ and $tan 3$
 - Print $\pi$
 - Print $\pi$, but only with 2 decimals
 - calculate the sum and the product of `fib = [1, 1, 2, 3, 5, 8, 13, 21, 34]`
 - find the minimum and maximum of `fib`
 - Add three more elements to fib
 - multiply `fib` by 2

Now do the same, but for `fib2 = np.array([1, 1, 2, 3, 5, 8, 13, 21, 34])`.
What has changed?

# 4.User input

Python can not only output results, but it also can receive user input. If we really want to make our  `mymegacalculator.py` app useable, it would be much better if it would ask us for the number we want to do something with, instead of just being able to output the result of `np.sqrt(2)`.

So similar to `print()` we can use `input()` to ask for user input.

In [None]:
intest = input('write something!')

In [None]:
intest

### Exercise 2

Now use this to write a function that asks the user for a number, and prints out the square root of that number.
However, there is a pitfall with `input()`.
Play around with your test and what we discussed above for types to figure it out.

As with most things, there is not single correct way. Your little script is likely different from my example here. With a short thing like this, it does not really matter, but for bigger projects, there are of course many *wrong* ways that will add up. So if you are going to write something bigger, please remember the intro from the first lecture!

# 5.Reading errors

Unless you always write perfect code on the first try, you will get errors thrown at you.
Luckily, python tries to be helpful with them.
Lets analyse a few errors we already have received in our short python career:

    np.sqrt(2)
    
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    <ipython-input-3-bbf78ff053fc> in <module>()
    ----> 1 np.sqrt(2)
    
    NameError: name 'np' is not defined



In [None]:
import numpi as np

In [None]:
b = 2.1
c = 'Hallo'
d = c/b

In [None]:
d = float(c)

In [None]:
np.sqrt(c)

In [None]:
j = 2
k = 3
print 'Hello'

There's two types of errors, exceptions and syntax errors, and they are always structured quite similarly.
You get the type, a traceback and a verbose description of the type.
With larger programms, this can easily seem quite overwhelming, but once you understand the structure, it becomes quite readable.
For the full, official explanation, please look [here](https://docs.python.org/3/tutorial/errors.html) and [here](https://docs.python.org/3/library/exceptions.html#bltin-exceptions).
In short, an exception "works" like this:

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)

This part marks the beginning of the error message and gives you a first info about the type of error and tells you that it now shows a traceback. 
The [TypeError](https://docs.python.org/3/library/exceptions.html#TypeError) is *\[an\] exception \[that\] may be raised by user code to indicate that an attempted operation on an object is not supported, and is not meant to be.*
So apparently, we did try to do something we are not supposed to.
The following traceback tells us what exactly we did do wrong:

    <ipython-input-50-90b9c40f605f> in <module>
           1 b = 2.1
           2 c = 'Hallo'
     ----> 3 d = c/b
     
In case of a long statement, it shows the lines preceding the error, with line numbers in front and an arrow pointing to the offending line.
For this small test case, this seems very obivous, but with larger programs, it is very helpful to know what line you need to look at.
After this small code snippet, we get some additional information about the error:

    TypeError: unsupported operand type(s) for /: 'str' and 'float'
    
Again, it tells us that we are suffering from a type error (error messages can be quite long, and it almost always is the bottom that's the most important, so it makes sense to repeat it!) and gives us some more information about it.
Our TypeError in this case is quite easy. We tried to divide a string by a float, which obviously isn't going to work.
So now we have to figure out what to do with this error. 
Was it a small typo and we fat-fingered `c` instead of `x`, which we might have assigned to `x = 32141` at some point?
Or do we want to split `Hallo` in two parts, so that we would get `d = ['Hal','lo']`?

For the first, it's a really easy fix.
For the second, we will figure out how to to that in the **Working with text** section.

A syntax error works a bit different:

      File "<ipython-input-58-fbd027aad335>", line 3
        print 'Hello'
                    ^
    SyntaxError: Missing parentheses in call to 'print'. Did you mean print('Hello')?

But again, we get all the information we need to fix it.
We do get the line number of the error, and we get an arrow pointing at the offender.
However, you have to know that the error points at the statement that could not be executed, so the actual error occured before it.
So in this example, `'Hello'` is fine, the problem lies in `print`.
So a further info and the type of error is given in the last line of the error message, which is this special case (remember python 2 vs 3) is completely self explaining.


### Exercise 3

Fix the errors in the little script below.
You should be able to solve all the issues with what you have learned so far.
Remember, you can add a new code box below the following box, to try out parts, or to get the `type()` info for some variables.

In [None]:
print'This is a little error exercise', \n 'Please correct the errors!')
err_count = input('How many errors did you have to fix?')
peop_count = input('How many people are in this course?')
err_total = err_count * peop_count
err_price = 1,2
err_sum = err_total * err_price
prinf('There's been a total of', err_total, 'errors fixed,')
print('if each error was worth', err_price, 'Euros, that would make', err_sum, 'Euros')

# 6.Working with text

So far, we have done some (simple) calculations and used text only to give instructions for input and to explain our output.
But python can do much more with text.

## Comments

You can not only use text as strings that get shown to the user, you can also use it as a comment, that only is visible inside the source of your program.
So a comment is not unlike what we are doing here in the jupyter notebook, providing explanations inbetween code.
You can start a comment with the `#` character like so:


In [None]:
# This is a comment.
# You can write various notes after the # character and then continue with a code line
var1 = 'this is a string' 
# This comment and the lines before 'var1 = 'this is a string'' will get ignored by
# the interpreter, even if they contain code, as above
var2 = 'another string' # you can also write comments after some code, but another # will get ignored

It is generally considered good practice to use comments to explain your code.
Some people argue that good code does not need comments, because code is always obvious, but for our purpose they tend to be very useful.
As a rule of thumb, use them to give some short summary at the beginning of your program, and briefly explain things that are not obvious.
Of course, what's obvious and what is not, will change with time, but even now, you probably don't need to do something like `a = 2 # This sets a as 2. a is now a variable and 2 is a number`.
For example, we could have commented our little root calculator above like this:

    # Short calculator that asks the user for a number and prints the result
    datain = float(input('what number do you want to root?')) # input will only provide strings, so keep in mind to turn it into a float!
    dataout = np.sqrt(datain)
    print('the square root of', datain, 'is', dataout) 

## Splitting strings

The most useful use case for us, is probably splitting strings.
Lets assume we are working with data from a datalogger that we imported into python, but it's in an annoying string form:

In [None]:
loggerdata = 'Muehlbach, waterlevel, 15.03.2019, 1.2, 1.2, 1.2, 1.3, 1.4, 1.5, 1.5, 1.5, 1.6, 1.4, 1.3, 1.3'

Now for us, as humans, that is very readable, and we can grasp what kind of data this is likely to be:
Some bi-hourly waterlevels, from a creek called Mühlbach, taken on the 15th March of 2019.
For python however, it's just a string with a length of 93 characters.

**How do we get this into a form, so that we can calculate, say the average waterlevel for that day?**

As with most things, there are many ways to get to what you want.
The following is just one example of many.
The most straightforward one would probably be to use a loop, but we'll get to loops later.

We could do it in a really stupit way, by splitting the string by hand, like so:

In [None]:
location = loggerdata[0:9]
datatype = loggerdata[11:21]
value1 = loggerdata[35:38]
value2 = loggerdata[40:43] # and so on for value2 to value12

You can probably see why this is not exactly the best approach.
So we need something that can do this for us, which in this case is the `split` function.
Use the help function to find out how to use it, and split `loggerdata` into the variable `logparts`.

In [None]:
logparts = loggerdata.split(',')

Now we access the parts of our log much easier.
Using a similar way than above, we can now adress the parts of our list, like

In [None]:
location = logparts[0]
datatype = logparts[1]
value1 = logparts[3]
value2 = logparts[4] # and so on for value2 to value12

But again, this is a lot of typing work. And there's a problem with our values. 
Whats the problem and how can you fix it?

In [None]:
type(value1)
value1 = float(value1)
value2 = float(value2)

In [None]:
value1

### Exercise 4

What would you need to do now, to be able to calculate the average waterlevel?

This of course can also be made much easier, so that this is solved with one line.
However, we get to one of the things needed in the next lecture, so for now, we use

In [None]:
waterlevels = np.array([1.2, 1.2, 1.2, 1.3, 1.4, 1.5, 1.5, 1.5, 1.6, 1.4, 1.3, 1.3])

In [None]:
waterlevels.mean()

# 7.Plotting

Besides calculating averages and other simple operations, plotting data is often one of the first and most important steps when working with data.

Plotting is not part of standard Python, but there's a package for it (in fact, that are many plotting packages). The graphics package we use is called `matplotlib`. To be able to use the plotting functions in `matplotlib` we have to import it.
We have already done it above, but just to repeat it:
 


In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

Again, you could import it with another name, like `import matplotlib.pyplot as plotter`, but for well known packages it is best to stick to the standard way.
`%matplotlib inline` is a so called *magic* command, for jupyter (or ipython), that affects the layout or behaviour of jupyter.
In this case, it makes sure that we can plot figures inside the notebook.
The most simple way to plot something, would be to type something like `plt.plot(x, y)` where `x` and `y` are arrays for our x and y values.
If you leave out information on x, the given array gets used as y, and the x axis shows a count for however many entries y has. 
So the most simple plot is something like

In [None]:
plt.plot([1, 2, 4, 2])

But lets assume this is not for data from 0 to 3, but for data in an 0.1 interval.
Please plot the data with such an x axis:

In [None]:
plt.plot([0, 0.1, 0.2, 0.3],[1, 2, 4, 2])

Tiping out the data by hand however is generally a lot of work.
In most cases, we either will plot data we imported from somewhere, or data we calculated with a function.
Can you plot the waterlevel data from our string splitting exercise above?

In [None]:
plt.plot(waterlevels)

Besides plotting data you already have stored in an array, plotting data that you calculate on the fly is another common use case.
Let's say we want to plot the function $f(x) = x^2$ from -5 to +5.
So essentially, we want to do something like `plt.plot(x,x**2)` (`**2` is the operator for $^2$).
Again, for us this seems quite straightforward, but for python, we must offer some more explanation, most of all, we need to provide x.
One way to define x, would be with `np.arange`:

In [None]:
x = np.arange(-5, 5, 0.1)

`np.arange` is quite straightforward, apart from one pitfall, that the end value is not included!
So if we want to really count to 5, we have to provide 5.1 as the stop value, which again, is possible in different ways:

In [None]:
x = np.arange(-5, 5.1, 0.1)
x = np.arange(-5, 5 + 0.1, 0.1)
start = -5
stop = 5
step = 0.1
x = np.arange(start, stop + step, step)

or in various mixed forms.
Again, is onto you to decide which way you wanna do it.
If it's just some quick plot, the shorter the better, but if it is used in some larger program where you might want to change the parameters, the longer approach might be useful.

Anyways, we have our x, so we just need to define our y and plot it:

In [None]:
y = x**2
plt.plot(x,y)

### Exercise 5

Now use what you just learned and plot the function $f(x) =  x^4 + \frac{x}{\pi}$ for all x between -10 and 10, with an interval of 0.01 with a red, dotted line.

Now this looks quite nice, but it is missing some important parts, the labels!
Redo the figure with `x` and `ylabel` and a `title`.

As we have already learned, we can set the line to the color red by passing `'r'` to the plot command.
This short way originates from *matlab*, since it is (was) the main goal of matplotlib to provide matlab style plotting for python.
So you can use simple colors this way, like `r`ed, `b`lue, `g`reen, `c`yan or blac`k`.

For basic plotting this often is enough, but there's various reasons to use other colors than just a few primary ones.
Luckily, there's many ways to pass some colors to matplotlib (see [here](http://matplotlib.org/examples/pylab_examples/color_demo.html). 
One of the most useful way is to use the html color names, shown [here](http://en.wikipedia.org/wiki/Web_colors) or in most graphics software.
If you prefer a more visual description, instead of `#00FF00` you can also use the xkcd names, which need to be prefaced by `xkcd:`. 
The xkcd list of color names is given  [here](https://xkcd.com/color/rgb/) and has some quite fitting descriptions.

Take a moment to find your favorite (or most funny/ridiculous) color and plot your figure with the lines color set to it, with a thickness of `50` and plot $x^3$ on top of it in *LightSalmon* with a dash-dot line.


There's much more to a "good" plot than just picking a color that you think is nice.
The most obvious thing, would be making sure that a color blind person can still see what's going on on your plot.

A good starting point is the [seaborn library](https://seaborn.pydata.org/index.html), which, you probably already guessed it, can be imported into python.

In [None]:
import seaborn as sns
sns.set()

Take a look at the galleries of [matplotlib](https://matplotlib.org/gallery.html) and [seaborn](https://seaborn.pydata.org/examples/index.html) and pick a plot you like.
Run it in this notebook and try to understand what it does.
Maybe try to change some colors or labels.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="whitegrid")

# Load the example iris dataset
diamonds = sns.load_dataset("diamonds")

# Draw a scatter plot while assigning point colors and sizes to different
# variables in the dataset
f, ax = plt.subplots(figsize=(6.5, 6.5))
sns.despine(f, left=True, bottom=True)
clarity_ranking = ["I1", "SI2", "SI1", "VS2", "VS1", "VVS2", "VVS1", "IF"]
sns.scatterplot(x="carat", y="price",
                hue="clarity", size="depth",
                palette="ch:r=-.2,d=.3_r",
                hue_order=clarity_ranking,
                sizes=(1, 8), linewidth=0,
                data=diamonds, ax=ax)

### Exercise 6: Plotting a sine wave
In this exercise we are going to plot one sine wave with the following formula:

$y(x) = A * sin(x) $

1. Create a variable `A` and give it a value (whatever number you like);
2. Create an array named `x` with float between 0 and $2\pi$ (`np.pi`) with steps of 0.1 using the numpy method `np.arange`;
3. Calculate the sine wave (use `np.sin`)for the values of x and store it in the variable `y`;
4. Plot the values of x and y, and give the plot a title, x-label and ylabel and a legend.

### Exercise 7. Print your name, age and favorite number
Print the following sentence:

*My name is XX. I am XX years old and my favorite number is: XX*

Create variables for the XX's, give them logical names, and use them in your print statement.

### Exercise 8. Find the intersection of two lines
In this exercise we will plot two linear functions and find the intersection. Consider the following two formula's:

$y_1(x) = 2x + 2$

$y_2(x) = -1.5x + 15$

Make a plot of these two functions by performing the following steps:

1. Create an array with values between 0 and 10 and name the variable `x`;
2. Calculate the values for $y_1$ and $y_2$;
3. Plot both lines and give the plot a title, xlabel, ylabel and legend.


To check your work calculate the intersection of the lines manually and plot a red point of your analytical solution.