# Short course on programming in python

# A. Variables

Variables are used to store and modify values.

In [1]:
a = 5
b = a + 3.1415
c = a / b

print(a, b, c)

5 8.1415 0.6141374439599582


Note, we did not need to declare variable types (like in fortran), we could just assign anything to a variable and it works. This is the power of an interpreted (as opposed to compiled) language. Also, we can add different types (`a` is an integer, and we add the float 3.1415 to get `b`). The result is 'upcast' to whatever data type can handle the result. I.e., adding a float and an int results in a float.

Variables can store lots of different kinds of data

In [2]:
s = 'Ice cream'            # A string
f = [1, 2, 3, 4]           # A list
d = 3.1415928              # A floating point number
i = 5                      # An integer
b = True                   # A boolean value

*Side note*: Anything followed by a `#` is a comment, and is not considered part of the code. Comments are useful for explaining what a bit of code does. ___USE COMMENTS___

You can see what `type` a variable has by using the `type` function, like

In [3]:
type(s)

str

---
### *Exercise*

> Use `type` to see the types of the other variables

---

You can test to see if a variable is a particular type by using the `isinstance(var, type)` function.

In [4]:
isinstance(s, str)  # is s a string?

True

In [5]:
isinstance(s, int)  # is s an integer?

False

# B. Tests for equality and inequality

We can test the values of variables using different operators. These tests return a `Boolean` value. Either `True` or `False`. `False` is the same as zero, `True` is nonzero. Note that assignment `=` is different than a test of equality `==`.

In [6]:
a < 99

True

In [7]:
b > 99

False

In [8]:
a == 5

True

There are other things that can be tested, not just mathematical equalities. For example, to test if an element is inside of a list or string (or any sequence, more on sequences below..), do

In [9]:
5 in [1, 2, 3, 4, 5 ,6]

True

In [10]:
'this' in 'What is this?'

True

In [11]:
'that' in 'What is this?'

False

# C. Strings

Strings are made using various kinds of (matching) quotes. Examples:

In [12]:
s1 = 'hello'
s2 = "world"
s3 = '''Strings can 
also go over
multiple lines.'''

You can also 'add' strings using 'operator overloading', meaning that the plus sign can take on different meanings depending on the data types of the variables you are using it on.

In [13]:
print( s1 + ' ' + s2)  # note, we need the space otherwise we would get 'helloworld'

hello world


We can include special characters in strings. For example `\n` gives a newline, `\t` a tab, etc. Notice that the multiple line string above (`s3`) is converted to a single quote string with the newlines 'escaped' out with `\n`.

In [14]:
s3

'Strings can \nalso go over\nmultiple lines.'

Strings are 'objects' in that they have 'methods'. Methods are functions that act on the particular instance of a string object. You can access the methods by putting a dot after the variable name and then the method name with parentheses (and any arguments to the method within the parentheses). Methods always have to have parentheses, even if they are empty.

In [15]:
s3.capitalize()

'Strings can \nalso go over\nmultiple lines.'

One of the most useful string methods is 'split' that returns a list of the words in a string, with all of the whitespace (actual spaces, newlines, and tabs) removed. More on lists next.

In [16]:
s3.split()

['Strings', 'can', 'also', 'go', 'over', 'multiple', 'lines.']

Another common thing that is done with strings is the `join` method. It can be used to join a sequence of strings given a common conjunction

In [17]:
words = s3.split()
'_'.join(words)        # Here, we are using a method directly on the string '_' itself.

'Strings_can_also_go_over_multiple_lines.'

# D. Containers

Often you need lists or sequences of different values (e.g., a timeseries of temperature – a list of values representing the temperature on sequential days). There are three containers in the core python language. There are a few more specialized containers (e.g., numpy arrays and pandas dataframes) for use in scientific computing that we will learn much more about later; they are very similar to the containers we will learn about here.

## Lists

Lists are perhaps the most common container type. They are used for sequential data. Create them with square brackets with comma separated values within:

In [18]:
foo = [1., 2., 3, 'four', 'five', [6., 7., 8], 'nine']
type(foo)

list

Note that lists (unlike arrays, as we will later learn) can be heterogeneous. That is, the elements in the list don't have to have the same kind of data type. Here we have a list with floats, ints, strings, and even another (nested) list!

We can retrieve the individual elements of a list by 'indexing' the list. We do this with square brackets, using zero-based indexes – that is `0` is the first element – as such:

In [19]:
foo[0]

1.0

In [20]:
foo[5]

[6.0, 7.0, 8]

In [21]:
foo[5][1]  # Python is sequential, we can access an element within an element using sequential indexing.

7.0

In [22]:
foo[-1]    # This is the way to access the last element.

'nine'

In [23]:
foo[-3]    # ...and the third to last element

'five'

In [24]:
foo[-3][2]   # we can also index strings.

'v'

We can get a sub-sequence from the list by giving a range of the data to extract. This is done by using the format

    start:stop:stride

where `start` is the first element, up to but not including the element indexed by `stop`, taking every `stride` elements. The defaluts are start at the begining, include through the end, and include every element. 

The up-to-but-not-including part is confusing to first time Python users, but makes sense given the zero-based indexing. For example, `foo[:10]` gives the first ten elements of a sequence.

In [25]:
# create a sequence of 10 elements, starting with zero, up to but not including 10.
bar = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [26]:
bar[2:5]

[2, 3, 4]

In [27]:
bar[:4]

[0, 1, 2, 3]

In [28]:
bar[4:]

[4, 5, 6, 7, 8, 9]

In [29]:
bar[::2]

[0, 2, 4, 6, 8]

---
###  *Exercise*

> Use the list

    bar = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    
> use indexing to get the following sequences:
    
    
    [3, 4, 5]
    
    [9]        # note this is differet than just the last element. 
               # It is a sequence with only one element, but still a sequence
    
    [2, 5, 8]

> What happens when you exceed the limits of the list?

    bar[99]
    bar[-99]
    bar[5:99]

---

You can assign values to list elements by puting the indexed list on the right side of the assignment, as

In [30]:
bar[5] = -99
bar

[0, 1, 2, 3, 4, -99, 6, 7, 8, 9]

This works for sequences as well,

In [31]:
bar[2:7] = [1, 1, 1, 1, 1]
bar

[0, 1, 1, 1, 1, 1, 1, 7, 8, 9]

Lists are also 'objects'; they also have 'methods'. Methods are functions that are designed to be applied to the data contained in the list. You can access them by putting a dot and the method name after the variable (called an 'object instance')

In [32]:
bar.sort()    # Note that we don't do 'bar = bar.sort()'. The sorting is done in place.

---
### *Exercise*

> What other methods are there? In iPython or a jupyter window, type `bar.` and then `<TAB>`. This will show the possible complitions, which in this case is a list of the methods and attributes. You can get help on a method by typing, for example, `bar.pop?`.  The text in the help file is called a `docstring`; as we will see below, you can write these for your own funcions.

> See if you can use these four methods of the list instance `bar`:

            1. append
            2. pop
            3. index
            4. count


---

## Tuples

Tuples (pronounced `too'-puls`) are sequences that can't be modified, and don't have methods. Thus, they are designed to be imutable sequences. They are created like lists, but with paretheses instead of square brackets.

In [33]:
foo = (3, 5, 7, 9)
# foo[2] = -999  # gives an assignment error. Commented so that all cells run.

Tuples are often used when a function has multiple outputs, or as a lightweight storage container. Becuase of this, you don't need to put the parenthases around them, and can assign multiple values at a time.

In [34]:
a, b, c = 1, 2, 3   # Equivalent to '(a, b, c) = (1, 2, 3)'

## Dictionaries

Dictionaries are used for unordered sequences that are referenced by arbitrary 'keys' instead of by a (sequential) index. Dictionaries are created using curly braces with keys and values separated by a colon, and key:value pairs separated by comas, as

In [35]:
foobar = {'a':3, 'b':4, 'c':5}

Elements are referenced and assigned by keys:

In [36]:
foobar['a']

3

In [37]:
foobar['c'] = -99
foobar

{'a': 3, 'b': 4, 'c': -99}

The keys and values can be extracted as lists using methods of the dictionary class.

In [38]:
foobar.keys()

dict_keys(['b', 'c', 'a'])

In [39]:
foobar.values()

dict_values([4, -99, 3])

New values can be assigned simply by assigning a value to a key that does not exist yet

In [40]:
foobar['spam'] = 'eggs'
foobar

{'a': 3, 'b': 4, 'c': -99, 'spam': 'eggs'}

---
### *Exercise*

> Use a dictioary to create a list-like object that has negative indices, with the indices ranging from -3 to 3 (with arbitrary floating point values)

> Explore the methods of the dictionary object, as was done with the list instance in the previous exercise.


---

You can make an empty dictionary or list by using the `dict` and `list` functions respectively.

# E. Loops

### For loops

Loops are one of the fundamental structures in programing. Loops allow you to iterate over each element in a sequence, one at a time, and do something with those elements.

*Loop syntax*: Loops have a very particular syntax in Python; this syntax is one of the most notable features to Python newcommers. The format looks like

    for *element* in *sequence*:                # NOTE the colon at the end
        <some code that uses the *element*>     # the block of code that is looped over for each element
        <more code that uses the *element*>     # is indented four spaces (yes four! yes spaces!)
    
    <the code after the loop continues>         # the end of the loop is marked simply by unindented code
    
Thus, indentation is significant to the code. This was done because good coding practice (in almost all languages, C, FORTRAN, MATLAB) typically indents loops, functions, etc. Having indentation be significant saves the end of loop syntax for more compact code.

*Some important notes on indentation*  Indentation in python is typically *4 spaces*. Most programming text editors will be smart about indentation, and will also convert TABs to four spaces. Jupyter notebooks are smart about indentation, and will do the right thing, i.e., autoindent a line below a line with a trailing colon, and convert TABs to spaces. If you are in another editor remember: ___TABS AND SPACES DO NOT MIX___. See [PEP-8](https://www.python.org/dev/peps/pep-0008/) for more information on the correct formatting of Python code.

A simple example is to find the sum of the squares of the sequence 0 through 99,

In [41]:
sum_of_squares = 0
for n in range(100):              # range yields a sequence of numbers from 0 up to but not including 100
    sum_of_squares += n**2        # the '+=' operator is equivalent to 'sum = sum + n**2', 
                                  # the '**' operator is a power, like '^' in other languages

print(sum_of_squares)

328350


You can iterate over any sequence, and in Python (like MATLAB) it is better to iterate over the sequence you want than to loop over the indices of that sequence.

In [42]:
words = ['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']

sentence = ''
for word in words:
    sentence += word + ' '

sentence

'the quick brown fox jumped over the lazy dog '

Though sometimes you want to iterate over a sequence but you *also* want the indices of those elements. One way to do that is the `enumarate` function:

    enumerate(<sequence>)

This returns a sequence of two element tuples, the first element in each tuple is the index, the second the element. It is commonly used in `for` loops, like

In [43]:
for idx, word in enumerate(words):
    print('The index is', idx, '...')
    print('...and the word is', word)

The index is 0 ...
...and the word is the
The index is 1 ...
...and the word is quick
The index is 2 ...
...and the word is brown
The index is 3 ...
...and the word is fox
The index is 4 ...
...and the word is jumped
The index is 5 ...
...and the word is over
The index is 6 ...
...and the word is the
The index is 7 ...
...and the word is lazy
The index is 8 ...
...and the word is dog


## F. Conditionals

Conditionals have a similar syntax to for statements. Generally, conditionals look like

    if <test>:
        <Code run if...>
        <...test is valad>

or

    if <first test>:
        <Code run if...>
        <...the first test is valad>
    elif <second test>:
        <Code run if...>
        <...the second test is valad>
    else:
        <Code run if...>
        <...neither test is valad>

In both cases the test statements are code segments that return a boolean value, often a test for equality or inequality. The `elif` and `else` statements are always optional; both, either, or none can be included.

In [44]:
x = 5

if x < 10:
    print('x is not bigger than 10')

    
if x < 10:
    print('x is less than 10')
else:
    print('x is more than 10')

x is not bigger than 10
x is less than 10


---
### *Exercise*

> Rerun the code block above using different values for x. What happens if x=10?

> Add an `elif` statement to the second block of code that will print something if x==10.

---

# G. Functions

Functions are ways to create reusable blocks of code that can be run with different variable values – the input variables to the function. Functions are defined using the syntax

    def <function name> (var1, var2, ...):
        <block of code...>
        <...defining the function>
        return <return variable(s)>

Functions can be defined at any point in the code, and called at any subsequent point.

In [45]:
def addfive(x):
    return x+5

addfive(3.1415)

8.1415

## Function inputs and outputs

Functions can have multiple input and output values. The documentation for the function can (and should) be provided as a string at the begining of the function.

In [46]:
def sasos(a, b, c):
    '''return the sum of a, b, and c and the sum of the squares of a, b, and c'''
    res1 = a + b + c
    res2 = a**2 + b**2 + c**2
    return res1, res2

sasos(3, 4, 5)

(12, 50)

Functions can have variables with default values. You can also specify positional variables out of order if they are labeled explicitly.

In [47]:
def powsum(x, y, z, a=1, b=2, c=3):
    return x**a + y**b + z**c

print( powsum(2., 3., 4.) )
print( powsum(2., 3., 4., b=5) )
print( powsum(z=2., x=3., y=4., c=2) )

75.0
309.0
23.0


---
### *Exercise*

> Verify `powsum(z=2., x=3., y=4., c=2)` is the same as `powsum(3., 4., 2., c=2)`

> What happens when you do `powsum(3., 4., 2., x=2)`?  Why?


---

# I. Packages

Functions and classes represnet code that is intended to be reused over and over. Packages are a way to store and manage this code. Python has a number of 'built-in' classes and functions that we have discussed above. List, tuples and dictionaries; `for` and `while` loops; and standard data types are part of every python session.

There is also a very wide range of packages that you can import that extend the abilities core Python. There are packages that deal with file input and output, internet communication, numerical processing, etc. One of the nice features about Python is that you only import the packages you need, so that the memory footprint of your code remains lean. Also, there are ways to import code that keep your 'namespace' organized.

> Namespaces are one honking great idea -- let's do more of those!

In the same way directories keep your files organized on your computer, namespaces organize your Python environment. There are a number of ways to import packages, for example.

In [48]:
import math     # This imports the math function. Here 'math' is like a subdirectory 
                # in your namespace that holds all of the math functions

---
### *Exercise*

> After importing the math package, type `math.` and hit <TAB> to see all the possible completions. These are the functions available in the math package. Use the math package to calculate the square root of 2.

> There are a number of other ways to import things from the math package. Experiment with these commands

    from math import tanh  # Import just the `tanh` function. Called as `tanh(x)`
    import math.sin        # Import just the `sin` function. Calls as `math.sin(x)`
    import math as m       # Import the math package, but rename it to `m`. Functions called like `m.sin(x)`
    from math import *     # All the functions imported to top level namespace. Functions called like `sin(x)`
    
> This last example makes things easier to use, but is frowned on as it is less clear where different functions come from.

> For the rest of the 'Zen of Python' type `import this`

---

One particular package that is central to scientific Python is the `numpy` package (*Num*erical *Py*thon). We will talk about this package much more in the future, but will outline a few things about the package now. The standard way to import this package is

In [49]:
import numpy as np

The `numpy` package has the same math functions as the `math` package, but these functions are designed to work with numpy arrays. Arrays are the backbone of the `numpy` package. For now, just think of them as homogeneous, multidimensional lists.

In [50]:
a = np.array([[1., 2., 3], [4., 5., 6.]])
a

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

In [51]:
np.sin(a)

array([[ 0.84147098,  0.90929743,  0.14112001],
       [-0.7568025 , -0.95892427, -0.2794155 ]])

Note that we can have two `sin` functions at the same time, one from the `math` package and one from the `numpy` package. This is one of the advantages of namespaces.

In [52]:
math.sin(2.0) == np.sin(2.0)

True

One commonly used package is the `datetime` package. We can pull out the two functions we would like to use, `datetime.datetime` and `datetime.timedelta` by the command

In [53]:
from datetime import datetime, timedelta

Here is an example of creating a sequence of datetime objects using a list comprehension

In [54]:
do = datetime(1970, 1, 1)     # The 'reference' data, Jan 1, 1970
dt = timedelta(hours=3)       # The increment between datetime objects in the sequence

dates = [do+n*dt for n in range(1000)]
dates

[datetime.datetime(1970, 1, 1, 0, 0),
 datetime.datetime(1970, 1, 1, 3, 0),
 datetime.datetime(1970, 1, 1, 6, 0),
 datetime.datetime(1970, 1, 1, 9, 0),
 datetime.datetime(1970, 1, 1, 12, 0),
 datetime.datetime(1970, 1, 1, 15, 0),
 datetime.datetime(1970, 1, 1, 18, 0),
 datetime.datetime(1970, 1, 1, 21, 0),
 datetime.datetime(1970, 1, 2, 0, 0),
 datetime.datetime(1970, 1, 2, 3, 0),
 datetime.datetime(1970, 1, 2, 6, 0),
 datetime.datetime(1970, 1, 2, 9, 0),
 datetime.datetime(1970, 1, 2, 12, 0),
 datetime.datetime(1970, 1, 2, 15, 0),
 datetime.datetime(1970, 1, 2, 18, 0),
 datetime.datetime(1970, 1, 2, 21, 0),
 datetime.datetime(1970, 1, 3, 0, 0),
 datetime.datetime(1970, 1, 3, 3, 0),
 datetime.datetime(1970, 1, 3, 6, 0),
 datetime.datetime(1970, 1, 3, 9, 0),
 datetime.datetime(1970, 1, 3, 12, 0),
 datetime.datetime(1970, 1, 3, 15, 0),
 datetime.datetime(1970, 1, 3, 18, 0),
 datetime.datetime(1970, 1, 3, 21, 0),
 datetime.datetime(1970, 1, 4, 0, 0),
 datetime.datetime(1970, 1, 4, 3, 0),


# J. Reading and writing text files

There are many different file formats. Data are often in a specialized binary format. But there are also many datasets that are simple text files. Basic text file commands are included in the core language.

In [55]:
f = open('02_GPS.dat')   # open a data file created by a handheld GPS unit.

# f.close()              # later when we are done with the file, we would close it with this command.

---
### *Exercise*

> Use tab completion to explore the different attributes and methods of the `file` object.

> We will use the `f.readlines()` method to iterate over all of the lines in the file. See what this command returns.

---

In [56]:
f.seek(0)  # This sets the pointer back to the beginning of the file. This allows us to run this
           # block of code many times without reopening the file each time.

for line in f.readlines():        # iterate over each line in the file. Each line is a string.
    data = line.split()           # split the line of text into words, each separated by spaces
    if not data: continue         # Test for an empty list, the same as if data == []
    if data[0] == 'Trackpoint':   # We only want to consider lines that begin with 'Trackpoint', as these hold the data
        print(data[1] + ' ' + data[2] + '   ' + data[3] + ' ' + data[4])

N42 49.822   W70 45.413
N42 49.820   W70 45.415
N42 49.821   W70 45.408
N42 49.824   W70 45.400
N42 49.825   W70 45.393
N42 49.824   W70 45.379
N42 49.821   W70 45.370
N42 49.821   W70 45.362
N42 49.821   W70 45.353
N42 49.816   W70 45.341
N42 49.807   W70 45.330
N42 49.794   W70 45.324
N42 49.784   W70 45.326
N42 49.776   W70 45.339
N42 49.781   W70 45.361
N42 49.786   W70 45.381
N42 49.780   W70 45.400
N42 49.767   W70 45.412
N42 49.750   W70 45.420
N42 49.735   W70 45.430
N42 49.719   W70 45.440
N42 49.701   W70 45.447
N42 49.683   W70 45.455
N42 49.668   W70 45.465
N42 49.652   W70 45.476
N42 49.637   W70 45.489
N42 49.623   W70 45.502
N42 49.611   W70 45.520
N42 49.600   W70 45.537
N42 49.585   W70 45.552
N42 49.574   W70 45.569
N42 49.562   W70 45.586
N42 49.549   W70 45.601
N42 49.536   W70 45.617
N42 49.525   W70 45.632
N42 49.513   W70 45.646
N42 49.499   W70 45.660
N42 49.486   W70 45.675
N42 49.473   W70 45.690
N42 49.463   W70 45.707
N42 49.452   W70 45.722
N42 49.436   W70

Now, lets use this script as a base for pulling the data out. We will use the `int` and `float` commands to convert the strings (the 'words' stored in the list `data`) to numbers. Then, we will store these numbers in a new list. 

In [57]:
f.seek(0)  

latitudes = []     # create empty lists to store numerical values of lat and lon
longitudes = []

for line in f.readlines():
    data = line.split()
    if not data: continue
    if data[0] == 'Trackpoint':
        lat = int(data[1][1:]) + float(data[2])/60   # the index to data[1] is used to trim the 'N' off of the string.
        latitudes.append(lat)
        lon = int(data[3][1:]) + float(data[4])/60   
        longitudes.append(lon)

latitudes, longitudes

([42.83036666666667,
  42.830333333333336,
  42.83035,
  42.8304,
  42.830416666666665,
  42.8304,
  42.83035,
  42.83035,
  42.83035,
  42.83026666666667,
  42.83011666666667,
  42.8299,
  42.82973333333333,
  42.8296,
  42.829683333333335,
  42.829766666666664,
  42.82966666666667,
  42.82945,
  42.829166666666666,
  42.828916666666665,
  42.82865,
  42.82835,
  42.82805,
  42.8278,
  42.827533333333335,
  42.827283333333334,
  42.82705,
  42.82685,
  42.82666666666667,
  42.82641666666667,
  42.826233333333334,
  42.826033333333335,
  42.82581666666667,
  42.8256,
  42.82541666666667,
  42.82521666666667,
  42.824983333333336,
  42.82476666666667,
  42.82455,
  42.82438333333333,
  42.8242,
  42.823933333333336,
  42.8237,
  42.82353333333333,
  42.82335,
  42.82313333333333,
  42.82295,
  42.82275,
  42.82256666666667,
  42.82236666666667,
  42.82215,
  42.82196666666667,
  42.821783333333336,
  42.821616666666664,
  42.82145,
  42.821266666666666,
  42.82106666666667,
  42.8208666

To write a file, open it with the `'w'` flag, which specifies the file as writable. This example shows how to write the latitude and longitude to a file, with formatted values. [Learn more about string formatting from the python documentation](https://docs.python.org/2/library/string.html#format-string-syntax)

In [58]:
f = open('output.dat', 'w')

for lat, lon in zip(latitudes, longitudes):
    f.write('{:4.2f}, {:4.2f}\n'.format(lat, lon))   # python 3.x style formatting
#     f.write('%4.2f, %4.2f' % (lat, lon))           # python 2.x style formatting

f.close()

There is a new syntax that shortens the open, work, close sequence. Use the command `with`, for example, we could have done the above example as

In [59]:
with open('output.dat', 'w') as f:
    f.write('{:4.2f}, {:4.2f}\n'.format(lat, lon))