# 02 Core language

# A. Variables

Variables are used to store and modify values.

In [98]:
a = 5
b = a + 3.1415
c = a / b

print(a, b, c)

5 8.1415 0.6141374439599582


Note, we did not need to declare variable types (like in fortran), we could just assign anything to a variable and it works. This is the power of an interpreted (as opposed to compiled) language. Also, we can add different types (`a` is an integer, and we add the float 3.1415 to get `b`). The result is 'upcast' to whatever data type can handle the result. I.e., adding a float and an int results in a float.

Variables can store lots of different kinds of data

In [None]:
s = 'Ice cream'            # A string
f = [1, 2, 3, 4]           # A list
d = 3.1415928              # A floating point number
i = 5                      # An integer
b = True                   # A boolean value

# B. Conditionals

We can test the values of variables using conditionals. Conditionals return a `Boolean` value. Either `True` or `False`. `False` is the same as zero, `True` is nonzero. Note that assignment `=` is different than a test of equality `==`.

In [99]:
a < 99

True

In [102]:
b > 99

False

In [101]:
a == 5

True

# C. Strings

Strings are made using various kinds of (matching) quotes. Examples:

In [2]:
s1 = 'hello'
s2 = "world"
s3 = '''Strings can 
also go over
multiple lines.'''

You can also 'add' strings using 'operator overloading', meaning that the plus sign can take on different meanings depending on the data types of the variables you are using it on.

In [3]:
print( s1 + ' ' + s2)  # note, we need the space otherwise we would get 'helloworld'

hello world


We can include special characters in strings. For example `\n` gives a newline, `\t` a tab, etc. Notice that the multiple line string above (`s3`) is converted to a single quote string with the newlines 'escaped' out with `\n`.

In [4]:
s3

'Strings can \nalso go over\nmultiple lines.'

Strings are 'objects' in that they have 'methods'. Methods are functions that act on the particular instance of a string object. You can access the methods by putting a dot after the variable name and then the method name with parentheses (and any arguments to the method within the parentheses). Methods always have to have parentheses, even if they are empty.

In [12]:
s3.capitalize()

'Strings can \nalso go over\nmultiple lines.'

One of the most useful string methods is 'split' that returns a list of the words in a string, with all of the whitespace (actual spaces, newlines, and tabs) removed. More on lists next.

In [13]:
s3.split()

['Strings', 'can', 'also', 'go', 'over', 'multiple', 'lines.']

# D. Containers

Often you need lists or sequences of different values (e.g., a timeseries of temperature – a list of values representing the temperature on sequential days). There are three containers in the core python language. There are a few more specialized containers (e.g., numpy arrays and pandas dataframes) for use in scientific computing that we will learn much more about later; they are very similar to the containers we will learn about here.

## Lists

Lists are perhaps the most common container type. They are used for sequential data. Create them with square brackets with comma separated values within:

In [64]:
foo = [1., 2., 3, 'four', 'five', [6., 7., 8], 'nine']

Note that lists (unlike arrays, as we will later learn) can be heterogeneous. That is, the elements in the list don't have to have the same kind of data type. Here we have a list with floats, ints, strings, and even another (nested) list!

We can retrieve the individual elements of a list by 'indexing' the list. We do this with square brackets, using zero-based indexes – that is `0` is the first element – as such:

In [65]:
foo[0]

1.0

In [66]:
foo[5]

[6.0, 7.0, 8]

In [67]:
foo[5][1]  # Python is sequential, we can access an element within an element using sequential indexing.

7.0

In [68]:
foo[-1]    # This is the way to access the last element.

'nine'

In [69]:
foo[-3]    # ...and the third to last element

'five'

In [70]:
foo[-3][2]   # we can also index strings.

'v'

We can get a sub-sequence from the list by giving a range of the data to extract. This is done by using the format

    start:stop:stride

where `start` is the first element, up to but not including the element indexed by `stop`, taking every `stride` elements. The defaluts are start at the begining, include through the end, and include every element. 

The up-to-but-not-including part is confusing to first time Python users, but makes sense given the zero-based indexing. For example, `foo[:10]` gives the first ten elements of a sequence.

In [71]:
# create a sequence of 10 elements, starting with zero, up to but not including 10.
bar = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [72]:
bar[2:5]

[2, 3, 4]

In [73]:
bar[:4]

[0, 1, 2, 3]

In [74]:
bar[4:]

[4, 5, 6, 7, 8, 9]

In [75]:
bar[::2]

[0, 2, 4, 6, 8]

---
###  *Exercise*

> Use indexing to get the following sequences:
    
    [3, 4, 5]
    
    [9]        # note this is differet than just the last element. 
               # It is a sequence with only one element, but still a sequence
    
    [2, 5, 8]

> What happens when you exceed the limits of the list?

    bar[99]
    bar[-99]
    bar[5:99]

---

You can assign values to list elements by puting the indexed list on the right side of the assignment, as

In [76]:
bar[5] = -99
bar

[0, 1, 2, 3, 4, -99, 6, 7, 8, 9]

This works for sequences as well,

In [77]:
bar[2:7] = [1, 1, 1, 1, 1]
bar

[0, 1, 1, 1, 1, 1, 1, 7, 8, 9]

Lists are also 'objects'; they also have 'methods'. Methods are functions that are designed to be applied to the data contained in the list. You can access them by putting a dot and the method name after the variable (called an 'object instance')

In [78]:
bar.sort()    # Note that we don't do 'bar = bar.sort()'. The sorting is done in place.

---
### *Exercise*

> What other methods are there? In iPython or a jupyter window, type `bar.` and then `<TAB>`. This will show the possible complitions, which in this case is a list of the methods and attributes. You can get help on a method by typing, for example, `bar.pop?`.  See if you can use three methods of the list instance `bar`.


---

## Tuples

Tuples (pronounced `too'-puls`) are sequences that can't be modified, and don't have methods. Thus, they are designed to be imutable sequences. They are created like lists, but with paretheses instead of square brackets.

In [79]:
foo = (3, 5, 7, 9)
foo[2] = -999  # gives an assignment error

TypeError: 'tuple' object does not support item assignment

Tuples are often used when a function has multiple outputs, or as a lightweight storage container. Becuase of this, you don't need to put the parenthases around them, and can assign multiple values at a time.

In [80]:
a, b, c = 1, 2, 3   # Equivalent to '(a, b, c) = (1, 2, 3)'

## Dictionaries

Dictionaries are used for unordered sequences that are referenced by arbitrary 'keys' instead of by a (sequential) index. Dictionaries are created using curly braces with keys and values separated by a colon, and key:value pairs separated by comas, as

In [81]:
foobar = {'a':3, 'b':4, 'c':5}

Elements are referenced and assigned by keys:

In [82]:
foobar['a']

3

In [83]:
foobar['c'] = -99
foobar

{'a': 3, 'b': 4, 'c': -99}

The keys and values can be extracted as lists using methods of the dictionary class.

In [84]:
foobar.keys()

dict_keys(['b', 'c', 'a'])

In [85]:
foobar.values()

dict_values([4, -99, 3])

New values can be assigned simply by assigning a value to a key that does not exist yet

In [86]:
foobar['spam'] = 'eggs'
foobar

{'a': 3, 'b': 4, 'c': -99, 'spam': 'eggs'}

---
### *Exercise*

> Use a dictioary to create a list-like object that has negative indices, with the indices ranging from -3 to 3 (with arbitrary floating point values)

> Explore the methods of the dictionary object, as was done with the list instance in the previous exercise.


---

# E. Loops

Loops are one of the fundamental structures in programing. Loops allow you to iterate over each element in a sequence, one at a time, and do something with those elements.

*Loop syntax*: Loops have a very particular syntax in Python; this syntax is one of the most notable features to Python newcommers. The format looks like

    for *element* in *sequence*:                # NOTE the colon at the end
        <some code that uses the *element*>     # the block of code that is looped over for each element
        <more code that uses the *element*>     # is indented four spaces (yes four! yes spaces!)
    
    <the code after the loop continues>         # the end of the loop is marked simply by unindented code
    
Thus, indentation is significant to the code. This was done because good coding practice (in almost all languages, C, FORTRAN, MATLAB) typically indents loops, functions, etc. Having indentation be significant saves the end of loop syntax for more compact code.

A simple example is to find the sum of the squares of the sequence 0 through 99,

In [92]:
sum_of_squares = 0
for n in range(100):              # range yields a sequence of numbers from 0 up to but not including 100
    sum_of_squares += n**2        # the '+=' operator is equivalent to 'sum = sum + n**2', 
                                  # the '**' operator is a power, like '^' in other languages

print(sum_of_squares)

328350


You can iterate over any sequence, and in Python (like MATLAB) it is better to iterate over the sequence you want than to loop over the indices of that sequence.

In [93]:
words = ['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']

sentence = ''
for word in words:
    sentence += word + ' '

sentence

'the quick brown fox jumped over the lazy dog '

The majority of loops that you will write will be `for` loops. These are loops that have a defined number of iterations, over a specified sequence. However, there may be times when it is not clear when the loop should terminate. In this case, you use a `while` loop. This has the syntax

    while <condition>:
        <code>

condition should be something that can be evaluated when the loop is started, and the variables that determine the conditional should be modified in the loop.

Here is an example of integration of an infinite curve that is terminated when the increment reaches some small value (NOTE, this is not a great way to do this kind of calculation...).

In [110]:
x = 1.0      # starting value
dx = 0.1     # increment for numerical integration
increment = 1e36   # arbitrary large value

integral = 0.0 # initial value of integral
while increment > 1e-5:
    increment = dx * (1.0/(x+0.5*dx))
    x += dx
    integral += increment

integral

9.209934430626433

# F. Functions

Functions are ways to create reusable blocks of code that can be run with different variable values – the input variables to the function. Functions are defined using the syntax

    def <function name> (var1, var2, ...):
        <block of code...>
        <...defining the function>
        return <return variable(s)>

Functions can be defined at any point in the code, and called at any subsequent point.

In [16]:
def addfive(x):
    return x+5

addfive(3.1415)

8.1415

## Function inputs and outputs

Functions can have multiple input and output values. The documentation for the function can (and should) be provided as a string at the begining of the function.

In [18]:
def sasos(a, b, c):
    '''return the sum of a, b, and c and the sum of the squares of a, b, and c'''
    res1 = a + b + c
    res2 = a**2 + b**2 + c**2
    return res1, res2

sasos(3, 4, 5)

(12, 50)

Functions can have variables with default values. You can also specify positional variables out of order if they are labeled explicitly.

In [22]:
def powsum(x, y, z, a=1, b=2, c=3):
    return x**a + y**b + z**c

print( powsum(2., 3., 4.) )
print( powsum(2., 3., 4., b=5) )
print( powsum(z=2., x=3., y=4., c=2) )

75.0
309.0
23.0


---
### *Exercise*

> Verify `powsum(z=2., x=3., y=4., c=2)` is the same as `powsum(3., 4., 2., c=2)`

> What happens when you do `powsum(3., 4., 2., x=2)`?  Why?


---

## Scope

Variables within the function are treated as 'local' variables, and do not affect variables outside of the 'scope' of the function. That is, all of the variables that are changed within the block of code inside a function are only changed within that block, and do not affect similarly named variables outside the function.

In [27]:
x = 5

def changex(x):      # This x is local to the function
    x += 10.         # here the local variable x is changed
    print('Inside changex, x=', x)
    return x

res = changex(x)    # supply the value of x in the 'global' scope.
print(res)          
print(x)            # The global x is unchanged

Inside changex, x= 15.0
15.0
5


Variables from the 'global' scope can be used within a function, as long as those variables are unchanged. This technique should generally only be used when it is very clear what value the global variable has, for example, in very short helper functions.

In [31]:
x = 5

def dostuffwitha(a):
    res = a + x       # Here, the global value of x is used, since it is not defined inside the function.
    return res

print(dostuffwitha(3.0))
print(x)

8.0
5


# G. Classes

Classes are used to define generic objects. The 'instances' of the class are supplied with specific data. Classes define a data structure, 'methods' to work with this data, and 'attributes' that define the data.

###### The computer science way to think of classes

Think of the class as a sentence. The nouns would be the classes, the associated verbs class methods, and associated adjectives class attributes. For example take the sentence

> The white car signals and makes a left turn.

In this case the object is a `car`, a generic kind of vehicle. We see in the setence that we have a particular instance of a `car`, a *white* `car`. Obviously, there can be many instances of the class `car`. White is a defining or distingusing 'attribute' of the car. There are two 'methods' noted: signaling and turning. We might write the code for a `car` object like this:

    class Car(object):
        
        def __init__(self, color):
            self.color = color

        def signal(self, direction):
            <signalling code>

        def turn(self, direction):
            <turning code>
 
###### The scientific way to thing about classes

Generally, in science we use objects to store and work with complicated data sets, so it is natural to think of the data structure first, and use that to define the class. The methods are functions that work on this data. The attributes hold the data, and other defining characteristics about the dataset (i.e., metadata). The primary advantage of this approach is that the data are in a specified structure, so that the methods can assume this structure and are thereby more efficient.

For example, consider a (atmospheric, oceanic, geologic) profile of temperature in the vertical axis. We might create a class that would look like:

    class Profile(object):
        '''
        Documentation describing the object, in particular how it is instantiated.
        '''
        def __init__(self, z, temp, lat, lon, time):
            self.z = z            # A sequence of values defining the vertical positions of the samples
            self.property = temp  # A corresponding sequence of temperature values
            self.lat = lat        # The latitude the profile was taken
            self.lon = lon        # The longitude the profile was taken
            self.time = time      # The time the profile was taken

        def mean(self):
            'return the mean of the profile'
            <code to calculate the mean temperature along the profile>

Note, there could be a number of different choices for how the data are stored, more variables added to the profile, etc. Designing good classes is essential to the art of computer programing. Make classes as small and agile as possible, building up your code from small, flexible building blocks. Classes should be parsimonious and cogent. Avoid bloat.

Classes are traditionally named with a Capitol, sometimes CamelCase, sometimes underlined_words_in_a_row, as opposed to functions which are traditionally lower case (there are many exceptions to these rules, though). When a class instance is created, the special `__init__` function is called to create the class instance. Within the class, the attributes are stored in `self` with a dot and the attribute name. Methods are defined like normal functions, but within the block, and the first argument is always `self`.

There are many other special functions, that allow you to, for exmaple, overload the addition operator (`__add__`) or have a representation of the class that resembles the command used to create it (`__repr__`).

Consider the example of a class defining a point on a 2D plan,

In [38]:
from math import sqrt     # more on importing external packages below

class Point(object):
    
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def norm(self):
        'The distance of the point from the origin'
        return sqrt(self.x**2 + self.y**2)
    
    def dist(self, other):
        'The distance to another point'
        dx = self.x - other.x
        dy = self.y - other.y
        return sqrt(dx**2 + dy**2)
        
    def __add__(self, other):
        return Point(self.x + other.x, self.y + other.y)
    
    def __repr__(self):
        return 'Point(%f, %f)' % (self.x, self.y)
    

p1 = Point(3., 4.)    # a point at location (3, 4)
p2 = Point(6., 8.)    # another point, we can have as many as we want..

res = p1.norm()
print('p1.norm() = ', res)

res = p2.norm()
print('p2.norm() = ', res)

res = p1.dist(p2)
print('The distance between p1 and p2 is', res)

p1+p2

p1.norm() =  5.0
p2.norm() =  10.0
The distance between p1 and p2 is 5.0


Point(9.000000, 12.000000)

Notice that we don't require `other` to be a `Point` class instance; it could be any object with `x` and `y` attributes. This is known as 'object composition' and is a useful approach for using multiple different kinds of objects with similar data in the same functions.

# H. Packages

Functions and classes represnet code that is intended to be resused over and over. Packages are a way to store and manage this code. Python has a number of 'built-in' classes and functions that we have discussed above. List, tuples and dictionaries; `for` and `while` loops; and standard data types are part of every python session.

There is also a very wide range of packages that you can import that extend the abilities core Python. There are packages that deal with file input and output, internet communication, numerical processing, etc. One of the nice features about Python is that you only import the packages you need, so that the memory footprint of your code remains lean. Also, there are ways to import code that keep your 'namespace' organized.

> Namespaces are one honking great idea -- let's do more of those!

In the same way directories keep your files organized on your computer, namespaces organize your Python environment. There are a number of ways to import packages, for example.

In [3]:
import math     # This imports the math function. Here 'math' is like a subdirectory 
                # in your namespace that holds all of the math functions

---
### *Exercise*

> After importing the math package, type `math.` and hit <TAB> to see all the possible completions. These are the functions available in the math package. Use the math package to calculate the square root of 2.

> There are a number of other ways to import things from the math package. Experiment with these commands

    from math import tanh  # Import just the `tanh` function. Called as `tanh(x)`
    import math.sin        # Import just the `sin` function. Calls as `math.sin(x)`
    import math as m       # Import the math package, but rename it to `m`. Functions called like `m.sin(x)`
    from math import *     # All the functions imported to top level namespace. Functions called like `sin(x)`
    
> This last example makes things easier to use, but is frowned on as it is less clear where different functions come from.

> For the rest of the 'Zen of Python' type `import this`

---

One particular package that is central to scientific Python is the `numpy` package (*Num*erical *Py*thon). We will talk about this package much more in the future, but will outline a few things about the package now. The standard way to import this package is

In [4]:
import numpy as np

The `numpy` package has the same math functions as the `math` package, but these functions are designed to work with numpy arrays. Arrays are the backbone of the `numpy` package. For now, just think of them as homogeneous, multidimensional lists.

In [5]:
a = np.array([[1., 2., 3], [4., 5., 6.]])
a

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

In [7]:
np.sin(a)

array([[ 0.84147098,  0.90929743,  0.14112001],
       [-0.7568025 , -0.95892427, -0.2794155 ]])

Note that we can have two `sin` functions at the same time, one from the `math` package and one from the `numpy` package. This is one of the advantages of namespaces.

In [9]:
math.sin(2.0) == np.sin(2.0)

True

# I. Reading text files

There are many different file formats. Data are often in a specialized binary format. But there are also many datasets that are simple text files. Let's take a look at how to 