# Motivation

This session we'll focus on developing core programming skills in Python. Control statements are ubiquitous across all programming languages. For example, previously we plotted a simple map of well locations. The map isnt very informative in that it just posts the surface locations. It would be helpful if we could filter the locations by well type and show only producing locations, or perhaps color the locations by formation. As we progress into more sophisticated data science and machine learnign topics we will also need to know how to perform repeated calculations or manipulations over a loop.




# Control Statements in Python

Control statements are how data are discriminated and filtered.

- Boolean expressions
- Logical operators (`is`, `is not`, `and`, and `or`)
- `if`/`elif`/`else` statements
- `for` and `while` loops

## Boolean Expressions

Recall that we previously discussed a boolean *variable*, which is a variable that is assigned either of the values `True` or `False`. We also discussed *statements*, which are lines of code that are read by the interpreter. A **boolean expression** is then, by extension, a statement that results in either of the truth values `True` or `False`.

Here is an example.

In [None]:
15 > 1

In the preceding code snippet, I wrote the expression `15 > 1`. The mathematical comparison `>` is exactly as we learned it in grade school. We are saying that '15 is greater than 1'. Python knows that this is true, and so outputs the value `True`. Let's see another example.

In [None]:
15 > 30

15 is not greater than 30, so the interpreter output a `False` value.

Let's try comparing two variables to see if they are equal. We must remember that the symbol `=` is used in variable assignment. If we try to compare two variables with `=`, the interpreter should get confused. Let's try it out.

In [None]:
5=5

Of course this won't work because the interpreter thinks that we are trying to assign the variable named '5' a value of 5. We can't use numbers as variable names. So what if we use the word 'five' instead?

In [None]:
five = 5
five = five
print(five = five)

The interpreter got angry and output an error message. The problem here is that we are asking the interpreter to print out a variable assignment, not a boolean expression. 

We can compare the values of two variables using the `==` symbol for equality. For inequality, we use `!=`.

In [None]:
five = 5
print(five == five)
print(five == 6)

The complete list of Python's relational operators for comparing variables are found in the next table. We'll call the variable or value that comes first in a comparison the 'first operand' and the variable that comes after the operator the 'second operand'. This table is adapted from *Murach's Python Programming* by Michael Urban and Joel Murach.

|Operator|Name|Way it works|
|---|---|---|
|`>`|Greater than|Returns `True` if first operand is greater than the second operand|
|`<`|Less than|Returns `True` if first operand is less than the second operand|
|`>=`|Greater than or equal to|Returns `True` if first operand is greater than or equal to the second operand|
|`<=`|Less than or equal to|Returns `True` if first operand is less than or equal to the second operand|
|`==`|Equals|Returns `True` if both operands are equal|
|`!=`|Not equal|Returns `True` if operands are not equal|

**Note:** We discussed declaring variables with `float` type. When we do comparisons, we should not compare the equality of a  `float` with another `float`. This is because Python doesn't use exact values for variable of `float` type.

## Logical Operators

Python also has a convenient syntax for chaining boolean statements together. You can use the `and` and `or` operators as illustrated in the next examples.

In [None]:
3 >= 2 and 5 < 7

In [None]:
3 >= 2 and 5 > 7

While using the `and` operator, both statements on either side of `and` must be `True`. In the second example above, I wrote `3 >= 2 and 5 > 7`. The first boolean statement, `3 >=2`, is certainly `True`. However, the second statement `5 > 7` is `False`, so the overall statement is `False`.

Let's see how the same statements would work if we used the `or` operator instead of `and`.

In [None]:
3 >= 2 or 5 < 7

In [None]:
3 >= 2 or 5 > 7

From this example, we see that only *one* of the statements on either side of the `or` operator must be `True` in order for the whole statement to be `True`. If neither the first statement nor the second statement are `True`, the overall statement will be `False`, as seen in the next example.

In [None]:
3 <= 2 or 5 > 7

Python has another way of comparing equality and inequality which will be very important when we talk about conditional control. Instead of writing `==`, there are some situations in which we should be writing `is`. Similarly, there are situations where writing `is not` is preferable to writing `!=`.

For now, we'll look at examples of how the `is` and `is not` operators work.

In [None]:
six = 6

# This should print out 'True'.
print(six is 6)

# This should print out 'False'.
print(six is 5)

In [None]:
# This should print out 'False'.
print(six is not 6)

# This should print out 'True'.
print(six is not 5 and six is not 7)

We can also use parentheses to chain together boolean statements in more creative and useful ways. Similarly, we can chain together several compound statements to form even larger compound statements. Let's look at some examples.

In [None]:
age = 36
handedness = 'right'
city = 'Calgary'

# Chain of booleans using 'and' with parentheses.
print((age < 45 and handedness == 'right') and (age < 36 or handedness != 'left'))

# Using parentheses to create compound statement.
print((age < 34 or city == 'Calgary') and handedness == 'right')

## Comparing Strings

Variables of the type `str` are different than the numerical variables of type `int` and `float`. In this section, we'll make sense of statements such as `'hello' < 'Hello'`, with the intention of avoiding semantic errors in the future.

The interpreter reads a string from left to right and the characters are compared one at a time. In Python, the hierarchy (also called the 'sort sequence') of characters is given by:

1. Lowercase letters, alphabetically ordered
2. Uppercase letters, alphabetically ordered
3. Special characters
4. Digits 0-9

Therefore, lower case characters are considered as having the 'top' value. Next in value are the upper case characters, followed by digits. Here are some examples.

**Note:** As stated above, this is most useful for avoiding semantic errors. In practice, we usually only compare string equality.

In [1]:
'hello' < 'hEllo'

False

In [2]:
'1hello' < '2hello'

True

In [3]:
'apple' > 'Apple'

True

In [4]:
'0' > 'A' or 'A' > 'a'

False

In [5]:
'#' < '@'

True

In [6]:
'hello' is 'hello'

  'hello' is 'hello'


True

We can use one of Python's most powerful features to manipulate strings. This is something we hinted at last week, and we'll continue to hint at until we finally start to define our own objects.

In Python, *everything* is an object. You may have encountered object-oriented programming in past experiences with other programming languages. The short introduction to an object is this: objects have **attributes** and **methods**. An **attribute** is some defined property of the object, and a **method** is a function specific to the object.

Let's remove some of the mystery around objects. Our first example of an object is the `str` data type. We can access the 'uppercase-ness' of a string by typing `string_name.isupper()`.

In [None]:
'string'.isupper()

Not surprisingly, the interpreter says that `'string'` is not uppercase. We called the `isupper()` method by using dot notation `.`.

Let's call two other useful methods, `lower()` and `upper()`.

In [None]:
print('string'.upper())
print('STRING'.lower())

When comparing strings, it is often most helpful to change the strings to either uppercase or lowercase due to the confusing values given to characters.

## Conditional Control

Here is where things get really interesting. What we did above with boolean statements and compound statements is necessary to understand the language of Python. However, they don't tell us much about *programming*. That is to say, they don't help us put statements and expressions together to accomplish various tasks. This is where we introduce conditional control, which is a way for a program to use boolean statements to decide what to do next.

These control statements are present in every programming language. Here is an example of how Python handles them.

In [None]:
msg = 'This is my message.'
decision_value = 5

# Here is the control statement.
if decision_value > 4:
    print(msg)

In [None]:
msg = 'This is my message.'
decision_value = 3

# Here is the control statement.
if decision_value > 4:
    print(msg)

In a conditional `if` statement, the line containing the keyword `if` always must end with a colon `:`. Note that it makes no difference if we place the entire conditional statement on the same line.

In [None]:
if decision_value > 4: print(msg)

This is, however, bad practice. Long blocks of code with conditional statements written in this way are difficult to read and maintain. For example, if you encountered the following code, you might not immediately see what the `if` statement is really doing.

In [None]:
msg = 'Fatal error: formatting hard drive now...'
msg1 = 'Program executed successfully.'
decision = 3

if decision > 4: print(msg1)
print(msg)

Earlier we discussed comparing numerical values with the relational operators `>`, `<`, `>=`, `<=`, `==`, and `!=`. These operators are combined with a conditional `if` statement to direct Python programs, as in the following example.

In [None]:
votes_to_win = 50
votes = 45

if 0 < votes and votes < votes_to_win:
    print('Not enough votes to win.')

The previous example is purely illustrational. Python allows relational operators to be chained arbitrarily. This greatly simplifies the `if` statement in the previous example.

In [None]:
votes_to_win = 50
votes = 45

if 0 < votes < votes_to_win:
    print('Not enough votes to win.')

It is also best practice to avoid comparing a variable directly to a boolean value. By default, most values in Python are considered `True`. 

Values considered `False` by default include `False`, `None`, `0`, and `0.`, among others that we will discuss next week. 

Not comparing variables directly to boolean values avoids problems brought on by Python's dynamic typing. Recall that we may declare an `int` variable and then change it arbitrarily to the `str` type. This can result in confusion, as shown in the next example.

In [None]:
var = 0

if var == False:
    print('This is confusing.')
    
var = '0'

if var == False:
    print('This is also confusing.')

The confusion arises because it may be that we wanted `var` to be `False`, but changing it to a `str` made its boolean value `True`. 

Therefore, to check a boolean value, instead of writing `if var == False:`, we write `if not var:`.

Similarly, instead of writing `if var == True:`, we write `if var:`.

Also, when using the `==` operator, Python simply checks if the two variables have the same *value*. Using `is` instead of `==` results in Python checking if the two variables are *the same object*. 

In [None]:
var = 0

if not var:
    print('This is less confusing.')
    
var = 1

if var:
    print('Ahh, much better.')

if var is 1:
    print('And less characters in the code makes it easier to read.')

### *Further examples*

In [8]:
str(0) == '0'

True

In [3]:
str(0) is '0'

False

In [4]:
'0' is '0'

True

In [5]:
print(hex(id(str(0))))
print(hex(id('0')))

0x12acb422810
0x12ac8faabc8


In [6]:
str('0') is '0'

True

In [7]:
print(hex(id(str('0'))))
print(hex(id('0')))

0x12ac8faabc8
0x12ac8faabc8


For writing alternate decisions in Python, we use the `elif` (else if) and `else` keywords. Here is an example.

In [None]:
print('Barely empathetic support system activated...')
user_var = input('How are you feeling today? ').lower()

if user_var == 'happy':
    print('Glad to hear it!')
elif user_var == 'sad':
    print('Sorry to hear that.')
else:
    print('...ok...')

## Computing through Repetition: Iteration

It is almost always necessary to repeat tasks when writing code to do any programming task. For example, a music player shouldn't just play a single song and then shut down. We accomplish the repetition of programming tasks through *iteration*. The most basic methods for iteration are the `while` and `for` loops.

### `while` Loops

A `while` loop is commonly used when it is unknown when a given task should terminate. For example, in numerical analysis, a branch of applied mathematics, a given procedure will terminate when a specific error estimate is below a given threshold. 

Here's a simple example of a `while` loop using [Stochcheck's approximation of $\pi$](http://mathworld.wolfram.com/PiApproximations.html). This loop continues iterating until the difference between the variable `pi_estimate` and Python's computation of $\pi$ is within $10^{-3}$. In other words, until $|\text{pi_estimate} - \pi| \leq 10^{-3}$.

In [None]:
import math

# Stoscheck's approximation of pi.
power = 0
DENOMINATOR = 163
TOLERANCE = 10e-3

pi_estimate = 2**power/DENOMINATOR

while abs(pi_estimate - math.pi) > TOLERANCE:
    power = power + 1
    # You can also write power += 1
    pi_estimate = 2**power/DENOMINATOR

print("Stoscheck's approximation to pi is {}.".format(pi_estimate))
print("The value of pi given by Python is {}.".format(math.pi))
print("The procedure repeated {} times.".format(power))

### A Caution about `while` Loops

Python, and computers in general, lack the decision-making power to do anything but what we tell them. Therefore, when you write a `while` loop, you need to ensure that there is some condition included so that the loop will eventually stop. If you don't do this, you will create an **infinite loop**. Valuable memory resources will be taken up by this infinite (non-terminating) loop and your program won't continue.

To avoid this in `while` loops, you can declare a 'counter' variable that keeps track of the current iteration of the loop. You can include an `if` clause containing a `break` statement so that the `while` loop terminates when a specified iteration is reached.

In [None]:
a = 5
counter = 0

while a > 4:
    print('This is an infinite loop. I hope it ends soon...')
    counter += 1
    if counter > 5:
        break

In [None]:
a = 5
counter = 0

while a > 4:
    print('This is an infinite loop. I hope it ends soon...')
    counter += 1

## `for` Loops

This style of loop is more commonly used than a `while` loop in most applications. The upper limit for iteration is set before the `for` loop begins. Use a `for` loop when you know how many times a given procedure must be completed.

### The `range()` Variable



The purpose of the `for` loop in the previous cell is to compute some procedure 10 times. The same `for` loop is written in Python using the `range(<int>)` function. The programmer specifies the upper limit for iterations (or terminating index) as the argument to the `range()` function. Let's see the Python `for` loop.

In [None]:
for index in range(10):
    # compute some task here
    print()

Just to make it more clear what's going on, let's print the iteration index at each iteration.

In [None]:
for index in range(10):
    print(index)

You can also use a custom-defined `range()`.

In [None]:
for i in range(2,12):
    print(i)

In [None]:
# Print every second index.
for i in range(0,10,2):
    print(i)

The general format for the `range()` function is `range(start,end,increment)`. The default values are `start = 0`, `end = <user-defined integer>`, and `increment = 1`. Any non-integer upper range limit causes an error.

In [None]:
for i in range(10.1):
    print(i)

You'll notice that the upper limit of the `range()` is never reached. This is because Python is a **zero-indexed** language. This means that the `range()` variable starts at 0 by default. **Zero-indexing** will take on more meaning next week when we talk about lists and dictionaries. For now, it means that any iteration index will start at 0 by default. So, instead of a loop starting at the 'first' iteration, the loop starts at the 'zero-th' iteration.

Therefore, when we call `range(10)`, we're saying that we want 10 iterations **starting at 0**. This naturally gives the iteration indices 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.

Perhaps a more useful example of using the `for` loop with a conditional expression is given in the next cell. This short program uses the `random` module that comes packaged with Python. The program begins by generating a random integer between 1 and 10. Iterating through the `range()`, the program stops when the iteration index is the same number as the randomly generated integer.

In [None]:
import random

# Generate the random integer.
random_int = random.randint(1,10)

# Loop until we find the random integer.
for i in range(11):
    print(i)
    if i == random_int:
        print('I found the random integer! It was {}.'.format(i))
        break

This short program illustrates a few new concepts. First, we imported the `random` module, which will be very useful for your first assignment.

Second, we used the `break` statement. This statement exits the `for` loop when it is called. In our example, the `break` statement is called when the iteration index matches the value of the random integer.

Similar to the `break` statement is the `continue` statement. However, instead of exiting the `for` loop, the `continue` statement 'cancels' the current iteration and moves on, or 'continues' to the next iteration. An example of this is in the next cell.

In [None]:
for i in range(10):
    if i == 5:
        continue
    else:
        print(i)

You can see in the output how the 5th iteration was skipped.

# Summary

### Comparisons, Logical Operators, and Conditional Control

- In Python, the comparison operators are:

|Operator|Name|Way it works|
|---|---|---|
|`a > b`|Greater than|Returns `True` if `a` has value greater than `b`|
|`a < b`|Less than|Returns `True` if `a` has value less than `b`|
|`a >= b`|Greater than or equal to|Returns `True` if `a` has value greater than or equal to `b`|
|`a <= b`|Less than or equal to|Returns `True` if `a` has value less than or equal to `b`|
|`a == b`|Equals|Returns `True` if `a` and `b` have the same value|
|`a != b`|Not equal|Returns `True` if `a` and `b` have different value|
|`a is b`|`is`|Returns `True` if `a` and `b` occupy the same space in memory; `a` and `b` are the same object|
|`a is not b`|`is not`|Returns `True` if `a` and `b` occupy distinct spaces in memory|

- We should not use `==` or `!=` to compare floats.
- We can chain together boolean expressions using the `and` and `or` keywords.
- Conditional expressions are written using the following syntax:

```
if <condition 1>:
    # <do this if condition 1 is met>
elif <condition 2>:
    # <do this if condition 2 is met>
else:
    # <do this>
```

### String Methods

- The `str` type is one example of an object in Python.
- String methods can be accessed using dot notation. Some useful methods for the `str` object are:

|Method|Use|Example|
|---|---|---|
|`isupper()`|Returns `True` if the string is uppercase.|`'string'.isupper()` will return `False`|
|`islower()`|Return `True` if the string is lowercase.|`'string.islower()` will return `True`|
|`upper()`|Converts a string to uppercase.|`'string'.upper()` becomes `'STRING'`|
|`lower()`|Converts a string to lowercase.|`'HELLO'.lower()` becomes `'hello'`|
|`format()`|Replaces `{}` within the string to the argument of `format()`|`'Hello {}'.format('world!')` gives `'Hello world!'`|

### Iteration

- Use `while` loops when the program requires an unknown number of iterations.
- Use `for` loops when the program requires a known number of iterations.
- The `range()` variable sets a number of iterations for a `for` loop. By default, `range(<number>)` gives iterations from `0` to `<number> - 1`.
- The syntax for a `while` loop is as follows:

```
while <condition>:
    # <do this as long as the condition is met>
```

- The syntax for a `for` loop is as follows. **Note:** *Any* variable name may be used for the index variable. Using `i` or `index` is just a convention:

```
for index in range(start = 0, end = 10, increment = 1):
    # <do this for every index in the above range>
```

- The `break` command exits a `while` or `for` loop.
- The `continue` command cancels the current iteration of a loop.

# When Things Go Wrong

The risk with process loops is that it is possible to make a loop that runs continuously without stopping and we need a way to stop it. Lets investigate how to do that. 

## Python Kernels

A python Kernel is the computational engine that runs the code in a notebook or a script. Every evaluated cell in the notebook adds onto the kernel, so previous evaluated cells provide the computational footing for subsequent cells in a waterfall fashion. 

To stop a chunk of code that has a loop with no end the kernel is interupted. This collapses the computational engine and all of the data held in memory collapses and is flushed. To continue working the kernel must be rebuilt or restarted.

At the top of a Jupyter notebook there is a drop down menu titled Kernel. The menu provides the ability to to stop, pause, and restart the kernel.

# A Gentle Introduction to Pandas

## What is Pandas
Pandas is an importable Python package with a data platform and an enourmous number of useful subroutines built in, and it has its own syntax. Pandas is easily the most used package in Python. If you installed the Anaconda data science platform then the pandas does not need to be installed as it already is. To use it, it must be imported, and recall that it is imported like so:

import pandas 

Recall that we can import and assign an alias in one line like so:

import pandas as pd

Pandas is supposed to mean a panel of data. Pandas is as to a spreadsheet as one can get under Python, but its much more powerful and complicated than a spreadsheet.

The scope of use of Pandas is enormous and can cover many hours and many hundreds of lines of text if we dedicated ourselves solely toward learning about Pandas. I dont see much utility in that. I think Pandas has to be absorbed in small doses, and this is the first dose. 


## Pandas Series and DataFrames

There are two building blocks in Pandas, the series, and the dataframe. In Pandas a column of data is called a series.a collection of one or more series is called a dataframe. A series can be only one data type, an integer, a float, a logical True or False, a datetime, a timedelta, a string, or a category. It is not possible to have a column in a dataframe with more than one data type, but each column can be any data type. 

On first examination a series looks like a simple list of values, see below.

In [11]:
import pandas as pd

# Recall that any string must be encased by quotes or tick marks, and that each value is delineated with a comma

zone_series = pd.Series(['Wabamun', 'Clearwater', 'McMurray', 'Paleozoic', 'Wabiskaw', 'Unspecified',
 'Mannville', 'Nisku', 'Grand Rapids', 'Keg River', 'Graminia', 'Devonian',
 'Wabiskaw-McMurray', 'Joli Fou', 'Banff', 'Ireton'])

zone_list = ['Wabamun', 'Clearwater', 'McMurray', 'Paleozoic', 'Wabiskaw', 'Unspecified',
 'Mannville', 'Nisku', 'Grand Rapids', 'Keg River', 'Graminia', 'Devonian',
 'Wabiskaw-McMurray', 'Joli Fou', 'Banff', 'Ireton']

But a Series is very different than a list in that every value in a Series has an index that follows it on every operation whereas a list has no persistent index, see below. 

In [16]:
# In the eyes of Python a series looks like this:

print('This is an unsorted Series:', zone_series)

# note the index 

print()
print('This is a sorted Series:\n', zone_series.sort_values())

# Note the index of each value is preserved. Now lets look at a list

This is an unsorted Series: 0               Wabamun
1            Clearwater
2              McMurray
3             Paleozoic
4              Wabiskaw
5           Unspecified
6             Mannville
7                 Nisku
8          Grand Rapids
9             Keg River
10             Graminia
11             Devonian
12    Wabiskaw-McMurray
13             Joli Fou
14                Banff
15               Ireton
dtype: object

This is a sorted Series:
 14                Banff
1            Clearwater
11             Devonian
10             Graminia
8          Grand Rapids
15               Ireton
13             Joli Fou
9             Keg River
6             Mannville
2              McMurray
7                 Nisku
3             Paleozoic
5           Unspecified
0               Wabamun
4              Wabiskaw
12    Wabiskaw-McMurray
dtype: object


In [20]:
# In the eyes of Python a list looks like this:

print('This is an unsorted List:\n', zone_list)

# note the index 

zone_list.sort()
print()
print('This is a sorted List:\n', zone_list)

# Note that a list has no index

This is an unsorted List:
 ['Banff', 'Clearwater', 'Devonian', 'Graminia', 'Grand Rapids', 'Ireton', 'Joli Fou', 'Keg River', 'Mannville', 'McMurray', 'Nisku', 'Paleozoic', 'Unspecified', 'Wabamun', 'Wabiskaw', 'Wabiskaw-McMurray']

This is a sorted List:
 ['Banff', 'Clearwater', 'Devonian', 'Graminia', 'Grand Rapids', 'Ireton', 'Joli Fou', 'Keg River', 'Mannville', 'McMurray', 'Nisku', 'Paleozoic', 'Unspecified', 'Wabamun', 'Wabiskaw', 'Wabiskaw-McMurray']


We will dig into lists more in the next session

## Loading a Pandas DataFrame

There are a number of ways to load a Pandas dataframe,  but we will look at only one technique right now and that using a csv or excel spreadsheet to load a dataframe. 

The syntax is simple:

df = read_csv( *filename* ) for a csv or df = read_excel( *filename*, sheetname= *Sheet1* ) for excel. The difference with excel is that you must also state a tab name.  

There are control phrase that we can use to dictate how the data is loaded and these can be found here:

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Among the most useful are `skiprows= XX` for skipping rows of data, `sep= YY` for specifying a space or tab as a delimiter where `YY` is a string enclosed by quotes or ticks.

So lets rebuild the maps from last session and filter data

In [22]:
'''
We need to build the kernel and populate it with data
''' 

import pandas as pd

# Now use pandas to import the file. Note the naming convention and the pandas alias of pd for pandas

wells_df = pd.read_csv('Marten_Hills_meta.csv')

## Pandas Columns Indexing

Recall that Pandas is a collection of one or more Series or columns of data. There are several ways to specify or index a column of data in Pandas but for the time being I will use only one. A column of data in a dataframe can be indexed using the name of the column as a string enclosed in quotes or ticks and enclosed in square brackets like so:

wells_df[*column_name*] 

There is a rule for naming a column in Pandas: the first character should not be a number (integer or float). You can force the column name to be a number but it ruins indexing and causes issues.


There are tips for naming columns: use snake case: `snake_case`

### First steps when loading data

Its helpful to make a list of the dataframe column names for easy reference. The columns command (a command because it does not have brackets behind it whereas a method does. More on this later)  

In [23]:
# Now lets look at the names of the columns using the pandas columns command

cols = wells_df.columns
print(cols)



Index(['UWI', 'Well Type (Simple)', 'Field', 'Pool', 'Formation',
       'Status (Detailed)', 'Status (Simple)', 'Total Depth (TD)',
       'Total Vertical Depth (TVD)', 'Surface Longitude', 'Surface Latitude',
       'Estimated LLR (Well)'],
      dtype='object')


First lets explore the data a little. How many formations are there?

In [25]:
print(wells_df['Formation'].unique())

['Wabamun' 'Clearwater' 'McMurray' 'Paleozoic' 'Wabiskaw' 'Unspecified'
 'Mannville' 'Nisku' 'Grand Rapids' 'Keg River' 'Graminia' 'Devonian'
 'Wabiskaw-McMurray' 'Joli Fou' 'Banff' 'Ireton']
