[Home](https://knorrena.github.io/IndustrialDataScience/)  
[Teaching](https://knorrena.github.io/IndustrialDataScience/content/2-Teaching.html)

# Python for Oil and Gas Session 2 Control Statements and Data Structures


## TOC:
- Control Statements
- Iteration
- Lists
- Dictionaries
- Tuples


## Introduction


This session we'll focus on developing core programming skills in Python. Control statements are ubiquitous across all programming languages. For example, previously we plotted a simple map of well locations. The map isnt very informative in that it just posts the surface locations. It would be helpful if we could filter the locations by well type and show only producing locations, or perhaps color the locations by formation. As we progress into more sophisticated data science and machine learnign topics we will also need to know how to perform repeated calculations or manipulations over a loop.




# Control Statements in Python

Control statements are how data are discriminated and filtered.

- Boolean expressions
- Logical operators (`is`, `is not`, `and`, and `or`)
- `if`/`elif`/`else` statements
- `for` and `while` loops

## Boolean Expressions

Recall that we previously discussed a boolean *variable*, which is a variable that is assigned either of the values `True` or `False`. We also discussed *statements*, which are lines of code that are read by the interpreter. A **boolean expression** is then, by extension, a statement that results in either of the truth values `True` or `False`.

Here is an example.

In [1]:
15 > 1

True

In the preceding code snippet, I wrote the expression `15 > 1`. The mathematical comparison `>` is exactly as we learned it in grade school. We are saying that '15 is greater than 1'. Python knows that this is true, and so outputs the value `True`. Let's see another example.

In [2]:
15 > 30

False

15 is not greater than 30, so the interpreter output a `False` value.

Let's try comparing two variables to see if they are equal. We must remember that the symbol `=` is used in variable assignment. If we try to compare two variables with `=`, the interpreter should get confused. Let's try it out.

In [10]:
5=5

SyntaxError: cannot assign to literal (Temp/ipykernel_16504/2359669884.py, line 1)

Of course this won't work because the interpreter thinks that we are trying to assign the variable named '5' a value of 5. We can't use numbers as variable names. So what if we use the word 'five' instead?

In [9]:
five = 5
five = five
print(five = five)

TypeError: 'five' is an invalid keyword argument for print()

The interpreter got angry and output an error message. The problem here is that we are asking the interpreter to print out a variable assignment, not a boolean expression. 

We can compare the values of two variables using the `==` symbol for equality. For inequality, we use `!=`.

In [11]:
five = 5
print(five == five)
print(five == 6)

True
False


The complete list of Python's relational operators for comparing variables are found in the next table. We'll call the variable or value that comes first in a comparison the 'first operand' and the variable that comes after the operator the 'second operand'. This table is adapted from *Murach's Python Programming* by Michael Urban and Joel Murach.

|Operator|Name|Way it works|
|---|---|---|
|`>`|Greater than|Returns `True` if first operand is greater than the second operand|
|`<`|Less than|Returns `True` if first operand is less than the second operand|
|`>=`|Greater than or equal to|Returns `True` if first operand is greater than or equal to the second operand|
|`<=`|Less than or equal to|Returns `True` if first operand is less than or equal to the second operand|
|`==`|Equals|Returns `True` if both operands are equal|
|`!=`|Not equal|Returns `True` if operands are not equal|

**Note:** We discussed declaring variables with `float` type. When we do comparisons, we should not compare the equality of a  `float` with another `float`. This is because Python doesn't use exact values for variable of `float` type.

## Logical Operators

Python also has a convenient syntax for chaining boolean statements together. You can use the `and` and `or` operators as illustrated in the next examples.

In [12]:
3 >= 2 and 5 < 7

True

In [13]:
3 >= 2 and 5 > 7

False

While using the `and` operator, both statements on either side of `and` must be `True`. In the second example above, I wrote `3 >= 2 and 5 > 7`. The first boolean statement, `3 >=2`, is certainly `True`. However, the second statement `5 > 7` is `False`, so the overall statement is `False`.

Let's see how the same statements would work if we used the `or` operator instead of `and`.

In [14]:
3 >= 2 or 5 < 7

True

In [15]:
3 >= 2 or 5 > 7

True

From this example, we see that only *one* of the statements on either side of the `or` operator must be `True` in order for the whole statement to be `True`. If neither the first statement nor the second statement are `True`, the overall statement will be `False`, as seen in the next example.

In [16]:
3 <= 2 or 5 > 7

False

Python has another way of comparing equality and inequality which will be very important when we talk about conditional control. Instead of writing `==`, there are some situations in which we should be writing `is`. Similarly, there are situations where writing `is not` is preferable to writing `!=`.

For now, we'll look at examples of how the `is` and `is not` operators work.

In [19]:
six = 6

# This should print out 'True'.
print(six is 6)

# This should print out 'False'.
print(six is 5)

True
False


  print(six is 6)
  print(six is 5)


In [20]:
# This should print out 'False'.
print(six is not 6)

# This should print out 'True'.
print(six is not 5 and six is not 7)

False
True


  print(six is not 6)
  print(six is not 5 and six is not 7)
  print(six is not 5 and six is not 7)


We can also use parentheses to chain together boolean statements in more creative and useful ways. Similarly, we can chain together several compound statements to form even larger compound statements. Let's look at some examples.

In [21]:
age = 36
handedness = 'right'
city = 'Calgary'

# Chain of booleans using 'and' with parentheses.
print((age < 45 and handedness == 'right') and (age < 36 or handedness != 'left'))

# Using parentheses to create compound statement.
print((age < 34 or city == 'Calgary') and handedness == 'right')

True
True


## Comparing Strings

Variables of the type `str` are different than the numerical variables of type `int` and `float`. In this section, we'll make sense of statements such as `'hello' < 'Hello'`, with the intention of avoiding semantic errors in the future.

The interpreter reads a string from left to right and the characters are compared one at a time. In Python, the hierarchy (also called the 'sort sequence') of characters is given by:

1. Lowercase letters, alphabetically ordered
2. Uppercase letters, alphabetically ordered
3. Special characters
4. Digits 0-9

Therefore, lower case characters are considered as having the 'top' value. Next in value are the upper case characters, followed by digits. Here are some examples.

**Note:** As stated above, this is most useful for avoiding semantic errors. In practice, we usually only compare string equality.

In [22]:
'hello' < 'hEllo'

False

In [23]:
'1hello' < '2hello'

True

In [24]:
'apple' > 'Apple'

True

In [25]:
'0' > 'A' or 'A' > 'a'

False

In [26]:
'#' < '@'

True

In [1]:
'hello' is 'hello'

  'hello' is 'hello'


True

We can use one of Python's most powerful features to manipulate strings. This is something we hinted at last week, and we'll continue to hint at until we finally start to define our own objects.

In Python, *everything* is an object. You may have encountered object-oriented programming in past experiences with other programming languages. The short introduction to an object is this: objects have **attributes** and **methods**. An **attribute** is some defined property of the object, and a **method** is a function specific to the object.

Let's remove some of the mystery around objects. Our first example of an object is the `str` data type. We can access the 'uppercase-ness' of a string by typing `string_name.isupper()`.

In [2]:
'string'.isupper()

False

Not surprisingly, the interpreter says that `'string'` is not uppercase. We called the `isupper()` method by using dot notation `.`.

Let's call two other useful methods, `lower()` and `upper()`.

In [31]:
print('string'.upper())
print('STRING'.lower())

STRING
string


When comparing strings, it is often most helpful to change the strings to either uppercase or lowercase due to the confusing values given to characters.

## Conditional Control

Here is where things get really interesting. What we did above with boolean statements and compound statements is necessary to understand the language of Python. However, they don't tell us much about *programming*. That is to say, they don't help us put statements and expressions together to accomplish various tasks. This is where we introduce conditional control, which is a way for a program to use boolean statements to decide what to do next.

These control statements are present in every programming language. Here is an example of how Python handles them.

In [32]:
msg = 'This is my message.'
decision_value = 5

# Here is the control statement.
if decision_value > 4:
    print(msg)

This is my message.


In [33]:
msg = 'This is my message.'
decision_value = 3

# Here is the control statement.
if decision_value > 4:
    print(msg)

In a conditional `if` statement, the line containing the keyword `if` always must end with a colon `:`. Note that it makes no difference if we place the entire conditional statement on the same line.

In [34]:
if decision_value > 4: print(msg)

This is, however, bad practice. Long blocks of code with conditional statements written in this way are difficult to read and maintain. For example, if you encountered the following code, you might not immediately see what the `if` statement is really doing.

In [35]:
msg = 'Fatal error: formatting hard drive now...'
msg1 = 'Program executed successfully.'
decision = 3

if decision > 4: print(msg1)
print(msg)

Fatal error: formatting hard drive now...


Earlier we discussed comparing numerical values with the relational operators `>`, `<`, `>=`, `<=`, `==`, and `!=`. These operators are combined with a conditional `if` statement to direct Python programs, as in the following example.

In [36]:
votes_to_win = 50
votes = 45

if 0 < votes and votes < votes_to_win:
    print('Not enough votes to win.')

Not enough votes to win.


The previous example is purely illustrational. Python allows relational operators to be chained arbitrarily. This greatly simplifies the `if` statement in the previous example.

In [37]:
votes_to_win = 50
votes = 45

if 0 < votes < votes_to_win:
    print('Not enough votes to win.')

Not enough votes to win.


It is also best practice to avoid comparing a variable directly to a boolean value. By default, most values in Python are considered `True`. 

Values considered `False` by default include `False`, `None`, `0`, and `0.`, among others that we will discuss next week. 

Not comparing variables directly to boolean values avoids problems brought on by Python's dynamic typing. Recall that we may declare an `int` variable and then change it arbitrarily to the `str` type. This can result in confusion, as shown in the next example.

In [38]:
var = 0

if var == False:
    print('This is confusing.')
    
var = '0'

if var == False:
    print('This is also confusing.')

This is confusing.


The confusion arises because it may be that we wanted `var` to be `False`, but changing it to a `str` made its boolean value `True`. 

Therefore, to check a boolean value, instead of writing `if var == False:`, we write `if not var:`.

Similarly, instead of writing `if var == True:`, we write `if var:`.

Also, when using the `==` operator, Python simply checks if the two variables have the same *value*. Using `is` instead of `==` results in Python checking if the two variables are *the same object*. 

In [39]:
var = 0

if not var:
    print('This is less confusing.')
    
var = 1

if var:
    print('Ahh, much better.')

if var is 1:
    print('And less characters in the code makes it easier to read.')

This is less confusing.
Ahh, much better.
And less characters in the code makes it easier to read.


  if var is 1:


### *Further examples*

In [40]:
str(0) == '0'

True

In [41]:
str(0) is '0'

  str(0) is '0'


False

In [42]:
'0' is '0'

  '0' is '0'


True

In [43]:
print(hex(id(str(0))))
print(hex(id('0')))

0x269bdc5d230
0x269b96d41f0


In [44]:
str('0') is '0'

  str('0') is '0'


True

In [45]:
print(hex(id(str('0'))))
print(hex(id('0')))

0x269b96d41f0
0x269b96d41f0


For writing alternate decisions in Python, we use the `elif` (else if) and `else` keywords. Here is an example.

In [4]:
user_var = input('What is the acronym for steam assisted gravity drainage?').lower()

if user_var == 'SAGD':
    print('Excellent!')
elif user_var == 'sad':
    print('Nope, thats not it')
else:
    print('The answer is SAGD')

What is the acronym for steam assisted gravity drainage?SAGD
The answer is SAGD


## Computing through Repetition: Iteration

It is almost always necessary to repeat tasks when writing code to do any programming task. For example, a music player shouldn't just play a single song and then shut down. We accomplish the repetition of programming tasks through *iteration*. The most basic methods for iteration are the `while` and `for` loops.

### `while` Loops

A `while` loop is commonly used when it is unknown when a given task should terminate. For example, in numerical analysis, a branch of applied mathematics, a given procedure will terminate when a specific error estimate is below a given threshold. 

Here's a simple example of a `while` loop using [Stochcheck's approximation of $\pi$](http://mathworld.wolfram.com/PiApproximations.html). This loop continues iterating until the difference between the variable `pi_estimate` and Python's computation of $\pi$ is within $10^{-3}$. In other words, until $|\text{pi_estimate} - \pi| \leq 10^{-3}$.

In [5]:
import math

# Stoscheck's approximation of pi.
power = 0
DENOMINATOR = 163
TOLERANCE = 10e-3

pi_estimate = 2**power/DENOMINATOR

while abs(pi_estimate - math.pi) > TOLERANCE:
    power = power + 1
    # You can also write power += 1
    pi_estimate = 2**power/DENOMINATOR

print("Stoscheck's approximation to pi is {}.".format(pi_estimate))
print("The value of pi given by Python is {}.".format(math.pi))
print("The procedure repeated {} times.".format(power))

Stoscheck's approximation to pi is 3.1411042944785277.
The value of pi given by Python is 3.141592653589793.
The procedure repeated 9 times.


### A Caution about `while` Loops

Python, and computers in general, lack the decision-making power to do anything but what we tell them. Therefore, when you write a `while` loop, you need to ensure that there is some condition included so that the loop will eventually stop. If you don't do this, you will create an **infinite loop**. Valuable memory resources will be taken up by this infinite (non-terminating) loop and your program won't continue.

To avoid this in `while` loops, you can declare a 'counter' variable that keeps track of the current iteration of the loop. You can include an `if` clause containing a `break` statement so that the `while` loop terminates when a specified iteration is reached.

In [None]:
a = 5
counter = 0

while a > 4:
    print('This is an infinite loop. I hope it ends soon...')
    counter += 1
    if counter > 5:
        break

In [None]:
a = 5
counter = 0

while a > 4:
    print('This is an infinite loop. I hope it ends soon...')
    counter += 1

## `for` Loops

This style of loop is more commonly used than a `while` loop in most applications. The upper limit for iteration is set before the `for` loop begins. Use a `for` loop when you know how many times a given procedure must be completed.

### The `range()` Variable
The range variable is like an alias to a loop. 


In [8]:
print(*range(0,10))

0 1 2 3 4 5 6 7 8 9


The purpose of the `for` loop in the previous cell is to compute some procedure 10 times. The same `for` loop is written in Python using the `range(<int>)` function. The programmer specifies the upper limit for iterations (or terminating index) as the argument to the `range()` function. Let's see the Python `for` loop.

In [10]:
for index in range(10):
    # compute some task here
    print(index)

0
1
2
3
4
5
6
7
8
9


Just to make it more clear what's going on, let's print the iteration index at each iteration.

In [12]:
for index in range(10):
    print(index)

0
1
2
3
4
5
6
7
8
9


You can also use a custom-defined `range()`.

In [11]:
for i in range(2,12):
    print(i)

2
3
4
5
6
7
8
9
10
11


The general format for the `range()` function is `range(start,end,increment)`. The default values are `start = 0`, `end = <user-defined integer>`, and `increment = 1`. Any non-integer upper range limit causes an error.

In [None]:
for i in range(10.1):
    print(i)

You'll notice that the upper limit of the `range()` is never reached. This is because Python is a **zero-indexed** language. This means that the `range()` variable starts at 0 by default. **Zero-indexing** will take on more meaning next week when we talk about lists and dictionaries. For now, it means that any iteration index will start at 0 by default. So, instead of a loop starting at the 'first' iteration, the loop starts at the 'zero-th' iteration.

Therefore, when we call `range(10)`, we're saying that we want 10 iterations **starting at 0**. This naturally gives the iteration indices 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.

Perhaps a more useful example of using the `for` loop with a conditional expression is given in the next cell. This short program uses the `random` module that comes packaged with Python. The program begins by generating a random integer between 1 and 10. Iterating through the `range()`, the program stops when the iteration index is the same number as the randomly generated integer.

In [13]:
import random

# Generate the random integer.
random_int = random.randint(1,10)

# Loop until we find the random integer.
for i in range(11):
    print(i)
    if i == random_int:
        print('I found the random integer! It was {}.'.format(i))
        break

0
1
2
3
4
5
6
7
8
9
10
I found the random integer! It was 10.


This short program illustrates a few new concepts. First, we imported the `random` module, which will be very useful for your first assignment.

Second, we used the `break` statement. This statement exits the `for` loop when it is called. In our example, the `break` statement is called when the iteration index matches the value of the random integer.

Similar to the `break` statement is the `continue` statement. However, instead of exiting the `for` loop, the `continue` statement 'cancels' the current iteration and moves on, or 'continues' to the next iteration. An example of this is in the next cell.

In [14]:
for i in range(10):
    if i == 5:
        continue
    else:
        print(i)

0
1
2
3
4
6
7
8
9


You can see in the output how the 5th iteration was skipped.

### String Methods

- The `str` type is one example of an object in Python.
- String methods can be accessed using dot notation. Some useful methods for the `str` object are:

|Method|Use|Example|
|---|---|---|
|`isupper()`|Returns `True` if the string is uppercase.|`'string'.isupper()` will return `False`|
|`islower()`|Return `True` if the string is lowercase.|`'string.islower()` will return `True`|
|`upper()`|Converts a string to uppercase.|`'string'.upper()` becomes `'STRING'`|
|`lower()`|Converts a string to lowercase.|`'HELLO'.lower()` becomes `'hello'`|
|`format()`|Replaces `{}` within the string to the argument of `format()`|`'Hello {}'.format('world!')` gives `'Hello world!'`|

## Lists
Lists are declared using square brackets: []
Lists can have contain strings, integers, floats, and logicals all in one list. 


In [None]:
this_list = ['lists', 22, 3.14, True]
print(this_list)

Adding entries to a list is quite easy. See below:

In [None]:
this_list.append('gamma')
print(this_list)

this_list.append(['porosity','permeability','water_saturation'])
print(this_list)

Lists of lists are essential concepts and are used frequently in Python.

In [None]:
another_list = [['resistivity','gamma'],[1.65,2.67]]
another_list.append(['sand','shale'])
print(another_list)

Indexing lists

In [None]:
print(this_list[1])
print(another_list[0][1])
print(another_list[0])

Strings are lists too, and can be indexed in the same way as a list. 

In [None]:
this_string = 'Strings are lists too'

print(this_string[0:7])
print(this_string[8:])

## Dictionaries
Dictionaries are a very useful data structure and are relied on in many Python packages for defining inputs. 

Dictionaries are declared using curly brackets: {}

and the basic structure is: {'UWI':'16-09-010-09W1', 'proppant_placed': 120, 'stages':32}

A very popular data type is a json and a geojson. These are just dictionaries accornding to Python.

In [16]:
well_dict =  {'UWI':'16-09-010-09W1', 'proppant_placed': 120, 'stages':32}
print(well_dict)

{'UWI': '16-09-010-09W1', 'proppant_placed': 120, 'stages': 32}


We can print just the keys of the dict

In [17]:
print(well_dict.keys())

dict_keys(['UWI', 'proppant_placed', 'stages'])


Nested Dictionaries, json, and geojson

In [21]:
nested_dict = { '16-09-010-09W1': {'proppant_placed': 120, 'stages':32},
                '16-09-012-09W1': {'proppant_placed': 88, 'stages':24}}
print(nested_dict.keys())


dict_keys(['16-09-010-09W1', '16-09-012-09W1'])


Indexing dictionaries works very similarly to lists

In [31]:
print(nested_dict['16-09-010-09W1']['proppant_placed'])

120


We can add elements to the dict using this syntax.

In [33]:
nested_dict['16-09-013-09W1'] = {}

nested_dict['16-09-013-09W1']['proppant_placed'] = 92
nested_dict['16-09-013-09W1']['stages'] = 29

print(nested_dict)

{'16-09-010-09W1': {'proppant_placed': 120, 'stages': 32}, '16-09-012-09W1': {'proppant_placed': 88, 'stages': 24}, '16-09-013-09W1': {'proppant_placed': 92, 'stages': 29}}


## Tuples
A tuple is a list, but unlike a list, once a tuple is created it cant be modified. 

In [35]:
# Empty tuple
facies_tuple = ()
print(facies_tuple)

# Tuple having integers
facies_tuple = (1, 2, 3)
print(facies_tuple)

()
(1, 2, 3)


# Data Types
We will be working with a variety of data types and the ML and AI packages we will be working with have specific requirements for data type. Also, some packages modify the output from the input data type to another. Its wise to check what type of data types we are working with.

The method is called type()

In [37]:
print(type(facies_tuple))
print(type(well_dict))

<class 'tuple'>
<class 'dict'>


Its wise to name objects with the type of data associated. It just makes keeping track of data 

# When Things Go Wrong

The risk with process loops is that it is possible to make a loop that runs continuously without stopping and we need a way to stop it. Lets investigate how to do that. 

## Python Kernels

A python Kernel is the computational engine that runs the code in a notebook or a script. Every evaluated cell in the notebook adds onto the kernel, so previous evaluated cells provide the computational footing for subsequent cells in a waterfall fashion. 

To stop a chunk of code that has a loop with no end the kernel is interupted. This collapses the computational engine and all of the data held in memory collapses and is flushed. To continue working the kernel must be rebuilt or restarted.

At the top of a Jupyter notebook there is a drop down menu titled Kernel. The menu provides the ability to to stop, pause, and restart the kernel.

# Summary

### Comparisons, Logical Operators, and Conditional Control

- In Python, the comparison operators are:

|Operator|Name|Way it works|
|---|---|---|
|`a > b`|Greater than|Returns `True` if `a` has value greater than `b`|
|`a < b`|Less than|Returns `True` if `a` has value less than `b`|
|`a >= b`|Greater than or equal to|Returns `True` if `a` has value greater than or equal to `b`|
|`a <= b`|Less than or equal to|Returns `True` if `a` has value less than or equal to `b`|
|`a == b`|Equals|Returns `True` if `a` and `b` have the same value|
|`a != b`|Not equal|Returns `True` if `a` and `b` have different value|
|`a is b`|`is`|Returns `True` if `a` and `b` occupy the same space in memory; `a` and `b` are the same object|
|`a is not b`|`is not`|Returns `True` if `a` and `b` occupy distinct spaces in memory|

- We should not use `==` or `!=` to compare floats.
- We can chain together boolean expressions using the `and` and `or` keywords.
- Conditional expressions are written using the following syntax:

```
if <condition 1>:
    # <do this if condition 1 is met>
elif <condition 2>:
    # <do this if condition 2 is met>
else:
    # <do this>
```

### Iteration

- Use `while` loops when the program requires an unknown number of iterations.
- Use `for` loops when the program requires a known number of iterations.
- The `range()` variable sets a number of iterations for a `for` loop. By default, `range(<number>)` gives iterations from `0` to `<number> - 1`.
- The syntax for a `while` loop is as follows:

```
while <condition>:
    # <do this as long as the condition is met>
```

- The syntax for a `for` loop is as follows. **Note:** *Any* variable name may be used for the index variable. Using `i` or `index` is just a convention:

```
for index in range(start = 0, end = 10, increment = 1):
    # <do this for every index in the above range>
```

- The `break` command exits a `while` or `for` loop.
- The `continue` command cancels the current iteration of a loop.

### Lists
- Use [] around anything to turn it into a list
- Use [ : ] for indexing
- Use append to add lists 
- Use pop to drop elements

### Dicts
- Use {} and key:element to define a dict
- Use the dict name and a [] ot index

### Tuples
- Use () for tuples, and tuples can be string or numeric
- Tuples cant be modified once struck, but can be over wrtitten

### Data Types
- Use the type method to check data types
- Name objects according to the data type it should be for easy reference

# Questions