## Chapter 4: Control flow

Two fundamental concepts form the backbone of computer programs: decision making (yes or no) and repetition. By default, Python executes code sequentially—one line at a time.However, real-world problems often require more flexible control over that flow. Sometimes, we need to check a condition and, depending on the result, execute certain parts of the code either once or multiple times. To make this possible, Python provides tools for conditional and iterative execution.

## Conditional execution

For conditional execution, Python provides the `if-else` statement. The syntax
of this statement is as follows:

```python
if expression:
    statement_1 
else:
    statement_2 
```

Python evaluates the expression inside the `if` statement. If the condition is true, then `statement_1` is executed; otherwise, `statement_2` runs. Notice that both `if` and `else` statements end with a colon (`:`). This signals the start of a block of code. The code inside these blocks must be indented, typically by four spaces. In Python indentation isn’t just for readability—it’s how the language defines which statements belong to which block. 

Here are few examples:

In [None]:
# import libraries required for the notebook
import random # import random library
import requests # import requests (for HTTP requests)
from io import StringIO # import StringIO (for reading the data)
import numpy as np # import numpy as np
import pandas as pd # import pandas as pd

In [None]:
# convert from Celsius to Fahrenheit and vice versa

# define temperature unit and value
t_type = "C"
t_val = 23.0

# do the conversion
if t_type == "C":
    tc_type = "F"
    tc_val = 9 / 5 * t_val + 32
else:
    tc_type = "C"
    tc_val = 5 / 9 * (t_val - 32)

print(f"{t_val:.2f} {t_type} = {tc_val:.2f} {tc_type}")

In [None]:
# the else statement is sometimes not needed

my_vector = np.array([1, 2, 3])

# if my_vector is not a unit vector make it a unit vector
if np.linalg.norm(my_vector) != 1.0:
    my_vector = my_vector / np.linalg.norm(my_vector)
    
print(f"vector magnitude = {np.linalg.norm(my_vector):.2f}")

In [None]:
# it is possible to include more than one
# condition using the elif statement

azimuth = 232 # angle between 0 and 360 deg

if azimuth == 0 or azimuth == 360:
    direction = "N"
elif azimuth > 0 and azimuth < 90:
    direction = "NE"
elif azimuth == 90:
    direction = "E"
elif azimuth > 90 and azimuth < 180:
    direction = "SE"
elif azimuth == 180:
    direction = "S"
elif azimuth > 180 and azimuth < 270:
    direction = "SW"
elif azimuth == 270:
    direction = "W"
else:
    direction = "NW"

print(f"Azimuth {azimuth} = {direction} direction")

In the cell above, `and` and `or` are logical operators. They are used to combine conditional statements. Another logical operator is `not`, which negates a Boolean value-returning `False` if the condition is `True`, and vice versa. You can learn more about logical operators [here](https://www.w3schools.com/python/gloss_python_logical_operators.asp).

Let’s move on to something more interesting—reading monthly production data for fields on the NCS directly from Factpages. The code below handles this task by first constructing the appropriate URL. It then uses Python’s [requests](https://pypi.org/project/requests/) library to fetch the data. If the request is successful, the data is read into a DataFrame; otherwise, an error message is displayed.

In [None]:
# read data from NOD factpages

# construct the URL
u_1 = "https://factpages.sodir.no/public?/Factpages/external/tableview/"
u_2 = "&rs:Command=Render&rc:Toolbar=false&rc:Parameters=f"
u_3 = "&IpAddress=not_used&CultureCode=en&rs:Format=CSV&Top100=false"
descriptor = "field_production_monthly"
url = u_1 + descriptor + u_2 + u_3

# request the data
response = requests.get(url)
# if the request was successful
if response.status_code == 200:
    # load csv data into a DataFrame. StringIO wraps the string
    # so it can be read as a file
    df = pd.read_csv(StringIO(response.text))
    # output DataFrame info
    df.info()
else:
    print(f"Error: {response.status_code}")

## Iterative execution

Python offers two types of loops for iterative execution: `while` loops and `for` loops. These are used when you need to run a block of code multiple times.

## While loops

The syntax of the `while` loop is as follows:

```python
initialize_variable

while expression:
    statement 
    modify variable 
```
We start by initializing a variable. Then, Python evaluates an expression that involves this variable. If the expression is true, the loop’s body is executed; if not, the program exits the loop. The loop continues to run as long as the expression remains true. It is important to update the variable inside the loop—otherwise, the condition may never become false, resulting in an infinite loop. 

These are some examples:

In [None]:
# sum numbers from 0 to 99_999 that are multiple of 3 or 5

number = 0 # number
sum = 0 # sum
while number < 100_000: 
    # if number is multiple of 3 or 5
    if number % 3 == 0 or number % 5 == 0: 
        sum += number # add number to sum
    number += 1 # increment number

print(f"Sum of numbers = {sum}")

In [None]:
# calculate pressure in a column of
# 200 m water and 300 m rock

g = 9.81 # gravity in m/s^2
rho_w = 1000 # water density in kg/m^3
d_w = 200 # water depth in m
p_w = rho_w * g * d_w # pressure at sea bottom in Pa
rho_r = 2700 # rock density in kg/m^3

d = 0.0 # initial depth in m

# table heading: <10 means left aligned with 10 characters
print(f"{"Depth[m]":<10} Pressure[kPa]")
print("-" * 25)

while d <= 500.0: # while depth is < 500 m
    if d <= d_w: # if in water
        p = rho_w * g * d
    else: # else if in rock
        p = p_w + rho_r * g * (d - d_w)
    # print depth and pressure in kPa
    print(f"{d:<10.1f} {p*1e-3:.1f}") 
    # update depth
    d += 50.0 

The `while` loop is especially useful in situations where the number of iterations isn’t known in advance. For example:

In [None]:
# simulate rolling two dice until double sixes are rolled

attempts = 0
die1 = die2 = 1

while die1 != 6 or die2 != 6:
    die1 = random.randint(1, 6)
    die2 = random.randint(1, 6)
    attempts += 1
    
print(f"It took {attempts} roll(s) to get double sixes!")

We don’t know how many attempts it will take to roll double sixes — it’s entirely random. This example also demonstrates the use of Python’s built-in [random](https://docs.python.org/3/library/random.html) module.

## For loops

The for loop has a simpler syntax:

```python
for item in sequence:
    statement  
```

The syntax comprises an `item` and a `sequence`. The sequence can be any collection of data. During the execution of the loop, the first element of the sequence is assigned to `item` and the statement(s) of the loop body are executed, then the next element is assigned to `item` and the statement(s) are again executed, and so on until all elements of the sequence are exhausted. 

Let’s try the examples above, but this time using a `for` loop:

In [None]:
# sum numbers from 0 to 99_999 that are multiple of 3 or 5

sum = 0 # sum

for number in np.arange(100_000): # for number 0 to 99_999
    # if number is multiple of 3 or 5
    if number % 3 == 0 or number % 5 == 0: 
        sum += number # add number to sum

print(f"Sum of numbers = {sum}")

In [None]:
# calculate pressure in a column of
# 200 m water and 300 m rock

# input variables as before

# table heading
print(f"{"Depth[m]":<10} Pressure[kPa]")
print("-" * 25)

# depths from 0 to 500 m
ds = np.arange(0, 501, 50) 
#ds = np.linspace(0, 500, 11) # another option

for d in ds: 
    if d <= d_w: # if in water
        p = rho_w * g * d
    else: # else if in rock
        p = p_w + rho_r * g * (d - d_w)
    # print depth and pressure in kPa
    print(f"{d:<10.1f} {p*1e-3:.1f}")  

In the two examples above, we use NumPy’s `arange()` function to create an array that starts at a given value (default is 0) and goes up to, but does not include, the end value, using a specified step size (default is 1). Alternatively, the `linspace()` function can be used to create an array from start to end, but instead of defining the step size, you specify the total number of elements. In short, `arange()` lets you control the step size, while `linspace()` lets you control the number of steps.

## zip, enumerate and break

In `for` loops, it’s often useful to iterate over more than one sequence (like lists) at the same time—this is where the built-in `zip()` function comes in handy, by pairing elements together from each iterable. Here is one example:

In [None]:
minerals = ["Quartz", "Gypsum", "Talc"]
hardness = [7, 2, 1]

for mineral, hard in zip(minerals, hardness):
    print(f"{mineral} hardness is {hard}")

It is also possible to extract the index of the iteration along with the elements by using the built-in `enumerate()` function, which makes it easy to track the position in the sequence while iterating:

In [None]:
for i, (mineral, hard) in enumerate(zip(minerals, hardness)):
    print(f"{i+1}. {mineral} hardness is {hard}")

Finally, you can use the `break` statement to stop a loop and exit it. Here’s an example:

In [None]:
number = random.randint(0, 100) # random number between 0 and 100

while True:
    # ask user for a guess, int is used to convert input to int
    guess = int(input("Guess the number: ")) 
    if guess == number: # if guess is correct
        print("You guessed it!")
        break # exit the loop
    elif guess < number: # if guess is too low
        print("Too low!")
    else: # if guess is too high
        print("Too high!")

## Vectorization

Vectorized operations can significantly improve the performance of your code. Below is a vectorized version of the depth versus pressure code above. It gives the same output:

In [None]:
# table heading
print(f"{"Depth[m]":<10} Pressure[kPa]")
print("-" * 25)

water = ds <= 200.0 # boolean array for water
rock = ds > 200.0  # boolean array for rock

# calculate pressures, this gives an array
ps = rho_w * g * ds * water + (p_w + rho_r * g * (ds - d_w)) * rock

for d, p in zip(ds, ps): # iterate over depth and pressure
    # print depth and pressure in kPa
    print(f"{d:<10.1f} {p*1e-3:.1f}")  

The boolean arrays `water` and `rock` indicate the depths at which these materials are present. Pressures (`ps`) are calculated in a single line using vectorized operations. In this calculation, the `water` and `rock` arrays are automatically converted by Python: `True` values become 1 and `False` values become 0. Finally, the `for` loop outputs the depths and corresponding pressures. The `zip()` function allows simultaneous iteration over the `ds` (depths) and `ps` (pressures) arrays.

While the performance gain in this example is minimal—since the code involves few operations—the advantages of vectorization become much more apparent in computation-heavy tasks. When working with large datasets or complex numerical calculations, it’s a good practice to consider vectorization. However, aim for a balance between performance and clarity to ensure your code remains efficient and easy to understand.

## List comprehensions

List comprehensions are a concise and powerful feature in Python that allow you to quickly create new lists by looping over a collection and optionally filtering its elements. The basic syntax is:

```python
[expression for value in collection if condition]
```
List comprehensions combine looping and conditionals into a single, readable
line. The conditional part is optional. Let’s revisit the same two examples:

In [None]:
# sum numbers from 0 to 99_999 that are multiple of 3 or 5

numbers = np.arange(100_000) # numbers

# find multiples of 3 and 5 using list comprehension
multiples = [number for number in numbers if number % 3 == 0 or number % 5 == 0]

sum = np.sum(multiples) # sum multiples

print(f"Sum of numbers = {sum}")

In [None]:
# table heading
print(f"{"Depth[m]":<10} Pressure[kPa]")
print("-" * 25)

# calculate pressures using list comprehension
ps = [rho_w*g*d if d <= 200.0 else p_w+rho_r*g*(d-d_w) for d in ds]

for d, p in zip(ds, ps): 
    # print depth and pressure in kPa
    print(f"{d:<10.1f} {p*1e-3:.1f}")  

This is pretty slick but perhaps not as clear as using a loop to calculate pressures.

## Exercise 1

The file [xeek_train_subset.csv](../data/xeek_train_subset.csv) contains well log data including gamma ray (`GR`) values and stratigraphic information.

a. Extract the data for well 16/10-1 into a Pandas DataFrame. 

b. From the gamma ray log (column `GR`), compute a new column called `VSH` (Volume of Shale) using different equations depending on the age of the rocks. For rocks older than the Tertiary, use the following equation:

$$
\text { VSH }=\mathrm{IGR}=(\mathrm{GR}-\mathrm{GR} \min ) /(\mathrm{GR} \max -\mathrm{GR} \min )
$$

where GR is the gamma ray value for the sample, GR min is the minimum GR (0), and GR max is the maximum GR (200). 

For Tertiary and younger rocks, specifically the Nordland, Hordaland and Rogaland groups, use:

$$
\text { VSH }=0.083 \text { * }(2^{(3.7 * \text { IGR })}-1)
$$

Hints:

- You should first compute `IGR` using the formula for all rows.

- Then apply the appropriate VSH equation depending on the group name (in the `GROUP` column).

- Add the resulting `VSH` values as a new column in the DataFrame.

In [None]:
# Do Exercise 1 here

## Exercise 2

You are provided with two datasets:

- [group_colors.csv](../data/group_colors.csv): Contains color information for each geological group. Each color is represented as a list of three integers corresponding to red, green, and blue (RGB) values, ranging from 0 to 255.

- [xeek_train_subset.csv](../data/xeek_train_subset.csv): Contains log data for multiple wells, including a column named GROUP indicating the group for each record.

For the well 16/10-1, create a dictionary where:

- Each key is the name of a group found in that well.

- Each value is a list representing the RGB color of the group, normalized so that each component is between 0 and 1 (by dividing by 255), and rounded to two decimal places.

In [None]:
# Do exercise 2 here

## Exercise 3

This exercise builds on Exercise 2 and continues working with the well 16/10-1 from the dataset [xeek_train_subset.csv](../data/xeek_train_subset.csv).

a. Create a dictionary that maps each geological group present in the well to its top depth defined as the shallowest `DEPTH_MD` value where that group appears.

Hints:

- Filter the data to include only rows for well 16/10-1.

- For each unique group in the `GROUP` column, find the minimum value in the `DEPTH_MD` column where that group occurs.

- Construct a dictionary where the keys are the group names, and the values are the corresponding top depths.

In [None]:
# Do Exercise 3a here

b. The above procedure will not work if the interval of interest is repeated in the well. For groups this does not seem to be a problem, but for facies (column `FORCE_2020_LITHOFACIES_LITHOLOGY`) it is. Create a dictionary that maps each facies in the well to its top depth.

Hints:

- Find the top depth of each facies. This is the first row where a new facies appears.

- Build a dictionary mapping facies to their top depths. If a facies appears more than once, append the row index to the key (e.g. "65000-3").

In [None]:
# Do Exercise 3b here