# Python fundamentals

In this notebook:

- Collections.
- Iteration.
- A very quick introduction to NumPy.

## Collections of things

There are several data structures for collections of things (i.e. of other data structures). Let's explore lists!

In [None]:
gr = [32, 39, 46, 44, 49, 34, 31, 32, 33]
gr

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">

<h3>Exercise</h3>

- Split this string into a list called `corps`: `'Equinor, AkerBP, Shell, BP, Petoro'`. <a title="Use the s.split() method">Hover for hint.</a>
- Change the second element to `'Aker BP'` (ie with a space).
- Use `sorted()` to sort the list. Can you sort it backwards? Stretch goal: can you use the `key` argument to sort by length?
- Copy (careful!) the list to a new name, `orgs`.
- Add the following organizations to the new list: `'NOD'`, `'Havtil'`. Make sure they're not in the old list.

</div>

In [None]:
s = 'Equinor, AkerBP, Shell, BP, Petoro'

# YOUR CODE HERE
# Hint: the string has methods, type s.<tab> to see them.



### Strings are collections too!

Remember strings, like `"Hello World!"`? Turns out they are collections of characters...

### A note about names, memory, and mutability

We have seen:

- Creating a list
- Lists can be heterogeneous and nested
- Lists can be empty
- Lists have length and order (indexing, slicing)
- Lists are mutable, so I can change an element, append, and pop
- Names are pointers, so sometimes I need copies of things

---

## Iteration: `for` (each) ... `in` ...

We often have some collection — lines from a file, or numbers, or lists of numbers — and want to do some operation on each thing in turn. For this, we need **iteration**. In Python, this is effected by the **`for` loop** (strictly, the **`for`–`else` loop** but we won't worry about that in this class, [read more](https://docs.python.org/3/reference/compound_stmts.html#for)).

- Basic pattern on a list
- List comprehension — with strings
- `continue` and `break`

In [None]:
# Recall we have our list of organizations from before:
orgs

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">

<h3>Exercise</h3>

Rearrange the following lines to loop over a list of files and gather the second part of the file names — the months — into a new list, then print the new list.

When the code runs, it should produce:

    ['Jan', 'Mar', 'Jun', 'Jun']
</div>

In [None]:
print(months)
files = ['MH_Jan_18.png', 'MH_Mar_18.png', 'MH_Jun_18.png', 'MH_Jun_17.png']
months.append(month)
month = file.split('_')[1]
for file in files:
months = []

### Making decisions in the loop

We have some porosities:

In [None]:
porosities = [0.08, 0.23, 0.11, 0.00, 0.10,  -0.03, -999.25, 17.5, 0.30, 1.50]

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">

<h3>Exercise</h3>

Turn this loop into a function.
</div>

In [None]:
def clean_porosities(porosities):

    # YOUR CODE HERE
    # Don't forget the docstring!

    return cleaned

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">

<h3>Exercise</h3>

Print `'sand'` or `'shale'` for the values in `regular` using a cutoff of 10% porosity.
</div>

In [None]:
# YOUR CODE HERE



### Turing complete

With data structures, code expressions, conditionals and loops, your knowledge of Python is now ["Turing complete"](https://en.wikipedia.org/wiki/Turing_completeness) and you can, in principle, perform _any_ computing task 💥 Use your power wisely :)

---

## A quick look at NumPy

It turns out that we want to do maths on collections of numbers so often that boffins have made a special library for it called NumPy. This library gives you (a) a special data structure called "arrays" for n-dimensional collections of numbers and (b) a lot of functions for doing mathy things to those arrays, from statistics to Fourier transforms. All this is implemented in very efficient Fortran somewhere deep inside your computer.

TL;DR — NumPy is awesome for scientists and engineers!

In [None]:
import numpy as np

arr = np.array(regular)
arr

In [None]:
import requests
from pathlib import Path
import io

def download_array(url):
    """
    Download, save, and np.loadtxt a text file.
    """
    r = requests.get(url)
    fname = Path('temp.txt')
    fname.write_text(r.text)
    arr = np.loadtxt(fname)
    fname.unlink()
    return arr

before = download_array("https://raw.githubusercontent.com/scienxlab/datasets/refs/heads/main/usgs/st-helens-before.txt")
before

In [None]:
after = download_array("https://raw.githubusercontent.com/scienxlab/datasets/refs/heads/main/usgs/st-helens-after.txt")
after

---

## Other collections

Besides strings, lists and arrays, there are several other very common collection types:

- `tuple` — similar to a list (eg ordered, integer indexed), but immutable (a bit safer but can be a bit less convenient).
- `dict` — a mapping from a **key** to a **value**, we'll take a quick look. Ordered, mutable... but keys must be immutable..
- `set` — a mutable, unordered collection of **unique** items. (The `frozenset` is immutable.)

We can make the following summary table for Python's collection structures:

| Type           | Element Types         | Heterogeneous? | Ordered?  | Mutable?  | Indexing style      | Other features |
|---------------|----------------------|--------------|----------|---------|------------------|----------------|
| `str`        | Characters (text)     | No           | Yes      | No      | Integer indices | Supports slicing, iteration, and regex operations |
| `list`       | Any                   | Yes          | Yes      | Yes     | Integer indices | Dynamic resizing, allows nesting |
| `tuple`      | Any                   | Yes          | Yes      | No      | Integer indices | Hashable if elements are hashable |
| `dict`       | Keys → Any, Values → Any | Yes (values) | Yes (Py 3.7+) | Yes (keys immutable) | Key-based       | Fast lookups, keys must be unique and hashable |
| `set`        | Any (hashable)        | Yes          | No       | Yes     | No direct indexing | Unique elements, supports set operations |
| `frozenset`  | Any (hashable)        | Yes          | No       | No      | No direct indexing | Hashable, unique elements, supports set operations |
| `np.array()` | Numeric (default)     | No (by default) | Yes  | Yes (in-place ops) | Integer indices | Vectorized operations, fixed dtype, efficient computation |
| `pd.Series`  | Any (supports dtype)  | Yes          | Yes      | Yes     | Label-based (default) & integer | Indexed data structure, missing values handling, supports operations like `apply()` |

Note that there are many other collections, such as `collections.deque`, `collections.namedtuple`, `collections.defaultdict`, `collections.OrderedDict`, `xarray.DataArray` and `xarray.Dataset`.


## Dictionaries

Dictionaries, or dicts, are everywhere in Python. However, they can take some getting used to.

If your dataset does not necessarily have a natural order, or if it could be more convenient to refer to a record by some name rather than merely its position, then you might want to use a dictionary.

For example, maybe I have measured the heights of my children over time:

In [None]:
heights = {
    'ann': [120, 124, 128],
    'bob': [132, 142, 149],
    'carol': [156, 164, 168],
}

The 'items' of the dictionary are "key, value pairs".

Even though dictionaries are ordered (since Python 3.6), I cannot get at the items by _position_:

In [None]:
heights[2]  # There is no key 2.

Instead, I have to use the keys, which in this case are strings:

In [None]:
heights.keys()

In [None]:
heights['carol']

I do sometimes want to iterate over the key, value pairs and there is a nice way to do this:

In [None]:
for key, value in heights.items():
    print(f"Child {key.title()} has heights: {', '.join(str(v) for v in value)}")

### Another `dict` example

For another example, suppose we have some measurements from different depths in a well:

In [None]:
print([100 + 0.1524 * n for n in range(10)])

In [None]:
depth = [100.0, 100.1524, 100.3048, 100.4572, 100.6096, 100.762, 100.9144, 101.0668, 101.2192, 101.3716]
gr = [98, 102, 97, 76, 26, 32, 37, 38, 34, 54]

plt.plot(gr, depth)

Getting the reading at 100.762 m is not very convenient:

In [None]:
idx = depth.index(100.762)
gr[idx]

If I store the data in a `dict`, I can look up the value directly:

In [None]:
gr_dict = {k:v for k, v in zip(depth, gr)}
gr_dict

In [None]:
gr_dict[100.762]

Better yet, I can use a Pandas Series...

In [None]:
import pandas as pd

gr_series = pd.Series(gr, index=depth)
gr_series

In [None]:
gr_series[100.762]

In [None]:
gr_series.plot()

And even better yet, I can use an `xarray.DataArray`... but now we are definitely getting ahead of ourselves.

<hr />
<small>
<p>Copyright 2025 Matt Hall (Equinor)</p>

<p>Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:</p>

<p>The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.</p>

<p>THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.</p>
</small>