# Better code: Write more comprehensive Python code

## Part of the Python Best Practices course, https://ssciwr.github.io/Python-best-practices-course/

### February 2026, I. S. Ulusoy 

## Encapsulate module content

Why you should run code in your module only if `__name__` is `"__main__"`:

Always encapsulate your module's content in functions/classes and have all function calls and instantiations contained after the statement
```
if __name__ == "__main__"
```

**Why?** Because, if you import your code as a module, all code that is not contained in a function or class will be run. Now, if you directly run the module,`__name__` will be set to `"__main__"`. But if the module is imported, `__name__` is set to the module's name. See the examples [module1.py](https://github.com/ssciwr-courses/pbp-better-code/blob/main/chapter5/module1.py) and [module2.py](https://github.com/ssciwr-courses/pbp-better-code/blob/main/chapter5/module2.py) in the assignment repo folder `chapter5`.

```
print("This will always be printed.")

def print_name():
    print("Module one is {}".format(__name__))

if __name__ == "__main__":
    print("Runs as main")
    print_name()
else:
    print("Runs as import")
    print_name()
``` 


In [1]:
! python module1.py

This will always be printed.
Runs as main
Module one is __main__


```
import module1

print("Running module 2 as {}".format(__name__))

module1.print_name()
``` 

In [2]:
! python module2.py

This will always be printed.
Runs as import
Module one is module1
Running module 2 as __main__
Module one is module1


## Ternary conditionals

Use ternary conditionals to simplify `if ... else` statements. 

In [3]:
x = 10

if x >= 10:
    y = 1
else:
    y = 0

print(f"Original version {y}")

Original version 1


In [4]:
# re-write this conditional as a ternary conditional
x = 10
y = 1 if x >= 10 else 0
print(f"Ternary version {y}")

Ternary version 1


## Context managers
 
Use context managers to not have to worry about tear-down methods, like closing a file. Using `with open`, for example, automatically closes the file after it has been read. 

In [5]:
# not using context manager
f = open("data/efield.t", "r")
numbers = f.read()
f.close()

In [6]:
# with context manager
with open("data/efield.t", "r") as f:
    numbers = f.read()

## Using enumerate

Use `enumerate` when you want to access both the list item and its index at the same time when iterating. 

In [7]:
lunch = ["Pizza", "Salad", "Pasta", "Sushi", "Sandwich"]

index = 0
for meal in lunch:
    print(index, meal)
    index += 1

0 Pizza
1 Salad
2 Pasta
3 Sushi
4 Sandwich


In [8]:
# with enumerate
for index, meal in enumerate(lunch):
    print(index, meal)

0 Pizza
1 Salad
2 Pasta
3 Sushi
4 Sandwich


## Using zip

Use `zip` if you want to iterate over two lists simultaneously. 

In [9]:
days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
lunch = ["Pizza", "Salad", "Pasta", "Sushi", "Sandwich"]

for index, day in enumerate(days):
    print("On {} we offer {} for lunch.".format(day, lunch[index]))

On Monday we offer Pizza for lunch.
On Tuesday we offer Salad for lunch.
On Wednesday we offer Pasta for lunch.
On Thursday we offer Sushi for lunch.
On Friday we offer Sandwich for lunch.


In [10]:
# also works for more than two lists

# if you use lists of different lengths, zip will stop after shortest list is exhausted

days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
lunch = ["Pizza", "Salad", "Pasta", "Sushi", "Sandwich"]

for day, option in zip(days, lunch):
    print("On {} we offer {} for lunch.".format(day, option))

On Monday we offer Pizza for lunch.
On Tuesday we offer Salad for lunch.
On Wednesday we offer Pasta for lunch.
On Thursday we offer Sushi for lunch.
On Friday we offer Sandwich for lunch.


## An example for refactoring a function

When you write code, consider that each piece of the code has a purpose. Ideally, each function and each method should serve a single purpose, that way they are easier to read, test, and reuse. Below is an example of refactoring of a function into a set of helper functions and more readable constructs, so that the code is cleaner and better to read.

Code that is easier to read is code that is easier to test, reuse and maintain and usually less prone to errors.

### 1. Original function

In [11]:
import numpy as np
import pandas as pd

In [12]:
def read_in(filedir, filename, mydatatype, myheader=False):
    if mydatatype == "numpy":
        name = f"{filedir}{filename}"
        print(f"Reading from file {name} - numpy")
        if myheader:
            data = np.loadtxt(name, skiprows=1)
        else:
            data = np.loadtxt(name, skiprows=0)
        data = data.T
    elif mydatatype == "pandas":
        name = f"{filedir}{filename}"
        print(f"Reading from file {name} - pandas")
        data = pd.read_csv(name, sep=r"\s+")
    else:
        print("Data type not found!")
        raise RuntimeError
    return data

In [13]:
result = read_in("./data/", "efield.t", "numpy", myheader=True)
print(result[0])

Reading from file ./data/efield.t - numpy
[ 0.   0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1.   1.1  1.2  1.3
  1.4  1.5  1.6  1.7  1.8  1.9  2.   2.1  2.2  2.3  2.4  2.5  2.6  2.7
  2.8  2.9  3.   3.1  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9  4.   4.1
  4.2  4.3  4.4  4.5  4.6  4.7  4.8  4.9  5.   5.1  5.2  5.3  5.4  5.5
  5.6  5.7  5.8  5.9  6.   6.1  6.2  6.3  6.4  6.5  6.6  6.7  6.8  6.9
  7.   7.1  7.2  7.3  7.4  7.5  7.6  7.7  7.8  7.9  8.   8.1  8.2  8.3
  8.4  8.5  8.6  8.7  8.8  8.9  9.   9.1  9.2  9.3  9.4  9.5  9.6  9.7
  9.8  9.9 10. ]


In [14]:
result = read_in("./data/", "npop.t", "pandas")
print(result.head(3))

Reading from file ./data/npop.t - pandas
   time       MO1       MO2       MO3      MO4       MO5       MO6       MO7  \
0   0.0  0.999974  0.999892  0.976819  0.97633  0.000708  0.005711  0.000158   
1   0.1  0.999974  0.999892  0.976819  0.97633  0.000708  0.005711  0.000158   
2   0.2  0.999974  0.999892  0.976819  0.97633  0.000708  0.005711  0.000158   

       MO8       MO9  ...      MO29      MO30      MO31      MO32      MO33  \
0  0.00092  0.000158  ...  0.000815  0.000514  0.004893  0.001388  0.000469   
1  0.00092  0.000158  ...  0.000815  0.000514  0.004893  0.001388  0.000469   
2  0.00092  0.000158  ...  0.000815  0.000514  0.004893  0.001388  0.000469   

       MO34      MO35      MO36      MO37      MO38  
0  0.000193  0.000469  0.000193  0.000401  0.000128  
1  0.000193  0.000469  0.000193  0.000401  0.000128  
2  0.000193  0.000469  0.000193  0.000401  0.000128  

[3 rows x 39 columns]


In [15]:
try:
    read_in("./data/", "npop.t", "foo")
except RuntimeError:
    print("Caught an error when trying to read in data with an invalid data type.")

Data type not found!
Caught an error when trying to read in data with an invalid data type.


### 2. First refactor

Separate two helper functions.

In [16]:
def read_in(filedir, filename, data_type, myheader=False):
    input_handler = {"numpy": read_numpy, "pandas": read_pandas}
    reading_method = input_handler.get(data_type)
    if not reading_method:
        print("Data type not found!")
        raise RuntimeError("Invalid data type")
    data = reading_method(filedir, filename, myheader)
    return data


def read_numpy(filedir, filename, myheader):
    name = "{filedir}{filename}".format(filedir=filedir, filename=filename)
    print(f"Reading from file {name} - numpy")
    if myheader:
        data = np.loadtxt(name, skiprows=1)
    else:
        data = np.loadtxt(name, skiprows=0)
    data = data.T
    return data


def read_pandas(filedir, filename, myheader):
    name = "{filedir}{filename}".format(filedir=filedir, filename=filename)
    print(f"Reading from file {name} - pandas")
    data = pd.read_csv(name, sep=r"\s+")
    return data

In [17]:
result = read_in("./data/", "efield.t", "numpy", myheader=True)
print(result[0])

Reading from file ./data/efield.t - numpy
[ 0.   0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1.   1.1  1.2  1.3
  1.4  1.5  1.6  1.7  1.8  1.9  2.   2.1  2.2  2.3  2.4  2.5  2.6  2.7
  2.8  2.9  3.   3.1  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9  4.   4.1
  4.2  4.3  4.4  4.5  4.6  4.7  4.8  4.9  5.   5.1  5.2  5.3  5.4  5.5
  5.6  5.7  5.8  5.9  6.   6.1  6.2  6.3  6.4  6.5  6.6  6.7  6.8  6.9
  7.   7.1  7.2  7.3  7.4  7.5  7.6  7.7  7.8  7.9  8.   8.1  8.2  8.3
  8.4  8.5  8.6  8.7  8.8  8.9  9.   9.1  9.2  9.3  9.4  9.5  9.6  9.7
  9.8  9.9 10. ]


In [18]:
result = read_in("./data/", "npop.t", "pandas")
print(result.head(3))

Reading from file ./data/npop.t - pandas
   time       MO1       MO2       MO3      MO4       MO5       MO6       MO7  \
0   0.0  0.999974  0.999892  0.976819  0.97633  0.000708  0.005711  0.000158   
1   0.1  0.999974  0.999892  0.976819  0.97633  0.000708  0.005711  0.000158   
2   0.2  0.999974  0.999892  0.976819  0.97633  0.000708  0.005711  0.000158   

       MO8       MO9  ...      MO29      MO30      MO31      MO32      MO33  \
0  0.00092  0.000158  ...  0.000815  0.000514  0.004893  0.001388  0.000469   
1  0.00092  0.000158  ...  0.000815  0.000514  0.004893  0.001388  0.000469   
2  0.00092  0.000158  ...  0.000815  0.000514  0.004893  0.001388  0.000469   

       MO34      MO35      MO36      MO37      MO38  
0  0.000193  0.000469  0.000193  0.000401  0.000128  
1  0.000193  0.000469  0.000193  0.000401  0.000128  
2  0.000193  0.000469  0.000193  0.000401  0.000128  

[3 rows x 39 columns]


In [19]:
try:
    read_in("./data/", "npop.t", "foo")
except RuntimeError:
    print("Caught an error when trying to read in data with an invalid data type.")

Data type not found!
Caught an error when trying to read in data with an invalid data type.


### 3. Second refactor

Set filename in main function to avoid repetition, do not pass unused data to helper functions.

In [20]:
def read_in(filedir, filename, data_type, myheader=False):
    input_handler = {"numpy": read_numpy, "pandas": read_pandas}
    reading_method = input_handler.get(data_type)
    if not reading_method:
        print("Data type not found!")
        raise RuntimeError("Invalid data type")
    name = f"{filedir}{filename}"
    data = reading_method(name, myheader)
    return data


def read_numpy(name, myheader):
    print(f"Reading from file {name} - numpy")
    if myheader:
        data = np.loadtxt(name, skiprows=1)
    else:
        data = np.loadtxt(name, skiprows=0)
    data = data.T
    return data


def read_pandas(name, *kwargs):
    print(f"Reading from file {name} - pandas")
    data = pd.read_csv(name, sep=r"\s+")
    return data

In [21]:
result = read_in("./data/", "efield.t", "numpy", myheader=True)
print(result[0])

Reading from file ./data/efield.t - numpy
[ 0.   0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1.   1.1  1.2  1.3
  1.4  1.5  1.6  1.7  1.8  1.9  2.   2.1  2.2  2.3  2.4  2.5  2.6  2.7
  2.8  2.9  3.   3.1  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9  4.   4.1
  4.2  4.3  4.4  4.5  4.6  4.7  4.8  4.9  5.   5.1  5.2  5.3  5.4  5.5
  5.6  5.7  5.8  5.9  6.   6.1  6.2  6.3  6.4  6.5  6.6  6.7  6.8  6.9
  7.   7.1  7.2  7.3  7.4  7.5  7.6  7.7  7.8  7.9  8.   8.1  8.2  8.3
  8.4  8.5  8.6  8.7  8.8  8.9  9.   9.1  9.2  9.3  9.4  9.5  9.6  9.7
  9.8  9.9 10. ]


In [22]:
result = read_in("./data/", "npop.t", "pandas")
print(result.head(3))

Reading from file ./data/npop.t - pandas
   time       MO1       MO2       MO3      MO4       MO5       MO6       MO7  \
0   0.0  0.999974  0.999892  0.976819  0.97633  0.000708  0.005711  0.000158   
1   0.1  0.999974  0.999892  0.976819  0.97633  0.000708  0.005711  0.000158   
2   0.2  0.999974  0.999892  0.976819  0.97633  0.000708  0.005711  0.000158   

       MO8       MO9  ...      MO29      MO30      MO31      MO32      MO33  \
0  0.00092  0.000158  ...  0.000815  0.000514  0.004893  0.001388  0.000469   
1  0.00092  0.000158  ...  0.000815  0.000514  0.004893  0.001388  0.000469   
2  0.00092  0.000158  ...  0.000815  0.000514  0.004893  0.001388  0.000469   

       MO34      MO35      MO36      MO37      MO38  
0  0.000193  0.000469  0.000193  0.000401  0.000128  
1  0.000193  0.000469  0.000193  0.000401  0.000128  
2  0.000193  0.000469  0.000193  0.000401  0.000128  

[3 rows x 39 columns]


In [23]:
try:
    read_in("./data/", "npop.t", "foo")
except RuntimeError:
    print("Caught an error when trying to read in data with an invalid data type.")

Data type not found!
Caught an error when trying to read in data with an invalid data type.


### Third refactor

Use variable to determine if row is skipped and avoid further duplication.

In [24]:
def read_in(filedir, filename, data_type, myheader=False):
    input_handler = {"numpy": read_numpy, "pandas": read_pandas}
    reading_method = input_handler.get(data_type)
    if not reading_method:
        print("Data type not found!")
        raise RuntimeError("Invalid data type")
    name = f"{filedir}{filename}"
    data = reading_method(name, myheader)
    return data


def read_numpy(name, myheader):
    print(f"Reading from file {name} - numpy")
    if myheader:
        skip = 1
    else:
        skip = 0
    data = np.loadtxt(name, skiprows=skip)
    data = data.T
    return data


def read_pandas(name, *kwargs):
    print(f"Reading from file {name} - pandas")
    data = pd.read_csv(name, sep=r"\s+")
    return data

In [25]:
result = read_in("./data/", "efield.t", "numpy", myheader=True)
print(result[0])

Reading from file ./data/efield.t - numpy
[ 0.   0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1.   1.1  1.2  1.3
  1.4  1.5  1.6  1.7  1.8  1.9  2.   2.1  2.2  2.3  2.4  2.5  2.6  2.7
  2.8  2.9  3.   3.1  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9  4.   4.1
  4.2  4.3  4.4  4.5  4.6  4.7  4.8  4.9  5.   5.1  5.2  5.3  5.4  5.5
  5.6  5.7  5.8  5.9  6.   6.1  6.2  6.3  6.4  6.5  6.6  6.7  6.8  6.9
  7.   7.1  7.2  7.3  7.4  7.5  7.6  7.7  7.8  7.9  8.   8.1  8.2  8.3
  8.4  8.5  8.6  8.7  8.8  8.9  9.   9.1  9.2  9.3  9.4  9.5  9.6  9.7
  9.8  9.9 10. ]


In [26]:
result = read_in("./data/", "npop.t", "pandas")
print(result.head(3))

Reading from file ./data/npop.t - pandas
   time       MO1       MO2       MO3      MO4       MO5       MO6       MO7  \
0   0.0  0.999974  0.999892  0.976819  0.97633  0.000708  0.005711  0.000158   
1   0.1  0.999974  0.999892  0.976819  0.97633  0.000708  0.005711  0.000158   
2   0.2  0.999974  0.999892  0.976819  0.97633  0.000708  0.005711  0.000158   

       MO8       MO9  ...      MO29      MO30      MO31      MO32      MO33  \
0  0.00092  0.000158  ...  0.000815  0.000514  0.004893  0.001388  0.000469   
1  0.00092  0.000158  ...  0.000815  0.000514  0.004893  0.001388  0.000469   
2  0.00092  0.000158  ...  0.000815  0.000514  0.004893  0.001388  0.000469   

       MO34      MO35      MO36      MO37      MO38  
0  0.000193  0.000469  0.000193  0.000401  0.000128  
1  0.000193  0.000469  0.000193  0.000401  0.000128  
2  0.000193  0.000469  0.000193  0.000401  0.000128  

[3 rows x 39 columns]


In [27]:
try:
    read_in("./data/", "npop.t", "foo")
except RuntimeError:
    print("Caught an error when trying to read in data with an invalid data type.")

Data type not found!
Caught an error when trying to read in data with an invalid data type.


By using helper functions and successively introducing abstractions, code can be refactored to become more modular and readable. Make sure to implement tests first to be sure that the refactor does not introduce new issues.