<a href="https://colab.research.google.com/github/noobylub/Computational-Linguistic/blob/master/Modules_and_exceptions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Download files with examples

!wget --no-clobber "https://gist.githubusercontent.com/macleginn/1bc4d83f952826fd70eae81ae3a8be10/raw/72b6cd4bd8f0741cd562146a253d55bc569f5510/dummy_counter.py"
!wget -nc "https://gist.githubusercontent.com/macleginn/c0c8aaf7c5f4f4bfb286a84f99a3d5fe/raw/bcf546228edc44d1fc29d689c5b56e93775bc5c9/dummy_counter_as_script.py"
!wget -nc "https://gist.githubusercontent.com/macleginn/c90b39d848a72fc67ad9da58e8345968/raw/63c8a4852c55ad5bc587e875a15c01d065e5a4ed/dummy_counter_w_main.py"

--2025-11-17 17:04:34--  https://gist.githubusercontent.com/macleginn/1bc4d83f952826fd70eae81ae3a8be10/raw/72b6cd4bd8f0741cd562146a253d55bc569f5510/dummy_counter.py
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 176 [text/plain]
Saving to: ‘dummy_counter.py’


2025-11-17 17:04:34 (1.88 MB/s) - ‘dummy_counter.py’ saved [176/176]

--2025-11-17 17:04:34--  https://gist.githubusercontent.com/macleginn/c0c8aaf7c5f4f4bfb286a84f99a3d5fe/raw/bcf546228edc44d1fc29d689c5b56e93775bc5c9/dummy_counter_as_script.py
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting 

## Managing complexity

### The last time

* **Object-oriented programming**: combines functions and data; helps map real-world objects and relations between them in code.
* **Functional programming**: combining simple behaviours together to produce complex behaviours.

### Today

* Organising code indo **modules and namespaces**.
* Registering and handling **exceptions**.

## Modularisation

If your program does one thing, and you want it to be self-contained, it is probably fine if its code is a [single large file](https://raw.githubusercontent.com/bellard/quickjs/refs/heads/master/quickjs.c).

Some problems with this approach:

1. Difficult to change the code (where is everything?).
2. If we want to reuse the code, we need to either import everything (code bloat) or copy and paste (harder to fix errors).
3. The namespace gets polluted.

## An aside on names

You're working with data. You need to read data stored in the PyTables format and write data in the Arrow format. The main class in both libraries is called `Table`. What do you think this will do?

```python
from pyarrow import Table
from tables import Table
```

Let's check using a similar name conflict in the Python standard library vs. a user file. The contents of `dummy_counter.py`:

```python
class Counter:
    def __init__(self):
        self.count = 0
        
    def get_value(self):
        return self.count
    
    def increment(self):
        self.count += 5
```

In [None]:
from collections import Counter
from dummy_counter import Counter

c = Counter()
print(type(c))

Both `Counter`'s and both `Table`'s are imported into the same *namespace*. We can avoid the naming conflict by either renaming

```python
from collections import Counter
<!-- We have to rename -->
from dummy_counter import Counter as DummyCounter

c = Counter()
dc = DummyCounter()
```

or using hierarchical namespaces:

```python
import collections
import dummy_counter

c = collections.Counter()
dc = dummy_counter.Counter()
```

This syntax will import all names into the current global namespace:

```python
from collections import *
```

This strategy is not recommended. Why?

In [None]:
# We can import like this
from collections import (
    Counter,
    defaultdict
)

### Modules and packages

Module is simply a file with Python code. When we import a module, **all code in it will be executed**.

Depending on how we import the module the definitions we will get access to the definitions it provides.

In [2]:
# importing a module with definitions only
import dummy_counter as dc

In [None]:
# importing a module with definitions and top-level statements
import dummy_counter_as_script

Current value: 1


Often we want to be able to use a module both as a library and as a script.

E.g., we can use the script part to quickly test our functions and classes. For this end, the Python gives us the `__name__ == "__main__"` trick:

```python
# Some definitions above

if __name__ == "__main__":
    print('I see that you used `python script.py`')
```

In [None]:
# Nothing is printed

import dummy_counter_w_main

In [3]:
# Exercise: create a module

!touch new_module.py

# Implement a function or a class there. Add a sanity check in main(), and then
# import the module to use what you implemented here.

In [4]:
import new_module

## Organising modules into packages

We can import modules from the same directory where we put the main script or notebook, but as we create more and more modules, this will start to look like a mess. Normally we want to have something like this:

```
my_package/         
├── core.py
├── utils.py
├── subpkg/
│   └── helpers.py
└── data/
    └── loader.py
```

So that then we can do

```python
import my_package.core as core
from my_package import utils
from my_package.data.loader import read_from_json
```

### The Pythonic solution

Any directory with Python files is a package if it has an `__init__.py` file (it can be empty). Packages can have subpackages. All python packages have that __init__ file

The actual structure is then:

```
my_package/
├── __init__.py
├── core.py
├── utils.py
├── subpkg/
│   ├── __init__.py
│   └── helpers.py
└── data/
    ├── __init__.py
    └── loader.py
```

In [5]:
# Let's download a sample package.
!git clone https://github.com/nedbat/pkgsample.git

Cloning into 'pkgsample'...
remote: Enumerating objects: 66, done.[K
remote: Counting objects: 100% (24/24), done.[K
remote: Compressing objects: 100% (13/13), done.[K
remote: Total 66 (delta 14), reused 17 (delta 11), pack-reused 42 (from 1)[K
Receiving objects: 100% (66/66), 27.62 KiB | 975.00 KiB/s, done.
Resolving deltas: 100% (31/31), done.


In [None]:
# Doesn't work!
# from pkgsample import add

The Python interpreter needs to know where to look for the code. Three things we can do here:

1. Change the current working directory (`os.chdir("./pkgsample/src/pkgsample")`) -- bad for various reasons.
2. Add the path to the code to the list of paths Python knows about.
3. Install the package, i.e. copy the package to one of the locations Python knows about, either globally or in the local environment (we'll cover environments in a later class).

In [10]:
# The middle path: adding the path to the code to the Python path collection.

import sys
import os
print(sys.path)
# Here is where you do the insertion of path
sys.path.insert(0, os.path.abspath('pkgsample/src'))

['/content/pkgsample/src', '/content', '/env/python', '/usr/lib/python312.zip', '/usr/lib/python3.12', '/usr/lib/python3.12/lib-dynload', '', '/usr/local/lib/python3.12/dist-packages', '/usr/lib/python3/dist-packages', '/usr/local/lib/python3.12/dist-packages/IPython/extensions', '/root/.ipython']


In [11]:
print(sys.path)

['/content/pkgsample/src', '/content/pkgsample/src', '/content', '/env/python', '/usr/lib/python312.zip', '/usr/lib/python3.12', '/usr/lib/python3.12/lib-dynload', '', '/usr/local/lib/python3.12/dist-packages', '/usr/lib/python3/dist-packages', '/usr/local/lib/python3.12/dist-packages/IPython/extensions', '/root/.ipython']


In [12]:
# This may not work if you tried import pkgsample before and failed. Try
# restarting the notebook.
# Basically. you need to add the pathing
from pkgsample import add

In [13]:
add.add(2, 3)

5

### Error handling: exceptions

What should we do if your function cannot continue? E.g., you need to read data from a file and the file is not there.

Strategies:
1. Panic, i.e. interrupt execution and exit the program. (E.g., OS kernel panic.)
2. Return None and set the value of a dedicated global variable to error (`errno` in the C language).
3. Return an error code (by convention, 0 usually means success; any non-zero value indicates error).

--> In Linux and some other OSes, programs return error codes, and we can use them (with caution)

In [14]:
!ls && echo "The return code:" $?

dummy_counter_as_script.py  dummy_counter_w_main.py  pkgsample	  sample_data
dummy_counter.py	    new_module.py	     __pycache__
The return code: 0


In [None]:
# Now let's do this in Python.
# Note the "&&" vs. ";"" difference
# The command line interface checks what the system outputted, like in the first example,
!python -c "import sys; print('I do not fell well.'); sys.exit(1)" && echo $?
!python -c "import sys; print('I do not fell well.'); sys.exit(1)" ; echo $?

I do not fell well.
I do not fell well.
1


It's a bit inconvenient to return both an error indicator and a value, and there are heated debates in the programming community on how to do this properly (an overview: https://digitalcommons.oberlin.edu/cgi/viewcontent.cgi?article=1854&context=honors).

OOP languages (C++, Java, Python) tend to use exceptions.

Exceptions are a special flag that is being "raised" or "thrown" when something goes wrong. If nothing is done after an exception is raised, it wall propagate (bubble up) the call chaing and crash the program at the top level.

In [None]:
def throws_on_odd(number):
    if number % 2 == 0:
        print("I'm good")
    else:
        raise ValueError(f'{number} is odd.')

In [None]:
throws_on_odd(3)

ValueError: 3 is odd.

In [None]:
# Some exceptions are baked in.
throws_on_odd()

TypeError: throws_on_odd() missing 1 required positional argument: 'number'

In [None]:
# An example of exception propagation

def top_management(number):
    middle_management(number)

def middle_management(number):
    throws_on_odd(number)

top_management(3)

ValueError: 3 is odd.

Exceptions make the behaviour of programs a bit unpredictable, but it saves a lot of typing. Cf. a more principled version of the code above written in Rust:

```rust
use std::error::Error;
use std::fmt;

// ----- Custom error type -----

#[derive(Debug)]
struct OddNumberError {
    number: i32,
}

impl fmt::Display for OddNumberError {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "{} is odd.", self.number)
    }
}

impl Error for OddNumberError {}

// ----- Functions -----

fn throws_on_odd(number: i32) -> Result<(), OddNumberError> {
    if number % 2 == 0 {
        println!("I'm good");
        Ok(())
    } else {
        Err(OddNumberError { number })
    }
}

fn middle_management(number: i32) -> Result<(), OddNumberError> {
    throws_on_odd(number)
}

fn top_management(number: i32) -> Result<(), OddNumberError> {
    middle_management(number)
}

// ----- Main -----

fn main() {
    match top_management(3) {
        Ok(()) => {}
        Err(e) => eprintln!("Error: {}", e),
    }
}
```

Often, we don't want to stop the world when an error happens and instead want to work around it. To do this, we use the `try... catch` structure.

In [16]:
# We know that this function may throw
def throws_on_odd(number):
    if number % 2 == 0:
        pass
    else:
        raise ValueError(f'{number} is odd.')

val1 = 4
val2 = 3
try:
    throws_on_odd(val1)
    print(f'{val1} consumed alright')
    throws_on_odd(val2)
    print(f'{val2} consumed alright')
    throws_on_odd()
    print('No argument at all is fine as well')
except ValueError as e:
    print(f'Bad argument: {e}')

4 consumed alright
Bad argument: 3 is odd.


In [17]:
# Let's see what happens if we change the error type.
val1 = 4
val2 = 6
# Basically try allows all
try:
    throws_on_odd(val1)
    print(f'{val1} consumed alright')
    throws_on_odd(val2)
    print(f'{val2} consumed alright')
    throws_on_odd()
    print('No argument at all is fine as well')
except ValueError as e:
    print(f'Bad argument: {e}')

4 consumed alright
6 consumed alright


TypeError: throws_on_odd() missing 1 required positional argument: 'number'

In [None]:
# Now we know better
val1 = 4
val2 = 6
try:
    throws_on_odd(val1)
    print(f'{val1} consumed alright')
    throws_on_odd(val2)
    print(f'{val2} consumed alright')
    throws_on_odd()
    print('No argument at all is fine as well')
except ValueError as e:
    print(f'Bad argument: {e}')
except TypeError as e:
    print(e)

4 consumed alright
6 consumed alright
throws_on_odd() missing 1 required positional argument: 'number'


In [None]:
# A catch-all: feeling confident to deal with any possible error
val1 = 4
val2 = 6
try:
    throws_on_odd(val1)
    print(f'{val1} consumed alright')
    throws_on_odd(val2)
    print(f'{val2} consumed alright')
    throws_on_odd()
    print('No argument at all is fine as well')
except Exception as e:
    print(f'Something went wrong: {e}')

4 consumed alright
6 consumed alright
Something went wrong: throws_on_odd() missing 1 required positional argument: 'number'


In [18]:
# A common situation: something bad in the dataset
data_list = ['a'] * 50000 + [None] + ['b'] * 50000
for datum in data_list:
    l = len(datum)

TypeError: object of type 'NoneType' has no len()

In [None]:
# 1. Investigate and prepare
for datum in data_list:
    # Not None and and not empty
    if datum:
        l = len(datum)
    else:
        continue

# 2. Just forget it: quicker but may miss more subtle errors
for datum in data_list:
    try:
        l = len(datum)
    except:
        continue

Good exception handling is important for code that you want to share with others. It is even more important, however, to be able to track the error messages and understand what's wrong with your (or someone else's code).

The key thing: to know what state your program is in at each step, i.e. what values are assigned to different variables. Two basic approaches:

1. Be a serious programmer and use a debugger: https://realpython.com/python-debugging-pdb/
2. Add print statements showing the values of different variables at different stages.

Some of the examples below look silly. So do most errors, which doesn't make them easier to spot. Fix all the errors:

In [19]:
x = 10
y = 5
print(x + y)

15


In [21]:
numbers = [1, 2, 3]
print("Sum is: ", sum(numbers))

Sum is:  6


In [22]:
items = ["a", "b", "c"]
for i in range(3):
    print(i, items[i])

0 a
1 b
2 c


In [25]:
def get_data():
    return "None"

data = get_data()
print(data)

None


In [26]:
config = {"db": {"host": "localhost"}}
print(config["db"]["host"])

localhost


In [36]:
import time

n = 5
start = time.time()
timeout = 3  # seconds

print('Starting the work...')
sum = 0
while n > 0:
    sum += n
    n -= 1
    if time.time() - start > timeout:
        raise RuntimeError('Timeout!')
print(f'The sum is equal to {n}.')

Starting the work...
The sum is equal to 0.


In [30]:
def countdown(n):
    if n == 0:
        print(n)
        print('Done!')
        return 0
    return 1 / n + countdown(n - 1)

print(countdown(-3))

RecursionError: maximum recursion depth exceeded

In [34]:
def append_item(lst, item):
    lst.append(item)
    return lst

x = []
y = append_item(x, 10)
z = append_item(x, 20)

assert x == [10,20], f'x == {x}'
assert y == [10,20], f'y == {y}'
assert z == [10,20], f'z == {z}'

In [38]:
def create_adder(num):

  def adder(input_number):
    num = 1
    return input_number + num

  return adder

add_15 = create_adder(15)
print(add_15(12))

UnboundLocalError: cannot access local variable 'num' where it is not associated with a value