## Chapter 10

### Exercise 1

In [1]:
from utils import set_mode

set_mode('local')

import time
from dsets import LunaDataset

In [None]:
def time_iterations(iter_count):
    ds = LunaDataset()

    start = time.time()
    for i in range(iter_count):
        _ = ds[i]
    end = time.time()

    print(f'{iter_count} iterations finished in {end - start} seconds.')

def time_last_iterations(iter_count):
    ds = LunaDataset()

    start = time.time()
    for i in range(iter_count):
        _ = ds[-i]
    end = time.time()

    print(f'{iter_count} iterations finished in {end - start} seconds.')

In [None]:
time_iterations(1000)

a) First run finished in 141 seconds.

In [None]:
time_iterations(1000)

b) Second run finished in under 1 second.

In [None]:
time_iterations(1000)

c) After clearing the cache, the runtime is back to 200 seconds.

In [None]:
time_last_iterations(1000)
time_last_iterations(1000)

d) Using the last 1000 samples has no impact on the runtime after being cached.

### Exercise 2

In [None]:
time_iterations(1000)
time_iterations(1000)

After randomizing the list, both runs take quite a long time.

### Exercise 3

In [None]:
time_iterations(1000)
time_iterations(1000)

The getCt decorator does have an impact on the first loop. The second one remains the same, however.

## Chapter 11

### Exercise 1

In [2]:
from utils import set_mode

set_mode('local')

from torch.utils.data import DataLoader
from tqdm import tqdm

In [None]:
def time_dataloader(num_workers, batch_size=1):
    # We only time the validation set, so it's faster
    dataset = LunaDataset(
        val_stride = 10,
        is_val_set = True,
    )

    data_loader = DataLoader(
        dataset,
        batch_size  = batch_size,
        num_workers = num_workers,
        pin_memory  = True
    )

    start_time = time.time()
    for _ in tqdm(data_loader):
        pass

    print(f'Finished in {time.time() - start_time:.2f} seconds. Num_workers: {num_workers}.')

In [None]:
time_dataloader(4)

Before cache is filled, iterating over the 55107 samples in the validation set takes 8 minutes and 10 seconds.

In [None]:
time_dataloader(4)

The run time significantly reduces after the first epoch, as the data is already placed in the on disk cache now. The time to iterate over the validation set is now only 1 minute and 21 seconds.

In [None]:
for workers in range(1, 13):
    time_dataloader(workers)

a) The number of workers has an impact on the runtime, though only a limited one: Initially, going from 1 worker to 2 workers reduced the runtime by 30 seconds, or almost a quarter of the initial runtime. The following modification of the workers does not have a visible effect anymore. The runtime stays constant after increasing the number further.

In [None]:
time_dataloader(batch_size=1024, num_workers=12)

c) The maximum combination fluctuates highly. I could not figure out what causes the problem, yet. When everything is fine, the maximum seems to be about `batch_size` 4 if `num_workers` is 12 and `batch_size` 512 if `num_workers` is 1.

### Exercise 2

There does not seem to be an observable difference.

## Chapter 12

### Exercise 1

In [None]:
import numpy as np

In [None]:
# More general implementation of f_score: Recall is considered beta times as important as precision
def f_score(preds, labels, beta=1, classification_threshold=0.5):
    # True positives: Elements identified as nodules that are actually nodules
    # False positives: Elements identified as nodules that are not nodules
    # True negatives: Elements not identified as nodules that are not nodules
    # False negatives: Elements not identified as nodules that are nodules

    pos_label_mask = labels > classification_threshold  # Actual nodules
    pos_pred_mask = preds > classification_threshold    # Elements identified as nodules

    neg_label_mask = ~pos_label_mask    # Actual non-nodules
    neg_pred_mask = ~pos_pred_mask      # Elements identifies as non-nodules

    pos_count = int(pos_label_mask.sum())   # Number of actual nodules
    neg_count = int(neg_label_mask.sum())   # Number of actual non-nodules

    true_neg_count = int((neg_label_mask & neg_pred_mask).sum())    # Number of non-nodules identified as such
    true_pos_count = int((pos_label_mask & pos_pred_mask).sum())    # Number of nodules identified as such

    false_pos_count = neg_count - true_neg_count    # Num. of samples identified as nodules, even though they are not nodules
    false_neg_count = pos_count - true_pos_count    # Num. of samples identified as non-nodules, even though they are nodules

    pos_pred_count = np.float32(true_pos_count + false_pos_count)
    precision = true_pos_count / pos_pred_count if pos_pred_count > 0 else 0

    act_pos_count = np.float32(true_pos_count + false_neg_count)
    recall = true_pos_count / act_pos_count if act_pos_count > 0 else 0

    denominator = ((beta ** 2) * precision) + recall
    return (1 + beta ** 2) * (precision * recall) / denominator if denominator > 0 else 0.0

b)
To reiterate:

- Recall: Number of samples correctly identified as positive against the number of actual positive samples
- Precision: Number of samples correctly identified as positive against the number of samples identified as positive, whether wrong or right

If we classify everything as positive, we have a lot of false positives, so precision will be very low. We will not have any false negatives, however, so recall will be high. If we classify everything as negative, both will be 0. In our case, we want to minimize false negatives, as we want to be really sure we miss no nodules, so we want to weigh recall higher than precision. In this case the F2 score is a better choice.

### Exercise 2

In [15]:
from torch.utils.data import WeightedRandomSampler

# Define Dataset
weighted_ds = LunaDataset(
    val_stride  = 10,
    is_val_set  = True,
    ratio_int   = 0,
)

# Get label counts
candidate_info = weighted_ds.candidate_info_list

label_counts = {}
for candidate in candidate_info:
    if candidate.isNodule_bool not in label_counts:
        label_counts[candidate.isNodule_bool] = 0
    label_counts[candidate.isNodule_bool] += 1

# Use inverse of label count as weight for weighted sampler
weights = [1 / label_counts[candidate.isNodule_bool] for candidate in candidate_info]
weighted_sampler = WeightedRandomSampler(weights, len(weighted_ds))

train_dl = DataLoader(
    weighted_ds,
    sampler = weighted_sampler,
    batch_size  = 1,
    num_workers = 6,
    pin_memory  = True
)

2025-08-07 15:00:44,724 INFO     pid:3034 dsets:273:__init__ <dsets.LunaDataset object at 0x7f1b6d0009b0>: 55107 validation samples True


In [17]:
label_counts = {
    'True': 0,
    'False': 0
}
for batch in tqdm(train_dl):
    if batch[1][0][0] == 1:
        label_counts['False'] += 1
    elif batch[1][0][1] == 1:
        label_counts['True'] += 1

label_counts

100%|██████████| 55107/55107 [23:11<00:00, 39.60it/s] 


{'True': 27503, 'False': 27604}

a) The candidate info list can be used to get the required information to construct the weights.

b) Using the index seems to be more clean, as the code is not scattered around so much. In the codebase, this approach would create a mess in the functions creating the dataloaders.

### Exercise 3

a) Three different training and validation ratios were tested: 3:1, 2:1 and 1:1. Interestingly, using a uniform ratio was not the most effective with regard to the loss. For this metric, the middle ratio, 2:1, scored the best, whereas 3:1 was close behind. The same statement can be made for the f1 score, recall and precision.

b) The application of the ratio as a function of the epoch has a positive effect on evaluation and train loss. The same statement can be made for the evaluation f1 score (as well as precision and recall), even though the values of these metrics of the train stage do not necessarily improve. Initially, these do also improve, but the higher the ratio gets, the worse they get. Nevertheless, as already mentioned, during evaluation, this function does increase the scores, indicating better generalization.

### Exercise 4

a) The augmentation approaches noise, offset and scale can indeed be made more aggressive by changing their baseline values in `training.py`.

b)

c) There are a few augmentation tactics found, but only some of which are applicable here. I could find augmentation tactics specifically for text and audio data, which obviously is not the correct scope for this project. More suitable approaches I came across are the following:

- Color space transformation: By randomly changing some the color channels, the contrast and brightness of the scans can be increased per slice. This does not seem like it would help here, but it could be considered.
- Random erasing: This approach seems similar to noise. Random parts of each CT slice are deleted.
- Salt and Pepper Noise: Whereas the current noise implementation uses a Gaussian distribution, some sources suggest to randomly insert black or white pixels, which imitate dead pixels or sensor dust. It does not seem like this would happen with CT scans, but it is definitely worth a try.

In [None]:
# TODO: Finish other exercises