# Testing different learning rates

The TorNet paper gave us a pretty good indicator of how to set up their baseline CNN model with CoordConv and VGG blocks.
But even better, their actual TorNet repository gave us the actual model blocks themselves.

In this notebook, we're mostly just running through the model to set up different learning rates to ensure
that we've got the best learning rate for training through the data.

In [1]:
import sys

import os
import glob
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt

# just the location for the input data
TORNET_DATA_INPUT_FOLDER = "/mnt/c/users/handypark/Documents/Grad_School_Courses/CS_230/tornet"

2024-11-15 22:58:31.614216: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-15 22:58:31.732357: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1731740311.781065      76 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1731740311.795969      76 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-15 22:58:31.910924: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

In [2]:
# Just making sure that we are indeed using GPU-based Tensorflow and not CPU-based Tensorflow.
tf.test.is_built_with_cuda()

# We tried experimental memory growth in some cases, but it didn't work out well (also crashed a lot).
"""
gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)
"""

'\ngpus = tf.config.list_physical_devices(\'GPU\')\nif gpus:\n  try:\n    # Currently, memory growth needs to be the same across GPUs\n    for gpu in gpus:\n      tf.config.experimental.set_memory_growth(gpu, True)\n    logical_gpus = tf.config.list_logical_devices(\'GPU\')\n    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")\n  except RuntimeError as e:\n    # Memory growth must be set before GPUs have been initialized\n    print(e)\n'

## TorNet's helper functions

Included below are the main TorNet helper functions we use to create the dataset when loading it into Tensorflow.
There weren't many changes here - these are mostly as is from the tornet repo.
They are annotated with new comments to show how they will be used later on.

In [3]:
"""
TorNet's data loading code, re-imported here manually for loading data into TensorFlow.
For some reason, trying to import the data loading code using `from tornet.data.tf.loader` wasn't working as expected,
so we re-copy that code over here to make use of it.
"""
from typing import List, Dict

from tornet.data.loader import query_catalog, read_file
from tornet.data.constants import ALL_VARIABLES
from tornet.data import preprocess as pp

def create_tf_dataset(files:str,
                      variables: List[str]=ALL_VARIABLES,
                      n_frames:int=1,
                      tilt_last: bool=True) -> tf.data.Dataset:
    """
    This is Tornet's main function for loading the data from the folder where it's all stored.
    
    As they stated, this creates a TF dataset object via the function read_file (which reads the NetCDF files
    into the data one at a time).
    """
    assert len(files)>0
    # grab one file to gets keys, shapes, etc
    data = read_file(files[0],variables=variables,n_frames=n_frames, tilt_last=tilt_last)
    
    output_signature = { k:tf.TensorSpec(shape=data[k].shape,dtype=data[k].dtype,name=k) for k in data }
    def gen():
        for f in files:
            yield read_file(f,variables=variables,n_frames=n_frames, tilt_last=tilt_last)
    ds = tf.data.Dataset.from_generator(gen,
                                        output_signature=output_signature)
    return ds
    

def preproc(ds: tf.data.Dataset,
            weights:Dict=None,
            include_az:bool=False,
            select_keys:list=None,
            tilt_last:bool=True):
    """
    This is Tornet's preprocessing function for taking the raw dataset loaded from the files (in create_tf_dataset)
    and then doing a few things:

    - Remove the time dimension (since we only care about detection at a given time t)
    - Add coordinates (so that we can run CoordConv layers later)
    - Split the data into its inputs and label outputs
    - Adding weights (if we decide to weight the data at all)

    Once the preprocessing is done, the data is basically ready to be trained on.
    """
    
    # Remove time dimesnion
    ds = ds.map(pp.remove_time_dim)

    # Add coordinate tensors
    ds = ds.map(lambda d: pp.add_coordinates(d,include_az=include_az,tilt_last=tilt_last,backend=tf))

    # split into X,y
    ds = ds.map(pp.split_x_y)

    # Add sample weights
    if weights:
        ds = ds.map(lambda x,y:  pp.compute_sample_weight(x,y,**weights, backend=tf) )
    
        # select keys for input
        if select_keys is not None:
            ds = ds.map(lambda x,y,w: (pp.select_keys(x,keys=select_keys),y,w))
    else:
        if select_keys is not None:
            ds = ds.map(lambda x,y: (pp.select_keys(x,keys=select_keys),y))

    return ds

In [4]:
def make_tf_loader(data_root: str, 
            data_type:str='train', # or 'test'
            years: list=list(range(2013,2023)),
            batch_size: int=128, 
            weights: Dict=None,
            include_az: bool=False,
            random_state:int=1234,
            select_keys: list=None,
            tilt_last: bool=True,
            from_tfds: bool=False,
            tfds_data_version: str='1.1.0',
            num_epochs: int=3):
    """
    This TorNet library function is used to load the data into Tensorflow.
    We're going to use the `create_tf_dataset` function from above, 
    then we'll use `preproc` to preprocess it.

    One important note - we tried a bunch of different functions for shuffling 
    and batching, repeating the dataset, etc. to try to be able to run
    many epochs with one function call. It wasn't working.
    Even the `drop_remainder=True` that we've added here to ds.batch
    seems to not really have an effect, as the model training
    still throws an error at the end of training about running out of data.
    """
    
    if from_tfds: # fast loader
        import tensorflow_datasets as tfds
        import tornet.data.tfds.tornet.tornet_dataset_builder # registers 'tornet'
        ds = tfds.load('tornet:%s' % tfds_data_version ,split='+'.join(['%s-%d' % (data_type,y) for y in years]))
        # Assumes data was saved with tilt_last=True and converts it to tilt_last=False
        if not tilt_last:
            ds = ds.map(lambda d: pp.permute_dims(d,(0,3,1,2), backend=tf))
    else: # Load directly from netcdf files
        file_list = query_catalog(data_root, data_type, years, random_state)
        ds = create_tf_dataset(file_list,variables=ALL_VARIABLES,n_frames=1, tilt_last=tilt_last) 

    ds = preproc(ds,weights,include_az,select_keys,tilt_last)
    ds = ds.prefetch(tf.data.AUTOTUNE)

    # this has been adjusted to include drop_remainder=True
    ds = ds.batch(batch_size, drop_remainder=True)
    return ds

## TorNet Baseline CNN Model Definition:

As stated in the comment for the code block below, this is just the TorNet model code that was used in the paper.
Our goal is to run this model which consists of:
- Normalizing the inputs
- Adding the coordinate information for CoordConv to work properly
- Running 4 VGG blocks which each have CoordConv2D and two MAXPOOL layers
- Flatten, then dense layers at the end to get the binary classification output

Later on, in our experiments to improve on this model, we'll likely still use some of this code
when it comes to preprocessing the data (from a normalization + CoordConv2D standpoint).

Notably, in the case of our data augmentation experiments, we'll only really need to change the input dataset
(and not any of the model code itself).
In comparison, for the YOLO transfer learning experiments, we'll likely not be able to use a good chunk of this
dataset (but it might still be useful to use some of the layers made here, for instance).

In [5]:
"""
This is just the TorNet model code that was used in the paper.
Goal is to run this model which consists of:
- Normalizing the inputs
- Adding the coordinate information for CoordConv to work properly
- Running 4 VGG blocks which each have CoordConv2D and two MAXPOOL layers
- One last block of Conv2D layers and MAXPOOL to get the output probability
"""

from typing import Dict, List, Tuple
import numpy as np
import keras
from tornet.models.keras.layers import CoordConv2D, FillNaNs
from tornet.data.constants import CHANNEL_MIN_MAX, ALL_VARIABLES


def build_model(shape:Tuple[int]=(120,240,2),
                c_shape:Tuple[int]=(120,240,2),
                input_variables:List[str]=ALL_VARIABLES,
                start_filters:int=64,
                l2_reg:float=0.001,
                background_flag:float=-3.0,
                include_range_folded:bool=True,
                head='maxpool'):
    # Create input layers for each input_variables
    inputs = {}
    for v in input_variables:
        inputs[v]=keras.Input(shape,name=v)
    n_sweeps=shape[2]
    
    # Normalize inputs and concate along channel dim
    normalized_inputs=keras.layers.Concatenate(axis=-1,name='Concatenate1')(
        [normalize(inputs[v],v) for v in input_variables]
        )

    # Replace nan pixel with background flag
    normalized_inputs = FillNaNs(background_flag)(normalized_inputs)

    # Add channel for range folded gates 
    if include_range_folded:
        range_folded = keras.Input(shape[:2]+(n_sweeps,),name='range_folded_mask')
        inputs['range_folded_mask']=range_folded
        normalized_inputs = keras.layers.Concatenate(axis=-1,name='Concatenate2')(
               [normalized_inputs,range_folded])
        
    # Input coordinate information
    cin=keras.Input(c_shape,name='coordinates')
    inputs['coordinates']=cin

    x,c = normalized_inputs,cin
    
    x,c = vgg_block(x,c, filters=start_filters,   ksize=3, l2_reg=l2_reg, n_convs=2, drop_rate=0.1)   # (60,120)
    x,c = vgg_block(x,c, filters=2*start_filters, ksize=3, l2_reg=l2_reg, n_convs=2, drop_rate=0.1)  # (30,60)
    x,c = vgg_block(x,c, filters=4*start_filters, ksize=3, l2_reg=l2_reg, n_convs=3, drop_rate=0.1)  # (15,30)
    x,c = vgg_block(x,c, filters=8*start_filters, ksize=3, l2_reg=l2_reg, n_convs=3, drop_rate=0.1)  # (7,15)
    #x,c = vgg_block(x,c, filters=8*start_filters, ksize=3, l2_reg=l2_reg, n_convs=3)  # (3,7)
    
    if head=='mlp':
        # MLP head
        x = keras.layers.Flatten()(x) 
        x = keras.layers.Dense(units = 4096, activation ='relu')(x) 
        x = keras.layers.Dense(units = 2024, activation ='relu')(x) 
        output = keras.layers.Dense(1)(x)
    elif head=='maxpool':
        # Per gridcell
        x = keras.layers.Conv2D(filters=512, kernel_size=1,
                          kernel_regularizer=keras.regularizers.l2(l2_reg),
                          activation='relu')(x)
        x = keras.layers.Conv2D(filters=256, kernel_size=1,
                          kernel_regularizer=keras.regularizers.l2(l2_reg),
                          activation='relu')(x)
        x = keras.layers.Conv2D(filters=1, kernel_size=1,name='heatmap')(x)
        # Max in scene
        output = keras.layers.GlobalMaxPooling2D()(x)

    return keras.Model(inputs=inputs,outputs=output)


def vgg_block(x,c, filters=64, ksize=3, n_convs=2, l2_reg=1e-6, drop_rate=0.0):

    for _ in range(n_convs):
        x,c = CoordConv2D(filters=filters,
                          kernel_size=ksize,
                          kernel_regularizer=keras.regularizers.l2(l2_reg),
                          padding='same',
                          activation='relu')([x,c])
    x = keras.layers.MaxPool2D(pool_size =2, strides =2, padding ='same')(x)
    c = keras.layers.MaxPool2D(pool_size =2, strides =2, padding ='same')(c)
    if drop_rate>0:
        x = keras.layers.Dropout(rate=drop_rate)(x)
    return x,c


def normalize(x,
              name:str):
    """
    Channel-wise normalization using known CHANNEL_MIN_MAX
    """
    min_max = np.array(CHANNEL_MIN_MAX[name]) # [2,]
    n_sweeps=x.shape[-1]
    
    # choose mean,var to get approximate [-1,1] scaling
    var=((min_max[1]-min_max[0])/2)**2 # scalar
    var=np.array(n_sweeps*[var,])    # [n_sweeps,]
    
    offset=(min_max[0]+min_max[1])/2    # scalar
    offset=np.array(n_sweeps*[offset,]) # [n_sweeps,]

    return keras.layers.Normalization(mean=offset,
                                      variance=var,
                                      name='Normalize_%s' % name)(x)

## Building the Model

Ok, we've got all of the TorNet model components figured out and imported (using the helper functions, etc.).
[Even getting that working turned out to be tricky - there were dependency issues to resolve, and while we'd 
like to just be able to import the functions rather than re-copying them here again, that wasn't working so well.]

In [7]:
model = build_model()

In [8]:
model.summary()

## Testing various learning rates

From here on out, the plan is simple: we'll try out a bunch of different learning rates, between values of 5 * 10^-3 and 1 * 10^-6, and see how they do on a single epoch of data.

We want to see a good decrease in loss by the end of the epoch (one full run of mini-batch gradient descent), and preferably not a massive value like a NaN or a double-digit value. Comparatively, we'll try to go with the learning rate that gives us the best loss after an epoch and go from there.

In [9]:
opt = keras.optimizers.Adam(learning_rate=1e-5)
loss = keras.losses.BinaryCrossentropy(from_logits=True)
model.compile(loss=loss, optimizer=opt)

We create a checkpoint saving function just to deal with iPython's many kernel crashes.
After each pass through the data and all batches, we'll save the weights, reload the model, and keep going.

We'll save the weights in `checkpoints/epoch_{number_of_epoch}.weights.h5`

In [10]:
def checkpoint_creator(checkpoint_path):
    # saving the model's weights in case the iPython kernel crashes (which it likes to do)
    cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                       save_weights_only=True,
                                       verbose=1)
    return cp_callback

def checkpoint_loader(checkpoint_path, model):
    model.load_weights(checkpoint_path)

In [11]:
preprocessed = make_tf_loader(data_root = TORNET_DATA_INPUT_FOLDER, 
                              data_type = "train", # or 'test'
                              years = list(range(2013, 2023)),
                              batch_size = 64, 
                              weights = {'wN':1.0,'w0':1.0,'w1':1.0,'w2':2.0,'wW':0.5},
                              include_az = False,
                              random_state = 12345,
                              select_keys = ALL_VARIABLES + ["coordinates", "range_folded_mask"],
                              tilt_last = True,
                              from_tfds = False,
                              tfds_data_version ="1.1.0")

In [11]:
model.fit(preprocessed, epochs=1, callbacks=[checkpoint_creator("checkpoints/epoch_lr_1e-5_1.weights.h5")])

I0000 00:00:1731708928.240335    4744 service.cc:148] XLA service 0x7ff7900198d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1731708928.240830    4744 service.cc:156]   StreamExecutor device (0): NVIDIA GeForce RTX 3070, Compute Capability 8.6
2024-11-15 14:15:28.404795: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1731708928.744022    4744 cuda_dnn.cc:529] Loaded cuDNN version 90300
I0000 00:00:1731708954.784961    4744 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


   2682/Unknown [1m7870s[0m 3s/step - loss: 2.8315
Epoch 1: saving model to checkpoints/epoch_lr_1e-5_1.weights.h5


2024-11-15 16:26:32.711546: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
	 [[{{node IteratorGetNext}}]]
2024-11-15 16:26:32.711728: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
	 [[{{node IteratorGetNext}}]]
	 [[IteratorGetNext/_14]]
2024-11-15 16:26:32.711809: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 8267479748185452807
2024-11-15 16:26:32.711816: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 13271600076539421217
2024-11-15 16:26:32.711820: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2567119292032504697
2024-11-15 16:26:32.711823: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 13511106840714875285
2024-11-

[1m2682/2682[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7870s[0m 3s/step - loss: 2.8313


<keras.src.callbacks.history.History at 0x7ff8b8799d90>

In [12]:
model_1e4 = build_model()

opt_1e4 = keras.optimizers.Adam(learning_rate=1e-4)
loss_1e4 = keras.losses.BinaryCrossentropy(from_logits=True)
model_1e4.compile(loss=loss_1e4, optimizer=opt_1e4)
model_1e4.fit(preprocessed, epochs=1, callbacks=[checkpoint_creator("checkpoints/epoch_lr_1e-4_1.weights.h5")])

I0000 00:00:1731740362.466520     158 service.cc:148] XLA service 0x7f7434019210 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1731740362.466825     158 service.cc:156]   StreamExecutor device (0): NVIDIA GeForce RTX 3070, Compute Capability 8.6
2024-11-15 22:59:22.590611: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1731740362.903679     158 cuda_dnn.cc:529] Loaded cuDNN version 90300
I0000 00:00:1731740390.926092     158 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


   2682/Unknown [1m8778s[0m 3s/step - loss: 1.3556
Epoch 1: saving model to checkpoints/epoch_lr_1e-4_1.weights.h5


2024-11-16 01:25:35.744124: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
	 [[{{node IteratorGetNext}}]]
2024-11-16 01:25:35.744215: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
	 [[{{node IteratorGetNext}}]]
	 [[IteratorGetNext/_18]]
2024-11-16 01:25:35.744326: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 8695542842600235751
2024-11-16 01:25:35.744331: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 1567776654613932835
2024-11-16 01:25:35.744336: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2187150333040116669
2024-11-16 01:25:35.744340: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 15134813536383874487
2024-11-1

[1m2682/2682[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8779s[0m 3s/step - loss: 1.3553


<keras.src.callbacks.history.History at 0x7f754a52d400>

In [13]:
model_1e3 = build_model()

opt_1e3 = keras.optimizers.Adam(learning_rate=1e-3)
loss_1e3 = keras.losses.BinaryCrossentropy(from_logits=True)
model_1e3.compile(loss=loss_1e3, optimizer=opt_1e3)
model_1e3.fit(preprocessed, epochs=1, callbacks=[checkpoint_creator("checkpoints/epoch_lr_1e-3_1.weights.h5")])

   2682/Unknown [1m8508s[0m 3s/step - loss: 0.5715
Epoch 1: saving model to checkpoints/epoch_lr_1e-3_1.weights.h5


2024-11-16 03:47:24.496592: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
	 [[{{node IteratorGetNext}}]]
	 [[IteratorGetNext/_18]]
2024-11-16 03:47:24.497934: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 8695542842600235751
2024-11-16 03:47:24.497942: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 1567776654613932835
2024-11-16 03:47:24.497947: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2187150333040116669
2024-11-16 03:47:24.497951: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 15134813536383874487
2024-11-16 03:47:24.497954: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 10014556179987434979
2024-11-16 03:47:24.497957: I tensorflow/co

[1m2682/2682[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8509s[0m 3s/step - loss: 0.5714


<keras.src.callbacks.history.History at 0x7f7440399730>

In [14]:
model_5e4 = build_model()

opt_5e4 = keras.optimizers.Adam(learning_rate=5e-4)
loss_5e4 = keras.losses.BinaryCrossentropy(from_logits=True)
model_5e4.compile(loss=loss_5e4, optimizer=opt_5e4)
model_5e4.fit(preprocessed, epochs=1, callbacks=[checkpoint_creator("checkpoints/epoch_lr_5e-4_1.weights.h5")])

   2682/Unknown [1m8519s[0m 3s/step - loss: 0.7231
Epoch 1: saving model to checkpoints/epoch_lr_5e-4_1.weights.h5


2024-11-16 06:09:24.714749: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 8695542842600235751
2024-11-16 06:09:24.717595: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 1567776654613932835
2024-11-16 06:09:24.717619: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 2187150333040116669
2024-11-16 06:09:24.717626: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 15134813536383874487
2024-11-16 06:09:24.717631: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 10014556179987434979
2024-11-16 06:09:24.717636: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 18299257707729137615
2024-11-16 06:09:24.717646: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv 

[1m2682/2682[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8520s[0m 3s/step - loss: 0.7229


<keras.src.callbacks.history.History at 0x7f7420236190>

In [15]:
model_5e3 = build_model()

opt_5e3 = keras.optimizers.Adam(learning_rate=5e-3)
loss_5e3 = keras.losses.BinaryCrossentropy(from_logits=True)
model_5e3.compile(loss=loss_5e3, optimizer=opt_5e3)
model_5e3.fit(preprocessed, epochs=1, callbacks=[checkpoint_creator("checkpoints/epoch_lr_5e-3_1.weights.h5")])

   1934/Unknown [1m6204s[0m 3s/step - loss: 16.9405

KeyboardInterrupt: 

Looking at the results above for the different learning rates, it looks like:
- Values as big as 5 * 10^-3 were too large (we got divergence with a loss of 16.9405, and I halted early to save time)
- Values as small as 10^-4 or 10^-5 did ok, but were still a bit slow to decrease the loss (single digit loss).
- Around 10^-3 seems to be a reasonably performing sweet spot.

For the remainder of the project, we'll go with 10^-3.