# Advanced Topics (...tying up some ends...)

This unit discusses some useful-to-know concepts that round off the course.

## Extending Keras

So far, we have encountered TensorFlow in two different guises:

 * As a RM-AD tool for GPU-enhanced computation of fast good-quality
   gradients (such as: for optimization).
 * As a ML toolkit to quickly wire up some DNN architectures.

What we have not seen is how these things fit together, specifically: how can we wrap up some of our own (possibly quite sophisticated) function designs as a Keras layer?

Let us explore this by means of an example. We start with a simple convolutional architecture for CIFAR-10 one-out-of-10 image classification.

In [None]:
import numpy
import tensorflow as tf
import tensorflow_datasets as tfds

# Let us load the CIFAR-10 dataset rather than MNIST
# (10 classes, 32x32 pixels, RGB).
# Details: http://www.cs.toronto.edu/~kriz/cifar.html

(ds_train_raw, ds_validation_raw, ds_test_raw), ds_info = tfds.load(
    'cifar10',
    split=['train[:75%]', 'train[75%:]', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True)

def normalize_image(image, label):
  """Normalizes images."""
  return tf.cast(image, tf.float32) / 255., label


ds_train = (
    ds_train_raw
    # Details: see https://www.tensorflow.org/guide/data_performance
    .map(normalize_image, num_parallel_calls=tf.data.AUTOTUNE)
    .cache()
    .shuffle(ds_info.splits['train'].num_examples)
    .batch(32)
    .prefetch(tf.data.AUTOTUNE))

ds_validation = (
    ds_validation_raw
    .map(normalize_image, num_parallel_calls=tf.data.AUTOTUNE)
    .batch(32)
    .cache()
    .prefetch(tf.data.AUTOTUNE))

ds_test = (
    ds_test_raw
    .map(normalize_image, num_parallel_calls=tf.data.AUTOTUNE)
    .batch(32)
    .cache()
    .prefetch(tf.data.AUTOTUNE))



In [None]:
### The model.

cnn_model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(32, 32, 3)),
        tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(50, activation='relu'),
        tf.keras.layers.Dense(10),
    ]
)

cnn_model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-4),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

cnn_model.fit(
    ds_train,
    epochs=25,
    validation_data=ds_validation)



As we have seen, modern DNN architectures tend to develop calibration problems more readily than earlier architectures.

One plausible hypothesis here might be that earlier architectures used rather simple features weakly indicative of the target class, which then also can be regarded as reasonably independent - and we rarely would encounter a situation where we accumulated "really strong evidence" where the "independency of (Bayesian) votes" would have been violated badly. With DNNs, we can have a situation where hierarchical extraction of features might lead to "tell-tale sign" type of evidence which (by optimization) gets attributed some large logit-value(s), but is not independent of other such evidence, which we do however implicitly assume by summing.

So, one could have the idea that, perhaps, large accumulated evidence (as input to softmax) should generally be distrusted, and the model is driven to use large evidence-contributions in some cases since, overall, this still improves predictions. So, how about introducing a layer that uniformly "punishes" large evidence before we feed it into softmax? Rather than trying to come up with a principled way to do this, let us here try an ad-hoc approach and attenuate all "large evidence" the same way with a nonlinear function that involves two parameters that are learnable - one determining a "length scale" below which behavior is mostly linear, and one for overall rescaling. We will go with the function `f(x) = A * asinh(B * x)`, with learnable `A` and `B`.

Also, for the sake of this example, we will ignore the question how to perhaps achieve this via creative use of some existing Keras layers (specifically, a "1D" convolution across a hidden feature-vector with an $1\times 1$ kernel and nonlinear activation). Rather, we want to implement our own layer.

For this, we need to do a little bit of OO programming, which we so far did not discuss in this course - implementing a subclass of the `tf.keras.layers.Layer` class. This will have a tiny amount of state - just two parameters - but show all the relevant bits and pieces.

Also mostly for illustration purposes, we give our layer an extra tuning parameter which allows selecting a nonlinearity, with (here) two possible choices.

We mostly follow the [Google/Alphabet Python style guide](https://g3doc.corp.google.com/eng/doc/devguide/py/style/index.md?cl=head) for this code.


In [None]:
# Introducing a new type of Keras layer.
# Base class documentation:
#   https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer


# Here, we are moving the computational body to separate @tf.function-s.
# For this particular example, the body is so simple that this makes
# little sense - we could just have done the computation in-place.
#
# In general, it is useful to have nontrivial transformations which we
# put into Keras layers also available directly as @tf.function functions
# that do not depend on Keras. A typical design would then use one module
# with such @tf.function definitions that can be used independently, plus
# a Keras wrapper that defines a layer using these functions on top of that.

@tf.function
def squash_evidence_asinh(t_evidence, t_param_a, t_param_b):
  """Squashes evidence `E` to `a * asinh(b * E)`."""
  return t_param_a * tf.math.asinh(t_param_b * t_evidence)


@tf.function
def squash_evidence_atan(t_evidence, t_param_a, t_param_b):
  """Squashes evidence `E` to `a * atan(b * E)`."""
  return t_param_a * tf.math.atan(t_param_b * t_evidence)


class EvidenceTweakingLayer(tf.keras.layers.Layer):
  """Layer for uniform nonlinear total-evidence-adjustment.

  Based on the hypothesis that seeing large accumulated evidence may generally
  have violated the Bayesian "independence" assumption beyond what we would
  be comfortable with.
  """

  # Class attributes.
  _NONLINEARITY_BY_TAG = dict(asinh=squash_evidence_asinh,
                              atan=squash_evidence_atan)

  def __init__(self, *, nonlinearity='asinh', **kwargs):
    """Initializes the instance."""
    # Forward parent-class keyword args to parent-class __init__, so that
    # `name=...` etc. args work as for a generic Keras `Layer`.
    super().__init__(**kwargs)
    nonlinearity_func = self._NONLINEARITY_BY_TAG.get(nonlinearity)
    if nonlinearity_func is None:
      raise ValueError(f'Unknown nonlinearity: {nonlinearity!r} - '
                       f'known: {set(self._NONLINEARITY_BY_TAG)!r}')
    # We need to store the __init__() parameters for .get_config(), so that
    # serialization and deserialization of the layer works.
    config = dict(nonlinearity=nonlinearity)
    config.update(**kwargs)
    self._config = config
    self._nonlinearity_func = nonlinearity_func
    # It is generally good practice to make sure that inspection of the
    # __init__ method's body gives clarity about all (public and private)
    # instance attributes.
    self._param_ab = None

  def build(self, input_shape):
    """Sets up layer-state."""
    del input_shape  # Unused by this layer.
    self._param_ab = self.add_weight(shape=(2,),
                                     initializer='random_normal')

  def call(self, inputs):
    """Evaluates the layer."""
    # Note that `inputs` may in general have batch-indices. The code in this
    # method needs to be able to handle data in such a form.
    # Here, the calculation is rather simple.
    return self._nonlinearity_func(inputs, self._param_ab[0], self._param_ab[1])

  def get_config(self):
    """Returns the layer's configuration."""
    # This method is important to ensure that saving and loading a layer
    # (such as: as part of a trained model) can re-create the given layer
    # in the expected form. Here, we need this since we have
    # instantiation-time tweaking parameters.
    #
    # The default .from_config() classmethod works for our use case,
    # since all our config is JSON-serializable.
    return self._config


Let us see if adding our new layer improves things.

In [None]:
cnn_model_tweaked = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(32, 32, 3)),
        tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(50, activation='relu'),
        tf.keras.layers.Dense(10),
        EvidenceTweakingLayer(nonlinearity='asinh'),
    ]
)

cnn_model_tweaked.compile(
    optimizer=tf.keras.optimizers.Adam(1e-4),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

cnn_model_tweaked.fit(
    ds_train,
    epochs=25,
    validation_data=ds_validation)



Let us try saving and loading our model.

In [None]:
cnn_model_tweaked.save('cnn_model_tweaked.h5')
reloaded_cnn_model_tweaked = tf.keras.models.load_model(
    'cnn_model_tweaked.h5',
    custom_objects={'EvidenceTweakingLayer': EvidenceTweakingLayer})


print('Test set accuracy (orig): '
      f'{cnn_model_tweaked.evaluate(ds_test)[1] * 100:.2f}%')
print('Test set accuracy (reloaded): '
      f'{reloaded_cnn_model_tweaked.evaluate(ds_test)[1] * 100:.2f}%')

Test set accuracy (orig): 60.17%
Test set accuracy (reloaded): 60.17%


Here, the classifier performance impact of this little idea was unconvincing - but this is often how things turn out when exploring ideas. Two things matter:

1. Having developed some useful intuition about what might work
   (via experimenting).
1. Being able to quickly try out ideas with little effort.


## TensorFlow and JAX

TensorFlow is Google's ML flagship library. This however does not mean that everybody at Google would do ML exclusively with TensorFlow.

One particularly interesting "not officially supported, currently-under-research" tool which Google also open sourced is [JAX](https://github.com/google/jax).

Tony Hoare is claimed to have said that "inside every large program is a small program struggling to get out", and JAX can be thought of as being such a "small program": On its github page, it describes itself as "Autograd and XLA" (where "[XLA](https://www.tensorflow.org/xla)" is the "accelerated linear algebra" also powering TensorFlow). JAX looks a lot like "numpy with automatic differentiation capabilities".

The Google colab kernel has JAX pre-installed, so let us explore it a bit.

In [None]:
import jax
from jax import numpy as jnp


def box_volume(sides):
  return jnp.prod(sides)

print(box_volume(jnp.array([2, 3, 4, 5], dtype=jnp.float32)))

grad_vol = jax.jit(jax.grad(box_volume))

print('Grad(volume) at [2, 4, 8]:',
      grad_vol(jnp.array([2, 4, 8], dtype=jnp.float32)))

120.0
Grad(volume) at [2, 4, 8]: [32. 16.  8.]


While this certainly looks useful, please note the JAX documentation (at the time of this writing) says:

```
This is a research project, not an official Google product.
Expect bugs and sharp edges.
Please help by trying it out, reporting bugs, and letting us know what you think!
```

# Recap

1. We looked into the structure of the Python language.

   * This was a lot of material.
   * Further on, we used nested functions a lot, used numpy
     vectorization left and right, and having a good model
     of evaluation semantics was relevant for
     understanding how `@tf.function` sees code differently.

1. We looked into the basic principles underlying fast gradients - and their use in numerical optimization.

   * This is the basis for many ML and ML-related frameworks
     (other than TensorFlow and JAX).
   * This also explains the design limitations one would face
     when trying to implement similar such ML-supporting
     infrastructure on top of a framework such as Matlab,
     Octave, Mathematica, Maple, etc.
   * ML frameworks have their limitations, such as having
     little reason to support better-than-float64 numerical
     accuracy. So, when we cannot use them, it is good to see how
     we can do this on our own. Also, a solid understanding
     of sensitivity backpropagation opens up new perspectives
     on other ideas, including Hamiltonian mechanics.
   * We have seen that "solving non-malicious 1000-parameter
     optimization problems numerically" is generally "easy"
     these days.

1. We formulated a first ML problem ("digit-8-recognizer") purely
   as a high-dimensional optimization problem, and from there went
   on to ML.

   * Mental model of supervised ML: "infinite-examples limit
     of k-Nearest-Neighbors classifiers".
   * Major concepts we encountered: "Estimating gradients on
     batches" ("stochastic gradient descent"), some
     "standard constructions" (loss functions, softmax, embeddings),
     generalization-performance-enhancing tricks (early stopping,
     L2 regularization, dropout).
   * We looked into (often entropy-based) explanations of the design
     of some standard constructions.
   * We discussed what generally happens if inference is done
     on examples drawn on a different distribution than
     the one used to get the training set.
   * We briefly discussed "vanishing gradients" and how
     information propagation in a DNN can be seen as a
     percolation problem (via MFA).
   * We observed that TensorFlow can indeed make our life
     much easier as long as we are within the confines of
     what we can build with commonly used "lego bricks".

1. We took a deeper look at TensorFlow
   * We explored a bit of the inner mechanics.
   * We saw how to use it to conveniently formulate high
     dimensional optimization problems in a device-agnostic way
     (and do physics with that).
   * We saw how to wrap up "unusual" computations to make
     them available as Keras layers.

[Author's note: Overall, I may occasionally have talked a bit of nonsense, but this was less than 10% of the time, and physics course material rarely is right more than 90% of the time anyhow. This was the 1st iteration of teaching this material. We might want to refine this for future iterations.

Thanks for participating. And a standing offer from my side: If you need help wiring something up with TensorFlow, you can always ask me via email to take a look. Getting some initial help when starting on some own physics project may be very useful.]


## Addendum: Connecting Mathematica and TensorFlow

Given the large popularity of symbolic algebra packages such as in particular Mathematica in theoretical physics, it makes sense to address some obvious questions around ML and Mathematica.

As we have seen, the changes required to make a major programming language support backpropagation go rather deep, and with both TensorFlow and JAX, there are still some sharp edges that are related to the attempt to blend object-language (for tensor-arithmetic computations - in TensorFlow1, these enter the scene very explicitly as computational graphs) and meta-language (for setting up and manipulating object-language entities - so, Python). Design-wise, it is perhaps debateable if this is the best possible design approach. While a clearer separation between these roles would be conceptually more elegant, there is also a large human element here. In any case, it is clear that bringing backpropagation to any major language is most feasible if the language is designed with simplicity and minimality in mind (such as the Lisp dialect Scheme, where this has indeed been accomplished), or started out with the design idea of properly supporting backpropagation. Retrofitting this into some existent design of a large language is hard. Also, compiling linear algebra to machine code that can execute fast on a range of very different hardware architectures (CPUs and also GPUs) requires major effort to build.

As such, it is not clear when - or even if at all - data manipulation packages that are popular with physicists will implement advanced numerical tensor backpropagation capabilities roughly on par with TensorFlow or JAX. Clearly, major symbolic algebra packages these days do support some form of ML - but typically not in the general-purpose way that would allow us to design and shape rather freely what our models look like. Still, one clearly would want to be able to *utilize* what such advanced capabilities have to offer in a setting where one would not want the ML library to dictate the programming language to use. This raises the question: if we might not get backpropagation with fast compiled linear algebra in the near future in popular symbolic algebra packages, is there perhaps at least a way to utilize such functionality in such a way that we can still use these packages as we are used to, but have them delegate parts of a problem to some other code behind-the-scenes? So, to the user of a Mathematica notebook, it looks as if there simply were a few functions to perform specialized data analysis, but behind the scenes, mostly invisible to the user, these functions exchange data with some other component running TensorFlow code.

This is indeed feasible - and we will look in detail into one concrete possible realization here. Before we do that, we should briefly ponder the solution landscape. There are efforts to introduce file format standards for serialized computation graphs, and modern versions of Mathematica have experimental support for loading and then running inference with models saved in the [ONNX format](https://onnx.ai/about.html) - we produced such a serialized model earlier for classifying MNIST digits. Since this file format is rather new and new versions are currently introduced in short succession that expand the set of supported basic numerical operators, it is quite possible that the dust has to settle a bit first before this file format can be used widely without hassle. If this is available, this might be the best option.

More generally, symbolic algebra packages, like just about every programming language, usually come with a Foreign Function Interface (FFI) that allows code authors to have the runtime kernel utilize other libraries - perhaps call functions from C or Fortran libraries. In principle, it might be possible to integrate TensorFlow(-Lite) into Mathematica at this level. The advantage would be very efficient data exchange between these components. However, given that both projects are large and complex and have their own unique approaches to handling some deep technical problems, it may well be that this causes major friction. Another idea might be to go for less tight coupling and have Mathematica exchange data with another process (perhaps on the same machine) that runs TensorFlow. In Mathematica, connecting to an external process either for one-off or session-based evaluation is supported via the [External Interpreted Languages Interfaces](https://reference.wolfram.com/language/guide/ExternalInterpretedLanguageInterfaces.html). Using this - or also a more bare-bones approach such as one based on Mathematica's [RunThrough](https://reference.wolfram.com/language/ref/RunThrough.html) command - would typically require first serializing data into some textual form, and then communicating requests and responses back-and-forth. Overall, this requires both data-conversion and interprocess communication, so is perhaps less efficient and also more brittle (with two running processes) than a more tightly integrated solution, but also - in principle - would be much easier to build.

Since setting up a ML model is major computational effort that typically then is amortized over all the queries to that model, it makes little sense to use an approach where every query would start a new process and set up the model from saved form first - we want to go with a background server that loads the model at start time and then can be queried as needed.

A general possible concern when starting processes from Mathematica, such as via `RunThrough[]`, is that these processes generally would inherit the Mathematica kernel's process environment, which may well have adjustments to e.g. `LD_LIBRARY_PATH` (on Unix systems), making launched processes by default use a very different collection of common libraries (such as for example `libz.so`) than the system-default - and this might clash with the needs libraries needed by other components in such a set-up. Here, an extra step (such as one more indirection) would likely be needed to switch over to using a less customized library environment.

One basic fallback approach that should nowadays always be available, irrespective of which symbolic algebra system (and version) one uses, is to wire up a simple web server that allows querying ML models via HTTP requests. This is spelled out in detail in the following - and may be a useful basis for trying very similar approaches for other combinations than Mathematica and TFLite.

### Having Mathematica use an external HTTP server to access ML capabilities

Let us look into a basic solution for having Mathematica handle ML tasks via calling a dedicated function that delegates work to a webserver, via a HTTP request. Here, we want to consider running the webserver and the Mathematica process(es) on the same computer, which we assume to be a Unix system - such as a researcher's personal workstation. We would still want to have at least some protection against unrelated users simply accessing the webserver and submitting their own requests without any form of authorization. If the computer is the researcher's private machine, no one else can log in to it, and also all programs running on that computer can be trusted, a lightweight approach would be to simply bind the webserver port only to the loopback interface - making it unreachable from the wider network. If there are other assumed-unprivileged users (i.e. users who cannot read all files or sniff network traffic), they still could in principle access the web server socket (since web services in general cannot be run over Unix domain sockets, only TCP sockets). Here, a very simple authentication mechanism that is similar to what is used by the X Window System might make sense. The idea is that every request needs to include a "secret" which can only be learned by accessing a file on the filesystem. This way, filesystem-based access control can be transmuted into access control for services offered by a server to which every user on the local machine can connect. For X11, the `MIT-MAGIC-COOKIE-1` mechanism shall be our guideline - whatever program can read the `$HOME/.XAuthority` file (or whatever the `XAUTHORITY` environment variable has been set to) and extract the secret within can access and interact with the X display of the logged in user.

We will do something very similar with our own secrets-file:
We want our webserver to write a secret to an agreed-upon location in the user's home directory at start up time, and the Mathematica kernel (which needs to be started afterwards) read that secret file and communicate the secret alongside every HTTP request to our "ML Server". For actually exchanging data, we cannot use the `HTTP GET` method for multiple reasons, an important one being URL length restrictions. Instead, we want to use `HTTP POST` requests that then transport a payload. Let us implement a bare-bones server on the basis of Python's `http` module. We want this server to - at start up time - load one (or multiple) ML models and use them to process requests, differentiating models by URL. The following code-cell is not intended to be run in a colab notebook, but is a copy-pasteable complete executable basic web server implemented in Python that can use `tflite`. We want to serve the `MNIST` tflite model we trained earlier on this course. The code comments indicate some little tricks here and there that were required to make this work despite some software packages having minor issues.


In [None]:
#!/usr/bin/env python3.10

import ast
import base64
import contextlib
import http.server
import threading

import io
import os
import sys

# Hack: at the time of this writing, tflite was only available for
# Python3.10, which however could not use Python3.11's numpy, so I did a:
# python3.10 -m pip install numpy -t ~/TensorFlow-ML/python3.10/site-packages
# ...and we adjust the path here.
# Note: This is for demo purposes only, and an unsound technique.
# Done properly, this would use a Python "virtual environment" (venv).
sys.path.insert(
    0,
    os.path.join(os.getenv('HOME'),
                 'TensorFlow-ML/python3.10/site-packages'))

import numpy
import tflite_runtime.interpreter as tflite


class ServerError(Exception):
  """Generic ML HTTP Server error."""


# This is somewhat slow and perhaps overkill for simply-structured data.
def parse_mathematica(data):
  """Parse mathematica via sympy.parsing."""
  # These imports only do heavy lifting upon first evaluation of the body;
  # subsequent evaluations (which however we do not have here) could
  # re-use this. If we use the below "fast" alternative, this also
  # avoids the `sympy` dependency.
  import sympy
  from sympy.parsing import mathematica
  return sympy.parsing.mathematica.parse_mathematica(data)


def parse_mathematica_simple_fast(data):
  """Ad-hoc parse simple Mathematica data (fast, lightweight)."""
  return ast.literal_eval(data.translate(str.maketrans('{}', '[]', '\n\r')))


def get_mnist_tflite_predictor(model_path):
  interpreter = tflite.Interpreter(model_path=model_path)
  interpreter.allocate_tensors()
  input_details = interpreter.get_input_details()
  output_details = interpreter.get_output_details()
  interpreter_lock = threading.Lock()
  #
  def fn_predict(in_data, verbose=False):
    # Note that this is setting interpreter-state, making
    # this function non-reentrant unless we ensure that
    # interpreter.invoke() calls cannot get mangled by
    # interwoven concurrent set-up / read-out operations.
    # This would matter a lot if we were to use e.g.
    # a http.server.ThreadingHTTPServer - but let's make this
    # robust.
    try:
      interpreter_lock.acquire()
      # Here, we are extra-permissive:
      # Any numerical data matrix/vector that comes in gets zero-padded
      # to at least 28x28 elements, then trimmed to 28x28 elements,
      # then reshaped. This allows us to easily hand-feed data such as
      # {1,2,3} for debugging. No harm in trying to predict
      # from bad-size data here.
      in_array = numpy.pad(
          numpy.asarray(in_data, dtype=numpy.float32).ravel(),
          ((0, 28*28),))[:28*28].reshape(1, 28, 28)
      interpreter.set_tensor(
          input_details[0]['index'], in_array)
      interpreter.invoke()
      output_data = interpreter.get_tensor(output_details[0]['index'])
      return output_data[0, ...].tolist()
    finally:
      interpreter_lock.release()
  #
  return fn_predict


class ML_HTTPServer(http.server.HTTPServer):
  """Server subclass that has access to a ML model.

  Attributes:
    (base class attributes plus...):
    ml_fn_predict_by_urlpath: Mapping of url-path to predictor-function,
      as documented in `__init__`.
  """

  def __init__(self, server_address, request_handler_class, *,
               fn_predict_by_urlpath=(),
               token_file='.ml_server_token'):
    """Initializes the instance.

    Args:
      server_address: The server address, passed on to base class `__init__`.
      request_handler_class: The request handler class, passed on to
        base class `__init__`.
      fn_predict_by_urlpath: data which when passed to `dict()` produces
        a key-value dictionary that maps a URL-path key to a predictor-function;
        Each predictor function is expected to map a single numpy.ndarray-like
        ML-data argument to numerical output data.
      token_file: Path (implicitly-relative to web-user's $HOME env-var)
        to a file where upon webserver startup the server writes a random
        secret. POST web requests must include the secret as the 1st line of
        the payload, proving that the requestor could read the token-file.
        This loosely resembles MIT-MAGIC-COOKIE-1 X Window Authorization and
        protects against independent users on the same machine (who see the
        TCP port bound to the loopback interface) connecting to the webserver
        and issuing their own requests. Does not protect against local
        users with elevated privileges, such as root access to files or
        packet sniffing. If web requests are to be routed from a different
        machine, TLS encryption should be used additionally.
    """
    # Secret handling goes first. Overall, this is a primitive mechanism
    # that one might want to replace with something more advanced,
    # such as HMAC, but the client also needs to support this.
    secret = base64.b64encode(os.getrandom(32)).rstrip(b'=')
    token_path = os.path.join(os.getenv('HOME'), token_file)
    with open(token_path, 'wb') as h_token:
      os.fchmod(h_token.fileno(), 0o700)
      h_token.write(secret)
    # Check if writing the secret succeeded.
    with open(token_path, 'rb') as h_token_re:
      re_secret = h_token_re.read()
    if not re_secret == secret:
      raise ServerError(
          'Could not align on-filesystem secret '
          f'with internal secret - path: {token_path}')
    # Initialize base class instance. We have not yet touched instance-state.
    super().__init__(server_address, request_handler_class)
    self.ml_fn_predict_by_urlpath = dict(fn_predict_by_urlpath)
    self._secret = secret

  def check_secret(self, user_provided):
    """Checks if the user-provided string equals the secret."""
    return self._secret == user_provided


class ML_HTTPRequestHandler(http.server.BaseHTTPRequestHandler):
  """HTTP Request handler that delegates POST to specialist functions.

  Also performs secret-checking on the server.
  """

  def _send_text_plain_response(self, status_code, response):
    self.send_response(status_code)
    self.send_header('Content-Type', 'text/plain')
    self.send_header('Content-Length', str(len(response)))
    self.end_headers()
    self.wfile.write(response)

  def do_POST(self):
    req_urlpath = self.path
    content_length = int(self.headers.get('content-length', 0))
    request_data = self.rfile.read(content_length)
    try:
      # If any of these operations fail, such as due to `req_urlpath`
      # not being in the mapping, the payload not having a b'\n', etc.,
      # the violation of the implicit code-expectation raises an exception,
      # and the `except`-section just returns a `Not Implemented` response.
      user_provided_secret, payload = request_data.split(b'\n', 1)
      if not self.server.check_secret(user_provided_secret):
          # Handled right below.
          # We provide a message since we stderr-output text.
          raise ServerError('Bad secret.')
      parsed_payload = parse_mathematica_simple_fast(payload.decode('utf-8'))
      fn_predict = self.server.ml_fn_predict_by_urlpath[req_urlpath]
      prediction = fn_predict(parsed_payload)
      response = repr(prediction).translate(
          str.maketrans('[]', '{}')).encode('utf-8')
      self._send_text_plain_response(http.HTTPStatus.OK, response)
    except Exception as exn:
      print('ERROR:', repr(exn), file=sys.stderr)
      self._send_text_plain_response(http.HTTPStatus.NOT_IMPLEMENTED,
                                     b'501 - Not Implemented')


def run_server(server_address=('localhost', 8000),
               mnist_model='mnist_model.tflite'):
  fn_predict_mnist = get_mnist_tflite_predictor(mnist_model)
  httpd = ML_HTTPServer(server_address,
                        ML_HTTPRequestHandler,
                        fn_predict_by_urlpath={
                            '/mnist': fn_predict_mnist,
                            })
  httpd.serve_forever()


if __name__ == '__main__':
  run_server()


If we start this webserver, the following Mathematica code illustrates how to then wrap up delegation of a ML inference task to this external server.

```
(* Server URL Authentication Secret *)
mlServerURL := "http://localhost:8000/mnist";
mlServerTokenPath:=FileNameJoin[{$HomeDirectory,".ml_server_token"}];
mlServerAuthSecret = Import[mlServerTokenPath, "Text"];
mlServerAuthSecretLength := {"Secret Length", StringLength[mlServerAuthSecret]};
Print[mlServerAuthSecretLength];


(* Running external classification *)
TFClassifyMNIST[numdata_]:=Module[{body, result},
body=mlServerAuthSecret<>"\n"<>ExportString[numdata,"String"];
result=URLFetch[mlServerURL,"Method"->"POST","Body"->body];
result]


(* Demo - Example Input *)

(* For illustration: Create 28x28 input from 7x7 text. *)

as28x28[textImage7x7_]:=KroneckerProduct[
ToExpression[#/.{"#"->1.0,"."->0.0}]&/@Characters[textImage7x7],
ConstantArray[1, {4,4}]]

demoDigit := as28x28[{
".......",
"..####.",
"..#..#.",
"..####.",
"..#..#.",
"..####.",
"......."}]

(* Running Classification *)

MLResponse:=TFClassifyMNIST[demoDigit];
Print[MLResponse]
(* Produces:
{-0.12768815457820892, -3.9151949882507324,
 -1.174180269241333, -2.3871653079986572,
 0.34610337018966675, 3.6387715339660645,
 2.232551097869873, -4.550597667694092,
 4.761049270629883, 0.7428076267242432}
*)


```

This basic template can (perhaps with a bit of help from a Unix wizard) be adjusted to other, similar tasks.