# Setup

In [None]:
!pip install pip==24.0
if "bootstrap" not in locals() or bootstrap.run:
    # path management for Python
    pythonpath, = !echo $PYTHONPATH
    if "." not in pythonpath.split(":"):
        pythonpath = ".:" + pythonpath
        %env PYTHONPATH={pythonpath}
        !echo $PYTHONPATH

    # get both Colab and local notebooks into the same state
    !wget --quiet https://raw.githubusercontent.com/lukakeso/FOG/main/bootstrap.py -O bootstrap.py
    import bootstrap

    # change into the correct directory
    FOG = bootstrap.get_repo()
    bootstrap.change_to_lab_dir()

    bootstrap.run = False  # change to True re-run setup

!pwd
%ls

In [None]:
import torch
import pytorch_lightning as pl
from IPython.display import display, HTML, IFrame

full_width = True
frame_height = 720  # adjust for your screen

if full_width:  # if we want the notebook to take up the whole width
    # add styling to the notebook's HTML directly
    display(HTML("<style>.container { width:100% !important; }</style>"))
    display(HTML("<style>.output_result { max-width:100% !important; }</style>"))

To recap, our model staging workflow,
which does the hand-off between training and production, looks like this:

1. Get model weights and hyperparameters
from a tracked training run in W&B's cloud storage.
2. Reload the model as a `LightningModule` using those weights and hyperparameters.
3. Call `to_torchscript` on it.
4. Save that result to W&B's cloud storage.


Here in this notebook,
rather than training or scripting a model ourselves,
we'll just `--fetch`
an already trained and scripted model binary:

In [None]:
%run training/stage_model.py --fetch --entity=cfrye59 --from_project=FOG

## Running our more portable model via a CLI

Now that our TorchScript model binary file is present,
we can spin up our text recognizer
with much less code.

We just need a compatible version of PyTorch
and methods to convert
our generic data types
(images, strings)
to and from PyTorch `Tensor`s.

We can put all this together in
a single light-weight object,
the `ParagraphTextRecognizer` class:

In [None]:
from text_recognizer.paragraph_text_recognizer import ParagraphTextRecognizer

ptr = ParagraphTextRecognizer()

And from there,
we can start running on images
and inferring the text that they contain:

In [None]:
from IPython.display import Image

example_input = "text_recognizer/tests/support/paragraphs/a01-077.png"

print(ptr.predict(example_input))
Image(example_input)

# Building a simple model UI

We use the
[`gradio` library](https://gradio.app/),
which includes a simple API for wrapping
a single Python function into a frontend
in addition to a less mature, lower-level API
for building apps more flexibly.



The core component is a script,
`app_gradio/app.py`,
that can be used to spin up our model and UI
from the command line:

In [None]:
%run app_gradio/app.py --help

One very nice feature of `gradio`
is that it is designed to run as easily
from the notebook as from the command line.

In [None]:
from app_gradio import app

frontend = app.make_frontend(ptr.predict)



We can spin up our UI with the `.launch` method,
and now we can interact
with the model from inside the notebook.


In [None]:
frontend.launch(share=True, width="100%")

For 72 hours, we can also access the model over the public internet
using a URL provided by `gradio`:

In [None]:
print(frontend.share_url)

You can point your browser to that URL
to see what the model looks like as a full-fledged web application,
instead of a widget inside the notebook.

Once done,
turn off the Gradio interface by running the `.close` method.

In [None]:
frontend.close()

# Wrapping a model into a model service

With the current setup our model is running in the same place as our frontend.

This is simple, but it ties too many things together.

First, it ties together execution of the two components.

If our ML model stops responding or there is a DNN bug,
the server goes down.
The same applies in reverse --
the only API for the model is provided by `gradio`,
so a frontend issue means the model is inaccessible.

That's bad because the server and the model scale differently.
Running the server at scale has different memory and computational requirements
than does running the model at scale.

Luckily, there is an easier way: "serverless cloud functions",
so named because
- they are run intermittently, rather than 24/7, like a server.
- they are run on cloud infrastructure.
- they are, as in
[purely functional programming](https://en.wikipedia.org/wiki/Purely_functional_programming)
or in mathematics, "pure" functions of their inputs,
with no concept of state.

We use AWS's serverless offering,
[AWS Lambda](https://aws.amazon.com/lambda/).

In [None]:
from api_serverless import api

In [None]:
import json

from IPython.display import Image
import requests  # the preferred library for writing HTTP requests in Python

lambda_url = "https://3akxma777p53w57mmdika3sflu0fvazm.lambda-url.us-west-1.on.aws/"
image_url = "https://fsdl-public-assets.s3-us-west-2.amazonaws.com/paragraphs/a01-077.png"

headers = {"Content-Type": "application/json"}
payload = json.dumps({"image_url": image_url})

response = requests.post(  # we POST the image to the URL, expecting a prediction as a response
    lambda_url, data=payload, headers=headers)
pred = response.json()["pred"]  # the response is also json

print(pred)

Image(url=image_url, width=512)

## Local in the front, serverless in the back

The primary "win" here
is that we don't need to run
the frontend UI server
and the backend model service in
the same place.

For example,
we can run a Gradio app locally
but send the images to the serverless function
for prediction.

Our `app_gradio` implementation supports this via the `PredictorBackend`.

In [None]:
serverless_backend = app.PredictorBackend(url=lambda_url)

The frontend doesn't care where the inference is getting done or how.

A `gradio.Interface`
just knows there's a Python function that it invokes and then
waits for outputs from.

Here, that Python function
makes a request to the serverless backend,
rather than running the model.

In [None]:
frontend_serverless_backend = app.make_frontend(serverless_backend.run)

frontend_serverless_backend.launch(share=True)