<h1>Integrating OpenAI's Whisper model with BlindBox</h1>
______________________________

## Introduction
______________________________

In this tutorial, we're going to walk through how we created a Whisper application image that can be deployed with BlindBox. We will cover two key steps:

1. How we created a BlindBox-compatible API for the OpenAI Whisper model using [FastAPI](https://fastapi.tiangolo.com/).
2. How we created our Docker image for our API.

You can see how we deploy the image with BlindBox in the [quick-tour](../getting-started/quick-tour.ipynb).

Let's dive in!

## Pre-requisites
____________________

To follow along with this tutorial, you will need to:

+ Have Docker installed in your environment. Here's the [Docker installation guide](https://docs.docker.com/desktop/install/linux-install/).

## Our Whisper FastAPI
_______________________

Our first task in deploying the Whisper OpenAI model with BlindBox was to create an API so that our end users will be able to query the model. We did this using the FastAPI library which allows us to quickly assign functions to API endpoints.
 
The full code we use to do this is available in the `server.py` file in the `ai_server_example` folder on BlindBox's official GitHub repository.

In [None]:
!git clone https://github.com/mithril-security/blindbox
!cd ai_server_example

There are three key sections in this code:

### Initial set-up

Firstly, we load the OpenAI tiny English Whisper model from Hugging Face, as well as initializing out API.

```python
# Some settings
STT = "openai/whisper-tiny.en" # Model name (HuggingFace)
MODEL_STORE_ADDR = "172.17.0.1" # Address of the model store

# Initialize our FastAPI API object
app = FastAPI()

# Load model and tokenizer
whisper_processor = load_from_store(STT, WhisperProcessor, MODEL_STORE_ADDR)
whisper_model = load_from_store(STT, WhisperForConditionalGeneration, MODEL_STORE_ADDR)
whisper_model.eval()
```

### Creating a predict endpoint

Secondly, we create a `/whisper/predict` POST endpoint on our FastAPI application object. This endpoint will convert the audio file to a tensor and then query the model with our input data.

```python
# This is a POST endpoint located at /whisper/predict
@app.post("/whisper/predict")
@async_speech_to_text_endpoint(sample_rate=16000) # We use the async_speech_to_text_endpoint to handle conversion from an audio file to a tensor

async def predict(x: np.ndarray) -> str:
    
    input_features = whisper_processor(
        x, sampling_rate=16000, return_tensors="pt"
    ).input_features
    
    # We query the model through the runner
    predicted_ids = await whisper_runner.submit(input_features)
    
    # We decode our results
    transcription = whisper_processor.batch_decode(
        predicted_ids, skip_special_tokens=True
    )

    return transcription[0]
```

We created a runner for querying the model which uses adaptive batching and a separate thread to avoid blocking the event loop since this process is quite intensive.

The runner is launched when the server starts up and executes the `run_whisper` function when a batch is ready to be processed.

```python
# A function that wraps prediction code and that will be executed by the runner
def run_whisper(x: torch.Tensor) -> torch.Tensor:
    return whisper_model.generate(x, max_length=128)


# Define a runner (i.e. the given function will be run on a separate thread with adaptive batching)
whisper_runner = BatchRunner(
    run_whisper,
    max_batch_size=256,
    max_latency_ms=200,
    collator=TorchCollator(),
)

app.on_event("startup")(whisper_runner.run) # Schedule the runner to run when the server is up
```

### Launching our server

Finally, we deploys our API on a python ASGI `uvicorn` server (an asynchronous web server) on `port 80`. It is essential to use port 80 as BlindBox will need to be able to communicate with our application on this port!

```python
if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=80)

```

To sum up, we packaged the Whisper model as an API by doing the following:

+ Creating an API app object that "configures" the `uvicorn` server by providing handlers for specific endpoints

+ Creating a `whisper/predict` endpoint which in turn queries the Whisper model.

+ Deploy our API on our server on `port 80`.

## Packaging our application in a Docker image
________________________________

Once we had created out Whisper API, all that was left to do was create a **Docker image** for our application that could then be deployed in BlindBox. Let's take a look at the Dockerfile we used to do this:

```docker
FROM python:3.10.10-bullseye as base

# install necessary dependencies
RUN pip install \
    torch==1.13.1 \
    transformers==4.26.1 \
    fastapi==0.95.0 \
    python-multipart==0.0.6 \
    uvicorn==0.21.1 \
    soundfile==0.12.1 \
    messages \
    librosa==0.10.0 \
    pydantic==1.10.7 \
    requests==2.28.2 \
    --extra-index-url https://download.pytorch.org/whl/cpu

COPY batch_runner.py /
COPY collators.py /
COPY messages.py /
COPY model_store.py /
COPY serializers.py /
COPY server.py /

# signal our app runs on port 80
EXPOSE 80

# launch our server
CMD ["python server.py"]
```

> Same as for the application code, this file can be viewed in the `ai_server_example` folder on the official BlindBox GitHub repository.

There are no complex requirements for the Docker image, but it is recommended to `EXPOSE` port 80 to signal that the application will be running on port 80 within our BlindBox.

## Conclusions
_______________________

The Whisper app is now ready to be built and deployed on BlindBox!

You can see exactly how we do this in our [Quick Tour](../getting-started/quick-tour.ipynb).
 
In this tutorial, we've seen how we can:
+ Create a BlindBox-compatible application
+ Create an application image for our application, ready to be built and deployed on BlindBox!