# Deploy your own Whisper API
In [whisper-example.ipynb](whisper-example.ipynb), we ran Whisper from a notebook, but what if we now want to use it in production to host our own API?\
Thanks to [Gradient Deployment](link), we can now do that in a very short time.\
There are many ways to build a containerised app for deployment, we are going to use our pre-built **Simple Serving Framework (SSF)** container image which can deploy our model with the following workflow:

1 - Upload model files and its interface and API description to Gradient Models\
2 - In the deployment specs, use the image "graphcore/simple-serving-framework" and specify our Gradient Model Id and our token in an environment variable.\
3 - SSF container will automatically download the model at startup, run an inference loop and generate an endpoint following the API description.

![diagram](deployment-workflow.png)

### App interface and SSF configuration
To interface our model with SSF we provide 2 files (you can find more information how to write these in SSF documentation) :
- [deployment/app.py](deployment/app.py): Application interface, it defines how one instance of your model will behave for different events (build, startup, request, shutdown).
- [deployment/config.yml](deployment/config.yml): yaml file to describe all the necessary information for SSF to build your API endpoint.

### Uploading our model files to Gradient Model

In [3]:
!pip install -U gradient > /dev/null

[0m

First, we will need an existing Gradient project Id to specify and a valid deployment name (that isn't used already)

In [4]:
projectId = "pvsqvrxz5fp"
deploymentName = "whisper"

Let's also log in gradient CLI and store our Gradient API token as a project secret

In [6]:
import getpass
psToken = getpass.getpass(prompt='Paperspace token?')
!gradient apiKey $psToken
# also add the token in our project secret, this will be useful later
!gradient secrets set project \
  --id $projectId\
  --name MyApiToken \
  --value $psToken
del psToken

Paperspace token? ······························


[0mSuccessfully added your API Key to /root/.paperspace/config.json. You're ready to go![0m
[0m[0m[0mSet project secret 'MyApiToken'[0m
[0m[0m

Now we can use Gradient CLI to upload our model files

In [7]:
!gradient models upload --name "Whisper" --modelType "custom" --projectId $projectId "./deployment"
modelId=!(gradient models list|grep Whisper|awk '{print $4}')

100% (7386 of 7386) |####################| Elapsed Time: 0:00:00 ETA:  00:00:00[0m[0m[0m[0m
[0m[K[0m[?25h[0m[0mModel uploaded with ID: mos9nektd2stq5o[0me:  0:00:00[0m[0m
[0m[K[0m[?25h[0m[0m

### Write a Gradient Deployment spec file
Following [Gradient documentation](https://docs.paperspace.com/gradient/deployments/deployment-spec), we specify the public image `ssf:latest`\
To link our model, we pass it through the special env variable `SSF_OPTIONS`, it contains the command to run SSF.

- `--config gradient-model:<model-id>|<ssf-config-path>` : used to download our model from Gradient and find our SSF config
- `clean build run` : Indicate to ssf to execute 3 steps : repo initialisation (clean), app's build step and finally run the container.
- `--deployment-api-key API_KEY` indicates that our token is stored in the variable API_KEY.\
Since SSF container will need to download our model from Gradient we need to pass our token secretely via `API_KEY` variable using Gradient `secret` set earlier : `secret:MyApiToken`

In [8]:
spec = f"""enabled: true
image: gc0alexandrec/deployment-test:latest
containerRegistry: alexc-personal
port: 8100
env:
    - name: SSF_OPTIONS
      value: '--config gradient-model:{modelId[0]}|config.yml --deployment-api-key API_TOKEN clean build run'
    - name: API_TOKEN
      value: secret:MyApiToken
resources:
  replicas: 1
  instanceType: IPU-POD4

"""

specs_file = open("spec.yml", "w+")
specs_file.write(spec)
specs_file.close()

### Deploy!
You are now ready to lauch the deployment using the spec file!

In [9]:
!gradient deployments create --name $deploymentName --projectId $projectId --spec spec.yml --clusterId clehbtvty

Created deployment: 5309b306-b0fe-4447-b647-3383c65c9092[0m
[0m[0m

#### That's it! You can now check that your deployment appears on Paperspace UI


In [10]:
%%capture deploymentId
!(gradient deployments list|grep whisper|awk '{print $4}'|tr -d '\n')

In [11]:
project_url = "https://console.paperspace.com/graphcorepaperspace/projects/" \
    + str(projectId) + "/gradient-deployments/" + str(deploymentId) + "/overview"
!(gradient deployments get --id $deploymentId)> deployment_info.json
endpoint=!(jq .deploymentSpecs[0].endpointUrl deployment_info.json | tr -d '"')
print("Deployment main page: ", project_url)
print("Endpoint docs: ","https://"+ endpoint[0]+"/docs")

Deployment main page:  https://console.paperspace.com/graphcorepaperspace/projects/pvsqvrxz5fp/gradient-deployments/5309b306-b0fe-4447-b647-3383c65c9092/overview
Endpoint docs:  https://d5309b306b0fe4447b6473383c65c9092.clehbtvty.paperspacegradient.com/docs


The deployment may take some time to startup completely, 
let's ping our server readiness/ endpoint to wait until it'sready to take requests.

In [16]:
import requests
import time
url = "https://"+ endpoint[0]+"/"
code = 0
while code != 200:
    response = requests.get(url)
    code = response.status_code
    print("fetching...")
    time.sleep(10)
print("Server ready")

fetching
fetching
fetching
fetching
fetching
fetching
fetching
fetching
fetching
fetching
fetching
fetching
fetching
fetching
fetching
fetching
fetching
fetching
fetching
fetching
OK


#### If you are here, the server should now be ready, let's sent a request to our API:
Let's test it with an example file.

In [18]:
import requests
import time
url = "https://"+ endpoint[0]+"/v1/whisper_transcription"
audio_file = open("example_audio.wav", "rb")
response = requests.post(url, files={"audio_file": audio_file})
print(response.json())

{'result': " But what if somebody decides to break it? Be careful that you keep adequate coverage, but look for places to save money. Maybe it's taking longer to get things squared away than the bankers expected. Hiring the wife for one's company may win her tax-aided retirement income. The boost is helpful, but inadequate. New self-deceiving rags are hurriedly tossed on the two naked bones. What a discussion can ensue when the title of this type of song is in question. There is no dying or waxing or gassing needed."}


**Notice that our response may be shorter than the full audio due to the maximum number of token configured in our model.**

We can now decide to shutdown our server to free the resources

In [19]:
!gradient deployments delete --id $deploymentId

Deleted deployment: 5309b306-b0fe-4447-b647-3383c65c9092[0m
[0m[0m