# Santacoder using FastAPI
______________________________

## Introduction
______________________________

In this tutorial, we're going to walk through how we created an application image for deploying the [Santacoder LLM model](https://huggingface.co/bigcode/santacoder) with BlindBox. The Santacoder model performs code generation, it fills in code when given the start of a code block. By deploying Santacoder with BlindBox, SaaS owners can be sure the code they sent the model is kept confidential at all times and is not exposed to the service provider. 

We will cover two key steps in this tutorial:

1. How we **created a BlindBox-compatible API** for the model.
2. How we **created the Docker image** for our API.

> You can see how we deploy the image with BlindBox in the [Quick tour](https://blindbox.mithrilsecurity.io/en/latest/docs/getting-started/quick-tour/).

Let's dive in!

## Pre-requisites
____________________

To follow along with this tutorial, you will need to:

+ Have Docker installed in your environment. Here's the [Docker installation guide](https://docs.docker.com/desktop/install/linux-install/).

## Packaging Santacoder with FastAPI
_______________________

Our first task in deploying the **Whisper OpenAI** model with **BlindBox** was to create an API so that our end users will be able to query the model. We did this using the **FastAPI library** which allows us to quickly assign functions to API endpoints.
 
The full code we use to do this is available in the `server.py` file in the `examples/santacoder` folder on BlindBox's official GitHub repository.

In [None]:
!git clone https://github.com/mithril-security/blindbox
!cd examples/santacoder

There are three key sections in this code:

### Initial set-up

Firstly, we load the santacoder model from Hugging Face, and initialize our API.

```python
 # specify HuggingFace model name
model_name = "bigcode/santacoder"

 # get tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

 # for GPU usage or cpu for CPU usage
device = "cuda"

 # get model and call eval
model = AutoModelForCausalLM.from_pretrained(model_name).eval()

 # optional: optimize for cpu inference
model = ipex.optimize(model)

 # initalize our API object
app = FastAPI()
```

### Creating an endpoint

Secondly, we create a `/generate` POST endpoint on our FastAPI application object. The user will be able to send their prompt to this endpoint and get back the model's generated code.

```python

class GenerateRequest(BaseModel):
    # user input
    input_text: str
    # set token limit
    max_new_tokens: int = 128


@app.post("/generate")
def generate(req: GenerateRequest):

    # prepare input for model
    inputs = tokenizer(req.input_text, return_tensors="pt").to(device)

    # run santacoder on inputs and get returned response
    outputs = model.generate(inputs)

    # convert output to string
    text = tokenizer.decode(outputs[0])

    # return text output
    return {"text": text}
```

### Launching our server

Finally, we deploys our API on a python ASGI `uvicorn` server (an asynchronous web server) on `port 80`. It is essential to use **port 80** as BlindBox will need to be able to communicate with our application on this port!

```python
if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=80)
```

To sum up, we packaged the Whisper model as an API by doing the following:

+ Creating an API app object that "configures" the `uvicorn` server by providing handlers for specific endpoints

+ Creating a `generate` endpoint which in turn queries the santacoder model.

+ Deploy our API on our server on `port 80`.

## Packaging our application in a Docker image
________________________________

Once we had created out Whisper API, all that was left to do was create a **Docker image** for our application that could then be deployed in BlindBox. Let's take a look at the Dockerfile we used to do this:

```docker
FROM python:3.10.10-bullseye as base

# install dependencies
RUN pip install \
    torch==1.13.1 \
    transformers==4.26.1 \
    fastapi==0.95.0 \
    python-multipart==0.0.6 \
    uvicorn==0.21.1 \
    pydantic==1.10.7 \
    intel_extension_for_pytorch==1.13.100 \
    --extra-index-url https://download.pytorch.org/whl/cpu

# copy our app code to container
COPY ./server.py ./

# signal that our application runs on port 80
EXPOSE 80

# run our app server
CMD python ./server.py
```

> Same as for the application code, this file can be viewed in the `examples/santacoder` folder on the official BlindBox GitHub repository.

There are no complex requirements for the Docker image, but it is recommended to `EXPOSE` port 80 to signal that the application will be running on port 80 within our BlindBox.

## Conclusions
_______________________

The Santacoder app is now ready to be built and deployed on BlindBox! You can see exactly how we do this in our [Quick Tour](https://blindbox.mithrilsecurity.io/en/latest/docs/getting-started/quick-tour/).
 
In this tutorial, we've seen how we can:
+ Create a **BlindBox-compatible application**
+ Create an **application image**, ready to be built and deployed on BlindBox!