<a href="https://colab.research.google.com/github/rastringer/vertex-ai-examples/blob/main/cloud_run_spacy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deploying NLP apps on Cloud Run

Cloud Run is a fully managed serverless platform on Google Cloud Platform (GCP) that lets you run containers without having to worry about managing servers or infrastructure. It's designed for modern, microservices-based applications that need to scale automatically and efficiently.

In this brief tutorial, we will make a simple web app that loads the spaCy library and performs entity recognition on submitted text files and URLs.

In [1]:
!mkdir spacy && cd spacy

In [2]:
%%writefile spacy/main.py

import os
import sys
from flask import Flask, render_template, request
import spacy
import requests
from bs4 import BeautifulSoup

# Create the Flask app at the module level
app = Flask(__name__)

# Load spaCy model once when the application starts
print("Loading spaCy model...")
try:
    nlp = spacy.load("en_core_web_lg")
    print("spaCy model loaded successfully.")
except OSError as e:
    print(f"Error loading spaCy model: {e}", file=sys.stderr)
    print("Python executable:", sys.executable, file=sys.stderr)
    print("Python version:", sys.version, file=sys.stderr)
    print("spaCy version:", spacy.__version__, file=sys.stderr)
    print("Current working directory:", os.getcwd(), file=sys.stderr)
    print("Contents of current directory:", os.listdir(), file=sys.stderr)
    raise

@app.route("/", methods=["GET", "POST"])
def index():
    if request.method == "POST":
        input_type = request.form["input_type"]
        input_data = request.form["input_data"]

        if input_type == "file":
            # Process uploaded file
            file = request.files["file"]
            if file:
                text = file.read().decode("utf-8")
            else:
                return "No file uploaded", 400
        elif input_type == "url":
            # Fetch and process web page content
            response = requests.get(input_data)
            soup = BeautifulSoup(response.content, "html.parser")
            text = soup.get_text()
        else:
            return "Invalid input type", 400

        # Count characters
        char_count = len(text)

        # Perform NER
        doc = nlp(text)
        entities = [(ent.text, ent.label_) for ent in doc.ents]

        return render_template("results.html", entities=entities, char_count=char_count)

    return render_template("index.html")

# This block is now outside the if __name__ == "__main__" check
port = int(os.environ.get("PORT", 8080))

if __name__ == "__main__":
    app.run(debug=True, host="0.0.0.0", port=port)

Writing main.py


In [3]:
!mkdir templates && cd templates

In [10]:
%%writefile spacy/templates/index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>NER App</title>
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet">
</head>
<body>
    <div class="container mt-5">
        <h1 class="mb-4">Named Entity Recognition</h1>
        <form method="POST" enctype="multipart/form-data">
            <div class="mb-3">
                <div class="form-check form-check-inline">
                    <input class="form-check-input" type="radio" name="input_type" id="file_radio" value="file" checked>
                    <label class="form-check-label" for="file_radio">Upload File</label>
                </div>
                <div class="form-check form-check-inline">
                    <input class="form-check-input" type="radio" name="input_type" id="url_radio" value="url">
                    <label class="form-check-label" for="url_radio">Enter URL</label>
                </div>
            </div>
            <div class="mb-3">
                <input type="file" class="form-control" name="file" id="file">
                <input type="text" class="form-control" name="input_data" id="url" placeholder="Enter URL" style="display: none;">
            </div>
            <button type="submit" class="btn btn-primary">Process</button>
        </form>
    </div>

    <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.bundle.min.js"></script>
    <script>
        document.querySelectorAll('input[name="input_type"]').forEach((elem) => {
            elem.addEventListener("change", function(event) {
                var file = document.getElementById("file");
                var url = document.getElementById("url");
                if (event.target.value === "file") {
                    file.style.display = "block";
                    url.style.display = "none";
                } else {
                    file.style.display = "none";
                    url.style.display = "block";
                }
            });
        });
    </script>
</body>
</html>

Writing spacy/templates/index.html


In [7]:
%%writefile templates/results.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>NER Results</title>
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet">
</head>
<body>
    <div class="container mt-5">
        <h1 class="mb-4">Named Entity Recognition Results</h1>
        <div class="alert alert-info" role="alert">
            Text contains <strong>{{ char_count }}</strong> characters.
        </div>
        <div class="table-responsive">
            <table class="table table-striped table-hover">
                <thead>
                    <tr>
                        <th>Entity</th>
                        <th>Label</th>
                    </tr>
                </thead>
                <tbody>
                    {% for entity, label in entities %}
                    <tr>
                        <td>{{ entity }}</td>
                        <td><span class="badge bg-primary">{{ label }}</span></td>
                    </tr>
                    {% endfor %}
                </tbody>
            </table>
        </div>
        <a href="/" class="btn btn-primary mt-3">Back to Home</a>
    </div>
    <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.bundle.min.js"></script>
</body>
</html>

Writing templates/results.html


In [8]:
%%writefile Dockerfile

# Use the official lightweight Python image.
FROM python:3.9-slim

# Allow statements and log messages to immediately appear in the logs
ENV PYTHONUNBUFFERED True

# Copy local code to the container image.
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./

# Install production dependencies.
RUN pip install --no-cache-dir -r requirements.txt

# Download spaCy model during build
RUN python -m spacy download en_core_web_lg

# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app

/bin/bash: line 1: cd: too many arguments


In [11]:
%%writefile spacy/requirements.txt

Flask==3.0.3
spacy==3.7.6
requests==2.32.3
beautifulsoup4==4.12.3
gunicorn==23.0.0

Writing spacy/requirements.txt


### The Cloud Run part

With the Google Cloud SDK installed, run

```
gcloud init
```

then set your project with billing enabled:

```
gcloud config set project <your project id>
```

Enable the necessary IAM permissions either for your admin account or a service account (service account example here):

```
gcloud projects add-iam-policy-binding PROJECT_ID \
    --member=serviceAccount: PROJECT_ID-compute@developer.gserviceaccount.com \
    --role=roles/cloudbuild.builds.builder
```

Now the simple command:

```
gcloud run deploy
```

Will build the container and deploy the service.
Head to the Cloud Run [console](https://console.cloud.google.com/run/) to look at customization and scale options.

