Skip to content

Latest commit

 

History

History

bart

bart

Intro

This repo containerizes BART into a serving container using fastapi.

CPU and GPU inference supported.

The model license can be found here

Setup

  1. Clone repo if you haven't, Navigate to the bart folder.

  2. Build container. Don't forget to change the project_id to yours.

    docker build . -t gcr.io/{project_id}/bart:latest
  3. Run container. No GPU is needed for this model.

    docker run --rm -p 80:8080 -e AIP_HEALTH_ROUTE=/health -e AIP_HTTP_PORT=8080 -e AIP_PREDICT_ROUTE=/predict gcr.io/{project_id}/bart:latest
  4. Make predictions

    python test_container.py

Deploy in Vertex AI.

You'll need to enable Vertex AI and have authenticated with a service account that has the Vertex AI admin or editor role.

  1. Push the image

    gcloud auth configure-docker
    docker push gcr.io/{project_id}/bart:latest
  2. Deploy in Vertex AI Endpoints.

    python ../gcp_deploy.py --image-uri gcr.io/<project_id>/bart:latest --accelerator-count 0 --model-name bart --endpoint-name bart-endpoint --endpoint-deployed-name bart-deployed-name
  3. Test the endpoint.

    python generate_request_vertex.py