Skip to content

A tutorial on using modern tools such as DVC, Poetry, Github Actions, Pytorch for developing and deploying AI.

Notifications You must be signed in to change notification settings


Repository files navigation

Face Recognition with modern tools

This repository covers the following approaches:

  1. Data version control (DVC)
  2. Poetry
  3. Pytorch
  4. VSCode with LambdaLabs extension
  5. FastAPI
  6. Github Actions
  7. Dotenv

Poetry Setup

  1. Install poetry
curl -sSL | python3 -
  1. Navigate to the local Git repository and initialize a poetry project:
poetry init

The prompts will guide you to provide relevant information as below

alt text

  1. Add dependencies to pyproject.toml under [tool.poetry.dependencies] section.

  2. Install dependencies

poetry install

If you want to use poetry for managing dependies alone, you can set package-mode = false in pyproject.toml file under [tool.poetry].

DVC Setup

  1. Install dvc in the virtual environment
pip install dvc dvc[gdrive]
  1. Initialize dvc in the root directory of the project
dvc init
  1. Associate the remote location with dvc (we are using Google Drive in this case)
dvc remote add --default <remote name> <remote url>

provide a custom name as <remote name>. To find out remote url:

  • navigate to the Google Drive folder and fetch the id from the address bar. For example, you will see a url like this:, where the id corresponds to the end part, i.e. 1jos7TNeasdfd45545353P.

  • assemble the url as gdrive://1jos7TNeasdfd45545353P.

  1. Authenticate using a Google Service Account

    • set up a service account following this official link using Google Cloud Platform (GCP)
    • enable Google Drive API using API and Services
    • creat a new key and download .JSON with credentials and save them to a local directory
    • set up an environment variable with key GDRIVE_CREDENTIALS_DATA and value being the content of the JSON.
    • share the Google Drive folder with the service account email, that should look something like this:
    • set remote to use service accounts:
    dvc remote modify <remote name> gdrive_use_service_account true
  2. Add a folder to the dvc

dvc add <folder path>
  1. Check if authentication passes by running a status command:
    dvc status -c
    # If you have a local folder that you would like to push, use `dvc add <folder path>` and `dvc push`, else, perform `dvc pull` to fetch the data from remote - a process similar to `git`

if you face an issue, please use Troubleshooting tips below.

CI setup - locally with act and on Github using actions

There are two ways in which CI runs are set up in this repository: 1) locally using act, and 2) on GitHub. Both the ways employ Github actions (see here). Let's start with setting up CI runs using act.

a) Using act

When a CI run is configured with .github/workflows/ci.yml, act reads the yml file and determines the set of jobs or actions that are required to be run locally. This means act has to emulate all the functionalities that are set up on Github including dvc on the local machine. Here are the steps:

# 1. check workflow
act -l

# 2. update the dvc configuration in the ci.yml to take care of dvc configs
  name: Pull DVC managed files
    run: |
        echo "Configuring DVC remote"
        dvc remote modify gdrive gdrive_use_service_account true
        echo "${GDRIVE_CREDENTIALS_DATA}" > gdrive_credentials.json
        dvc remote modify gdrive gdrive_service_account_json_file_path \
        echo "Pulling DVC managed files"
        dvc pull -v

# 3. run the job as per .github/workflow/ci.yml. The example here is for MacOS M2 chip
act --container-architecture linux/amd64 -s \


  • DVC setup:

    • If authentication fails while using any of the dvc commands:
      • ensure GDRIVE_CREDENTIALS_DATA is set up as a local environment variable.
      • check if Google Drive API is enabled in the service account on GCP
      • Clear cache and retry, typically, cache is found at: $CACHE_HOME/pydrive2fs/{gdrive_client_id}/default.json
  • If dvc pull fails with dvc pull: error: failed to pull data from the cloud - 'gdrive', check if the gdrive remote is set up correctly.

  • If you are trying to pull data from a remote location but did not add the folder to dvc, you will get an error like dvc pull: error: data 'data' not found in cache or in remote storage 'gdrive'.

  • CI with act:

    • secrets are to be supplied with -s option
    • if you have functions that require GITHUB_TOKEN, you can skip the function as:
      - name: Report test results to Test Reporter
        uses: dorny/test-reporter@v1
        if: always() && env.ACT_RUN != 'true'
          name: generate test reports
          path: test-results/*.xml
          reporter: java-junit
          fail-on-error: false
          fail-on-empty: false
          token: ${{ secrets.GITHUB_TOKEN }}

Github Actions

  1. Create a .github/workflows directory in the root of the repository.

  2. Create a ci.yml file in the .github/workflows directory and add the following code:

name: Face Recognition CI

      - main
      - main

        runs-on: ubuntu-latest
        - uses: actions/checkout@v2
        - name: Set up Python 3.8
        uses: actions/setup-python@v2
            python-version: 3.8
        - name: Install dependencies
        run: |
            python -m pip install --upgrade pip
            pip install -r requirements.txt
        - name: Run tests
        run: |


A tutorial on using modern tools such as DVC, Poetry, Github Actions, Pytorch for developing and deploying AI.







No releases published


No packages published