Please note: this is not an officially supported Google product.
Data Quality Manager (aka DQM) is a platform dedicated to data quality issues detection, especially in the context of online advertising.
A typical DQM deployment relies on Google Cloud Platform (GCP) App Engine.
All you need is to create a GCP project, and copy/paste the following command in the Cloud Shell console:
wget -qO dqm.py https://raw.githubusercontent.com/google/dqm/master/installer.py && python3 dqm.py
You will be prompted when needed during the process.
DQM is made of two components:
- A backend (Python, Django) to execute/store checks.
- A frontend (Typescript, VueJS) to allow users to plan/execute/monitor checks.
- A GCP project.
- The Google Cloud SDK (command line interface) installed on your local machine.
- The pipenv Python package manager installed on your local machine.
- The Node.js npm installed on your local machine.
Clone this repository:
git clone https://github.com/google/dqm.git
cd dqm/frontend
npm install
npm run build
Note – DQM is configured to export the build to backend/www
(because the frontend build is intended to ship together with the backend to App Engine, where it will be served "statically").
Activate required APIs, setup database, create App Engine app and get your service account key file by executing the following commands (replace [YOUR GCP PROJECT ID]
by your actual project ID):
cd ../backend
export gcpproject="[YOUR GCP PROJECT ID]"
gcloud config set project $gcpproject
gcloud services enable analytics.googleapis.com
gcloud services enable analyticsreporting.googleapis.com
gcloud sql instances create dqm --region="europe-west1"
gcloud sql databases create dqm --instance=dqm
gcloud sql users create dqmuser --instance=dqm
gcloud app create
gcloud iam service-accounts keys create ./key.json \
--iam-account $gcpproject@appspot.gserviceaccount.com
Note – Of course, you can change values/names in the lines above, but keep in mind that the key.json
file will have to be deployed to App Engine, so it must stay inside the backend
dir anyway.
Because App Engine won't be able to handle Django database migrations (~ tables creation) on the production environment, you'll have to run these migrations from your local machine before actually deploying.
First, get the required Python packages thanks to pipenv:
pipenv install --dev
Set the following environment variables on your local machine:
export DQM_CLOUDSQL_CONNECTION_NAME="[YOUR CLOUD SQL CONNECTION NAME]"
export DQM_CLOUDSQL_DATABASE="dqm"
export DQM_CLOUDSQL_USER="dqmuser"
Note – DQM_CLOUDSQL_CONNECTION_NAME
value should be something like [YOUR GCP PROJECT ID]:us-central1:dqm
. If you're not sure, open the GCP console, navigate to the SQL module and copy the value of the Instance connection name field.
Then, install cloud_sql_proxy
following this procedure (basically, download the binary and make it executable). Once installed, you can launch the cloud_sql_proxy
daemon, which will route your local app database traffic to your production GCP SQL database.
./cloud_sql_proxy -instances="$DQM_CLOUDSQL_CONNECTION_NAME"=tcp:3306 &
Finally, let Django create the database tables and fields:
pipenv run python manage.py makemigrations
pipenv run python manage.py makemigrations dqm
pipenv run python manage.py migrate
App Engine relies on the app.yaml
file to configure your app's settings. Update the env_variables
section with your values:
#...
env_variables:
CLOUDSQL_CONNECTION_NAME: "[YOUR CLOUD SQL CONNECTION NAME]"
CLOUDSQL_USER: "dqmuser"
CLOUDSQL_DATABASE: "dqm"
DQM_SERVICE_ACCOUNT_FILE_PATH: "key.json"
You're now ready to deploy:
gcloud app deploy
gcloud app browse
gcloud app logs tail -s default
DQM has no per-user access restriction, but you do so by enabling GCP Identity-Aware Proxy (IAP).
Start backend server with:
cd dqm/backend
pipenv run python manage.py
If you install/update other Python libs, don't forget to refresh the requirements.txt
file before deploying (App Engine won't read your Pipfile
but the requirements.txt
instead):
pipenv lock --requirements > requirements.txt
pipenv run python manage.py test dqm
Start frontend dev server with:
cd dqm/frontend
npm run serve
- A bunch of new GA related checks to come...
- Planed & asynchronous checks execution.
- Alerting.
- Checking stuff on other Google Marketing Platform tools (DV360, GA360...), Google Ads.