-
data-reports-etl
: A project built with BigQuery Dataform that transforms Stackoverflow public raw data into reporting tables in a BigQuery data warehouse. -
ETL
: A project built with BigQuery Dataform that transforms Stackoverflow public raw data into reporting tables in a BigQuery data warehouse.
-
Clone the project workspace repository:
git clone https://github.com/jolares/stackoverflow-ai.git
-
Install project workspace node dependencies:
npm install
dataform.dataform
: provides syntax highlighting, compilation, and intellisense for Dataform and SQLX projects. Refer to the extension site
A secret token is created for the GCP Dataform Service Account to interact with Dataform resources.
-
Enable GCP Secret Manager API for the GCP Project
-
Open the GCP Secret Manager console and create a new secret token with any secure value of your preference.
- This project named the secret token
GCP_BIGQUERY_DATAFORM_SA_TOKEN
- This project named the secret token
-
After the secret is created, edit the secret's permissions and grant access to the Dataform service account; for this, make the service account a new principal for the secret, and assign to it the role
Secret Manager Secret Accessor
A secret token is created for the GCP Dataform Service Account to interact with Dataform resources.
-
Open the Google Cloud AIM Admin console
-
Assign the role of
BigQuery Admin
to the dataform service account -
Edit the Dataform Service Account created by Google by adding the Role
BigQuery Admin
to it.Note: if you do not see the service account in the list of principals displayed to you within the Permissions page, you probably need to enable/check the option that indicates
Include Google-provided role grants
stackoverflow-ai/
├── ...
├── definitions/
├ ├── reporting/
├ ├── sources/
├ ├── staging/
├── environments.json
├── schedules.json
└── package.json
// dataform.json file
{
// (Required) Set this value to the GCP BigQuery dataset name (the Dataset ID without
// the GCP Project ID subdomain)
"defaultSchema": "{GCP_BIGQUERY_DATASET_NAME}",
// (Required)
"assertionSchema": "dataform_assertions",
// (Required)
"warehouse": "bigquery",
// (Required) Set this value to the GCP Project ID
"defaultDatabase": "{GCP_PROJECT_ID}"
// (Optional) Set this value the BigQuery Dataset Location (i.e. us-central-1, US)
"defaultLocation": "US"
}
{
"environments": [
{
"name": "production",
"configOverride": {},
// The git repository branch, or commit SHA, that triggers the workflow
// run using this environment (i.e. master, main, release, develop, etc)
"gitRef": "master"
},
{
"name": "development",
"configOverride": {},
"gitRef": "master"
},
{
"name": "staging",
"configOverride": {},
"gitRef": "master"
},
// ... Other environments can be added here
]
}
{
"schedules": [
{
"name": "daily",
"options": {
"includeDependencies": false,
"fullRefresh": false,
"tags": [
"daily"
]
},
"cron": "00 09 * * *",
"notification": {
"onSuccess": false,
"onFailure": false
},
"notifications": [
{
"events": [
"failure"
],
"channels": [
"email jo"
]
}
]
}
],
"notificationChannels": [
{
"name": "email jo",
"email": {
"to": [
"jolares@gatech.edu"
]
}
}
]
}