Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

Commit

Permalink
#7 small README changes
Browse files Browse the repository at this point in the history
  • Loading branch information
marcin-kolda committed Oct 20, 2017
1 parent f2fa0c7 commit 3633b57
Showing 1 changed file with 12 additions and 11 deletions.
23 changes: 12 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[![Build Status](https://travis-ci.org/ocadotechnology/gcp-census.svg?branch=master)](https://travis-ci.org/ocadotechnology/gcp-census)
[![Coverage Status](https://coveralls.io/repos/github/ocadotechnology/gcp-census/badge.svg?branch=master)](https://coveralls.io/github/ocadotechnology/gcp-census?branch=master)
# gcp-census
GAE python based app which regularly collects metadata about BigQuery tables and stores them in BigQuery.
GAE python based app which regularly collects metadata about BigQuery tables and stores it in BigQuery.

GCP Census was created to answer the following questions:
* How much data we have in the whole GCP organisation?
Expand All @@ -11,21 +11,22 @@ GCP Census was created to answer the following questions:

Now every question above can be easily answered by querying metadata in BigQuery or looking at our dashboard created in [Google Data Studio](https://cloud.google.com/data-studio/).

## How it works
## How it works?

GCP Census retrieves BigQuery metadata using [REST API](https://cloud.google.com/bigquery/docs/reference/rest/v2/):
1. Daily run is triggered by GAE cron ([cron.yaml](config/cron.yaml) for exact details)
1. Daily run is triggered by GAE cron (see [cron.yaml](config/cron.yaml) for exact details)
1. GCP Census iterates over all projects/datasets/tables to which it has access using GAE Tasks
1. Retrieves [Table data](https://cloud.google.com/bigquery/docs/reference/rest/v2/tables) and stream it into [bigquery.table_metadata_v0_1](bq_schemas/bigquery/table_metadata_v0_1.json) table.
1. In case of partitioned tables, GCP Census retrieves also [partitions summary](https://cloud.google.com/bigquery/docs/creating-partitioned-tables#listing_partitions_in_a_table) by querying the partitioned table.
1. Retrieves [Table data](https://cloud.google.com/bigquery/docs/reference/rest/v2/tables) and stream it into [bigquery.table_metadata_v0_1](bq_schemas/bigquery/table_metadata_v0_1.json) table
1. In case of partitioned tables, GCP Census retrieves also [partitions summary](https://cloud.google.com/bigquery/docs/creating-partitioned-tables#listing_partitions_in_a_table) by querying the partitioned table

GCP Census will retrieve all table metadata to which it has access, so all config is based on GCP IAM.
GCP Census will retrieve all table metadata to which it has access, so all configuration is based on GCP IAM.

# Setup

1. Create GCP project and assign billing to it
1. Clone the repository
1. Install dependencies(ideally using [virtualenv](https://virtualenv.pypa.io/en/stable/)):
1. Clone GCP Census repository
1. Specify metadata output BigQuery location in [app.yaml](app.yaml) (defaults to 'EU')
1. Install dependencies (ideally using [virtualenv](https://virtualenv.pypa.io/en/stable/)):
```
pip install -r requirements.txt
pip install -t lib -r requirements.txt
Expand All @@ -34,9 +35,9 @@ GCP Census will retrieve all table metadata to which it has access, so all confi
```
gcloud app deploy --project YOUR-PROJECT-ID -v v1 app.yaml config/cron.yaml config/queue.yaml
```
1. Grant [bigquery.dataViewer](https://cloud.google.com/bigquery/docs/access-control#bigquery.dataViewer) role to YOUR-PROJECT-ID@appspot.gserviceaccount.com service account on whole GCP organisation or selected projects.
1. GCP Census will be triggered by cron, see [cron.yaml](config/cron.yaml) for exact details
1. Optionally you can trigger [Cron Jobs](https://console.cloud.google.com/appengine/taskqueues/cron?tab=CRON) in the Cloud Console:
1. Grant [bigquery.dataViewer](https://cloud.google.com/bigquery/docs/access-control#bigquery.dataViewer) role to YOUR-PROJECT-ID@appspot.gserviceaccount.com service account on whole GCP organisation, folder or selected projects.
1. GCP Census job will be triggered daily by cron, see [cron.yaml](config/cron.yaml) for exact details
1. Optionally you can trigger cron jobs in [the Cloud Console](https://console.cloud.google.com/appengine/taskqueues/cron?tab=CRON):
* run `/createModels` to create BigQuery dataset and table
* run `/bigQuery` to start collecting BigQuery metadata

Expand Down

0 comments on commit 3633b57

Please sign in to comment.