Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

Commit

Permalink
Created views with cost estimations #26
Browse files Browse the repository at this point in the history
  • Loading branch information
marcin-kolda committed Feb 9, 2018
1 parent 8f3d1d8 commit 86fc6b1
Show file tree
Hide file tree
Showing 3 changed files with 45 additions and 0 deletions.
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,20 @@ FROM `YOUR-PROJECT-ID.bigquery_views.partition_metadata_v1_0`
ORDER BY numBytes DESC LIMIT 100
```

* Provide cost estimation for the top 100 largest datasets
```sql
SELECT * FROM `YOUR-PROJECT-ID.bigquery_views.datasets_cost_estimation_v1_0`
ORDER BY estimatedMonthlyCostInUSD DESC LIMIT 100
```

### Dataset cost estimation

Views with costs are only estimation because:
* monthly cost is estimated based on current data (i.e. if you created large table today, estimation will show it's costs for the whole month),
* snapshots are done daily and don't include temporary tables created and deleted between snapshots,
* deduplication views summarises all data from the last 2 days (i.e. it includes table deleted yesterday),
* estimation costs doesn't include free 10 GB per month as Census doesn't know how many billing accounts are used.

## How it works?

![Architecture diagram](architecture-diagram.png)
Expand Down
16 changes: 16 additions & 0 deletions bq_schemas/bigquery_views/datasets_cost_estimation_v1_0.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#standardSQL
-- This view provides BigQuery datasets current estimated costs, which might differ to those provided by GCP billing
SELECT
projectId,
datasetId,
ROUND(SUM(numBytes/(1024*1024*1024)), 3) AS totalStorageInGB,
ROUND(SUM(numLongTermBytes/(1024*1024*1024)), 3) AS longTermStorageInGB,
ROUND(SUM(numBytes/(1024*1024*1024))-SUM(numLongTermBytes/(1024*1024*1024)), 3) AS activeStorageInGB,
ROUND( ((SUM(numBytes/(1024*1024*1024))-SUM(numLongTermBytes/(1024*1024*1024))) * 0.02 + (SUM(numLongTermBytes/(1024*1024*1024))) * 0.01), 3) AS estimatedMonthlyCostInUSD,
SUM(numRows) AS numRows,
COUNT(tableId) AS tableCount
FROM
`$PROJECT_ID.bigquery_views.table_metadata_v1_0`
GROUP BY
projectId,
datasetId
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
-- This view provides BigQuery datasets current estimated costs, which might differ to those provided by GCP billing
SELECT
projectId,
datasetId,
ROUND(SUM(numBytes/(1024*1024*1024)), 3) AS totalStorageInGB,
ROUND(SUM(numLongTermBytes/(1024*1024*1024)), 3) AS longTermStorageInGB,
ROUND(SUM(numBytes/(1024*1024*1024))-SUM(numLongTermBytes/(1024*1024*1024)), 3) AS activeStorageInGB,
ROUND( ((SUM(numBytes/(1024*1024*1024))-SUM(numLongTermBytes/(1024*1024*1024))) * 0.02 + (SUM(numLongTermBytes/(1024*1024*1024))) * 0.01), 3) AS estimatedMonthlyCostInUSD,
SUM(numRows) AS numRows,
COUNT(tableId) AS tableCount
FROM
[bigquery_views_legacy_sql.table_metadata_v1_0]
GROUP BY
projectId,
datasetId

0 comments on commit 86fc6b1

Please sign in to comment.