.. index:: database
What follows is a listing of custom functions written for Socorro in the PostgreSQL database which are intended for database administration, particularly scheduled tasks. Many of these functions depend on other, internal functions which are not documented.
All functions below return BOOLEAN, with TRUE meaning completion, and throw an ERROR if they fail, unless otherwise noted.
These functions manage the population of the many Materialized Views in Socorro. In general, for each matview there are two functions which maintain it. In the cases where these functions are not listed below, assume that they fit this pattern.
update_{matview_name} ( updateday DATE optional default yesterday, checkdata BOOLEAN optional default true, check_period INTERVAL optional default '1 hour' ) fills in one day of the matview for the first time will error if data is already present, or source data is missing backfill_{matview_name} ( updateday DATE optional default yesterday, checkdata BOOLEAN optional default true, check_period INTERVAL optional default '1 hour' ) deletes one day of data for the matview and recreates it. will warn, but not error, if source data is missing safe for use without downtime
More detail on the parameters:
- updateday
- UTC day to run the update/backfill for. Also the UTC day to check for conflicting or missing dependant data.
- checkdata
- Whether or not to check for conflicting data (i.e. has this already been run?), and for missing upstream data needed to run the fill. If checkdata=false, function will just emit NOTICEs and return FALSE if upstream data is not present.
- check_period
- For functions which depend on reports_clean, the window of reports_clean to check for data being present. This is because at Mozilla we check to see that the last hour of reports_clean is filled in, but open source users need a larger window.
Matview functions return a BOOLEAN which will have one of three results: TRUE, FALSE, or ERROR. What these mean generally depend on whether or not checkdata=on. It also returns an error string which gives more information about what it did.
If checkdata=TRUE (default):
- TRUE
- matview function ran and filled in data.
- FALSE
- matview update has already been run for the relevant period. no changes to data made, and warning returned.
- ERROR
- underlying data is missing (i.e. no crashes, no raw_adu, etc.) or some unexpected error condition
IF checkdata=FALSE:
- TRUE
- matview function ran and filled in data.
- FALSE
- matview update has already been run for the relevant period, or source data (crashes, adu, etc.) is missing. no changes to data made, and no warning made.
- ERROR
- some unexpected error condition.
Or, as a grid of results (where * indicates that a warning message is returned as well):
Matview Proc | CheckData | |
---|---|---|
Condition | TRUE | FALSE |
Success | TRUE | TRUE |
Already Run | FALSE* | FALSE |
No Source Data | ERROR* | FALSE* |
Other Issue | ERROR* | ERROR* |
Exceptions to the above are generally for procedures which need to run hourly or more frequently (e.g. update_reports_clean, reports_duplicates). Also, some functions have shortcut names where they don't use the full name of the matview (e.g. update_adu).
Note that the various matviews can take radically different amounts of time to update or backfill ... from a couple of seconds to 10 minutes for one day.
In addition, there are several procedures which are designed to update or backfill multiple matviews for a range of days. These are designed for when there has been some kind of widespread issue in crash processing and a bunch of crashes have been reprocessed and need to be re-aggregated.
These mass-backfill functions generally give a lot of command-line feedback on their progress, and should be run in a screen session, as they may take hours to complete. These functions, as the most generally used, are listed first. If you are doing a mass-backfill, you probably want to limit the backfill to a week at a time in order to prevent it from running too long before committing.
These need to be run every hour, for each hour. None of them take the standard parameters.
Matview | Update Function | Backfill Function | Depends On | Notes |
---|---|---|---|---|
reports_duplicates | update_reports_duplicates | backfill_reports_duplicates | ||
reports_clean | update_reports_clean | backfill_reports_clean | reports_duplicates, product_version | |
product_version | update_product_versions | update_product_versions | ftpscraper | Cumulative |
Since update_product_versions is cumulative, it needs to only be run once.
These daily functions generally accept the parameters given above. Unless otherwise noted, all of them depend on all of the hourly functions having completed for the day.
Matview | Update Function | Backfill Function | Depends On | Notes |
---|---|---|---|---|
build_adu | update_build_adu | backfill_build_adu | raw_adu fill | |
product_adu | update_adu | backfill_adu | raw_adu fill | |
crashes_by_user | update_crashes_by_user | backfill_crashes_by_user | update_adu | |
crashes_by_user_build | update_crashes_by_user_build | backfill_crashes_by_user_build | update_build_adu | |
correlations | update_correlations | backfill_correlations | NA | Last Day Only |
correlations_addons | update_correlations | backfill_correlations | NA | Last Day Only |
correlations_cores | update_correlations | backfill_correlations | NA | Last Day Only |
correlations_modules | Not working at present. | |||
daily_hangs | update_hang_report | backfill_hang_report | ||
home_page_graph | update_home_page_graph | backfill_home_page_graph | product_adu | |
home_page_graph_build | update_home_page_graph_build | backfill_home_page_graph_build | build_adu | |
nightly_builds | update_nightly_builds | backfill_nightly_builds | ||
signature_products | update_signatures | backfill_signature_counts | ||
signature_products_rollup | update_signatures | backfill_signature_counts | ||
tcbs | update_tcbs | backfill_tcbs | ||
tcbs_build | update_tcbs_build | backfill_tcbs_build | ||
explosiveness | update_explosiveness | backfill_explosiveness | tcbs | Last Day Only |
Functions marked "last day only" do not accumulate data, but display it only for the last day they were run. As such, there is no need to fill them in for each day.
Matview functions which don't fit the parameters above include:
Purpose: backfills data for all matviews for a specific range of dates. For use when data is either missing or needs to be retroactively corrected.
Called By: manually by admin as needed
backfill_matviews ( startdate DATE, optional enddate DATE default current_date, optional reportsclean BOOLEAN default true ) SELECT backfill_matviews( '2011-11-01', '2011-11-27', false ); SELECT backfill_matviews( '2011-11-01' );
- startdate
- the first date to backfill
- enddate
- the last date to backfill. defaults to the current UTC date.
- reportsclean
- whether or not to backfill reports_clean as well. defaults to true supplied because the backfill of reports_clean takes a lot of time.
Purpose: backfill only the reports_clean normalized fact table.
Called By: admin as needed
backfill_reports_clean ( starttime TIMESTAMPTZ, endtime TIMESTAMPTZ, ) SELECT backfill_reports_clean ( '2011-11-17', '2011-11-29 14:00:00' );
- starttime
- timestamp to start backfill
- endtime
- timestamp to halt backfill at
Note: if backfilling less than 1 day, will backfill in 1-hour increments. If backfilling more than one day, will backfill in 6-hour increments. Can take a long time to backfill more than a couple of days.
Purpose: updates the list of product_versions and product_version_builds based on the contents of releases_raw.
Called By: daily cron job
update_product_versions ( ) SELECT update_product_versions ( );
Notes: takes no parameters as the product update is always cumulative. As of 2.3.5, only looks at product_versions with build dates in the last 30 days. There is no backfill function because it is always a cumulative update.
Purpose: updates "rank_compare" based on the contents of the reports_clean table
Called By: daily cron job
Note: this matview is not historical, but contains only one day of data. As such, running either the update or backfill function replaces all existing data. Since it needs an exclusive lock on the matview, it is possible (though unlikely) for it to fail to obtain the lock and error out.
- Purpose: supports other admin functions by checking if reports_clean is complete
- to the end of the day.
Called By: other udpate functions
reports_clean_done ( updateday DATE, check_period INTERVAL optional default '1 hour' ) SELECT reports_clean_done('2012-06-12'); SELECT reports_clean_done('2012-06-12','12 hours');
These functions support partitioning, upgrades, and other management of tables and views.
Purpose: to create new paritions for the reports table and its child tables every week.
Called By: weekly cron job
weekly_report_partitions ( optional numweeks integer default 2, optional targetdate date default current_date ) SELECT weekly_report_partitions(); SELECT weekly_report_partitions(3,'2011-11-09');
- numweeks
- number of weeks ahead to create partitions
- targetdate
- date for the starting week, if not today
Purpose: attempt to get a lock on a table, looping with sleeps until the lock is obtained.
Called by: various functions internally
try_lock_table ( tabname TEXT, mode TEXT optional default 'EXCLUSIVE', attempts INT optional default 20 ) returns BOOLEAN IF NOT try_lock_table('rank_compare', 'ACCESS EXCLUSIVE') THEN RAISE EXCEPTION 'unable to lock the rank_compare table for update.'; END IF;
- tabname
- the table name to lock
- mode
- the lock mode per PostgreSQL docs. Defaults to 'EXCLUSIVE'.
- attempts
- the number of attempts to make, with 3 second sleeps between each. optional, defaults to 20.
Returns TRUE for table locked, FALSE for unable to lock.
Purpose: creates a new table, skipping if the table is found to already exist.
Called By: upgrade scripts
create_table_if_not_exists ( tablename TEXT, declaration TEXT, tableowner TEXT optional default 'breakpad_rw', indexes TEXT ARRAY default empty list ) SELECT create_table_if_not_exists ( 'rank_compare', $q$ create table rank_compare ( product_version_id int not null, signature_id int not null, rank_days int not null, report_count int, total_reports bigint, rank_report_count int, percent_of_total numeric, constraint rank_compare_key primary key ( product_version_id, signature_id, rank_days ) );$q$, 'breakpad_rw', ARRAY [ 'product_version_id,rank_report_count', 'signature_id' ]);
- tablename
- name of the new table to create
- declaration
- full CREATE TABLE sql statement, plus whatever other SQL statements you only want to run on table creation such as priming it with a few records and creating the primary key. If running more than one SQL statement, separate them with semicolons.
- tableowner
- the ROLE which owns the table. usually 'breakpad_rw'. optional.
- indexes
- an array of sets of columns to create regular btree indexes on. use the array declaration as demonstrated above. default is to create no indexes.
Note: this is the best way to create new tables in migration scripts, since it allows you to rerun the script multiple times without erroring out. However, be aware that it only checks for the existance of the table, not its definition, so if you modify the table definition you'll need to manually drop and recreate it.
Purpose: allow idempotent addition of new columns to existing tables.
Called by: upgrade scripts
add_column_if_not_exists ( tablename text, columnname text, datatype text, nonnull boolean default false, defaultval text default '', constrainttext text default '' ) returns boolean SELECT add_column_if_not_exists ( 'product_version_builds','repository','citext' );
- tablename
- name of the existing table to which to add the column
- columname
- name of the new column to add
- datatype
- data type of the new column to add
- nonnull
- is the column NOT NULL? defaults to false. must have a default parameter if notnull.
- defaultval
- default value for the column. this will cause the table to be rewritten if set; beware of using on large tables.
- constrainttext
- any constraint, including foreign keys, to be added to the column, written as a table constraint. will cause the whole table to be checked; beware of adding to large tables.
Note: just checks if the table & column exist, and does nothing if they do. does not check if data type, constraints and defaults match.
Purpose: to purge old raw data quarterly per data expiration policy.
Called By: manually by DBA.
drop_old_partitions ( mastername text, cutoffdate date ) retruns BOOLEAN SELECT drop_old_partitions ( 'reports', '2011-11-01' );
- mastername
- name of the partition master, e.g. 'reports', 'extensions', etc.
- cutoffdate
- earliest date of data to retain.
- Notes: drop_old_partitions assumes a table_YYYYMMDD naming format.
- It requires a lock on the partitioned tables, which generally means shutting down the processors.
Obsolete; Removed.
Purpose: allows admin users to manually add a release to the releases_raw table.
Called By: admin interface
add_new_release ( product citext, version citext, release_channel citext, build_id numeric, platform citext, beta_number integer default NULL, repository text default 'release', update_products boolean default false, ignore_duplicates boolean default false ) returns BOOLEAN SELECT add_new_release('WaterWolf','5.0','release',201206271111,'osx'); SELECT add_new_release('WaterWolf','6.0','beta',201206271198,'osx',2, 'waterwolf-beta',true);
Notes: validates the contents of the required fields. If update_products=true, will run the update_products hourly job to process the new release into product_versions etc. If ignore_duplicates = true, will simply ignore duplicates instead of erroring on them.
Purpose: let admin users change the featured versions for a specific product.
Called By: admin interface
edit_featured_versions ( product citext, featured_versions LIST of text ) returns BOOLEAN SELECT edit_featured_versions ( 'Firefox', '15.0a1','14.0a2','13.0b2','12.0' ); SELECT edit_featured_versions ( 'SeaMonkey', '2.9.' );
Notes: completely replaces the list of currently featured versions. Will check that versions featured have not expired. Does not validate product names or version numbers, though.
Purpose: allows adding new products to the database.
Called By: DBA on new product request.
add_new_product ( prodname text, initversion major_version, prodid text default null, ftpname text default null, release_throttle numeric default 1.0 ) returns BOOLEAN
- prodname
- product name, properly cased for display
- initversion
- first major version number of the product which should appear
- prodid
- "Product ID" for the product, if available
- ftpname
- Product name in the FTP release repo, if different from display name
- release_throttle
- If throttling back the number of release crashes processed, set here
Notes: add_new_product will return FALSE rather than erroring if the product already exists.