Skip to content

Commit

Permalink
v2.5.0 Implemented limited INSERT ON CONFLICT support in partitioning…
Browse files Browse the repository at this point in the history
… trigger. See CHANGELOG and docs for limitation details.
  • Loading branch information
keithf4 committed Jul 14, 2016
1 parent d5205b7 commit 1558644
Show file tree
Hide file tree
Showing 4 changed files with 61 additions and 4 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.txt
@@ -1,5 +1,7 @@
2.5.0
-- Added very limited support for INSERT ... ON CONFLICT (upsert) in the partitioning trigger. For situations where only new data is being inserted, this can provide significant performance improvements. However, major limitations are that the constraint violations that would trigger the ON CONFLICT clause only occur on a per child table basis. This is a known limitation for inheritance in general. Constraints DO NOT apply across all tables in an inheritance set (Ex. Primary keys are only enforced for each individual child table and not for all tables in the partition set. Data duplication is possible). The included python script "check_unique_constraint.py" can help mitigate duplication, but cannot prevent it. Of a larger concern is an ON CONFLICT DO UPDATE clause which may not fire and cause wildly inconsistent data if not accounted for. These limitations will likely never be overcome in this extension until global indexes or constraints for inheritance sets are supported in PostgreSQL. It is recommended you test this feature out extensively before implementing in production and monitor it carefully. Many thanks to MikaelUlvesjo for contributing work on this issue (Github Issue #105 & Pull Request #122).
-- Added check to part_config_sub to ensure premake > 0. Makes it consistent with part_config table.
-- Allow internal check_version() function to work with test releases of postgres. If it's an alpha, beta or rc release it ignores the current version, so you're on your own if things fail due to version feature mismatches.


2.4.1
Expand Down
10 changes: 6 additions & 4 deletions doc/pg_partman.md
Expand Up @@ -46,7 +46,9 @@ PostgreSQL has an object naming length limit of 63 characters. If you try and cr

Table inheritance in PostgreSQL does not allow a primary key or unique index/constraint on the parent to apply to all child tables. The constraint is applied to each individual table, but not on the entire partition set as a whole. For example, this means a careless application can cause a primary key value to be duplicated in a partition set. This is one of the "big issues" that causes performance issues with partitoning on other database systems and one of the reasons for the delay in getting partitioning built in to PostgreSQL. In the mean time, a python script is included with `pg_partman` that can provide monitoring to help ensure the lack of this feature doesn't cause long term harm. See **`check_unique_constraint.py`** in the **Scripts** section.

INSERT ... ON CONFLICT (upsert) is supported in the partitioning trigger, but is very limited. The major limitations are that the constraint violations that would trigger the ON CONFLICT clause only occur on individual child tables that actually contain data due to reasons explained above. Of a larger concern than data duplication is an ON CONFLICT DO UPDATE clause which may not fire and cause wildly inconsistent data if not accounted for. It is unclear whether this limitation will be able to be overcome while partitioning is based around inheritance and triggers. For situations where only new data is being inserted, upsert can provide significant performance improvements. However, if you're relying on data in older partitions to cause a constraint violation that upsert would normally handle, it likely will not work. Upsert is optional, turned off by default and is recommended you test it out extensively before implementing in production and monitor it carefully. See https://www.postgresql.org/docs/9.5/static/sql-insert.html
INSERT ... ON CONFLICT (upsert) is supported in the partitioning trigger, but is very limited. The major limitations are that the constraint violations that would trigger the ON CONFLICT clause only occur on individual child tables that actually contain data due to reasons explained above. Of a larger concern than data duplication is an ON CONFLICT DO UPDATE clause which may not fire and cause wildly inconsistent data if not accounted for. It is unclear whether this limitation will be able to be overcome while partitioning is based around inheritance and triggers. For situations where only new data is being inserted, upsert can provide significant performance improvements. However, if you're relying on data in older partitions to cause a constraint violation that upsert would normally handle, it likely will not work. Also, if the resulting UPDATE would end up violating the partitioning constraint of that chld table, it will fail. pg_partman does not currently support UPDATES that would require moving a row from one child table to another.

Upsert is optional, turned off by default and is recommended you test it out extensively before implementing in production and monitor it carefully. See https://www.postgresql.org/docs/9.5/static/sql-insert.html

### Logging/Monitoring

Expand Down Expand Up @@ -115,15 +117,15 @@ As a note for people that were not aware, you can name arguments in function cal
+ *\<interval\>* - For the time-custom partitioning type, this can be any interval value that is valid for the PostgreSQL interval data type. Do not type cast the parameter value, just leave as text.
+ *\<integer\>* - For ID based partitions, the integer value range of the ID that should be set per partition. Enter this as an integer in text format ('100' not 100). Must be greater than or equal to 10.
* `p_constraint_cols` - an optional array parameter to set the columns that will have additional constraints set. See the **About** section above for more information on how this works and the **apply_constraints()** function for how this is used.
* `p_premake` - is how many additional partitions to always stay ahead of the current partition. Default value is 4. This will keep at minimum 5 partitions made, including the current one. For example, if today was Sept 6th, and `premake` was set to 4 for a daily partition, then partitions would be made for the 6th as well as the 7th, 8th, 9th and 10th. Note some intervals may occasionally cause an extra partition to be premade or one to be missed due to leap years, differing month lengths, daylight savings (on non-UTC systems), etc. This won't hurt anything and will self-correct. If partitioning ever falls behind the `premake` value, normal running of `run_maintenance()` and data insertion to id-based tables should automatically catch things up.
* `p_premake` - is how many additional partitions to always stay ahead of the current partition. Default value is 4. This will keep at minimum 5 partitions made, including the current one. For example, if today was Sept 6th, and `premake` was set to 4 for a daily partition, then partitions would be made for the 6th as well as the 7th, 8th, 9th and 10th. Note some intervals may occasionally cause an extra partition to be premade or one to be missed due to leap years, differing month lengths, daylight savings (on non-UTC systems), etc. This won't hurt anything and will self-correct. If partitioning ever falls behind the `premake` value, normal running of `run_maintenance()` and data insertion should automatically catch things up.
* `p_use_run_maintenance` - Used to tell partman whether you'd like to override the default way that child partitions are created. Set this value to TRUE to allow you to use the `run_maintenance()` function, without any table paramter, to create new child tables for serial partitioning instead of using 50% method mentioned above. Time based partitining MUST use `run_maintenance()`, so either leave this value true or call the `run_maintenance()` function directly on a partition set by passing the parent table as a parameter. See **run_mainteanance** in Maintenance Functions section below for more info.
* `p_start_partition` - allows the first partition of a set to be specified instead of it being automatically determined. Must be a valid timestamp (for time-based) or positive integer (for id-based) value. Be aware, though, the actual paramater data type is text. For time-based partitioning, all partitions starting with the given timestamp up to CURRENT_TIMESTAMP (plus `premake`) will be created. For id-based partitioning, only the partition starting at the given value (plus `premake`) will be made.
* `p_inherit_fk` - allows `pg_partman` to automatically manage inheriting any foreign keys that exist on the parent table to all its children. Defaults to TRUE.
* `p_epoch` - tells `pg_partman` that the control column is an integer type, but actually represents and epoch time value. All triggers, constraints & table names will be time-based. Be sure you create a functional, time-based index on the control column (to_timestamp(controlcolumn)) so this works efficiently.
* `p_epoch` - tells `pg_partman` that the control column is an integer type, but actually represents and epoch time value. All triggers, constraints & table names will be time-based. In addition to a normal index on the control column, be sure you create a functional, time-based index on the control column (to_timestamp(controlcolumn)) as well so this works efficiently.
* `p_upsert` - adds upsert to insert queries in the partition trigger to allow handeling of conflicts Defaults to '' (empty string) which means it's inactive.
+ the value entered here is the entire ON CONFLICT clause which will then be appended to the INSERT statement(s) in the trigger
+ Ex: to ignore conflicting rows on a table with primary key "id" set p_upsert to `'ON CONFLICT (id) DO NOTHING'`
+ Ex: to update a conflicting row on a table with columns id(pk), val set p_upsert to `'ON CONFLICT (id) DO UPDATE SET val=EXCLUDED.val'`
+ Ex: to update a conflicting row on a table with columns (id(pk), val) set p_upsert to `'ON CONFLICT (id) DO UPDATE SET val=EXCLUDED.val'`
+ Requires postgresql 9.5
+ See *About* section above for more info.
* `p_jobmon` - allow `pg_partman` to use the `pg_jobmon` extension to monitor that partitioning is working correctly. Defaults to TRUE.
Expand Down
9 changes: 9 additions & 0 deletions sql/functions/check_version.sql
Expand Up @@ -18,6 +18,13 @@ IF v_current_version[1]::int > v_check_version[1]::int THEN
RETURN true;
END IF;
IF v_current_version[1]::int = v_check_version[1]::int THEN
IF substring(v_current_version[2] from 'beta') IS NOT NULL
OR substring(v_current_version[2] from 'alpha') IS NOT NULL
OR substring(v_current_version[2] from 'rc') IS NOT NULL
THEN
-- You're running a test version. You're on your own if things fail.
RETURN true;
END IF;
IF v_current_version[2]::int > v_check_version[2]::int THEN
RETURN true;
END IF;
Expand All @@ -32,3 +39,5 @@ RETURN false;

END
$$;


44 changes: 44 additions & 0 deletions updates/pg_partman--2.4.1--2.5.0.sql
@@ -1,5 +1,6 @@
-- Added very limited support for INSERT ... ON CONFLICT (upsert) in the partitioning trigger. For situations where only new data is being inserted, this can provide significant performance improvements. However, major limitations are that the constraint violations that would trigger the ON CONFLICT clause only occur on a per child table basis. This is a known limitation for inheritance in general. Constraints DO NOT apply across all tables in an inheritance set (Ex. Primary keys are only enforced for each individual child table and not for all tables in the partition set. Data duplication is possible). The included python script "check_unique_constraint.py" can help mitigate duplication, but cannot prevent it. Of a larger concern is an ON CONFLICT DO UPDATE clause which may not fire and cause wildly inconsistent data if not accounted for. These limitations will likely never be overcome in this extension until global indexes or constraints for inheritance sets are supported in PostgreSQL. It is recommended you test this feature out extensively before implementing in production and monitor it carefully. Many thanks to MikaelUlvesjo for contributing work on this issue (Github Issue #105 & Pull Request #122).
-- Added check to part_config_sub to ensure premake > 0. Makes it consistent with part_config table.
-- Allow internal check_version() function to work with test releases of postgres. If it's an alpha, beta or rc release it ignores the current version, so you're on your own if things fail due to version feature mismatches.


ALTER TABLE @extschema@.part_config ADD COLUMN upsert TEXT NOT NULL DEFAULT '';
Expand Down Expand Up @@ -1426,6 +1427,49 @@ RETURN v_success;
END
$$;


/*
* Check PostgreSQL version number. Parameter must be full 3 point version.
* Returns true if current version is greater than or equal to the parameter given.
*/
CREATE OR REPLACE FUNCTION check_version(p_check_version text) RETURNS boolean
LANGUAGE plpgsql STABLE
AS $$
DECLARE

v_check_version text[];
v_current_version text[] := string_to_array(current_setting('server_version'), '.');

BEGIN

v_check_version := string_to_array(p_check_version, '.');

IF v_current_version[1]::int > v_check_version[1]::int THEN
RETURN true;
END IF;
IF v_current_version[1]::int = v_check_version[1]::int THEN
IF substring(v_current_version[2] from 'beta') IS NOT NULL
OR substring(v_current_version[2] from 'alpha') IS NOT NULL
OR substring(v_current_version[2] from 'rc') IS NOT NULL
THEN
-- You're running a test version. You're on your own if things fail.
RETURN true;
END IF;
IF v_current_version[2]::int > v_check_version[2]::int THEN
RETURN true;
END IF;
IF v_current_version[2]::int = v_check_version[2]::int THEN
IF v_current_version[3]::int >= v_check_version[3]::int THEN
RETURN true;
END IF; -- 0.0.x
END IF; -- 0.x.0
END IF; -- x.0.0

RETURN false;

END
$$;

-- Restore dropped object privileges
DO $$
DECLARE
Expand Down

0 comments on commit 1558644

Please sign in to comment.