index columns miss-reported if they were built with a function #3

mhow · 2011-06-22T16:07:42Z

This is not a major issue (for me) but I noticed that when reading the json file, the index column reported on an index that used a function was incorrect.

I have some indices that are built using lower(columnname), the json reported the index on "otherfieldname" instead.

jmafc · 2011-06-22T16:22:44Z

Indeed, I'm afraid all my tests with indexes were done with traditional indexes, i.e., not using a function. I'll investigate how difficult it'll be to fix that.

* pyrseas/dbobject/index.py (Index.to_map): Make key columns specification conditional. (IndexDict.query): Add expression to target list. (IndexDict._from_catalog): Eliminate keycols if array is zero. * tests/dbobject/test_index.py (IndexToMapTestCase.test_index_function): New test for indexes on expressions.

rhunwicks · 2012-06-06T08:10:43Z

I think this might still be a problem if there is an function-based index on more than one column. See the dbtoyaml output for index pty_i_last_name_first_name below, which includes a columns section as well as the expression:

Postgresql definition:

CREATE TABLE ipc_owner.sys_parties
(
  pty_id bigint NOT NULL,
  organization_name character varying(80),
  last_name character varying(40),
  first_name character varying(40),
  other_names character varying(40),
  CONSTRAINT sys_parties_pkey PRIMARY KEY (pty_id )
)
WITH (
  OIDS=FALSE
);
CREATE INDEX pty_i_last_name_first_name
  ON ipc_owner.sys_parties
  USING btree
  (lower(last_name::text) COLLATE pg_catalog."default" , lower(first_name::text) COLLATE pg_catalog."default" , lower(other_names::text) COLLATE pg_catalog."default" );

CREATE INDEX pty_i_organization_name
  ON ipc_owner.sys_parties
  USING btree
  (lower(organization_name::text) COLLATE pg_catalog."default" );

ddbtoyaml output:

  table sys_parties:
    columns:
    - pty_id:
        not_null: true
        type: bigint
    - organization_name:
        type: character varying(80)
    - last_name:
        type: character varying(40)
    - first_name:
        type: character varying(40)
    - other_names:
        type: character varying(40)
    indexes:
      pty_i_last_name_first_name:
        access_method: btree
        columns:
        - other_names)::text)
        expression: lower((last_name)::text), lower((first_name)::text), lower((other_names)::text)
      pty_i_organization_name:
        access_method: btree
        expression: lower((organization_name)::text)
    primary_key:
      sys_parties_pkey:
        access_method: btree
        columns:
        - pty_id

rhunwicks · 2012-06-06T12:15:15Z

The yaml I am expecting, and which seems to generate the correct index going in the other direction is:

    indexes:
      pty_i_last_name_first_name:
        access_method: btree
        expression: lower(last_name::text), lower(first_name::text), lower(other_names::text)

It also gets confused when there is more than column, but only one of them is function-based:

Actual SQL:

CREATE INDEX ead_i_address
  ON ipc_owner.sys_electronic_addresses
  USING btree
  (lower(address::text) COLLATE pg_catalog."default" , ead_type COLLATE pg_catalog."default" );

Expected yaml

    indexes:
      ead_i_address:
        access_method: btree
        expression: lower((address)::text), ead_type

Actual yaml

    indexes:
      ead_i_address:
        access_method: btree
        columns:
        - address)::text)
        - ead_type
        expression: lower((address)::text)

jmafc · 2012-06-06T18:55:02Z

You're right. I've created two new test cases, one for expressions over multiple columns and another for "mixed" indexes (regular column plus function). I think we are getting all the information, that is we fetch pg_index.indkey, which has zeros for the parts that are actually expresssions, and pg_get_expr(indexprs, indrelid), which gives the representation of the expressions. I have to tweak how we deal with the info.

jmafc · 2012-06-06T19:42:03Z

There are several ways we could show the info in YAML. Say we have a table with c1 int, c2 text, c3 text and for whatever reason we want to create an index on (lower(c2), c1, upper(c3). An option would be to show this as:

    indexes:
      idx:
        keys:
        - lower(c2):
            type:
              expression
        - c1
        - upper(c3):
             type:
               expression

We need to specify "type: expression" to read it back in (since with delimited identifiers someone could actually created a column named literally "lower(c2)", as crazy as that sounds). The second key could also have a "type: column" specification which of course would be the default. The downside is this is incompatible with the current output from dbtoyaml or the input expected by yamltodb. So we could accept "columns:" as substitute for "keys:".

The main alternative would be to use (similar to your example):

    indexes:
      idx:
        expression: (lower(c2), c1, upper(c3)

The downside of this that we'd have to parse and make sense of that expression (and it could get hairy with collations in 9.1+). The former format could be extended/modified to allow something like:

  keys:
    - lower(c2):
        type:
          expression
        collation:
          french

Comments, suggestions?

rhunwicks · 2012-06-06T21:49:17Z

I like your first example, using keys with the ability for each key to be either a column or an expression.

Allowing columns as an alias for keys will ensure backwards compatibility for anyone using regular column-based indexes.

For people like me who are using function-based indexes it will be straightforward to use dbtoyaml to extract the correct syntax for the new version and then diff/meld/eclipse/etc. to compare the extract to the schema yaml file and update it to the new format.

jmafc · 2012-06-07T21:27:11Z

This looks like a tough nut to crack. pg_index has three (and in 9.1 four) int2vector/oid2vector columns, which have to be split and analyzed. In SQL, we can do that with unnest(), but we'll fetch multiple rows. In Python, we get a string, e.g., "0 1 0", which is fairly easy to split. pg_index also has an indexprs column (text before 9.1, pg_node_tree in 9.1), which can be interpreted with pg_get_expr(), but what you get for my example above is "lower(c2), upper(c3)" (the regular column is missing). Although this particular case is easy to split in Python, with str.split(', '), if one of the functions has more than one argument separated by a comma (like btrim()), that technique will fail, which would then require more careful parsing. And then there is indoption whose individual int2 components are actually flag bits "defined by the index's access method". I had tried parsing the output of pg_get_indexdef(), but wasn't very successful (that's why you saw partial column specs like "address)::text)". Outputting the complete final parenthesized expression from pg_get_indexdef() would obviate all these issues for dbtoyaml, but may affect yamltodb. In any case, I think the broken down representation is preferable, so I'll try to work along those lines.

jmafc · 2012-06-11T20:17:40Z

Made some progress on this. Two unit tests. First takes following statements:

CREATE TABLE t1 (c1 integer, c2 text, c3 text);
CREATE INDEX t1_idx ON t1 (lower(c2), lower(c3));

and outputs the following for the index:

indexes:
  t1_idx:
    access_method: btree
    keys:
    - lower(c2):
        type: expression
    - lower(c3):
        type: expression

Second test uses:

CREATE TABLE t1 (c1 integer, c2 text, c3 text);
CREATE INDEX t1_idx ON t1 (btrim(c3, 'x') NULLS FIRST, c1, lower(c2) DESC);

and emits the following for the index:

indexes:
  t1_idx:
    access_method: btree
    keys:
    - btrim(c3, 'x'::text):
        nulls: first
        type: expression
    - c1
    - lower(c2):
        order: desc
        type: expression

I'm not entirely happy with parsing the expressions, but at least it appears to do the job. If you have some really hairy examples, please let me know. I'll work on the YAML to SQL side now.

jmafc closed this as completed Jun 22, 2011

rhunwicks reopened this Jun 6, 2012

jmafc closed this as completed in e76bc07 Jun 13, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index columns miss-reported if they were built with a function #3

index columns miss-reported if they were built with a function #3

mhow commented Jun 22, 2011

jmafc commented Jun 22, 2011

rhunwicks commented Jun 6, 2012

rhunwicks commented Jun 6, 2012

jmafc commented Jun 6, 2012

jmafc commented Jun 6, 2012

rhunwicks commented Jun 6, 2012

jmafc commented Jun 7, 2012

jmafc commented Jun 11, 2012