Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up Column Django model with support for display options #658

Closed
kgodey opened this issue Sep 13, 2021 · 9 comments · Fixed by #815
Closed

Set up Column Django model with support for display options #658

kgodey opened this issue Sep 13, 2021 · 9 comments · Fixed by #815
Assignees
Labels
type: enhancement New feature or request work: backend Related to Python, Django, and simple SQL

Comments

@kgodey
Copy link
Contributor

kgodey commented Sep 13, 2021

Problem

Some of our data types have display options. We need to store these display options associated with columns of those data types.

Proposed solution

(1) Create a Column Django model with a display_options JSON field to store display options.

  • Update /api/v0/columns/ to use the Django model, including using the auto-generated id for the ID.
  • Implement automatic DB reflection of columns.

(2) While reflecting columns from the database:

  • If name or index of the column changes, we destroy the Column object and create a new one with a new id (we never update Column objects).
  • We will copy display options from an old column object to the new object based on these rules:
    • if we see a column at the same index with the same data type but the name is different from the current column, we'll copy the display options over from the current column.
    • if we see fewer columns than we expect from previous reflection, we'll use the names/types of the columns to figure out which ones have been deleted and copy over display options between columns with the same names and types.

(3) If we are updating column names/types from within Mathesar, we will update the existing Column object.

(4) For the purposes of this issue, do not allow display options to be set. We will implement setting display options in each specific data type issue.

Additional context

  • This implementation means that when a table's columns are deleted or renamed outside of Mathesar, we might lose some display options. This seems unavoidable. I investigated the idea of storing display options in Postgres column comments, but columns are identified by the table's OID and column index, which is what we'd be storing anyway. This way, we don't have to touch the database (people may already be using column comments for other things).
  • Spec for implementing display options for Columns #392 has more discussion.
@mathemancer
Copy link
Contributor

It seems that the attrelid, attnum pair from the pg_attribute table should be a robust identifier for a column after all. So, we should be able to use the column model to both provide display options that we don't want to store in the user DB, and also map between the service layer ID and the DB layer column identifier (represented by the above pair). This will make querying for column info more robust, as well as getting constraints associated with columns.

We were confused before, and treating attnum as the index of a column. This was incorrect. It's a table-local incrementing smallint that identifies the column in that table. This is why combining that number with the oid of the containing table (attrelid above) is a valid global identifier for the column (in a given database; For a completely global identifier, we'll need to add the database as a part of the identifying tuple).

@kgodey kgodey added status: started and removed ready Ready for implementation labels Nov 4, 2021
@silentninja
Copy link
Contributor

I will go with the approach @mathemancer mentioned.

I am planning to model it just like the existing Django Table model. The column model will be a wrapper over the MathesarColumn and will be using DatabaseObject as a base. Since Django doesn't have composite primary keys support we will have to store attrelid and attnum as two different columns and use a unique together constraint.

Oid field of DatabaseObject will be the attnum from pg_attribute and a foreign key reference of the respective Django Table model will be used in the place of attrelid as it is a wrapper around the actual table oid.

As for the display_option field is concerned:

  1. I will store it as a json field on the database as proposed.
  2. Serializers will be polymorphic, each datatype will have its own serializer and a mapping serializer will be used to switch based on the column type field.

We might have to move from storing display_options field as a json into a proper table(either as a sparse table or a generic foreign key related table) depending upon the need to query the options, something to decide on before a stable release as the change would be possible only with a data transformation.

@kgodey
Copy link
Contributor Author

kgodey commented Nov 5, 2021

@silentninja Your plan generally sounds good. Some notes:

  • I don't think we should reuse the oid field for anything else. I think we should refactor DatabaseObject so that oid is optional or create another base model without oid. attnum should be named attnum.
  • JSON fields are queryable in Django using regular syntax so we should be fine there. See https://pganalyze.com/blog/postgres-jsonb-django-python

@silentninja
Copy link
Contributor

  • I don't think we should reuse the oid field for anything else

I wasn't comfortable using oid either(oid will be unique and can be described but wont be possible with attnum) but I was thinking of it as a identifier, moreover columns are the only objects without a oid and thought of going with it. I will create a different base model instead of making oid as optional as that would unnecessarily remove constraints of other objects.

  • JSON fields are queryable in Django using regular syntax so we should be fine there

While it is true jsonb can be queried, there are few missing operators when using a index and could make the query not so trivial. Moreover updating the fields won't be a straight forward migration.

But display_options look more like user settings and json seems much suitable in this case. Moreover will be helpful in the case of custom datatype extensions or plugins. I am more concerned about migration than querying now, but I will go for jsonb as it fits the use case

@kgodey
Copy link
Contributor Author

kgodey commented Nov 8, 2021

@silentninja Sounds good.

@silentninja
Copy link
Contributor

This spec brings breaking change to the API(referring to django column id instead of column index), will it be okay to introduce display options along with the breaking changes or should I make display options backward compatible and then introduce the breaking change.

It is possible to make it backward compatible by referring to SQL alchemy column index in the apis and using it to get the Django columns

Would like to know your opinion @kgodey @pavish @seancolsen

@pavish
Copy link
Member

pavish commented Nov 12, 2021

@silentninja I think it would be better to first change column reference to id instead of index in a separate PR. We'll make changes on the frontend and add commits to the same PR. The display options PR can base off that branch.

@silentninja
Copy link
Contributor

Display options is an optional field and an enhancement , so it won’t affect anything I can add it in the same PR

@kgodey
Copy link
Contributor Author

kgodey commented Nov 12, 2021

I don't think we need backwards compatibility at this stage. Our API is v0 because it's unstable, see our API standards page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request work: backend Related to Python, Django, and simple SQL
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

4 participants