Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Product spec for Views #22

Merged
merged 19 commits into from
Jan 24, 2022
Merged

Product spec for Views #22

merged 19 commits into from
Jan 24, 2022

Conversation

kgodey
Copy link
Contributor

@kgodey kgodey commented Jan 13, 2022

I was thinking through how to represent Views in Mathesar and ended up having a lot of thoughts on the data model and how to translate DB queries to Views. I figured we could have the most granular discussion here instead of a GitHub Discussion.

Previous discussion and context

Additional notes

I also fixed a bug with the organize images script, it couldn't handle images with spaces in the filename.

@kgodey kgodey requested a review from a team as a code owner January 13, 2022 20:57
@github-actions github-actions bot requested review from eito-fis, ghislaineguerin, mathemancer, pavish, seancolsen and silentninja and removed request for a team January 13, 2022 20:57
@kgodey kgodey requested review from dmos62 and a team and removed request for eito-fis January 13, 2022 20:57
@kgodey
Copy link
Contributor Author

kgodey commented Jan 13, 2022

@dmos62 @ghislaineguerin @mathemancer @pavish @seancolsen @silentninja I've assigned all of you to review. As usual, please unassign yourself when you have left feedback and don't intend to leave any more or if everything looks good to you.

@kgodey kgodey added the status: review In review label Jan 13, 2022
## Filters
Views can have filters applied. Unlike Tables, view filters are not necessarily related to the columns that are present in the view.

Using the example table above, imagine a view created from the query `SELECT ID, Title FROM Movies WHERE Year > 2000;` This will return this view: which is filtered by Year even though it's not a column in the View.
Copy link
Member

@pavish pavish Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we intend to show this filter in the 'filters dropdown' on the frontend?

I think it's best if we consider this just as part of the query for the view, and not as a filter that can be manipulated.

Reasoning:

  • The view is specifically created by applying this filter on the parent table.
  • When the user wants to apply filters on the view, I assume they'll want to only further filter the view with the columns shown rather than being able to edit the value for Year.
  • Based on our UX, it would not be possible to add this filter for Year back if the user removes it, unless they close and reopen the view.
  • It would also confuse non-technical users if a random filter is present on a view which does not contain that column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 'filters dropdown' on the frontend?
[...]
Based on our UX

Please don't assume any particular design for Views as yet. This spec is aimed at clarifying product requirements which will then influence the requirements for design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. see this UX for adding a filter in Chart.io https://chartio.com/docs/visual-sql/start-a-query/visual-mode/#add-a-filter.

Even if we don't allow the user to manipulate the filters, it seems like it would be useful to show the filter applied (without them having to look at the query or understand SQL), otherwise non-technical users may get confused about why the View is not showing all the data they expect. And if we're figuring out how to represent the filter, that's most of the work, so why not let them change it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense.

I've been explaining to people lately that tables are a structured representation of raw data and views are reports generated from those tables. When I look at views as reports, it kind of makes them seem immutable, which is true from a db standpoint.

But considering that we intend to represent views as mutable entities on the frontend, allowing users to edit the base filter makes sense.

I'm only concerned on how well we would be able to represent the differences between tables and views to the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, I couldn't see this conversation while doing my own review, so there's some overlap. To reiterate what I said in other spots, I think we're missing clarity between the filter that defines a view, and a filter applied to a view. I think we need both, and since from SQLAlchemy a view is a table (more-or-less), filtering a previously created view is the same as filtering a table. I think we should be crystal clear on this point, lest we create confusion for users.

This applies to all transformations that can be used to define a view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Saved" filters are part of the query. "Unsaved" filters are only visible in the UI.

I don't see how this would get rid of the concept of views. Could you elaborate?

Views are more than just saved filters. They can be referenced by other queries/views, can have their own set of filters/sorts, more of hierarchical structures which are closer to a table in characteristics.

If we call them unsaved filters, it would be misleading as unsaved filter looks like something that cannot be extended up, something that either can be persisted or can't be and if persisted can be concatenated to existing filters

In the case of very deep Views(table1-> view1->view2), they need to show that hierarchy and users should be able to determine where certain operations like filter/sort took place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Views are saved filters, among other things. Views can be used as:

Hierarchy and composition are the reason why I would not call them saved filters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@silentninja said

Hierarchy and composition are the reason why I would not call them saved filters.

This is a succinct way to put it; I agree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can rely on users always naming views well. I'm also not sure how this point relates to the product spec or filters.

Views should be treated as similar to a table. so a view name could help with understanding what the data the view holds instead of having to look where or how it was generated. This is more of a convention rather an accurate description. If named properly, we don't have to worry about showing how the view was generated unless required, where we could do complex diagrams like dependency graphs.

Generally, when working with a View I wouldn't worry about how it was generated, rather would be focusing on what to do with the data it holds. This again is a convention, so it wouldn't apply to all the situations just like you mentioned

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean here by "a dependency chart" or "cluttered filters". Could you explain or maybe do a quick wireframe of what you're imagining for both of those things?

Dependency graph:
img

As for cluttered filters, I don't have visualisation other than our existing filter dropdown, but the reason why I think it would be cluttered is that we could end up with too many filters as a list(maybe along with the entity name they came from) in case of a deep view

product/specs/2022-01-views/02-modeling-views.md Outdated Show resolved Hide resolved
product/specs/2022-01-views/03-modeling-view-columns.md Outdated Show resolved Hide resolved
@kgodey
Copy link
Contributor Author

kgodey commented Jan 13, 2022

it might be useful to look through Chart.io docs while reviewing the spec, I took some inspiration from them. they have the a good example of breaking down SQL query building visually, which is essentially what we're doing to create and interact with Views:

@seancolsen seancolsen removed their assignment Jan 18, 2022
@pavish pavish removed their assignment Jan 18, 2022
Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of this makes sense. I've noted some spots that may be some trouble in specific comments.


# Introduction

Fundamentally, **Views** are saved database queries. This means that in order to work with Views in Mathesar, we need to translate every concept that can be used in [PostgreSQL queries](https://www.postgresql.org/docs/14/queries.html) to our end users in a user-friendly way.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite true. You could query a view, and further modify it (e.g., by adding filters, joining, choosing a subset of columns, etc.) without knowing how the view was created. This would give us some flexibility for working with views previously defined by some DB query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured we'd be able to break down views previously defined by some DB query into the concepts defined in this spec (as long as we have access to the query, which I assume we would if we had access to the view). Is this inaccurate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be possible, but would be very (very) difficult under all but the simplest circumstances. Moreover, my point is that even if we could do that, we don't need to in order to be able to work with a view. Having access to the underlying query isn't necessary, since we can treat a view as a table for any of the operations we support.

I'm making the following assumptions in the rest of the spec about how we want to work with Views in Mathesar.

- We **do not** need to support creating or editing Views based on every conceivable database query in Mathesar. We will be focusing on allowing common use cases.
- We **do** need to support viewing Views based on any conceivable database query correctly, even if they can't be edited. Users should be able to connect a database with existing Views to Mathesar and have those Views show up correctly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply being able to view the definition of the view somehow, or just the actual tabular output? I.e., if they combine something we don't support with something we do (e.g., a filter we understand), do we need to try to pick out the filter and show that in the UI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the idea. If there's a filter we don't understand, I think we would show it as an unknown filter in the UI and not allow them to edit that part.


- We **do not** need to support creating or editing Views based on every conceivable database query in Mathesar. We will be focusing on allowing common use cases.
- We **do** need to support viewing Views based on any conceivable database query correctly, even if they can't be edited. Users should be able to connect a database with existing Views to Mathesar and have those Views show up correctly.
- At the moment, we **only** care about the final output of the views. If a view uses a subquery, CTE, union, intersection, etc. internally, we will not be representing those to the user in the UI (unless they look at the underlying SQL query).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To double-check: I thought if a user creates a view, the creation would be visible (insofar as it was created in Mathesar). I.e., if they filter and group a table to create the view, they'd be able to see what those elements were in the UI. Is this incorrect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. But if a view was created outside of Mathesar and involved a CTE creating some derived columns that don't reflect in the final tabular output of the view, the user will not see anything about those derived columns.

dateCreated: 2022-01-13T19:49:54Z
---

Here's how I think we should model views in our API and UI. Each heading represents an attribute of Views.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On most/all of the sections in this page, we have the possibility for confusion between the transformations that create a view, and then further transformations applied to that view. For example, I can create a view by filtering based on some row. Then, looking at that view, I can filter to further reduce the dataset without saving the result as a view. I think this multiple-level transformation isn't very clear from the way these sections are laid out. This is probably going to be challenging to portray in a sensible manner.

Note that I'm thinking in the abstract here, not necessarily in our current UI (I don't recall if we have that functionality or not). I do think that being able to manipulate / sort / group / filter the data in a persisted view would be useful for most users, though, and that comes with the problem I mentioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comment about "saved" and "unsaved" filters above.

Comment on lines 18 to 23
### Data Sources
- **Definition**: This is the set of source columns that are used to generate the data in the current View column.
- **Allowed values**: references to other Table or View columns, including other columns in the same View.
- **Optional**: This could be empty for purely calculated columns (e.g. using the Postgres `random()` function and putting the output in a column)

### Data Formula
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure Data Sources and Data Formula should be separate. The formula must include the sources (if there are any) to make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formula would include the sources, yes.

I'm borrowing Element's UI to illustrate the idea, imagine Matrix channels are data sources.

Formula would be something like:
Screen Shot 2022-01-20 at 3 52 16 PM

Sources would just be list of variables used in the formula:
Screen Shot 2022-01-20 at 3 51 34 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you made that illustration, Kriti. Creative. I'm leaning towards dropping Data Sources, and querying the Data Formula for what columns it references. That way there would be a single source of truth for this. Having Data Sources at the same level of interface as Data Formula is a bit in conflict with the fact that one is derived from the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't always have a Formula, e.g. a view that's built on the query SELECT Title, Release Year, Rating FROM Movies; won't have any Formula for Title, Release Year or Rating. It will have a Source for each column, though.

We also may not have a Source for a column with a Formula - e.g. a column might be generated using the RANDOM() function, which would be the Formula. There's no source data for it, though.

## Columns
- **For alpha release**: Users should be able to see all columns associated with a view. Each column should show:
- Data Type (non-editable)
- Data Sources (editable if there's no Formula)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the goal of making the data sources editable. Would the intention be to drop a different column in in place of the one shown? Wouldn't it be simpler / smoother to remove that column and add a different one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if users want to reuse a view w a different table, this might come in handy. I think removing/adding a new column is probably easier w SQL than it is through the interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I definitely don't understand. Do you want them to use sort of the same processing, but on a different data set? I.e., maybe apply the same sequence of filters and grouping? Do you have an example of what you're envisioning there (i.e., with actual table names and fake data)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can reuse views with different tables, at least not in any easy way that's worth doing for the alpha release.

I'll remove data sources being editable, it would be better to drop and create new columns.

- **For alpha release**:
- Users should be able to see the rows associated with a view.
- If a cell is a direct representation of a record, users should be able to edit that record via that cell.
- If a column is a direct representation of a record, users should be able to add a new record via that cell.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To double-check; the intent would be to pop up some "record input" UI when trying to edit the cell, correct? I.e,. since the entire table wouldn't be there, they'd need some interface to input the rest of the row in the associated table. What if the column is the join column? Would they edit the record in both tables, or just one? (Or does that not count as a direct representation?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the idea.

I think if it's a join column, it should be updated in both tables. I'll specify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on second thought, join columns should not count as direct representations. Only columns with a single data source count.

Comment on lines 41 to 42
- Users should be able to see what filters are applied to their View.
- Users should be able to edit and delete filters applied to their view, including basic use cases for columns that are not visible in the View.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this. Does this refer to editing the underlying query via the filter UI, or filtering the already-created view?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both, I'll add more details about "saved" vs. "unsaved".

Comment on lines 48 to 49
- Users should be able to see what sorts are applied to their View.
- Users should be able to edit and delete sorts applied to their view, including basic use cases for columns that are not visible in the View.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as filters

@kgodey kgodey force-pushed the views_spec branch 2 times, most recently from 3e2ee13 to 95b0c95 Compare January 20, 2022 22:29
@dmos62 dmos62 removed their request for review January 21, 2022 12:19
Here's how I think we should model view columns in our API and UI. Each heading represents an attribute of a View Column.

### Data Type
- **Definition**: This is the final data type of the content of the column after any computations etc. are applied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot always determine the exact data type of a view's column, as it could be a function that returns a polymorphic type. So we should be taking this into consideration and support having polymorphic column types

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create a function like that, and even use it to generate the column, but if PostgreSQL can't figure out a non-pseudo-type for the column, it'll throw an error. All polymorphic types are pseudo types.

https://www.postgresql.org/docs/13/datatype-pseudo.html

So, you can't have a column of type, e.g., ANYARRAY

- **Required**. Data type should always be set, at the very least, we can treat unknown data types as text.

### Sources
- **Definition**: This is the set of source columns that are used to generate the data in the current View column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be just referring to the parent of view ignoring the ancestors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will reference the immediate parent. If the parent is another view, you'll have to go look at that view to find the source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the idea of a dependency graph eventually but I don't think it's worth doing before the alpha release.

## Filters
- **For alpha release**:
- Users should be able to see what filters are applied to their View.
- Users should be able to edit and delete filters applied to their view in the UI, including basic use cases for columns that are not visible in the View.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "including basic use cases for columns that are not visible in the View" mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming SELECT ID, Title FROM Movies WHERE Year > 2000; is the query for the view, here the "filter" would be Year > 2000. Users should be able to edit that even though Year is not a visible column in the View.

@silentninja silentninja removed their assignment Jan 24, 2022
@kgodey
Copy link
Contributor Author

kgodey commented Jan 24, 2022

@silentninja @mathemancer I've updated the spec to have a new page for modeling View Queries. Queries have their own filters, sorts, and aggregations, which should only be accessible when you're editing the query (if the query is editable). This is separate from filters. sorts, and groups on view data.

The "saved filter" concept has been removed, although there will still be functionality to apply whatever filters you have in the UI to the view query.

@kgodey kgodey merged commit 5639935 into master Jan 24, 2022
@kgodey kgodey deleted the views_spec branch January 24, 2022 22:34
@kgodey
Copy link
Contributor Author

kgodey commented Jan 24, 2022

I'm going to go ahead and merge this. I'll open a new discussion once it's up on the wiki for any further thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

8 participants