Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow SQL queries to retrieve report data #2853

Closed
clemens-msupply opened this issue Jan 29, 2024 · 5 comments
Closed

Allow SQL queries to retrieve report data #2853

clemens-msupply opened this issue Jan 29, 2024 · 5 comments
Assignees
Labels
Build Tested: Dev Issue cannot be tested in build but tested by dev enhancement New feature or request Team Tauhou Chris, Clemens, Roxy
Milestone

Comments

@clemens-msupply
Copy link
Contributor

clemens-msupply commented Jan 29, 2024

Currently, report data is retrieved using the existing graphql queries.
Advantages of this approach are:

  • User permissions are enforced, i.e. report can't be printed when the user has missing permissions (however, no easy way to say which permissions are needed to print a report)
  • Reports are not affected if the underlying database schema changes

But there are also some disadvantages. For more advanced, non-standard reports where there isn't a matching GraphQL endpoint the returned data must be processed using Tera. Some known problems with graphql queries + Tera:

  • To get the latest encounter for every patient all patients and their latest encounter can be fetched. However, patient without encounters need to be filtered out manually using Tera.
  • Unsupported graphql filters need to be emulated in Tera (e.g. OR filter)
  • Post-processing in Tera is very cumbersome
  • Sorting post-processed data seems to be impossible in some cases...
  • Logic is spread out in graphql query files and in Tera files which can be very confusing and hard to follow
  • The approach is very different from SQL dashboard reports, e.g. almost impossible to reuse any query code form dashboard or local reports.

Expected behaviour

It would be much better if we could query and mutate data using a query language such as SQL so that the returned data is in the right shape for the Tera template, i.e. so that no or only minimal post-processing is needed in Tera.

Possible solutions:

Using direct SQL database queries

Allow direct SQL queries on the underlying database. To solve the permission problem the report designer would have to specify the permissions and reports need to be manually reviewed to not query unrelated data.

Disadvantage is that reports expect a stable database schema. To solve this we could provide stable table views, e.g. report_org_table_name.

Another problem is that we support Sqlite and Postgres, and SQL query syntax might vary slightly. There are a couple of possible solutions:

  • Report designer have to specify both queries manually
  • Use https://github.com/tobymao/sqlglot to transpile a query to both databases (if possible). It's written in python and would probably require extra steps when using the report builder.
  • Use https://prql-lang.org/ It's written in Rust and might easily integrate. It also transpiles to Sqlite and Postgres. However, it's not common SQL (even though it looks nicer).

R&D idea using Sqlite to query Graphql endpoints

Sqlite syntax could then be used to bring graphql data into the right shape...

This seems to be too hard/complex. There is also a problem with how vtables and JOINs work in sqlite: https://sqlite.org/forum/forumpost/521e7da42b

@clemens-msupply clemens-msupply added enhancement New feature or request needs triage labels Jan 29, 2024
@clemens-msupply clemens-msupply changed the title Allow SQL queries to retrieve and report data Allow SQL queries to retrieve report data Jan 29, 2024
@clemens-msupply clemens-msupply added the needs architecture/solution Needs wider dev input on general solution label Jan 31, 2024
@craigdrown
Copy link
Contributor

@clemens-msupply my feeling is that limiting us to GraphQL endpoints is far too restrictive- it will place a huge burden on the dev team to create new endpoints (or expand existing), be too slow to develop new reports, and too slow to run with large data sets.
I love the look of PRQL - we could teach that to super users quite easily - & it appears you can drop into SQL within it - ideally we would support both PRQL & directly writing SQL ?
For permissions, I think a simple solution would be to have a header in the report that specifies an array of user groups - the current user has to be in one of those groups to run the report - that way you can have fine grained control from the user management interface. Later on we might create a UI to edit report groups, but you wouldn't need that at the start. (Currently we can create user groups, but a user can only be in one group. Maybe report groups would be different- we do have user tags which are unlimited, so rather than groups you could have a list of user tags in the report- would function the same and have no mSupply OG changes needed)

@Chris-Petty
Copy link
Contributor

This would feel much easier if we had a role in the DB for reporting that was basically read only, but unfortunately sqlite does not support and access has to be in application logic.

@jmbrunskill
Copy link
Contributor

FYI I might useful to see that I did with JSON and custom SQL in notify (I think you've seen this code before though?)
https://github.com/msupply-foundation/notify/blob/main/backend/datasource/src/json_query.rs

I'd personally lean towards writing SQL rather than having to learn a new language, although it does look pretty cool.
I feel like by introducing a new language you're losing half of the benefit of allowing SQL in the first place.

I think we could get a long way without having to have different versions of sql syntax.
I think some functions you'd want such as getting the current date are quite different, however I'm not convinced PRQL has solve that problem?

As a reasonably complex example, this query runs fine in both postgresql...

CREATE TABLE table1 (
  id INT
);
INSERT INTO table1 (id) VALUES (1);
INSERT INTO table1 (id) VALUES (2);

CREATE TABLE table2 (
  id INT,
  number TEXT
);
INSERT INTO table2 (id, number) VALUES (1,'1');

Query

SELECT table1.id, count(table2.id), sum(table1.id+table2.id), sum(CAST(table2.number_string AS INTEGER)) 
FROM table1 
LEFT JOIN table2 ON table1.id = table2.id
WHERE table1.id in (SELECT table1.id from table1)
GROUP BY table1.id
HAVING count(table2.id) > 1
ORDER BY table1.id;

Maybe we could provide a few custom functions we can call if needed to create some common ground if a problematic area is discovered? Say a GET_CURRENT_DATETIME() function, that we can implement in either as a db specific function, or perhaps replacing them at run time in our rust code, similar to how we use these constants in our migrations?

pub(crate) const DATETIME: &str = "TIMESTAMP";
#[cfg(not(feature = "postgres"))]
pub(crate) const DATETIME: &str = "TEXT";

Or at least the UI could default to one query and only provide an option for a custom version when it's really needed.

@clemens-msupply
Copy link
Contributor Author

clemens-msupply commented Mar 19, 2024

Do you know if there is a sqlite version of: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY;?

Agree, and I was actually leaning to providing SQL with the option to have two versions, one for sqlite and one for postgres. The sqlglot transpiler looks actually quite cool and might be useful to generate two versions of the same query...

@jmbrunskill
Copy link
Contributor

Do you know if there is a sqlite version of: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY;?

I had a brief look but didn't find one...

@clemens-msupply clemens-msupply self-assigned this Mar 19, 2024
@clemens-msupply clemens-msupply added Team Tauhou Chris, Clemens, Roxy and removed needs architecture/solution Needs wider dev input on general solution labels Mar 22, 2024
@roxy-dao roxy-dao added this to the V2.0.0 milestone Apr 9, 2024
@roxy-dao roxy-dao added the Build Tested: Dev Issue cannot be tested in build but tested by dev label Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Tested: Dev Issue cannot be tested in build but tested by dev enhancement New feature or request Team Tauhou Chris, Clemens, Roxy
Projects
None yet
Development

No branches or pull requests

6 participants