Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support LATERAL subqueries. #24

Closed
derekmahar opened this issue Dec 27, 2019 · 9 comments
Closed

Support LATERAL subqueries. #24

derekmahar opened this issue Dec 27, 2019 · 9 comments
Assignees

Comments

@derekmahar
Copy link
Contributor

Please implement LATERAL subqueries. According to Add LATERAL Joins or CROSS APPLY to Your SQL Tool Chain, SQL:1999 introduced "lateral derived tables". Many popular relational database systems support lateral subqueries:

In PostgreSQL’s LATERAL JOIN, Krzysztof Kempiński explains how LATERAL subqueries work in PostgreSQL 9.3.

@mithrandie
Copy link
Owner

Isn’t possible to replace to with, for example, WITH clause and JOIN clause?

@ondohotola
Copy link

ondohotola commented Dec 28, 2019 via email

@mithrandie
Copy link
Owner

Yes, I made csvq for simple confirmations and calculations of data.
Csv has no types, no restrictions, no relations, no indexes, so it is not suitable for large or complicated calculations. And for those reasons, I am not aiming for a complete SQL implementation.

I’ll fix as much as possible obvious bugs or features that are useful in most cases, but in most cases I think that it will work with existing features.
I’m not sure if LATERAL is so needed and there are no other way.

@derekmahar
Copy link
Contributor Author

derekmahar commented Dec 28, 2019

Isn’t possible to replace to with, for example, WITH clause and JOIN clause?

Possibly. I'd have to do more research. The following solutions are likely inefficient:

More than CTE and Cross Join suggests that LATERAL may combine and simplify the functionality of common table expressions, cross joins, and the WINDOW keyword.

This answer to How to get previous and current row value using recursive CTE? compares solutions using Common Table Expressions, window functions, and OUTER APPLY which is syntax in Microsoft SQL Server that is equivalent or similar to LATERAL.

@derekmahar
Copy link
Contributor Author

derekmahar commented Dec 28, 2019

Yes, I made csvq for simple confirmations and calculations of data.
Csv has no types, no restrictions, no relations, no indexes, so it is not suitable for large or complicated calculations. And for those reasons, I am not aiming for a complete SQL implementation.

I’ll fix as much as possible obvious bugs or features that are useful in most cases, but in most cases I think that it will work with existing features.

I highly appreciate that csvq already supports much of the SQL standard including advanced syntax like Common Table Expressions and window functions. I raised this issue for the following reasons:

  • LATERAL may simplify a query that I wish to execute using csvq.
  • LATERAL is absent from csvq and I think it's useful to know in what ways csvq deviates from the SQL standard.
  • LATERAL is part of the SQL standard and I assumed that a goal of csvq was to conform to the SQL standard as closely as possible and where it makes sense in the context of CSV files.

I don't expect csvq to support all of the SQL standard, but the closer the better. Over time, you may decide to fill in any gaps that you think you or your users may find useful.

I’m not sure if LATERAL is so needed and there are no other way.

I'm not yet sure, either, but LATERAL does seem to simplify certain queries that use derived tables. I doubt that the SQL standard would have introduced LATERAL if it weren't useful.

@derekmahar
Copy link
Contributor Author

derekmahar commented Dec 28, 2019

I think for really complicated stuff, just import the CSVs into the SQL database of your choice and have your way with them.

I may do this eventually, but for now I'm exploring what computations I may perform using csvq alone. In effect, I'm testing the capabilities and limitations of csvq, and will report any limitations or flaws that I encounter.

I see CSVQ as a tool to do quick and dirty queries on smallish data sets. And that it does excellently.

I agree.

@mithrandie
Copy link
Owner

Okey, maybe I will implement it, but now the priority is low because it can be done in other ways.

@mithrandie mithrandie self-assigned this Dec 28, 2019
mithrandie added a commit that referenced this issue May 17, 2020
- Support LATERAL join. ([Github #24](#25))
mithrandie added a commit that referenced this issue May 20, 2020
- Support LATERAL join. ([Github #24](#24))
- Support USING join condition in FULL OUTER JOIN.
- Support $XDG_CONFIG_HOME environment variable for configuration files.
@mithrandie
Copy link
Owner

LATERAL JOIN is now included and released in version 1.13.0.

@derekmahar
Copy link
Contributor Author

I tested a LATERAL JOIN using a slightly modified version of the example in PostgreSQL’s LATERAL JOIN:

$ cat orders.csv | csvq "SELECT * FROM STDIN"
+----+---------+----------------------------+
| id | user_id |         created_at         |
+----+---------+----------------------------+
| 1  | 1       | 2017-06-20 04:35:03.582895 |
| 2  | 2       | 2017-06-20 04:35:07.564973 |
| 3  | 3       | 2017-06-20 04:35:10.986712 |
| 4  | 1       | 2017-06-20 04:58:10.137503 |
| 5  | 3       | 2017-06-20 04:58:17.905277 |
| 6  | 3       | 2017-06-20 04:58:25.289122 |
+----+---------+----------------------------+
-- lateral_query1.sql
SELECT user_id, first_order_time, next_order_time, id FROM
  (SELECT user_id, min(created_at) AS first_order_time FROM orders GROUP BY user_id) o1
  LEFT JOIN LATERAL
  (SELECT id, created_at AS next_order_time
   FROM orders o2
   WHERE o2.user_id = o1.user_id AND created_at > o1.first_order_time
   ORDER BY created_at ASC LIMIT 1)
   o2 ON true;
-- lateral_query2.sql
SELECT user_id, first_order_time, next_order_time, id FROM
  (SELECT user_id, min(created_at) AS first_order_time FROM orders GROUP BY user_id) o1
  INNER JOIN LATERAL
  (SELECT id, created_at AS next_order_time
   FROM orders o2
   WHERE o2.user_id = o1.user_id AND created_at > o1.first_order_time
   ORDER BY created_at ASC LIMIT 1)
   o2 ON true;
$ cat orders.csv | csvq --source lateral_query1.sql
+---------+----------------------------+-----------------------------+------+
| user_id |      first_order_time      |       next_order_time       |  id  |
+---------+----------------------------+-----------------------------+------+
| 1       | 2017-06-20 04:35:03.582895 | 2017-06-20 04:58:10.137503  | 4    |
| 2       | 2017-06-20 04:35:07.564973 |            NULL             | NULL |
| 3       | 2017-06-20 04:35:10.986712 | 2017-06-20 04:58:17.905277  | 5    |
+---------+----------------------------+-----------------------------+------+
$ cat orders.csv | csvq --source lateral_query2.sql
+---------+----------------------------+-----------------------------+----+
| user_id |      first_order_time      |       next_order_time       | id |
+---------+----------------------------+-----------------------------+----+
| 1       | 2017-06-20 04:35:03.582895 | 2017-06-20 04:58:10.137503  | 4  |
| 3       | 2017-06-20 04:35:10.986712 | 2017-06-20 04:58:17.905277  | 5  |
+---------+----------------------------+-----------------------------+----+

The resuls of each these queries matches those in PostgreSQL’s LATERAL JOIN.

Jehops pushed a commit to Jehops/freebsd-ports-legacy that referenced this issue May 22, 2020
From ChangeLog: https://github.com/mithrandie/csvq/releases/v1.13.0

 * Support LATERAL join. mithrandie/csvq#24)
 * Support USING join condition in FULL OUTER JOIN.
 * Support $XDG_CONFIG_HOME environment variable for configuration files.

PR:	246632
Submitted by:	vulcan@wired.sh (maintainer)


git-svn-id: svn+ssh://svn.freebsd.org/ports/head@536207 35697150-7ecd-e111-bb59-0022644237b5
uqs pushed a commit to freebsd/freebsd-ports that referenced this issue May 22, 2020
From ChangeLog: https://github.com/mithrandie/csvq/releases/v1.13.0

 * Support LATERAL join. mithrandie/csvq#24)
 * Support USING join condition in FULL OUTER JOIN.
 * Support $XDG_CONFIG_HOME environment variable for configuration files.

PR:	246632
Submitted by:	vulcan@wired.sh (maintainer)


git-svn-id: svn+ssh://svn.freebsd.org/ports/head@536207 35697150-7ecd-e111-bb59-0022644237b5
uqs pushed a commit to freebsd/freebsd-ports that referenced this issue May 22, 2020
From ChangeLog: https://github.com/mithrandie/csvq/releases/v1.13.0

 * Support LATERAL join. mithrandie/csvq#24)
 * Support USING join condition in FULL OUTER JOIN.
 * Support $XDG_CONFIG_HOME environment variable for configuration files.

PR:	246632
Submitted by:	vulcan@wired.sh (maintainer)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants