Skip to content

polars-cli vs Qsv sqlp #1620

Answered by jqnatividad
13minutes-yt asked this question in Q&A
Feb 25, 2024 · 8 comments · 8 replies
Discussion options

You must be logged in to vote

@13minutes-yt , this is because qsv loads and parses the CSV separately before executing the SQL and this is not exactly an apples-to-apples comparison as the queries are different and you're also bypassing the CSV parsing for the polars-cli.

However, if we also use read_csv directly in the qsv SQL query similar to the polars-cli query, and just pass a small, dummy csv as input, we get similar performance as sqlp also leverages the magic of Polars LazyFrames in reading only what it needs from the CSV to fulfill the query, i.e.

/usr/bin/time qsv sqlp -Q smalldummy.csv "select VendorID,sum(total_amount) from read_csv('taxi.csv') group by VendorID order by VendorID"
VendorID,total_amount
1,5…

Replies: 8 comments 8 replies

Comment options

You must be logged in to vote
1 reply
@13minutes-yt
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
6 replies
@jqnatividad
Comment options

@ondohotola
Comment options

@jqnatividad
Comment options

@13minutes-yt
Comment options

@jqnatividad
Comment options

Answer selected by jqnatividad
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@ondohotola
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants