Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export functions to read/scan/write data #111

Open
etiennebacher opened this issue Apr 16, 2024 · 7 comments
Open

Export functions to read/scan/write data #111

etiennebacher opened this issue Apr 16, 2024 · 7 comments
Labels

Comments

@etiennebacher
Copy link
Owner

etiennebacher commented Apr 16, 2024

So far I only exported sink_* functions because they don't risk namespace collision with other packages, while exporting write_parquet() or read_parquet() would conflict with arrow for example.

However, some users do not know the existence of pl$read_parquet() and pl$scan_parquet(), and therefore use arrow::read_parquet() and as_polars_df() which is not efficient at all. The goal of tidypolars is to replace the somewhat confusing (to R users) syntax of polars so that they don't have to deal with pl$ for instance. Therefore, I shouldn't expect them to use pl$scan_parquet().

The easy solution would be to add the "_polars" suffix for read/write functions (and potentially sink and scan for consistency?), so I would export read_parquet_polars() for instance. duckplyr has duckplyr_df_from_parquet(), so one option would be to export polars_df_from_parquet() and polars_lf_from_parquet() instead of read and scan.

Edit: not a big fan of polars_lf_from_parquet() because I like seeing all the options in the autocompletion when I type "write"

@eitsupi
Copy link

eitsupi commented Apr 17, 2024

Related issue: apache/arrow#38456

Personally, I think it would be fine to have more R-like style functions like read_parquet_polars(path, ..., as_data_frame = FALSE) in the polars package.

This would be similar to, for example, Python Polars having something like the polars.DataFrame.pipe method to make method chaining work in Python.

@etiennebacher
Copy link
Owner Author

Personally, I think it would be fine to have more R-like style functions like read_parquet_polars(path, ..., as_data_frame = FALSE) in the polars package.

Why should it be in polars? There are already functions to import and export data there so I don't see why we should duplicate those

@eitsupi
Copy link

eitsupi commented Apr 18, 2024

Why should it be in polars? There are already functions to import and export data there so I don't see why we should duplicate those

Of course, it doesn't have to be present, but the mere sugar syntax is present in Python Polars.

Also, as for write_*, I think the incompatibility of the pipe |> and the $ operator reinforces the need for it to exist as a function.
e.g. we should write like pl$DataFrame(...)$some_methods(...) |> some_function(...) |> (\(x) x$write_parquet(...))()

@ginolhac
Copy link

the incompatibility of the pipe |> and the $ operator reinforces the need for it to exist as a function.

Of note, in R4.3 (and probably 4.2, I am not sure) the native placeholder _ works with the $

> women |> _$weight
 [1] 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164

@etiennebacher
Copy link
Owner Author

If we introduce this kind of functions in polars itself, then we'd have two kind of syntax for the same thing, e.g pl$read_parquet() and read_parquet_polars(). Wouldn't that lead to confusion, similarly as in the arrow issue you linked above?

@eitsupi
Copy link

eitsupi commented Apr 18, 2024

Of note, in R4.3 (and probably 4.2, I am not sure) the native placeholder _ works with the $

I think this is not the case in this case. x |> _$foo() is not allowed.

@eitsupi
Copy link

eitsupi commented Apr 18, 2024

Wouldn't that lead to confusion, similarly as in the arrow issue you linked above?

The problem with the arrow package is that the function names are inconsistent.
In other words, there are only read_parquet and read_csv_arrow instead of read_csv and read_paquet_arrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants