Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: rename arguments of IO functions #935

Merged
merged 7 commits into from
Mar 19, 2024
Merged

refactor!: rename arguments of IO functions #935

merged 7 commits into from
Mar 19, 2024

Conversation

eitsupi
Copy link
Collaborator

@eitsupi eitsupi commented Mar 18, 2024

No description provided.

@eitsupi eitsupi changed the title refactor!: rename the first argument to source and not to allow positional arguments for others refactor!: rename arguments of IO functions Mar 18, 2024
@eitsupi eitsupi marked this pull request as ready for review March 18, 2024 15:15
Copy link
Collaborator

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, a couple of comments

Comment on lines +12 to +13
- In `<DataFrame>$write_*` functions, the first argument is now `file`.
- In `<LazyFrame>$sink_*` functions, the first argument is now `path`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why those names are different, it looks like an inconsistency upstream (there are several issues about inconsistent naming/behavior across write/scan/sink functions)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

I believe that Python uses different names by design because these two are of different classes.
str | Path | IO[str] | IO[bytes] v.s. str | Path
https://github.com/pola-rs/polars/blob/ec8f5e3a414c857a65bbb3db83720275282b14b4/py-polars/polars/dataframe/frame.py#L2579
https://github.com/pola-rs/polars/blob/ec8f5e3a414c857a65bbb3db83720275282b14b4/py-polars/polars/lazyframe/frame.py#L2282

I think this is important in terms of compatibility with common file operations using the IO module.
https://docs.python.org/3/library/io.html

In the R world, it may not be worthwhile to make a distinction between these, so we may as well unify them into file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think having a single file arg is better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I checked again carefully and this is something else at the Rust level, so I still think another argument name makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this argument. I think there's no point in having file for write functions and path for sink functions so we should choose one, and the name given on the Rust side shouldn't matter for this choice.

I think having a single file arg is better.

Actually path should be preferred since those write/sink functions might be able to create hive partitions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name given on the Rust side shouldn't matter for this choice.

Since these are actually implemented differently at the Rust level, I think we need to respect the variable names on the Rust side.
https://docs.pola.rs/user-guide/io/csv/

@@ -48,11 +49,12 @@ pl_scan_ndjson = function(

# check if url link and predownload, wrap in result, robj_to! can unpack R-result
args[["path"]] = lapply(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also be .args I guess

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do that later.

@eitsupi eitsupi merged commit bf02ca7 into main Mar 19, 2024
25 checks passed
@eitsupi eitsupi deleted the io-arguments branch March 19, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants