Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement support for multi-character comments in read_csv #12519

Merged

Conversation

dmitrybugakov
Copy link
Contributor

@dmitrybugakov dmitrybugakov commented Nov 16, 2023

This pull request introduces the capability to handle multi-character and single-byte comment identifiers in the read_csv function, addressing the feature request in issue #10583.

@ritchie46
Copy link
Member

Sorry, I don't want this in polars. I am sorry about this one. This had to be documented better.

@ritchie46 ritchie46 closed this Nov 16, 2023
@ritchie46 ritchie46 reopened this Nov 16, 2023
@ritchie46
Copy link
Member

Ok, I see this one is only comments. This should be acceptable.

@dmitrybugakov dmitrybugakov force-pushed the feature/multi-char-comment-support-10583 branch from 60a8de2 to 8df1694 Compare November 16, 2023 18:37
@dmitrybugakov dmitrybugakov force-pushed the feature/multi-char-comment-support-10583 branch from 8df1694 to c1d79fb Compare November 16, 2023 18:48
@dmitrybugakov dmitrybugakov changed the title Implement Support for Multi-Character Comments in read_csv [WIP] Implement Support for Multi-Character Comments in read_csv Nov 16, 2023
@dmitrybugakov dmitrybugakov changed the title [WIP] Implement Support for Multi-Character Comments in read_csv feature(rust): Implement Support for Multi-Character Comments in read_csv Nov 16, 2023
@github-actions github-actions bot added the enhancement New feature or an improvement of an existing feature label Nov 16, 2023
@ritchie46
Copy link
Member

Thanks, can you also expose it to python as well?

#[must_use]
pub fn with_comment_char(mut self, comment_char: Option<u8>) -> Self {
self.comment_char = comment_char;
/// Set a single byte character as the comment character.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should update the comment here (it mentions 'single-byte character')

assert_eq!(df.shape(), (3, 5));

let csv = r"1,2,3,4,5
### this is a comment
Copy link
Collaborator

@alexander-beedie alexander-beedie Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would work equally well with comment_char being a single "#" character; could you update the example to something that can only work with a multi-character comment prefix?

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Nov 16, 2023

We should also probably rename the comment_char param (perhaps to comment_prefix?) as the "_char" suffix clearly indicates that it expects only a single character 🤔

@stinodego stinodego changed the title feature(rust): Implement Support for Multi-Character Comments in read_csv feat: Implement support for multi-character comments in read_csv Nov 16, 2023
@github-actions github-actions bot added python Related to Python Polars rust Related to Rust Polars labels Nov 16, 2023
if c.is_ascii() {
Some(CommentChar::Single(c as u8))
} else {
None // Or handle non-ASCII characters as needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that fallback to Multi then?

@dmitrybugakov
Copy link
Contributor Author

@alexander-beedie regarding comment char/prefix.

I've considered this approach as well. However, these changes will impact the user experience. If the Polars team is in agreement, I am ready to implement it.

@dmitrybugakov dmitrybugakov force-pushed the feature/multi-char-comment-support-10583 branch 4 times, most recently from c7184d9 to 850a61c Compare November 17, 2023 14:37
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Nov 18, 2023

I've considered this approach as well. However, these changes will impact the user experience. If the Polars team is in agreement, I am ready to implement it.

You can use the decorator @deprecate_renamed_parameter on the python side to make this a seamless change; allows for parameter renames without causing disruption 👍

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dmitrybugakov. Looks great!

@ritchie46 ritchie46 merged commit fd9b760 into pola-rs:main Nov 20, 2023
26 checks passed
@dmitrybugakov dmitrybugakov deleted the feature/multi-char-comment-support-10583 branch November 20, 2023 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants