Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Add read_clipboard and DataFrame.write_clipboard #15272

Merged
merged 4 commits into from Apr 1, 2024

Conversation

CanglongCl
Copy link
Contributor

@CanglongCl CanglongCl commented Mar 25, 2024

Closes #9902

Similar to pandas.DataFrame.to_clipboard and pandas.read_clipboard, read_clipboard reads csv format text from clipboard and convert it into DataFrame whereas DataFrame.write_clipboard writes DataFrame into clipboard with csv format.

These 2 methods are useful while exploring data from Excel or other similar software in interactive environment like jupyter notebook.

Implementation

  • Reading and writing clipboard uses arboard in rust side, supporting Windows, macOS and Linux, maintained by 1Password.
  • read_clipboard just reads the clipboard then passes the result to read_csv with default separator '\t'.
  • write_clipboard just gets csv string by calling DataFrame.write_csv (but with default separator '\t') and writes to clipboard.

Difference from pandas method

pandas.read_clipboard uses regex '\\s+' as separator, but polars does not allow regex as separator in read_csv.

Compatibility

Tested Compatible:

  • Microsoft Excel (Mac)
  • Google Sheet (Online)
  • WPS (Online)

Others

Both methods are only tested on M1 macOS. It would be much appreciated if someone could test on Linux and Windows.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Mar 25, 2024
Copy link

codecov bot commented Mar 25, 2024

Codecov Report

Attention: Patch coverage is 42.50000% with 23 lines in your changes are missing coverage. Please review.

Project coverage is 81.34%. Comparing base (252702a) to head (56002cc).
Report is 14 commits behind head on main.

Files Patch % Lines
py-polars/src/functions/io.rs 0.00% 18 Missing ⚠️
py-polars/polars/io/clipboard.py 72.72% 3 Missing ⚠️
py-polars/polars/dataframe/frame.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15272      +/-   ##
==========================================
+ Coverage   81.32%   81.34%   +0.01%     
==========================================
  Files        1359     1365       +6     
  Lines      176076   176655     +579     
  Branches     2526     2526              
==========================================
+ Hits       143191   143694     +503     
- Misses      32402    32478      +76     
  Partials      483      483              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46
Copy link
Member

I am not convinced by this one. Two things I am concerned about.

  • New dependencies and binary size. (Can you check how much this increments by the new dependencies)
  • Why default to tab separated data? What do we think about limited dtype support?

Cargo.toml Outdated Show resolved Hide resolved
@CanglongCl
Copy link
Contributor Author

CanglongCl commented Mar 27, 2024

I am not convinced by this one. Two things I am concerned about.

  • New dependencies and binary size. (Can you check how much this increments by the new dependencies)
  • Why default to tab separated data? What do we think about limited dtype support?

I'm not kinda familiar with this. I hope I did the right thing. I use my mac and make build-release in ./py-polars; and then I check the binary size of py-polars/polars/polars.abi3.so

  • Before (252702a): 69.3 MB (69,303,536 Bytes)
  • After (56002cc): 69.3 MB (69,306,240 Bytes)
  • Difference: +2.704 KB

About default separator, while copying data in Excel, Google Sheet or WPS, the data will be transformed into csv with separator \t. Since these 2 methods are majorly used for exploring data in these softwares, \t is used as default. Similarly, pandas also does the same thing.

@cjackal
Copy link
Contributor

cjackal commented Mar 27, 2024

  • Why default to tab separated data?

Adding to @CanglongCl 's, rendered html table is another source of clipboard I/O which defaults to tab separator. "I/O of tsv strings via clipboard" is the force of nature, I think.

@CanglongCl CanglongCl requested a review from orlp March 29, 2024 05:31
@ritchie46
Copy link
Member

Alright. Let's give this a try. I understand why this can be nice. Thank you @CanglongCl

@ritchie46 ritchie46 merged commit 82f717b into pola-rs:main Apr 1, 2024
23 checks passed
@CanglongCl CanglongCl deleted the clipboard branch April 1, 2024 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add from_clipboard and to_clipboard methods
4 participants