Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Suppress warnings while import datar #165

Closed
1 of 3 tasks
coforfe opened this issue Dec 20, 2022 · 7 comments
Closed
1 of 3 tasks

[QST] Suppress warnings while import datar #165

coforfe opened this issue Dec 20, 2022 · 7 comments
Labels
enhancement New feature or request

Comments

@coforfe
Copy link

coforfe commented Dec 20, 2022

Feature Type

  • Adding new functionality to datar

  • Changing existing functionality in datar

  • Removing existing functionality in datar

Problem Description

Hi,

Would it be possible to remove the set of "Warnings" that appear when importing datar?.
Well, that is more an aesthetic issue than any other thing, specially when exporting your notebook to html or pdf.

I tried to control that with

import warnings
warnings.filterwarnings('ignore')

but the messages continue appearing.

Thanks again!
Carlos.

Feature Description

In R there is a set of functions to control that, one of them is suppressPackageStartupMessages() just for that purpose.

Additional Context

No response

@coforfe coforfe added the enhancement New feature or request label Dec 20, 2022
@pwwang
Copy link
Owner

pwwang commented Dec 20, 2022

@coforfe
Copy link
Author

coforfe commented Dec 20, 2022

Yes, thanks!
That solves the issue completely.

I had not read that section yet.

Thanks again,
Carlos

@coforfe coforfe closed this as completed Dec 20, 2022
@pwwang
Copy link
Owner

pwwang commented Dec 20, 2022

If you don't want to call options() every time you import datar, you can also try a configuration file:

https://pwwang.github.io/datar/options/#configuration-files

@coforfe
Copy link
Author

coforfe commented Dec 20, 2022

Good!.

For what I am seeing so far, there is no "important" function pending to implement (dply, tidyr and forcats), right?
Even the map equivalent functions (functional programing).

The helper functions, and the the ones to change the variable type, are also very useful too.
I would very interested in having some additional functions related to dates(year, month, day, yearmon).

And, although it was not very exhaustive, I saw that in terms of execution time are fully equivalent to pandas.
In a particular case, when I included comments to explain the purpose of each pipe, I got the impression that the execution time was higher, but later I tried to reproduce it but the dataset was quite small and times with and without comments in between where equivalent.

In R I use quite frequently an equivalent wrapper to dplyr but with data.table in the backend (tidytable) which is incredibly fast, and there, you can see this benchmark, that could be useful here to compare with pandas.

Thanks again for all your help.
Carlos.

@pwwang
Copy link
Owner

pwwang commented Dec 20, 2022

Correct that the porting is pretty complete. But a package like purrr is not implemented yet. Some discussions were happening here: #48

For date utilities, pd_dt() is a helper to access .dt accessor of a Series or SeriesGroupBy. See
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html
You can also use base.as_date to make a datetime.datetime/.date object or an array of it.

You can try timeit for benchmarking in terms of execution time testing. In common cases, datar, especially the pandas backend, we are trying the match the speed. However, we do have some overhead in order to make the implementations match the R APIs.

pandas is slow. A backend using polars is WIP (https://github.com/pwwang/datar-polars). However, the expression design of polars is not strong enough (e.g. not being able to use predicates to select columns). So the implementation is not fully relying on the polars expression, which should be the faster way with polars. The lazy execution should be even faster, but it requires better design. It may come in the future with the polars backend. However, I believe even with current implementations, it'll be faster than pandas. polars is still young, and some of the functions/implementations may still be missing.

@pwwang pwwang changed the title [ENH] [QST] Suppress warnings while import datar Dec 20, 2022
@coforfe
Copy link
Author

coforfe commented Dec 21, 2022

Thanks for these new pointers!.

For polars, I know that something equivalent to dplyr is already going on. For the same person who is building tidytable, you can see details here:

There is another alternative (pyarrow) , in terms of speed and with the capability of handling big datasets, that is growing fast in usage and new functionality that is arrow. As a matter of fact, is being co-developed by R/Python people.

Thanks again,
Carlos.

@pwwang
Copy link
Owner

pwwang commented Dec 21, 2022

Thanks for sharing.

They have been noticed for a while. Implementing pyarrow backend is a to-do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants