[QST] Suppress warnings while import datar #165

coforfe · 2022-12-20T07:42:25Z

Feature Type

Adding new functionality to datar
Changing existing functionality in datar
Removing existing functionality in datar

Problem Description

Hi,

Would it be possible to remove the set of "Warnings" that appear when importing datar?.
Well, that is more an aesthetic issue than any other thing, specially when exporting your notebook to html or pdf.

I tried to control that with

import warnings
warnings.filterwarnings('ignore')

but the messages continue appearing.

Thanks again!
Carlos.

Feature Description

In R there is a set of functions to control that, one of them is suppressPackageStartupMessages() just for that purpose.

Additional Context

No response

The text was updated successfully, but these errors were encountered:

pwwang · 2022-12-20T15:38:32Z

Have you read about this:

https://pwwang.github.io/datar/import/#warn-about-python-reserved-names-to-be-masked-by-datar

coforfe · 2022-12-20T15:46:20Z

Yes, thanks!
That solves the issue completely.

I had not read that section yet.

Thanks again,
Carlos

pwwang · 2022-12-20T17:14:45Z

If you don't want to call options() every time you import datar, you can also try a configuration file:

https://pwwang.github.io/datar/options/#configuration-files

coforfe · 2022-12-20T17:56:02Z

Good!.

For what I am seeing so far, there is no "important" function pending to implement (dply, tidyr and forcats), right?
Even the map equivalent functions (functional programing).

The helper functions, and the the ones to change the variable type, are also very useful too.
I would very interested in having some additional functions related to dates(year, month, day, yearmon).

And, although it was not very exhaustive, I saw that in terms of execution time are fully equivalent to pandas.
In a particular case, when I included comments to explain the purpose of each pipe, I got the impression that the execution time was higher, but later I tried to reproduce it but the dataset was quite small and times with and without comments in between where equivalent.

In R I use quite frequently an equivalent wrapper to dplyr but with data.table in the backend (tidytable) which is incredibly fast, and there, you can see this benchmark, that could be useful here to compare with pandas.

Thanks again for all your help.
Carlos.

pwwang · 2022-12-20T18:15:09Z

Correct that the porting is pretty complete. But a package like purrr is not implemented yet. Some discussions were happening here: #48

For date utilities, pd_dt() is a helper to access .dt accessor of a Series or SeriesGroupBy. See
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html
You can also use base.as_date to make a datetime.datetime/.date object or an array of it.

You can try timeit for benchmarking in terms of execution time testing. In common cases, datar, especially the pandas backend, we are trying the match the speed. However, we do have some overhead in order to make the implementations match the R APIs.

pandas is slow. A backend using polars is WIP (https://github.com/pwwang/datar-polars). However, the expression design of polars is not strong enough (e.g. not being able to use predicates to select columns). So the implementation is not fully relying on the polars expression, which should be the faster way with polars. The lazy execution should be even faster, but it requires better design. It may come in the future with the polars backend. However, I believe even with current implementations, it'll be faster than pandas. polars is still young, and some of the functions/implementations may still be missing.

coforfe · 2022-12-21T10:23:29Z

Thanks for these new pointers!.

For polars, I know that something equivalent to dplyr is already going on. For the same person who is building tidytable, you can see details here:

https://github.com/markfairbanks/tidypolars

There is another alternative (pyarrow) , in terms of speed and with the capability of handling big datasets, that is growing fast in usage and new functionality that is arrow. As a matter of fact, is being co-developed by R/Python people.

https://arrow.apache.org/docs/python/compute.html#standard-compute-functions
In R you can use already most of the dplyr syntax with another dplyr equivalent package which is dbplyr (db is for multiple types of databases connectivity in the back-end).

Thanks again,
Carlos.

pwwang · 2022-12-21T19:39:11Z

Thanks for sharing.

They have been noticed for a while. Implementing pyarrow backend is a to-do.

coforfe added the enhancement New feature or request label Dec 20, 2022

coforfe closed this as completed Dec 20, 2022

pwwang changed the title ~~[ENH]~~ [QST] Suppress warnings while import datar Dec 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Suppress warnings while import datar #165

[QST] Suppress warnings while import datar #165

coforfe commented Dec 20, 2022

pwwang commented Dec 20, 2022

coforfe commented Dec 20, 2022

pwwang commented Dec 20, 2022

coforfe commented Dec 20, 2022

pwwang commented Dec 20, 2022

coforfe commented Dec 21, 2022

pwwang commented Dec 21, 2022

[QST] Suppress warnings while import datar #165

[QST] Suppress warnings while import datar #165

Comments

coforfe commented Dec 20, 2022

Feature Type

Problem Description

Feature Description

Additional Context

pwwang commented Dec 20, 2022

coforfe commented Dec 20, 2022

pwwang commented Dec 20, 2022

coforfe commented Dec 20, 2022

pwwang commented Dec 20, 2022

coforfe commented Dec 21, 2022

pwwang commented Dec 21, 2022