Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Polars conversion utilities #6455

Merged
merged 5 commits into from
May 27, 2024
Merged

Conversation

pranavvp16
Copy link
Contributor

Adds conversion functions from polars to pandas and vice versa, with relevant tests for the functions.
#5423

@fkiraly fkiraly changed the title [ENH] Polars conversion utlities [ENH] Polars conversion utilities May 20, 2024
@fkiraly fkiraly added module:datatypes datatypes module: data containers, checkers & converters enhancement Adding new functionality labels May 20, 2024
@pranavvp16
Copy link
Contributor Author

pranavvp16 commented May 24, 2024

I'm thinking to add a check in the convert_pandas_to_polars converter which fails when the index of the dataframe is of pandas.period type with the error ComputeError: cannot create series from Extension("pandas.period", Int64, Some("{\"freq\": \"M\"}")) which comes internally from polars. The fix to this may be

obj.index = obj.index.to_timestamp()

So we have two solutions now

  1. Raise a error when the pandas object has index of type pandas.period
  2. convert the index internally by to_timestamp()

I ran into this error when trying to convert the load_airline dataset into polars. Also I would like to know if I can create a deepcopy of the pandas obj being passed as we are renaming the index columns of the dataframe which in turn changes the index names of the original pandas dataframe too.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - we may have to change these loose converters in later PR when we integrate, but that will be easily possible given that this is still all private.

@fkiraly fkiraly merged commit 0bc4bef into sktime:main May 27, 2024
53 checks passed
geetu040 pushed a commit to geetu040/sktime that referenced this pull request Jun 4, 2024
Adds conversion functions from polars to pandas and vice versa, with
relevant tests for the functions.
sktime#5423
fkiraly pushed a commit to sktime/skpro that referenced this pull request Aug 18, 2024
adds index support as part of #440 and is used to sync up polars
conversion utilities between skpro and sktime.

Correponding sktime pr for polars conversion utilities is
sktime/sktime#6455.

In this pr:

If a pandas Dataframe is a `from_type` and polars frame is a `to_type`
then during the conversion, we will save the index (assumed never to be
in multi-index format) and insert it as an individual column with column
name `__index__`. Then the resulting pandas dataframe will be converted
to a polars dataframe.

In the inverse function, if we are converting from polars dataframe to
pandas dataframe, if the column `__index__` exists in the pandas
dataframe post-conversion, then we will map that column to the index
before returning the pandas Dataframe

After this is merged, #447 will be implemented as a `polars` only
estimator. tests will also be written to check polars input end to end
and pandas input and output through the polars estimator (i.e pandas
input into polars estimator -> polars predictions -> pandas output)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:datatypes datatypes module: data containers, checkers & converters
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants