Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should ewm_mean preserve null values? #15258

Closed
2 tasks done
MarcoGorelli opened this issue Mar 23, 2024 · 1 comment · Fixed by #15503
Closed
2 tasks done

Should ewm_mean preserve null values? #15258

MarcoGorelli opened this issue Mar 23, 2024 · 1 comment · Fixed by #15503
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@MarcoGorelli
Copy link
Collaborator

MarcoGorelli commented Mar 23, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

In [30]: pl.Series([5.6, None, 3.7]).ewm_mean(alpha=.1, adjust=False, ignore_nulls=False)
Out[30]:
shape: (3,)
Series: '' [f64]
[
        5.6
        5.6
        5.391209
]

Log output

No response

Issue description

The second element is null, but the result forward fills the previous one. I think there's no need to, though?

Without nulls, the formula would be:

  • y_0 = x_0
  • y_1 = weighted_average([x_1, y_0], weights=[alpha, 1-alpha])
  • y_2 = weighted_average([x_2, y_1], weights=[alpha, 1-alpha])

If x_1 is null, then the null values are skipped, and the result is scaled appropriately:

  • y_0 = x_0
  • y_1 = ???
  • y_2:
    • weighted_average([x_2, y_0], weights=[alpha, (1-alpha)**2]) if ignore_nulls=False
    • weighted_average([x_2, y_0], weights=[alpha, 1-alpha]) if ignore_nulls=True

y_1 isn't used anywhere in any calculation. It doesn't need to be forward-filled from y_0, right?

Expected behavior

I think null values should be preserved. So, the output should be (imo):

shape: (3,)
Series: '' [f64]
[
        5.6
        null
        5.391209
]

I don't think this needs to be configurable either, it's easy to just add .forward_fill() to an expression chain if you want the nulls filled

As for how to go about changing it, I don't know - maybe it's fair game as a clearly documented and called-out breaking 1.0 change?

Installed versions

--------Version info---------
Polars:               0.20.16
Index type:           UInt32
Platform:             Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python:               3.11.8 (main, Feb 25 2024, 16:39:33) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               0.9.2
matplotlib:           3.8.3
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.1
pyarrow:              15.0.1
pydantic:             2.6.1
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@MarcoGorelli MarcoGorelli added bug Something isn't working python Related to Python Polars needs triage Awaiting prioritization by a maintainer labels Mar 23, 2024
@wbeardall
Copy link

I'm inclined to agree here, I'd err on the side of adding a statement to the expression chain over adding increased complexity to the interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants