Skip to content

Add Window Functions in Polars notebook#103

Merged
Haleshot merged 4 commits intomarimo-team:mainfrom
henryharbeck:window
May 24, 2025
Merged

Add Window Functions in Polars notebook#103
Haleshot merged 4 commits intomarimo-team:mainfrom
henryharbeck:window

Conversation

@henryharbeck
Copy link

📝 Summary

The PR adds the Window Functions notebook in the Polars course.

It covers:

  • Partitioning, ordering and mapping strategies within Polars' window functions
  • References some differences and conveniences Polars offers compared to window functions in SQL
  • Gives a tip on how to re-use a window definition

Relates to: #40

I welcome any feedback or suggestions!

📋 Checklist

  • I have included package dependencies in the notebook file using --sandbox
  • If adding a course, include a README.md
  • Keep language direct and simple.

# "polars==1.29.0",
# "pyarrow==20.0.0",
# "sqlglot==26.16.4",
# ]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requires duckdb, pyarrow and sqlglot to execute the SQL cell present

def _(df, pl):
(
df.with_columns(
is_weekday=pl.col("date").dt.is_business_day(),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI that is_business_day was added after Polars 1.24.0, which is currently the latest version available in WASM.
Happy to update if this is an issue.

@Haleshot Haleshot added the enhancement New feature or request label May 23, 2025
Copy link
Contributor

@Haleshot Haleshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this notebook contrib! Really great walkthrough of the topic at hand & clear progression through window functions concepts & the examples build well on each other. The dataset is simple and the explanations are easy to follow and relevant to chosen data.

One tiny naming nit/doubt I have: in the window reuse example, I wonder if cumulative_daily_revenue might be slightly misleading since it's actually cumulative revenue of each day (across channels) rather than across days.

The SQL cell inclusion and mapping strategies sections are particularly helpful. Thanks again for a great contrib ❤️

PS: I was crosschecking and playing around with mo.ui.dataframe and passing the polars queries inside it & applying transformations via the UI to cross-check the cell outputs 😄

Haleshot
Haleshot previously approved these changes May 23, 2025
@Haleshot Haleshot merged commit a50306f into marimo-team:main May 24, 2025
1 check passed
@henryharbeck
Copy link
Author

Hi @Haleshot,

Thanks very much for the positive feedback, the quick review and merge!

One tiny naming nit/doubt I have: in the window reuse example, I wonder if cumulative_daily_revenue might be slightly misleading since it's actually cumulative revenue of each day (across channels) rather than across days.

Thinking about it more, this was really the only metric / calculation that didn't make much sense. I.e., it doesn't make much sense to cumulate across channel as there is no natural order (compared to cumulating across dates). I removed it given there are 3 new columns re-using the same window, enough to support what is being shown!

Thanks again!

@henryharbeck henryharbeck deleted the window branch May 24, 2025 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants