Skip to content

Add interpolate: interior forward-fill for numeric Polars columns#686

Merged
tschm merged 10 commits intomainfrom
copilot/add-interpolate-function
Apr 25, 2026
Merged

Add interpolate: interior forward-fill for numeric Polars columns#686
tschm merged 10 commits intomainfrom
copilot/add-interpolate-function

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 25, 2026

  • Add interpolate function to src/jquantstats/data.py
  • Export interpolate from src/jquantstats/__init__.py
  • Apply review feedback:
    • Replace hard-coded _NUMERIC_TYPES with s.dtype.is_numeric() (future-proof, consistent with Polars)
    • Fix __row_idx__ collision: generate a unique temp column name if __row_idx__ already exists in the input
    • Compute first/last valid indices without materializing all indices (arg_max() on boolean mask instead of arg_true())
    • Add regression test test_existing_row_idx_column_is_preserved for the collision case
  • All 11 interpolate tests pass

Copilot AI linked an issue Apr 25, 2026 that may be closed by this pull request
Copilot AI changed the title [WIP] Add interpolate function for forward-fill in Polars DataFrames Add interpolate: interior forward-fill for numeric Polars columns Apr 25, 2026
Copilot AI requested a review from tschm April 25, 2026 13:21
@tschm tschm marked this pull request as ready for review April 25, 2026 13:23
Copilot AI review requested due to automatic review settings April 25, 2026 13:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new public Polars utility interpolate(df) to forward-fill numeric columns only within each column’s interior range (between first/last non-null), preserving leading/trailing nulls and leaving non-numeric columns unchanged.

Changes:

  • Implement interpolate(df: pl.DataFrame) -> pl.DataFrame in src/jquantstats/data.py (with a temporary row-index column used for interior masking).
  • Export interpolate from the package top-level in src/jquantstats/__init__.py.
  • Add unit tests in tests/test_jquantstats/test_data.py covering expected filling/preservation behavior and ensuring the temp column isn’t present in output.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
tests/test_jquantstats/test_data.py Adds TestInterpolate to validate interior forward-fill semantics and output cleanliness.
src/jquantstats/data.py Introduces _NUMERIC_TYPES and the interpolate implementation using a temporary __row_idx__ mask.
src/jquantstats/init.py Re-exports interpolate and includes it in __all__.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/jquantstats/data.py Outdated
Comment on lines +84 to +96
_NUMERIC_TYPES: frozenset[type[pl.DataType]] = frozenset(
{
pl.Int8,
pl.Int16,
pl.Int32,
pl.Int64,
pl.UInt8,
pl.UInt16,
pl.UInt32,
pl.UInt64,
pl.Float32,
pl.Float64,
}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numeric-column detection here uses a hard-coded _NUMERIC_TYPES set, which diverges from the repo’s existing convention of using Polars’ dtype.is_numeric() (and will also miss numeric dtypes like Decimal/Int128 if present). Consider removing _NUMERIC_TYPES and using df[col].dtype.is_numeric() / dtype.is_numeric() to decide whether to interpolate a column, which keeps behavior consistent and future-proof across Polars versions.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f15c54a. Replaced _NUMERIC_TYPES with s.dtype.is_numeric(). Also fixed the __row_idx__ collision (unique temp column name via while tmp_col in df.columns) and the arg_true() efficiency issue (arg_max() on the boolean mask instead). Added test_existing_row_idx_column_is_preserved to cover the collision case.

Comment thread src/jquantstats/data.py Outdated
Comment on lines 140 to 153
mask = (pl.col("__row_idx__") >= pl.lit(first_valid_idx)) & (
pl.col("__row_idx__") <= pl.lit(last_valid_idx)
)
filled_col = pl.when(mask).then(pl.col(col).fill_null(strategy="forward")).otherwise(pl.col(col)).alias(col)
out.append(filled_col)
else:
out.append(pl.col(col))

return df.with_columns(pl.int_range(0, df.height).alias("__row_idx__")).select(out)


def _subtract_risk_free(dframe: pl.DataFrame, rf: float | pl.DataFrame, date_col: str) -> pl.DataFrame:
"""Subtract the risk-free rate from all numeric columns in the DataFrame.

Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The temporary __row_idx__ column name can collide with a user-provided column of the same name. In that case this implementation will overwrite the user column in with_columns(...), and select(out) will also include a (modified) __row_idx__ in the output because the loop iterates over all df.columns. Use a temp column name guaranteed not to exist in df.columns (e.g., generate a unique name), and ensure it’s excluded from out even if the input already had a __row_idx__ column.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already applied in f15c54a — the implementation now picks a temp column name via while tmp_col in df.columns: tmp_col = f"_{tmp_col}_" so it can never collide with a user column, and test_existing_row_idx_column_is_preserved verifies that a pre-existing __row_idx__ column is both untouched and retained in the output.

Comment thread src/jquantstats/data.py Outdated
Comment thread tests/test_jquantstats/test_data.py
tschm and others added 4 commits April 25, 2026 17:34
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ient first/last idx, collision test

Agent-Logs-Url: https://github.com/Jebel-Quant/jquantstats/sessions/6b1a72b6-ed84-4742-a62c-25875b57d5d1

Co-authored-by: tschm <2046079+tschm@users.noreply.github.com>
…' into copilot/add-interpolate-function

# Conflicts:
#	src/jquantstats/data.py
#	tests/test_jquantstats/test_data.py

Co-authored-by: tschm <2046079+tschm@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tschm tschm merged commit e52cfb7 into main Apr 25, 2026
53 checks passed
@tschm tschm deleted the copilot/add-interpolate-function branch April 25, 2026 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

There should be an interpolate function

3 participants