Skip to content

_cook_pp: trailing NaN in pw_sem after ffill() causes spurious "pp_df has missing values" warnings #95

@samuelwnaylor

Description

@samuelwnaylor

In _cook_pp, after invalid bins are nulled out, pw_sem_{pre/post} is filled with ffill():

pp_df[pw_sem_col] = pp_df[pw_sem_col].ffill()

ffill() only propagates values forward (low to high wind speed). If the last N bins near cutout wind speed are all invalid (insufficient data), they have no subsequent valid value to propagate from, so they remain NaN. This causes the "pp_df has missing values" warning to fire on every turbine/reference pair, including every bootstrap sample, even when there is no meaningful data quality issue.

Potential fix

Add bfill() after ffill() to fill any remaining trailing NaNs with the nearest valid value below cutout:

# before
pp_df[pw_sem_col] = pp_df[pw_sem_col].ffill()

# after
pp_df[pw_sem_col] = pp_df[pw_sem_col].ffill().bfill()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions