Skip to content

Conversation

@rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Dec 21, 2025

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.
  • If I used AI to develop this pull request, I prompted it to follow AGENTS.md.

Followup to #63438. Via the test suite, I identified all the places where we were making the copy of an array. For those that it was clear that the caller didn't need to be making a copy, I added copy=False. This includes:

  • cases where pandas owns the memory that is creating the Index; and
  • cases where the resulting Index (or derivatives thereof) is not returned to the user

There are likely more cases where we can pass copy=False but it isn't so clear just from the surrounding code. One large case that is being skipped here is calls that come from maybe_convert_objects. This is used in a variety of places where it would be okay to not copy, but a copy is being made in maybe_convert_objects because not all uses are safe. We could maybe add a copy_index argument here.

@rhshadrach rhshadrach marked this pull request as ready for review December 21, 2025 16:32
@rhshadrach rhshadrach added Performance Memory or execution speed performance Index Related to the Index class or subclasses labels Dec 21, 2025
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!
Tried to verify all cases with a second eye to ensure it's fine to not copy in those cases, and looking good

# because it can't have freq if it has NaTs
# _with_infer needed for test_fillna_categorical
return Index._with_infer(result, name=self.name)
return Index._with_infer(result, name=self.name, copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, this one is not needed I think because result returned from putmask above is an Index, and so we already do a shallow copy by default.

But no harm in keeping it to avoid confusion ;)

except (TypeError, ValueError):
# let's instead try with a straight Index
self = Index(self._values)
self = Index(self._values, copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ._values of a MultiIndex is essentially always already a copy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but also any place we're passing in ._values it is always safe to not make a copy. If ._values is user-owned data, that is a problem in and of itself.

from pandas import TimedeltaIndex

value = TimedeltaIndex(td64arr, name=name)
value = TimedeltaIndex(td64arr, name=name, copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one in theory depends on the input, if we want to_timedelta similarly copy array input like the Index constructor.

With this branch:

>>> arr = np.array([1, 2, 3], dtype="timedelta64[ns]")
>>> idx = pd.to_timedelta(arr)
>>> idx
TimedeltaIndex(['0 days 00:00:00.000000001', '0 days 00:00:00.000000002',
                '0 days 00:00:00.000000003'],
               dtype='timedelta64[ns]', freq=None)
>>> arr[0] = 100
>>> idx
TimedeltaIndex(['0 days 00:00:00.000000100', '0 days 00:00:00.000000002',
                '0 days 00:00:00.000000003'],
               dtype='timedelta64[ns]', freq=None)

So idx got modified (incorrectly).

Maybe could check for td64arr is arg and in that case do copy? (of course, to_timedelta then has no way to disable the copy, but if you want that you can also use the Index constructor)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was thrown off by the copy=False on L240 here. Is the use of is here safe, or can np.asarray return a view so that the arrays have different ids but still share the same underlying memory?

# https://github.com/pandas-dev/pandas/issues/24304
# convert ndarray[period] -> PeriodIndex
return PeriodIndex(values, freq=freq).asi8
return PeriodIndex(values, freq=freq, copy=False).asi8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input is an object array of Periods, that needs to be converted to integers, so the copy=False won't have any effect I suppose?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, this will always be object. Can revert.

@jorisvandenbossche
Copy link
Member

One large case that is being skipped here is calls that come from maybe_convert_objects. This is used in a variety of places where it would be okay to not copy, but a copy is being made in maybe_convert_objects because not all uses are safe. We could maybe add a copy_index argument here.

Are you thinking here about the DatetimeIndex/TimedeltaIndex/PeriodIndex calls inside maybe_convert_objects? Because they always get objects here, so I would assume this will create new data anyway?

@rhshadrach
Copy link
Member Author

Are you thinking here about the DatetimeIndex/TimedeltaIndex/PeriodIndex calls inside maybe_convert_objects?

Ah, I forgot those were the only calls here; indeed copy=False would not have an impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Index Related to the Index class or subclasses Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants