-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
PERF: Pass copy=False to Index constructor #63451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
Tried to verify all cases with a second eye to ensure it's fine to not copy in those cases, and looking good
| # because it can't have freq if it has NaTs | ||
| # _with_infer needed for test_fillna_categorical | ||
| return Index._with_infer(result, name=self.name) | ||
| return Index._with_infer(result, name=self.name, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, this one is not needed I think because result returned from putmask above is an Index, and so we already do a shallow copy by default.
But no harm in keeping it to avoid confusion ;)
| except (TypeError, ValueError): | ||
| # let's instead try with a straight Index | ||
| self = Index(self._values) | ||
| self = Index(self._values, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ._values of a MultiIndex is essentially always already a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but also any place we're passing in ._values it is always safe to not make a copy. If ._values is user-owned data, that is a problem in and of itself.
pandas/core/tools/timedeltas.py
Outdated
| from pandas import TimedeltaIndex | ||
|
|
||
| value = TimedeltaIndex(td64arr, name=name) | ||
| value = TimedeltaIndex(td64arr, name=name, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one in theory depends on the input, if we want to_timedelta similarly copy array input like the Index constructor.
With this branch:
>>> arr = np.array([1, 2, 3], dtype="timedelta64[ns]")
>>> idx = pd.to_timedelta(arr)
>>> idx
TimedeltaIndex(['0 days 00:00:00.000000001', '0 days 00:00:00.000000002',
'0 days 00:00:00.000000003'],
dtype='timedelta64[ns]', freq=None)
>>> arr[0] = 100
>>> idx
TimedeltaIndex(['0 days 00:00:00.000000100', '0 days 00:00:00.000000002',
'0 days 00:00:00.000000003'],
dtype='timedelta64[ns]', freq=None)
So idx got modified (incorrectly).
Maybe could check for td64arr is arg and in that case do copy? (of course, to_timedelta then has no way to disable the copy, but if you want that you can also use the Index constructor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was thrown off by the copy=False on L240 here. Is the use of is here safe, or can np.asarray return a view so that the arrays have different ids but still share the same underlying memory?
| # https://github.com/pandas-dev/pandas/issues/24304 | ||
| # convert ndarray[period] -> PeriodIndex | ||
| return PeriodIndex(values, freq=freq).asi8 | ||
| return PeriodIndex(values, freq=freq, copy=False).asi8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the input is an object array of Periods, that needs to be converted to integers, so the copy=False won't have any effect I suppose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, this will always be object. Can revert.
Are you thinking here about the DatetimeIndex/TimedeltaIndex/PeriodIndex calls inside |
Ah, I forgot those were the only calls here; indeed copy=False would not have an impact. |
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.AGENTS.md.Followup to #63438. Via the test suite, I identified all the places where we were making the copy of an array. For those that it was clear that the caller didn't need to be making a copy, I added
copy=False. This includes:There are likely more cases where we can pass
copy=Falsebut it isn't so clear just from the surrounding code.One large case that is being skipped here is calls that come frommaybe_convert_objects. This is used in a variety of places where it would be okay to not copy, but a copy is being made inmaybe_convert_objectsbecause not all uses are safe. We could maybe add acopy_indexargument here.