-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this issue exists on the latest version of pandas.
-
I have confirmed this issue exists on the main branch of pandas.
Reproducible Example
Hello, I encountered some memory issues when indexing a large DataFrame. I created a test case and found that indexing in the format df['a'] > 5 changes the original RangeIndex format to int64index, doubling the memory usage.
import pandas as pd
df = pd.DataFrame({'a': range(100000)})
print("Original index type:", type(df.index))
# loc operation
df_loc = df.loc[df['a'] > 5]
print("Index type after loc:", type(df_loc.index))
df_loc.info()
# iloc operation
df_iloc = df.iloc[5:]
print("Index type after iloc:", type(df_iloc.index))
df_iloc.info()
Original index type: <class 'pandas.core.indexes.range.RangeIndex'>
Index type after loc: <class 'pandas.core.indexes.base.Index'>
<class 'pandas.core.frame.DataFrame'>
Index: 99994 entries, 6 to 99999
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 99994 non-null int64
dtypes: int64(1)
memory usage: 1.5 MB
Index type after iloc: <class 'pandas.core.indexes.range.RangeIndex'>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99995 entries, 5 to 99999
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 99995 non-null int64
dtypes: int64(1)
memory usage: 781.3 KB
Installed Versions
python : 3.9.18.final.0
python-bits : 64
pandas : 2.1.4
numpy : 1.26.0
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.3
Cython : 3.0.5
pytest : None
hypothesis : None
Prior Performance
No response