-
-
Notifications
You must be signed in to change notification settings - Fork 19k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np
import pandas as pd
rng = np.random.default_rng()
n_rows = 3000000
timestamps = rng.random(n_rows) * 1000
theta = rng.random(n_rows) * 666
random_df = pd.DataFrame({"timestamp":timestamps, "theta": theta})
random_df.sort_values("timestamp")
random_df.to_csv("D:\\random_df.csv")
test_df = pd.read_csv("D:\\random_df.csv")
current_time = 0
for trial in np.arange(0,50):
end_time = current_time + 10.0
selected_data = test_df.loc[(test_df["timestamp"]>current_time) & (test_df["timestamp"]<end_time),"theta"]
print(f"trial {trial}, {selected_data.shape[0]} rows found")
if selected_data.shape[0]==0:
selected_data = test_df.loc[(test_df["timestamp"]>current_time) & (test_df["timestamp"]<end_time),"theta"]
print(f"tried again, {selected_data.shape[0]} rows found")
current_time = end_time + 1.0
Issue Description
Hi all, I'm trying to select data from a large (3 million rows, 0.5GB) dataframe that I created previously and saved as a csv, then read back into a df. Randomly and without throwing any errors, selecting data based on some condition returns an empty series, even though the data exists. If I run the same code multiple times, the selection of data fails for different subsets of data. If within the same code I check whether an empty df has been returned and then try to select the exact same data again, the data is often (but not always) found. If this is a memory issue, it seems like an error should be thrown. Thanks!!
Expected Behavior
Data is selected on the first try or an error is thrown if it's a memory issue.
Installed Versions
INSTALLED VERSIONS
commit : 4665c10
python : 3.13.7
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : Intel64 Family 6 Model 165 Stepping 5, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_United States.1252
pandas : 2.3.2
numpy : 2.3.3
pytz : 2025.2
dateutil : 2.9.0.post0
pip : 25.2
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : 1.4.2
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : 2.11.0
odfpy : None
openpyxl : None
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : None
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2025.2
qtpy : None
pyqt5 : None
None