Pandas version checks
Reproducible Example
When I want to groupby a dataframe(df, shape[0]~1M) and apply a dataframe returning function, i.e.
def func(x):
y = do something to x
return pd.DataFrame(y) # y.shape[0] >= 1
df.groupby('some_columns').apply(func)
I try this process with df.head(100) and get the result as I want.
But when submit with the full df, it unfinished even after 12h, a check run time with progess_apply from tqdm package, and find progress bar end after less than 15min, the job, however, still running just like before and do not stop several hours later.
Btw, I try it with a ordinary for loop and finished after about 6h.
Installed Versions
version: 2.0.3
report a error after input pd.show_versions()
SystemError: initialization of _internal failed without raising an exception
Prior Performance
No response