Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Filter speed performance regression compared to 0.4 #601
load (this is fine)
filter command in question in this issue
note, behavior is same when
At commit 5b77d24
But back in Modin 0.4 this took half the time
The input file can be found here on google drive.
In addition to this performance regression, we should explore what is the bottlneck in Modin here even in 0.4, because the Pandas filter time is only 4.04 seconds!
So speed problem totally solved using
We should just fix the regression instead of warning users not to use it.
The slowdown is coming from the separation of the data and the metadata. When a mask is performed, the index needs to communicate the updates to the data, which right now is done through a
Just to make sure we are on same page - there are possibly to different "regression" issues here, the difference between modin 0.4 and latest - but more importantly the difference between using
and the currenttly 100 times faster query method:
Agree - just making Modn fast in all cases is the best plan, what makes Modin so appealing is that it can be used as a drop in scalable Pandas.