Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Speedup groupby transform calculations #609
This PR updates the logic for how we calculate group by transform features.
Previously we iterated over the groups first and then the features, updating the resulting frame greedily as we went along.
Now we iterate over the features, then groups. For each feature, we accumate all the values across all groups and then update the frame just once.
Basically this improves the number of update calls from
Also added a benchmarks folder to hold the code I used to test these changes
@@ Coverage Diff @@ ## master #609 +/- ## ========================================== + Coverage 97.42% 97.42% +<.01% ========================================== Files 118 118 Lines 9526 9532 +6 ========================================== + Hits 9281 9287 +6 Misses 245 245