Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Improve performance of aggregation feature calculation #209
We had logic that checked for the case where the dataframe we were aggregating on was empty or had no related instances. This logic involved a slow python loop. Due to how we fill in missing values later in the function, this check was unnecessary and could be removed.
In some benchmarks, I saw 25% speeds ups for calculations.
Also, added test for the empty child dataframe case, which was previously untested.
@@ Coverage Diff @@ ## master #209 +/- ## ========================================== + Coverage 93.46% 93.48% +0.02% ========================================== Files 71 71 Lines 7784 7757 -27 ========================================== - Hits 7275 7252 -23 + Misses 509 505 -4