Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Previously we were loading each file sequentially. As each file is independent of the rest, if we can work on them in parallel we could speed up the process. Using concurrent.futures (Python 3.2+) we can use an executer to schedule the tasks. This lead to the following stats on my machine (Macbook Air 2012, dual i5) running against Pepys and incrementing every endnote by 1: Existing code: 65.47 real 64.20 user 0.47 sys 61.36 real 61.03 user 0.23 sys 69.37 real 67.36 user 0.74 sys With ThreadPoolExecutor 29.19 real 48.03 user 2.85 sys 28.06 real 46.90 user 2.75 sys 28.26 real 47.19 user 2.74 sys With ProcessPoolExecutor 27.89 real 101.88 user 0.61 sys 28.03 real 101.86 user 0.58 sys 27.77 real 101.56 user 0.56 sys Given that, I went with ProcessPoolExecutor, but either is a halving of time over the original.
- Loading branch information