Commit 1f80544
Parallelise endnote processing
Previously we were loading each file sequentially. As each file is independent of the rest, if we can work on them in parallel we could speed up the process. Using concurrent.futures (Python 3.2+) we can use an executer to schedule the tasks. This lead to the following stats on my machine (Macbook Air 2012, dual i5) running against Pepys and incrementing every endnote by 1:
Existing code:
65.47 real 64.20 user 0.47 sys
61.36 real 61.03 user 0.23 sys
69.37 real 67.36 user 0.74 sys
With ThreadPoolExecutor
29.19 real 48.03 user 2.85 sys
28.06 real 46.90 user 2.75 sys
28.26 real 47.19 user 2.74 sys
With ProcessPoolExecutor
27.89 real 101.88 user 0.61 sys
28.03 real 101.86 user 0.58 sys
27.77 real 101.56 user 0.56 sys
Given that, I went with ProcessPoolExecutor, but either is a halving of time over the original.1 parent afb92f7 commit 1f80544
1 file changed
+20
-16
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
| |||
65 | 66 | | |
66 | 67 | | |
67 | 68 | | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
73 | 75 | | |
74 | | - | |
75 | | - | |
76 | | - | |
| 76 | + | |
77 | 77 | | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
82 | 82 | | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
87 | 87 | | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
88 | 92 | | |
89 | 93 | | |
90 | 94 | | |
0 commit comments