Skip to content

Commit 799727d

Browse files
authored
Update II Data engineering toolbox.py
1 parent b1b3185 commit 799727d

File tree

1 file changed

+28
-1
lines changed

1 file changed

+28
-1
lines changed

Introduction to Data Engineering/II Data engineering toolbox.py

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,4 +62,31 @@
6262
*option b """
6363

6464
#---
65-
#
65+
#Why parallel computing?
66+
"""Which of these statements is not correct?
67+
68+
ok 1 Parallel computing can be used to speed up any task.
69+
2 computing can optimize the use of multiple processing units.
70+
3 Parallel computing can optimize the use of memory between several machines.
71+
72+
(ome tasks might be too small to benefit from parallel computing due to the communication overhead.)"""
73+
74+
#---
75+
#From task to subtasks
76+
"""You will be using the multiprocessor.Pool API which allows you to distribute your workload over several processes. """
77+
# to apply a function over multiple cores
78+
@print_timing
79+
def parallel_apply(apply_func, groups, nb_cores):
80+
with Pool(nb_cores) as p:
81+
results = p.map(apply_func, groups)
82+
return pd.concat(results)
83+
84+
# Parallel apply using 1 core
85+
parallel_apply(take_mean_age, athlete_events.groupby('Year'), 1)
86+
87+
# Parallel apply using 2 cores
88+
parallel_apply(take_mean_age, athlete_events.groupby('Year'), 2)
89+
90+
# Parallel apply using 4 cores
91+
parallel_apply(take_mean_age, athlete_events.groupby('Year'), 4)
92+

0 commit comments

Comments
 (0)