You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When runnning Asakusafw on M3BP on a many core machine, a situation
was observed where system CPU got high. The shape of system CPU rise
is application dependent, one example is at 80-90% (all cpu busy=100%)
for 5 seconds.
By inspecting the situation, two lock contentions in task
creation/finish handling are identified.
First one is a lock in GLibc code [1] which is called from boost log
to local time local time. This lock is somewhat infamous [2] [3].
Linux perf call graph was recorded, some part of report is at [4].
After replacing local time to utc time by ad hoc patch like [5],
second lock contention of m_physical_graph_mutex in scheduler was
observed. Linux perf output is at [6].
These locks can be highly contended when many short tasks are
created/finished. If these locks can be removed or made fine-grained,
M3BP will be faster in some application executions.
First, your patch to reduce impact from timezone conversion is effective to make processing faster.
I will fix the logger to avoid __tz_convert calls.
For the second issue, I wrote a simple benchmark that creates many physical tasks: https://gist.github.com/logicmachine/f713d79f300e528c3d3ba6abea7c852f
When I run and profile this benchmark on a machine with 16 CPU cores, I confirmed that _raw_spin_lock accounts for several percent of execution time.
I think it is possible to reduce the impact from lock contention in m3bp::Scheduler, because current implementation uses naive algorithms.
I will try to fix for using better structures and algorithms to manage physical tasks.
When runnning Asakusafw on M3BP on a many core machine, a situation
was observed where system CPU got high. The shape of system CPU rise
is application dependent, one example is at 80-90% (all cpu busy=100%)
for 5 seconds.
By inspecting the situation, two lock contentions in task
creation/finish handling are identified.
First one is a lock in GLibc code [1] which is called from boost log
to local time local time. This lock is somewhat infamous [2] [3].
Linux perf call graph was recorded, some part of report is at [4].
After replacing local time to utc time by ad hoc patch like [5],
second lock contention of
m_physical_graph_mutex
in scheduler wasobserved. Linux perf output is at [6].
These locks can be highly contended when many short tasks are
created/finished. If these locks can be removed or made fine-grained,
M3BP will be faster in some application executions.
[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=time/tzset.c;h=f65116ce24577db3aca8d3d368c162ac71a8cd13;hb=HEAD#l605
[2] 16145 – localtime_r etc holds lock via __tz_convert https://sourceware.org/bugzilla/show_bug.cgi?id=16145
[3] [IMPALA-3316] convert_legacy_hive_parquet_utc_timestamps=true makes reading parquet tables 30x slower - Cloudera Open Source https://issues.cloudera.org/browse/IMPALA-3316
[4] https://gist.github.com/shino/27be5cf09d64a190975caa12e3c58160
[5] diff on top of commit
9517aa8
:[6] https://gist.github.com/shino/9290371381ad2e8a3bf743b0ff99918f
The text was updated successfully, but these errors were encountered: