[Improvement] speed up confusion matrix calculation #465

dreamerlin · 2020-12-20T06:37:11Z

This PR uses np.bincount to speed up confusion matrix calculation

codecov · 2020-12-20T06:54:44Z

Codecov Report

Merging #465 (626e073) into master (30ff6b2) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #465   +/-   ##
=======================================
  Coverage   84.67%   84.67%           
=======================================
  Files         118      118           
  Lines        8347     8348    +1     
  Branches     1366     1365    -1     
=======================================
+ Hits         7068     7069    +1     
  Misses        932      932           
  Partials      347      347

Flag	Coverage Δ
unittests	`84.66% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmaction/core/evaluation/accuracy.py	`93.18% <100.00%> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 30ff6b2...1d773d4. Read the comment docs.

innerlee · 2020-12-20T06:56:48Z

Any numbers on benchmark?

dreamerlin · 2020-12-20T12:04:01Z

classes = 10
length = 100
times = 10000

start = time.time()
for i in range(times):
    labels = np.random.randint(0, classes, length).astype(np.int64)
    pred = np.random.randint(0, classes, length).astype(np.int64)
    new_confusion_matrix(pred, labels)  # using np.bincount
end = time.time()
print(end - start)

start = time.time()
for i in range(times):
    labels = np.random.randint(0, classes, length).astype(np.int64)
    pred = np.random.randint(0, classes, length).astype(np.int64)
    confusion_matrix(pred, labels)  # original one
end = time.time()
print(end - start)

0.5623 vs 1.1669

dreamerlin · 2020-12-20T12:04:35Z

classes = 100
length = 10
times = 10000

start = time.time()
for i in range(times):
    labels = np.random.randint(0, classes, length).astype(np.int64)
    pred = np.random.randint(0, classes, length).astype(np.int64)
    new_confusion_matrix(pred, labels)
end = time.time()
print(end - start)

start = time.time()
for i in range(times):
    labels = np.random.randint(0, classes, length).astype(np.int64)
    pred = np.random.randint(0, classes, length).astype(np.int64)
    confusion_matrix(pred, labels)
end = time.time()
print(end - start)

0.5215 vs 0.5446

dreamerlin · 2020-12-20T12:04:59Z

The modification can truly speed up the calculation

kennymckormick · 2020-12-21T02:52:02Z

mmaction/core/evaluation/accuracy.py

-        index_pred = label_map[plabel]
-        confusion_mat[index_real][index_pred] += 1
+    max_label = label_set[-1]
+    label_map = np.zeros(max_label + 1, dtype=np.int64)


I think it's OK to use a dictionary instead of np array here, as the original one:

label_map = {label: i for i, label in enumerate(label_set)}

so that 4 lines -> 1 line
any performance concern here?

label_map = {label: i for i, label in enumerate(label_set)} y_pred_mapped = [label_map[i] for i in y_pred] y_real_mapped = [label_map[i] for i in y_real]

This is much slower even than the original one.

numbers would be appreciated

That's right, the for loop is much slower.

The proposal by kenny VS The original one before this PR:
for classes = 10, length = 100, times = 10000, 1.2643 vs 1.1462
for classes = 10, length = 100, times = 10000, 0.6333 vs 0.5554
even worse

dreamerlin added 2 commits December 19, 2020 23:40

a little bit faster confusion matrix

626e073

add changelog

1d773d4

kennymckormick reviewed Dec 21, 2020

View reviewed changes

kennymckormick approved these changes Dec 21, 2020

View reviewed changes

dreamerlin requested a review from innerlee December 21, 2020 05:41

innerlee merged commit 777546f into open-mmlab:master Dec 21, 2020

dreamerlin deleted the fast_conf branch January 24, 2021 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] speed up confusion matrix calculation #465

[Improvement] speed up confusion matrix calculation #465

dreamerlin commented Dec 20, 2020

codecov bot commented Dec 20, 2020 •

edited

Loading

innerlee commented Dec 20, 2020

dreamerlin commented Dec 20, 2020

dreamerlin commented Dec 20, 2020

dreamerlin commented Dec 20, 2020

kennymckormick Dec 21, 2020

dreamerlin Dec 21, 2020

innerlee Dec 21, 2020

kennymckormick Dec 21, 2020

dreamerlin Dec 21, 2020

[Improvement] speed up confusion matrix calculation #465

[Improvement] speed up confusion matrix calculation #465

Conversation

dreamerlin commented Dec 20, 2020

codecov bot commented Dec 20, 2020 • edited Loading

Codecov Report

innerlee commented Dec 20, 2020

dreamerlin commented Dec 20, 2020

dreamerlin commented Dec 20, 2020

dreamerlin commented Dec 20, 2020

kennymckormick Dec 21, 2020

Choose a reason for hiding this comment

dreamerlin Dec 21, 2020

Choose a reason for hiding this comment

innerlee Dec 21, 2020

Choose a reason for hiding this comment

kennymckormick Dec 21, 2020

Choose a reason for hiding this comment

dreamerlin Dec 21, 2020

Choose a reason for hiding this comment

codecov bot commented Dec 20, 2020 •

edited

Loading