From 586c1cbdd8cd69c4d24f4e8b37abb123cb41f8fe Mon Sep 17 00:00:00 2001 From: Darshan Krishnaswamy Date: Wed, 17 Jun 2020 11:51:54 -0400 Subject: [PATCH] Update Broadcasting.md --- Python/Module3_IntroducingNumpy/Broadcasting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Python/Module3_IntroducingNumpy/Broadcasting.md b/Python/Module3_IntroducingNumpy/Broadcasting.md index 0f8c47d1..572c2bd7 100644 --- a/Python/Module3_IntroducingNumpy/Broadcasting.md +++ b/Python/Module3_IntroducingNumpy/Broadcasting.md @@ -564,7 +564,7 @@ def pairwise_dists_crude(x, y): Regrettably, there is a glaring issue with the vectorized computation that we just performed. Consider the largest sized array that is created in the for-loop computation, compared to that of this vectorized computation. The for-loop version need only create a shape-$(M, N)$ array, whereas the vectorized computation creates an intermediate array (i.e. `diffs`) of shape-$(M, N, D)$. This intermediate array is even created in the one-line version of the code. This will create a massive array if $D$ is a large number! -Suppose, for instance, that you are finding the Euclidean between pairs of RGB images that each have a resolution of $32 \times 32$ (in order to see if the images resemble one another). Thus in this scenario, each image is comprised of $D = 32 \times 32 \times 3 = 3072$ numbers ($32^2$ pixels, and each pixel has 3 values: a red, blue, and green-color value). Computing all the distances between a stack of 5000 images with a stack of 100 images would form an intermediate array of shape-$(5000, 100, 3072)$. Even though this large array only exists temporarily, it would have to consume over 6GB of RAM! The for-loop version requires $\frac{1}{3027}$ as much memory (about 2MB). +Suppose, for instance, that you are finding the Euclidean between pairs of RGB images that each have a resolution of $32 \times 32$ (in order to see if the images resemble one another). Thus in this scenario, each image is comprised of $D = 32 \times 32 \times 3 = 3072$ numbers ($32^2$ pixels, and each pixel has 3 values: a red, blue, and green-color value). Computing all the distances between a stack of 5000 images with a stack of 100 images would form an intermediate array of shape-$(5000, 100, 3072)$. Even though this large array only exists temporarily, it would have to consume over 6GB of RAM! The for-loop version requires $\frac{1}{3072}$ as much memory (about 2MB). Is our goose cooked? Are we doomed to pick between either slow for-loops, or a memory-inefficient use of vectorization? No! We can refactor the mathematical form of the Euclidean distance in order to avoid the creation of that bloated intermediate array.