# Timing with time.time
the big difference between explicit loops and vectorization
See more about vectorization in https://stackoverflow.com/questions/35091979/why-is-vectorization-faster-in-general-than-loops.  Vectorization uses special hardware. Unlike a multicore CPU, for which each of the parallel processing units is a fully functional CPU core, vector processing units can perform only simple operations, and all the units perform the same operation at the same time, operating on a sequence of data values (a vector) simultaneously. This is not same as the traditional multi-threading.  
* Avoid explicit loops as much as possible and favor vectorization. 
* Avoid using list, DataFrames to calculate and favor numpy arrays.

In the following example, the same calculation with vectorization can be 500 times faster. Vectorization can be much faster in either CPU or GPU. In numpy, if v is vector, then directly use np.exp(v), np.log(v), np.abs(v)... will be much faster than the for loop version. 


In [4]:
import time
import numpy as np

a = np.random.rand(1000000)
b = np.random.rand(1000000)

tic = time.time()
c = np.dot(a,b)
toc = time.time()
print(c)
print("Vectorization version: "+str(1000*(toc-tic))+"ms")

c = 0
tic = time.time()
for i in range(1000000):
    c += a[i]*b[i]
toc = time.time()
print(c)
print("For loop version: "+str(1000*(toc-tic))+"ms")

249974.5258138627
Vectorization version: 1.0030269622802734ms
249974.52581385977
For loop version: 571.9444751739502ms


In [22]:
import time
import numpy as np

a = np.random.rand(1000000,1)
b = np.random.rand(1000000,1)
ra = np.transpose(a)
print(a.shape, ra.shape)

tic = time.time()
c = np.dot(np.transpose(a),b) 
#This inner-dot is consistent with the mathematical defintion.
#However, np.dot(a,np.transpose(b)) will give a matrix (outer product, even though it is called dot)

toc = time.time()
print(c)
print("Vectorization version: "+str(1000*(toc-tic))+"ms")
#The performance here has NOT been affected after reshaping a and b to be column vectors. 

d = 0
tic = time.time()
for i in range(1000000):
    d += a[i]*b[i]
toc = time.time()
print(d)
print("For loop version: "+str(1000*(toc-tic))+"ms")
# The performance get three times slower after reshaping a and be to be column vectors. 

(1000000, 1) (1, 1000000)
[[249982.58653862]]
Vectorization version: 0.0ms
[249982.58653862]
For loop version: 1857.5315475463867ms


# Timing with timeit.timeit 
Python has a built-in timing module to timing your code. This module provides a simple way to time small bits of Python code. It has **both a Command-Line Interface as well as a callable one**. It avoids a number of common traps for measuring execution times. 

In [11]:
import timeit

Let's use timeit to time various methods of creating the string '0-1-2-3-.....-99'

In [12]:
# For loop
timeit.timeit('"-".join(str(n) for n in range(100))', number=10000)

0.24557954196961873

In [13]:
# List comprehension
timeit.timeit('"-".join([str(n) for n in range(100)])', number=10000)

0.21757246250376738

In [17]:
# Map()
timeit.timeit('"-".join(map(str, range(100)))', number=10000)

0.17361728926744036

We see a significant time difference by using map()! This is good to know and we should keep this in mind.

Now let's introduce iPython's magic function **%timeit**<br>, which is specific to jupyter notebooks!*

iPython's %timeit will perform the same lines of code a certain number of times (loops) and will give you the fastest performance time (best of 3).

Now repeat the above examinations using iPython magic!

In [8]:
%timeit "-".join(str(n) for n in range(100))

23.9 µs ± 75 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [9]:
%timeit "-".join([str(n) for n in range(100)])

20.7 µs ± 86.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [10]:
%timeit "-".join(map(str, range(100)))

16.2 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


It's important to note that iPython will limit the amount of *real time* it will spend on its timeit procedure. For instance if running 100000 loops took 10 minutes, iPython would automatically reduce the number of loops to something more reasonable like 100 or 1000.

Now we can we can easily time lines of code both in and out of iPython. Check out the documentation for more information:
https://docs.python.org/3/library/timeit.html

# Time.time vs timeit.timeit
https://stackoverflow.com/questions/17579357/time-time-vs-timeit-timeit
timeit is more accurate, for three reasons:

- it repeats the tests many times to eliminate the influence of other tasks on your machine, such as disk flushing and OS scheduling.  
- it disables the garbage collector to prevent that process from skewing the results by scheduling a collection run at an inopportune moment.  
- it picks the most accurate timer for your OS, time.time or time.clock in Python 2 and time.perf_counter() on Python 3. See timeit.default_timer.  