# Benchamarks
Let's prove what is real and what is not ..

## 1) Python vs Python - For loops vs Map/Lambda vs Comprehension lists

### Test 1: Create a short list

In [94]:
n = 10000

In [95]:
%%timeit
r = []
for x in range(n):
    r.append(x)

699 µs ± 7.48 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [96]:
%%timeit
# Map
list(map(hex, range(n)))

885 µs ± 16.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [97]:
%%timeit
# Comprehension list
[hex(x) for x in range(n)]

1.35 ms ± 29.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [37]:
%%timeit
# Comprehension Generator 
(hex(x) for x in range(n))

567 ns ± 4.03 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


Clear winner is genearator, but don't get exited, generator is just creating the pointer is not creating the list. So, for now is not going to be taken into account.
(for loop, map loop, CL)

**Winner: For**
* n=10    -> For (0.974 us, 1.3 us, 1.67 us)
* n=1000  -> For (69 us, 85 us, 130 us)
* n=10000 -> For (700us, 853 us, 1270 us)

### EC2 (C5 RAM:144 GB)
**Winner: For**
* n=10    -> For (0.63 us, 0.73 us, 0.97 us)
* n=1000  -> For (48 us, 54 us, 71 us)
* n=10000 -> For (491us, 614 us, 837 us)

### Test 2: Simple transformation

In [68]:
n = 10000

In [69]:
%%timeit
# For cycle
result = []
for x in range(n):
    result.append(x+2)    

962 µs ± 7.78 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [70]:
%%timeit
# Map Lambda
list(map(lambda x: x+2, range(n)))

1.03 ms ± 6.62 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [71]:
%%timeit
# Comprehension list
[x+2 for x in range(n)]

583 µs ± 5.69 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


(for loop, map loop, CL)

**Winner: CL** (by far !)
* n=10    -> CL (1.13 us, 1.4 us, 0.842 us)
* n=1000  -> CL (95.5 us, 98.3 us, 55.2 us)   
* n=10000 -> CL (962us, 1.03 ms, 0.58 ms)

### EC2 (C5 RAM:144 GB)

**Winner: CL** (by far !)
* n=10    -> CL (0.796 us, 1.08 us, 0.568 us)
* n=1000  -> CL (65 us, 77 us, 40 us)   
* n=10000 -> CL (667us, 806us, 418us)
    
Notice that the for loop is **in this case** better than the map/lambda loop. 

### Test 3: cuadratic function

In [128]:
n = 1000
def fun(x):
    return -5*x**2+30*x+100

In [129]:
%%timeit
# For cycle
result = []
for x in range(n):
    result.append(-5*x**2+30*x+100)

452 µs ± 3.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [130]:
%%timeit
# Map Lambda
list(map(lambda x: -5*x**2+30*x+100, range(n)))

461 µs ± 6.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [131]:
%%timeit
# Comprehension list
[-5*x**2+30*x+100 for x in range(n)]

416 µs ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


*The winner here is CL* (~8%)

OBS: All the numbers improved (faster) putting the function explicity instead of using `fun(x)`

### Apendix
#### Resources

In [134]:
# Memory (!free -h) -> 17,179,869,184
!sysctl -a |grep memsize

hw.memsize: 17179869184


In [135]:
!sysctl -a |grep ncpu

hw.ncpu: 8


#### Versions and used libraries

In [2]:
!pyspark --version

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.0
      /_/
                        
Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 13.0.1
Branch HEAD
Compiled by user ubuntu on 2020-06-06T11:32:25Z
Revision 3fdfce3120f307147244e5eaf46d61419a723d50
Url https://gitbox.apache.org/repos/asf/spark.git
Type --help for more information.


In [5]:
!python3 -V; java -version

Python 3.7.5
openjdk version "13.0.1" 2019-10-15
OpenJDK Runtime Environment (build 13.0.1+9)
OpenJDK 64-Bit Server VM (build 13.0.1+9, mixed mode, sharing)


In [6]:
!pip list

Package            Version
------------------ -------
appnope            0.1.0  
attrs              19.3.0 
backcall           0.2.0  
bleach             3.1.5  
decorator          4.4.2  
defusedxml         0.6.0  
entrypoints        0.3    
findspark          1.4.2  
importlib-metadata 1.7.0  
ipykernel          5.3.0  
ipython            7.16.1 
ipython-genutils   0.2.0  
ipywidgets         7.5.1  
jedi               0.17.1 
Jinja2             2.11.2 
jsonschema         3.2.0  
jupyter            1.0.0  
jupyter-client     6.1.5  
jupyter-console    6.1.0  
jupyter-core       4.6.3  
MarkupSafe         1.1.1  
mistune            0.8.4  
nbconvert          5.6.1  
nbformat           5.0.7  
notebook           6.0.3  
packaging          20.4   
pandocfilters      1.4.2  
parso              0.7.0  
pexpect            4.8.0  
pickleshare        0.7.5  
pip                20.0.2 
prometheus-client  0.8.0  
prompt-toolkit     3.0.5  
ptyprocess         0.6.0  
py4j               0.10.9 
P