## Python Speed Optimisation - quick test

### Some random experimentation with numpy arrays vs loops


Goal: To sum up the elements in a data-structure, using multiple different methods, to see which methods and data-structures are faster.

The 3 data-structures I used were: 
    - Lists 
    - Tuples
    - Arrays (numpy)

The different method of summing up were: 
    - Summing up with each data-structure's in-built sum() function
        i.   For lists, this is:  sum(list)
        ii.  For tuples, this is: sum(tuple)
        iii. For arrays, this is: array.sum()
    - Summing up by manually looping through each element of the data-structure.




### Summary of Results:

I haven't presented the data carefully or neatly yet, 
    and my code is a bit rough and scrappy, 
    and I should try averaging multiple runs to get more reliable results, 
    but it seems that the ranking of fastest to slowest is:


1. Numpy's built in sum() function is the fastest  (~   25 ms)
2. Python's sum(tuple) function is the next fastest(~  100 ms)
3. Python's sum(list) function is the next fastest (~  145 ms)
4. Looping through a list is the 3rd fastest       (~  500 ms)
5. Looping through a tuple is similar to a list    (~  500 ms)
6. Looping through np.array is the slowest         (~ 2800 ms)


### Things I should probably check out later...:

- What about summing up integers? I only summed floats in these tests.
- What about pandas dataframes? 
- I understand why numpy arrays are faster than lists when using each of their respective sum functions (vectorisation of code), but why is numpy slower when using a standard loop?
- Why is sum(tuple) faster than sum(list), but looping through a tuple is the same speed as looping through a list?
- What happens when there are non-int or non-float values? 
- How do the times all scale as you change the size of the list/tuple/array?
- Cython, Numba, GPU programming



In [113]:
import numpy as np
import pandas as pd

# first test array = a numpy array
test_array1 = np.linspace(0.1, 10000.0, 10000000)

# second test array = a list. A list in Python is kind of like an array of pointers...
test_array2 = [0.1*a for a in range(1, 10000001)]

# 3rd test array = a tuple.
test_array3 = tuple(test_array2)

# to test what the array looks like
print(test_array1)

# just to check what the list looks like
print(test_array2[0])
print(test_array2[-1])


# to check what the tuple looks like
print(test_array3[0])
print(test_array3[4])
print(test_array3[-1])

[  1.00000000e-01   1.00999990e-01   1.01999980e-01 ...,   9.99999800e+03
   9.99999900e+03   1.00000000e+04]
0.1
1000000.0
0.1
0.5
1000000.0


In [51]:
def method1(input_array):
    """
    Loops through the array and sums up all the values
    """
    sum1 = 0
    for i in input_array:
        sum1 += i
    return sum1


def method2(input_array):
    """ 
    Use Numpy's own sum() function
    """
    sum2 = input_array.sum()
    return sum2


50000500000.0
Wall time: 3.38 s


In [49]:
%%time
sum1 = test_array3.sum()
print(sum1)

50000500000.0
Wall time: 23 ms


In [96]:
%%time
a = method1(test_array2)
print(a)

5.0000005e+12
Wall time: 504 ms


In [100]:
%%time
b = method1(test_array1)
print(b)

50000500000.0
Wall time: 2.82 s


In [123]:
%%time

c = test_array1.sum()
print(c)

50000500000.0
Wall time: 22 ms


In [92]:
%%time
d = sum(test_array2)
print(d)

5.0000005e+12
Wall time: 149 ms


In [112]:
%%time
e = sum(test_array3)
print(e)

5.0000005e+12
Wall time: 116 ms


In [119]:
%%time
f = method1(test_array3)
print(f)

5.0000005e+12
Wall time: 491 ms
