Simply put, an **algorithm** is a program or function that solves some specific problem. For example, a sorting algorithm is an algorithm that, given a list of values, outputs that same list of values but rearranges them in increasing (or decreasing) order.

Ultimately, our goal is not to measure the time of a specific execution of an algorithm, but rather to analyze the algorithm and predict how the execution time will evolve as data grows larger.

Intuitively, the more data an algorithm needs to process, the more time it will take to run. What we are interested in is building a model that tells us by how much the execution time grows as we increase the amount of data. We call these models the **time complexity of an algorithm**. By analyzing the time complexity of an algorithm, we want to be able to answer questions like:

*If we double the data, do we double the execution time, do we quadruple it, or something else entirely?*



In [1]:
import time

print(time.time())  the total number of seconds that have passed from January 1, 1970, until now

In [2]:
print(time.time())

1622674591.2581313


random.randint() function that, given two integers a and b, outputs a random number between a and b (inclusive)

In [3]:
import random
print(random.randint(1, 10))
print(random.randint(1, 10))
print(random.randint(1, 10))
print(random.randint(1, 10))

9
6
5
9


Using list comprehensions, we can use the random.randint() function to generate a random list of length 500 with values, say, from -1,000 to 1,000, as follows:

In [4]:
values = [random.randint(-1000, 1000) for _ in range(500)]

Notice that we used the _ notation in the above for loop. This is a notation that can be used when we do not use the iteration variable. It gives the exact same result that we would get using some variable name, but avoids having to find a name for something that we will not use.

In [5]:
import time
import random

In [6]:
def maximum(values):
    answer = None
    for value in values:
        if answer == None or answer < value:
            answer = value
    return answer

In [7]:
def gen_input(length):
    return [random.randint(-1000, 1000) for _ in range(length)]

In [8]:
# add your code below
times = []
for length in range(1,501):
    values = gen_input(length)
    start = time.time()
    maximum(values)
    end  = time.time()
    runtime = end - start
    times.append(runtime)
    
print(times)

[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0009968280792236328, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0005252361297607422, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0009970664978027344, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0009980201721191406, 0.

On the previous screen, we evaluated the execution time of the maximum() function on input lists ranging from length 1 to length 500. The printed results are hard to read, so let's plot these times for visualizing how they are growing:

<Img src = "https://github.com/rhnyewale/Data-Engineering/blob/main/Images/tc_1.JPG?raw=true">
    
We can see some spikes on the execution times at some points. These are due to external factors such as CPU load, memory management, among others. However, despite the lack of accuracy of these measures, we can see a line forming underneath, which roughly describes how the execution time is increasing. This is shown in red in the following plot:
    
<Img src = "https://github.com/rhnyewale/Data-Engineering/blob/main/Images/tc_2.JPG?raw=true">

We can see that as the data increases, so does the execution time. This is not surprising since there is more data to process. However, this tells us more. It gives us an insight on the rate at which it is increasing. The red line is a straight line, which means that the time is growing somewhat proportionally with the data.

This is good news because it means that the execution time grows at the same rate as the data. Doubling the amount of data will double the amount of time needed to process it.

The purpose of this mission is to learn how to look at an algorithm and derive a mathematical expression for the red line. As mentioned before, we call such an expression the **time complexity of the algorithm**:
    
<Img src = "https://github.com/rhnyewale/Data-Engineering/blob/main/Images/tc_3.JPG?raw=true">
    
With the time complexity model, we are able to plug in a list length and get an idea of the execution time of the algorithm for that input length without needing to actually run the code. In general, the time complexity can have several behaviors; it does not always grow as a straight line. We will learn several types of growth throughout this course. By the end of this course, you'll be able to analyze an algorithm and provide a time complexity model:
    
<Img src = "https://github.com/rhnyewale/Data-Engineering/blob/main/Images/tc_4.JPG?raw=true">
    

In [10]:
def sum_values(values):
    total = 0            # c1, 1 time
    for value in values: # c2, N times
        total += value   # c3, N times
    return total         # c4, 1 time

Now we multiply the execution time of each line by the number of times it is executed, and add those together:

c1 + c2 X N + c3 X N + c4

(c2+c3) X N + (c1 + c4)

We can simplify a bit further by renaming c2 + c3 as another constant — let's say a — and c1 + c4 as b. By doing so, we obtain a cleaner expression for the execution time in terms of the size of the input (N):

a × N + b

The following figure shows the plot of aN + b for the different a, b value pairs:


<Img src ="https://github.com/rhnyewale/Data-Engineering/blob/main/Images/tc_5.JPG?raw=true">

As you can see, regardless of the values of a and b, the function aN + b is a straight line. We call an algorithm whose time complexity is a straight line a linear time algorithm. These algorithms have the property that the execution times grows proportionally to the data:

    
    
