# ADS Lab 1 - Investigating Data Structures

In the lecture we talked about the Python list data structure as well as dictionaries, hash tables and binary search trees.  Here we will be looking at simple operations on (long) lists and investigating how algorithmic choices impact run time.

In order to do this we will be creating and processing lists of random numbers.  We can easily generate random numbers in Python using the *random* library.

In [None]:
import random

startno=1
endno=100

#random.randint(a,b) returns a (pseudo-)random integer between a and b inclusive
a=random.randint(startno,endno)
print(a)

### Exercise 1.
Remember, if you have list *list1* you can add another *item* to the end by calling *list1.append(item)*

Write a function *make_rand_list1(n)* which returns a list of *n* random integers (between 1 and 100).  This function should use the **append** method for lists as described above.

## Keyword Arguments to Functions
There are 2 ways of supplying arguments to functions in Python - by position and by name.

In [None]:
def myfunc(a,b):
    '''
    myfunc() has 2 positional arguments.  
    They have to be supplied each time the function is called.
    They have to be supplied in the correct order
    '''
    
    x=0
    for i in range(100000):
        x+=a*i/b*(i+1) 
    
    return x


In [None]:
myfunc(2,10)

In [None]:
myfunc(10,2)

In [None]:
myfunc(20)

However, sometimes we want to make arguments optional or supply default values.  We can do this using named or keyword arguments

In [None]:
def myfunc2(numerator=10,denominator=2):
    '''
    If no values are supplied at run-time then numerator will be 10 and denominator will be 2
    '''
    
    x=0
    for i in range(100000):
        x+=numerator*i/denominator*(i+1) 
    
    return x

In [None]:
myfunc2()

In [None]:
myfunc2(denominator=10,numerator=2) #we can give named arguments in any order provided we use the names

In [None]:
myfunc2(30,2) #no names given so default ordering is assumed

In [None]:
myfunc2(30)

### Exercise 2
Modify your code in exercise 1 so that it takes two optional keyword arguments that define the start and end of the range which the random numbers are selected from.

### Exercise 3
You may have thought of ways of creating this list which do not use the append method.  For example, you could
* create a list containing a new random number and use list concatenation: [r]+list1
* create a list containing a new random number and use list concatenation: list1 + [r]
* use a list comprehension together with the *range()* function

Can you create Python functions for each of these algorithms (and any others you can think of)?

Which of your functions do you think is the best?  Which do you think will run the fastest?

## Timing Code
We can time how long it takes code to run on the kernel using the *time* library

In [None]:
import time

starttime=time.time() #start the stopwatch
'''
now put the code you want to time
'''
x=0
for i in range(10000):
    x+=2**i
'''
code being timed completed
'''
endtime=time.time() #stop the stopwatch
timetaken=endtime-starttime
print("Time taken was {}s".format(timetaken)) #.format() can be called on a String for nice formatting (alternative to string concatenation)

If you repeatedly run the cell above, you will notice that the times vary (a lot).  This is mainly due to the fact that your computer is doing other things at the same time.   Further, results will vary greatly from one machine to another.  However, by performing a large number of repetitions and taking an average, we can get an idea of how fast a snippet of code will run on the current machine.

Below, there is code for a timeit() function.  This is a higher level function which takes as its first argument a function *somefunc*. This function may have its own positional arguments (\*args) and its own keyword arguments (\*\*kwargs).  These must be passed to *somefunc* when it is called from within timeit() 

In [None]:
import numpy as np

def timeit(somefunc,*args,repeats=100,**kwargs):
    times=[]
    for i in range(repeats):
        starttime=time.time()
        ans=somefunc(*args,**kwargs)
        endtime=time.time()
        timetaken=endtime-starttime
        times.append(timetaken)
    
    mean=np.mean(times)
    stdev=np.std(times)
 
    return (mean,stdev)

In [None]:
timeit(myfunc,2,10)

One can use the *timeit* function above to consider how the amount of time taken relates to the (size of the) input 

### Exercise 4
Write some code to time the running of make_rand_list1(n) where n is a multiple of 10 between 0 and 1000.  Store the values of n in a list called *xs* and the mean times (returned by *timeit()*) in a list called *ys*.
(If you are on a super-fast computer, you can add one or two 0s above to make it take a bit longer.)

## Plotting the Results
For a really simple scatterplot of the results, you can use the following code

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.scatter(xs,ys,label='make_rand_list1')
plt.ylim(0,0.003)  #you will need to play around with the upper limit depending on your computer
plt.legend(loc='upper left')
plt.xlabel('Length of list')
plt.ylabel('Time (s)')
plt.title('Average length of time to generate lists of different lengths')

### Exercise 5
Can you time all of your different functions for making a list of random integers (for varying values of n) and plot the results on the same graph?  What can you conclude about the different algorithms?

## Extension

- How big are the standard deviations of run-time over repeats? Can you repeat Exercise 5, but this time plotting also the standard errors (the standard error is, roughly, the expected error in the estimate of the mean time taken, and is given by the standard deviation divided by the square root of the number of repeats). You can use *plt.errorbar*. 


- Does it make a difference if you do all the repeats for one list length one after the other, or if you do all list lengths once, and then all list lengths again, and again, for the given number of repeats?


- You can sort any of your lists using the sorted() function.  Investigate how long it takes to find a number in a sorted() list by

    1. Checking each item in turn.
    2. Using a binary search strategy.  In other words, start in the middle of the list, compare the current item with your target item and move to the left or right accordingly.  Take the middle of that sub-list and so on until the item is found.

To make this more realistic, make the range that your random numbers are selected from much larger than your longest lists (in order to reduce the number of repeated items in the lists). 