# Fundamentals of Computer Science 30398 - Lecture 10

### QuickSort

We will start with an implementation of a QuickSort - a solution to the homework problem from Lecture 9. See the notes from the previous lecture for the description of the algorithm

In [100]:
import random

The function `split` is almost identical to the one we wrote as an exam problem. Given a list `arr` and an element `pivot` we want to return three lists: one, containing all the elements smaller than `pivot`, one containing all the elements equal to `pivot`, and the third one containing all the elements larger than `pivot`.

In [102]:
def split(arr, pivot):
    smaller, equal, larger = [], [], []
    for x in arr:
        if x < pivot:
            smaller.append(x)
        elif x > pivot:
            larger.append(x)
        else:
            equal.append(x)
    return smaller, equal, larger

In [103]:
split([1,4,7,2,3,1,5,9,3], 3)

([1, 2, 1], [3, 3], [4, 7, 5, 9])

The function `quick_sort`, given an list `arr` as an argument, picks a random element from `arr` as a pivot, calls function `split` to get all elements smaller, equal, and larger than pivot, recursively sorts the lists with smaller and larger elements, finally - concatenates all three lists.

In [106]:
def quick_sort(arr):
    if len(arr) == 0:
        return []
    rand_ind = random.randint(0, len(arr)-1)
    pivot = arr[rand_ind]
    smaller, equal, larger = split(arr, pivot)
    return quick_sort(smaller) + equal + quick_sort(larger)

In [109]:
quick_sort([1,2,7,3,4, 3, 11])

[1, 2, 3, 3, 4, 7, 11]

### A bit more abstraction - non-default comparison.
What if we want to sort the array in a decreasing order? It would be easy to go back to the code of a quick_sort function, and make it have another (ideally: optional) argument `decreasing`, by default `False`, and modify its behavior slightly when `decreasing` is  `True` (for example, by having `return quick_sort(larger, decreasing = True) + equal + quick_sort(larger, decreasing = True)` in this case).

In many situations this is good enough solution, often though, more abstract and reusable solution will be desirable. What if elements in the list were pairs of names and scores --- which we might want to sort with respect to names (in increasing, or decreasing order), or with respect to scores (again, potentially in increasing and decreasing order).

One way to avoid code duplication, and increasing complexity of the quick sort function, is to instead write `quick sort` that takes asn an argument not just the list, but also a function that is supposed to serve as a comparison between two elements. This function (which we will call `compare`) should take two arguments `x` and `y`, and return `True` if `x` should come before `y` in the final list. For example:

In [107]:
def less(x,y):
    return x < y

In [17]:
def greater(x,y):
    return x > y

We would like now to call function quick sort as follows:
```python
quick_sort([1,2,7,3,4,11], less) == [1, 2, 3, 4, 7, 11]
quick_sort([1,2,7,3,4,11], greater) == [11, 7, 4, 3, 2, 1]
```

This modification is actually easy to implement. The only place where we are comparing two elements of the list happens in the function `arr`. We should just replace all appearences of the `<` and `>` operator by appropriate calls tu function `compare` passed to `split` as an argument.

In [116]:
def split(arr, pivot, compare):
    smaller, equal, larger = [], [], []
    for x in arr:
        if compare(x, pivot):
            smaller.append(x)
        elif compare(pivot, x):
            larger.append(x)
        else:
            equal.append(x)
    return smaller, equal, larger

Finally - the modification to the function `quick_sort` itself is minimal. It now takes two arguments, `arr` and `compare`, and passes those two to the `split` function that will make use of the `compare`.

In [117]:
def quick_sort(arr, compare):
    if len(arr) == 0:
        return []
    rand_ind = random.randint(0, len(arr)-1)
    pivot = arr[rand_ind]
    smaller, equal, larger = split(arr, pivot, compare)
    return quick_sort(smaller, compare) + equal + quick_sort(larger, compare)

In [118]:
quick_sort([1,2,7,3,4,11], greater)

[11, 7, 4, 3, 2, 1]

In [119]:
quick_sort([1,2,7,3,4, 11], less)

[1, 2, 3, 4, 7, 11]

In [120]:
quick_sort([1,2,7,3,4, 11], lambda x, y : x > y)

[11, 7, 4, 3, 2, 1]

In [121]:
quick_sort([1,2,7,3,4, 11], compare=greater)

[11, 7, 4, 3, 2, 1]

Just for to make the function easier to use, we should supply the default value for the argument `compare` --- if we just want to sort the integers in a standard, increasing order, using default operator `<`, we shouldn't have to worry too much:

In [124]:
def quick_sort(arr, compare = None):
    if compare == None:
        compare = lambda x, y: x < y
    if len(arr) == 0:
        return []
    rand_ind = random.randint(0, len(arr)-1)
    pivot = arr[rand_ind]
    smaller, equal, larger = split(arr, pivot, compare)
    return quick_sort(smaller, compare) + equal + quick_sort(larger, compare)

In [126]:
quick_sort([1,2,7,3,4, 11])

[1, 2, 3, 4, 7, 11]

In [128]:
quick_sort([1,2,7,3,4, 11], compare=greater)

[11, 7, 4, 3, 2, 1]

In [129]:
results = [("John", 11), ("James", 8), ("Daisy", 20)]

In [130]:
quick_sort(results, lambda x, y: x[0] < y[0])

[('Daisy', 20), ('James', 8), ('John', 11)]

In [131]:
quick_sort(results, lambda x, y: x[1] > y[1])

[('Daisy', 20), ('John', 11), ('James', 8)]

## Classes (structures)

Imagine a scenario when you want to write a program that calculates grades for all students who have taken a specific exam. Each student has a name, and a score for each of the three tasks. 

With what we have learned so far, one design of such a program, would be to keep a list of all students data -- each record corresponding to a student, would be a tuple, first element of each is the name of the student, second element is the score for task `split`, third element is the score for task `longest_increasing_subarray` and so on. For instance, the entire dataset could look like this.

In [27]:
student_lists = [ ("Tim", 10, 10, 5),
                  ("Jim", 10, 10, 2),
                  ("Kim", 10, 10, 9)]

We can now write a very simple function (with trivial control flow) to calculate a grade of student given as an arguement. The student, again is represented as a tuple, and the scores for specific problems have indices $1, 2$ and $3$ in this tuple. The following would be a perfectly valid solution.

In [132]:
def calculate_grade(student):
    MAX_SCORE = 30
    MAX_GRADE = 7.5
    total_score = student[1] + student[2] + student[3]
    exact_grade = (total_score / MAX_SCORE) * MAX_GRADE
    rounded_grade = int(exact_grade*10)/10
    return rounded_grade

In [133]:
for student in student_lists:
    print(student[0], "got", calculate_grade(student))

Tim got 6.2
Jim got 5.5
Kim got 7.2


The above design, despite providing correct answers so far, is incredibly cumbersome. Keeping the data of each student as a tuple, which is by definition unstructured, is going to be difficult to manage. Throughout the entire code base, you need to keep in mind what is the meaning of the element at index $0$ in the student-tuple (their name), what about element at index $1$ --- this surely correspond to a score for one of the problems, but which one?

As it turns out, Python lets us to define our own **classes** which are essentially user-defined types (similarly to the type `list`, `tuple` or `string` we are already used to).

The syntax for creating a new class is:

In [134]:
class Student:
    pass

A value of type `Student` is often also called an **object** of class `Student`.

Almost always we will also want to create an **initialization method** for any newly class as well: a function that will be called to initialize the value of an object of this class right after it has been created. To the first approximation you should think of an object as a box, that can contain several (named) variables (i.e. **fields**).

To define an initialization method, we use the following syntax:

In [138]:
class Student:
    def __init__(self):
        print("Created New Student")

In [140]:
a = Student()

Created New Student


In [141]:
type(a)

__main__.Student

So far this is rather useless class, usually the initialization method will take several additional arguments, and initialize the values if appropriate fields in a newly created object.

In [142]:
class Student:
    def __init__(self, name, split_score, longest_score, find_score):
        self.name = name
        self.split_score = split_score
        self.longest_score = longest_score
        self.find_score = find_score

In [143]:
a = Student("Tim", 10, 10, 2)

In [144]:
type(a)

__main__.Student

We can now access the fields of an object of class `Student` referred to with variable `a`, as follows:

In [147]:
a.name

'Tim'

In [148]:
a.split_score

10

In [149]:
a.longest_score

10

We can modify those as well:

In [150]:
a.split_score=9

In [152]:
a.split_score

9

And create another object of type `Student`, with different values for the fields.

In [153]:
b = Student("Kim", 10, 10, 8)

In [154]:
b.name

'Kim'

In [155]:
b.split_score

10

### Warning
All objects of user-defined class in Python by default are mutable. This means that they come with usual pitfall we learned about, when discussing lists:

Variables should be thought of as **referring** to specific objects. In particular we might have two different variables refering to the same object of our own class.

In [156]:
a = Student("Kim", 10, 10, 8)

In [157]:
b = a

In this case, if a change the value of a field `name` for the object referred to by variable `a`,

In [159]:
a.name = "John"

The `name` of a student referred to by variable `b` will also have changed - this is the same object after all.

In [160]:
b.name

'John'

## Back to the grading example.

We can now rewrite the `student_list` from the beginning of this section to contain values of the newly created type `Student` instead of  the unstructured tuples.:

In [164]:
students_list = [ Student("Tim", split_score=10, longest_score=10, find_score=5),
                  Student("Jim", split_score=10, longest_score=10, find_score=2),
                  Student("Kim", split_score=10, longest_score=10, find_score=9)]

For example, to access the name and split score of the first student in the list, we can use:

In [167]:
print("Name: ", students_list[0].name, "; Split score: ", students_list[0].split_score)

Name:  Tim ; Split score:  10


The `calculate_grade` function we wrote can also be rewritten using this structure:

In [169]:
def calculate_grade(student):
    MAX_SCORE = 30
    MAX_GRADE = 7.5
    total_score = student.split_score + student.longest_score + student.find_score
    exact_grade = (total_score / MAX_SCORE) * MAX_GRADE
    rounded_grade = int(exact_grade*10)/10
    return rounded_grade

In [170]:
for student in students_list:
    print(student.name, "got", calculate_grade(student))

Tim got 6.2
Jim got 5.5
Kim got 7.2


This code is already significantly better than the previous one. `student.split_score` and `student.name` are self-describing, and are used in place of `student[1]` and `student[0]` from the previous code. As the complexity and the number of structures used in the code grows, this difference becomes even more stark.

### Methods

This usage of classes, to provide additional structure to your data - essentially treating objects of a class as containers with several fields, each having its own intended meaning, resembles notion of **structures** from **C** and other early programming languages.

The notion of classes and objects provides another piece of functionality beyond what we just discussed: we can associate specific behavior with a class.

Concretely, within the class definition we can define functions (so-called **methods** of the class). Those functions will always take at least one argument, conventionally called **self**, which will be an object of the class we are just defining - and will be describing some functionality that's particular to the values of this class.

For example, in the code above, we defined the function `calculate_grade` as a global function - but it can be thought of as a concrete functionality associated with the data type `Student`. Alternatively, we could have defined `calculate_grade` as a method of the class `Student` inside the definition, as below

In [171]:
class Student:
    def __init__(self, name, split_score, longest_score, find_score):
        self.name = name
        self.split_score = split_score
        self.longest_score = longest_score
        self.find_score = find_score

    def calculate_grade(self):
        MAX_SCORE = 30
        MAX_GRADE = 7.5
        total_score = self.split_score + self.longest_score + self.find_score
        exact_grade = (total_score / MAX_SCORE) * MAX_GRADE
        rounded_grade = int(exact_grade*10)/10
        return rounded_grade

In [172]:
a = Student("Kim", 10, 10, 2)

Now, to call the method `calculate_grade` for a specific object of type `Student`, we could potentially do it in the same way we called functions before - this is just some function that takes a single argument, assuming that it is of type `Student`. We can call it and pass the appropriate argument in the parentheses.

In [177]:
Student.calculate_grade(a)

5.5

The above is actually heavily **non-idiomatic** way of doing it in Python, and will be very confusing for anyone reading the code. The standard way is, instead, to use the following syntax:

In [176]:
a.calculate_grade()

5.5

Note that did not specify the parameter that will be passed as `self` in the parentheses, as usual. Instead, we use the `object.method()` syntax to call the method with a name `method` of the appropriate class, and pass it as the first argument `object`.

This construction, in fact, should already be very familiar to you. Indeed, we have called the method `append` of some objects of types `list` several times before. The code above is in its essence similar to more familiar:

In [182]:
? list.append

[31mSignature:[39m  list.append(self, object, /)
[31mDocstring:[39m Append object to the end of the list.
[31mType:[39m      method_descriptor

In [179]:
a = list()
a.append(6)

In [180]:
a

[6]

The first line creates a new object of a class `list`. In the next line, we call the method `append` of the class `list`, on the object `a` (passed as `self` to the method), and additional argument `6` (passed as a second argument to the method `append`).

We are ready now to rewrite the code again from the beginning of this chapter, using the `calculate_grade` method of a class `Student`, instead of the global function `calculate_grade`.

In [183]:
student_list = [ Student("Kim", 10, 10, 2),
                 Student("Jim", 10, 10, 5),
                 Student("Tim", 10, 5, 0)]

In [189]:
for x in student_list:
    print(x.name, "score is", x.calculate_grade())

Kim score is 6.5
Jim score is 6.2
Tim score is 3.7


In [190]:
student_list[0].find_score = 6

In [191]:
for x in student_list:
    print(x.name, "score is", x.calculate_grade())

Kim score is 6.5
Jim score is 6.2
Tim score is 3.7


## Special method `__repr__`

One annoying issue with the class `Student` we defined is that, when working with values of this class in the interactive environment, attempting to print those values ends up providing rather unhelpful result:

In [192]:
student_list[0]

<__main__.Student at 0x10a0ba710>

We can easily see from this representation that the first element of the list `student_list` is an object of the class `Student` but not much beyond this. This is much worse than if we used simple, unstructured tuples:

In [193]:
a = ("John", 1,5,3)

In [194]:
a

('John', 1, 5, 3)

As it turns out, we can improve readability of the values of the type we care about in the interactive environment (for example Jupyter Notebook, or in the debugger), by defininig a special metod `__repr__` of the class we create. The method should take a single argument `self` - as usual, an object of the newly defined class - and should return a String - specifically, a string representation of an object in question that should be printed. Here is an example:

In [199]:
class Student:
    def __init__(self, name, split_score, longest_score, find_score):
        self.name = name
        self.split_score = split_score
        self.longest_score = longest_score
        self.find_score = find_score

    def calculate_grade(self):
        MAX_SCORE = 30
        MAX_GRADE = 7.5
        total_score = self.split_score + self.longest_score + self.find_score
        exact_grade = (total_score / MAX_SCORE) * MAX_GRADE
        rounded_grade = int(exact_grade*10)/10
        return rounded_grade

    def __repr__(self):
        return "Student(name="+self.name+" scores:" + str(self.split_score) +"," \
            + str(self.longest_score) + "," + str(self.find_score) +")"

In [200]:
a = Student("Kim", 10, 10, 2)

In [201]:
a

Student(name=Kim scores:10,10,2)

This is much more convenient to work with!

## Another example of a user defined class - polynomials

A slightly different scenario where defining a separate class is convinient appears when we want to group together a specific sets of behaviors (functions) operating on a concrete data - even when the data by itself is not particularly complicated/structured.

A good example of this would be a class `Polynomial`, objects of which are polynomials. Each polynomial is an expression of a form
$$
a_0 + a_1 t + a_2 t^2 + \ldots + a_n t^n,
$$
for some coefficients $a_0, \ldots a_n$, and is associated with a function (in a mathematical sense) $f(t) := a_0 + a_1 t + \ldots + a_n t^n$.

The data associated with a polynomial is just a list of its coefficients - it is not particularly complex by itself, and one could imagine just storing all polynomials appearing in a program as such. It might be useful thought, to distinguish the lists that correspond to polynomials, from all other lists appearing in the program: more likely than not, the things we would want to do with polynomials are very different than what we want to do with other lists.

For instance, it very rarely make sense to suddenly sort all the coefficient of the polynomial; on the other hand: plugging a specific number as $t$ -- i.e. evaluating a given polynomial at an argument $t$ makes sense if the list we have in hand actually represents coefficients of a polynomial, and does not make much sense for other lists --- even if those happen to have only integer values (for examples: scores of a given student on various exam problem).

We could start writing a class for polynomials, for example, as follows:

In [205]:
class Polynomial:
    def __init__(self, coefficients):
        self.coeff = coefficients

    # compute self.coeff[0] + self.coeff[1] * t + self.coeff[2] * (t ** 2) + ... + self.coeff[n] * (t**n)
    def evaluate(self, t):
        result = 0
        for i in range(len(self.coeff)):
            result += self.coeff[i] * (t ** i)
        return result

The objects of this class are relatively simple, they will contain only a single field `coefficients` which is just a list.

Let us create a representation of a polynomial $2 t^2 + 1$.

In [206]:
p = Polynomial([1, 0, 2])

We can evaluate the function associated with this polynomial at $t := 5$. We should get $2 \cdot 5^2 + 1$

In [210]:
p.evaluate(5)

51

Another, slightly faster, way of writing a function `evaluate` that avoids using the Python exponentiation operator `**` is provided below. We do not really need to re-compute the powers $t^i$ from scratch in the loop: in the last iteration we have access to $t^{i-1}$, to get $t^i$ from this, we only need to multiply it by $t$.

In [211]:
class Polynomial:
    def __init__(self, coefficients):
        self.coeff = coefficients

    # compute self.coeff[0] + self.coeff[1] * t + self.coeff[2] * (t ** 2) + ... + self.coeff[n] * (t**n)
    def evaluate(self, t):
        result = 0
        t_i = 1
        for i in range(len(self.coeff)):
            result += self.coeff[i] * t_i
            t_i *= t
        return result

In [212]:
p = Polynomial([1, 0, 2])
p.evaluate(5)

51

One can easily imagine adding additional methods to the class `Polynomial` above and providing more and more useful functionality. For example, we could add method `add(self, other)` that adds two polynomials (and returns a new polynomial which is a sum of the two), or `mul(self, other)` that multiplies two polynomials.

### Homework - Sudoku solver
We discussed in the lecture the backtracking-based sudoku solver algorithm, which you are asked to implement as a homework problem. The explanation of the algorithm, together with several functions that will be helpful for the final implementation, is available in the `sudoku.ipynb` file, in the homework section of Piazza.