# Fundamentals of Computer Science 30398 - Lecture 9

We will start the lecture by discussing briefly the exam problems. The statements are here as a reminder.

### **Problem 1 — Splitting a List by a Pivot**

Write a function `split(arr, pivot)` that takes two arguments:

- `arr`: a list of integers
- `pivot`: an integer used as a threshold value

The function should return a **tuple of two lists**:

1. The **first list** should contain all elements of `arr` that are **less than or equal to** `pivot`.
2. The **second list** should contain all elements of `arr` that are **greater than** `pivot`.
#### **Examples**

```python
split([5, 6, 4, 1, 7, 8], 5)   # ([5, 4, 1], [6, 7, 8])
split([5, 7, 8, 2, 4, 11], 1)  # ([], [5, 7, 8, 2, 4, 11])
split([5, 3, 8, 1, 3], 11)     # ([5, 3, 8, 1, 3], [])
split([1, 2, 3, 1, 2, 3], 2)   # ([1, 2, 1, 2], [3, 3])
```
#### **Notes**

- The returned value must be a 2-tuple `(list1, list2)`.
- You may assume that all elements in `arr` are integers.

### Discussion
Most of you solved this problem almost correctly, congratulations! One small recurring mistake in several of your solution was to return the list `[list1, list2]` at the end, instead of the tuple `(list1, list2)`. This is, technically, not exactly what the problem was asking for, but since the main logic of your solution was correct, I was granting maximum number of points

In [2]:
def split_wrong(arr, pivot):
    list1, list2 = [],[]
    for x in arr:
        if x <= pivot:
            list1.append(x)
        else:
            list2.append(x)
    return [list1, list2]

In [9]:
split_wrong([5,6, 4,1,7,8], 5)

[[5, 4, 1], [6, 7, 8]]

In [13]:
split_wrong([5,6,4,1,7,8], 5) == ([5,4,1],[6,7,8])

False

The correct solution is almost identical, except it returns the tuple at the end, as requested:

In [7]:
def split(arr, pivot):
    list1, list2 = [],[]
    for x in arr:
        if x <= pivot:
            list1.append(x)
        else:
            list2.append(x)
    return (list1, list2)

In [12]:
split([5,6,1,7,8], 5)

([5, 1], [6, 7, 8])

In [14]:
split([5,6,4,1,7,8], 5) == ([5,4,1],[6,7,8])

True

### **Problem 2 — Longest Increasing Subarray**

Write a function `longest_increasing_subarray(arr)` that takes a list of distinct integers `arr` and returns the length of the longest **contiguous increasing subarray**.

Your algorithm must run in **O(n)** time.

#### **Examples**
```python
longest_increasing_subarray([4, 1, 2, 5, 0, 7]) == 3
```
because `[1, 2, 5]` is the longest contiguous increasing subarray.

```python
longest_increasing_subarray([5, 4, 3, 2, 1]) == 1
```
since each element by itself forms an increasing subarray of length 1.
```python
longest_increasing_subarray([1, 2, 3, 4, 5, 6]) == 6
```
because the entire array is increasing.
```python
longest_increasing_subarray([5, 1, 6, 2, 7, 3]) == 2
longest_increasing_subarray([1, 2, 0, 4, 5]) == 3
```
#### **Hint**
Iterate through the array while keeping track of:
- the length of the current increasing subarray, and
- the maximum length found so far.

If `arr[i + 1] > arr[i]`, extend the current subarray; otherwise, reset its length to 1.
#### **Note**
Partial credit will be awarded for a correct Θ(n²) solution.

### Discussion

Almost all of you wrote something close to the correct solution, congratulations! One of the most common mistakes was a code with a logic equivalent to this:

In [16]:
def longest_increasing_subarray_wrong(arr):
    current=1
    longest = 1
    for i in range(len(arr)-1):
        if arr[i+1] > arr[i]:
            current += 1
        else:
            longest = max(current, longest)
            current = 1
    return longest

Note that this only updates the value of the variable `longest` when encountering an element that's smaller than its predecessor. As a consequence, if the longest increasing subarray happens to be at the end of the input array `arr`, the `longest` variable will never be updated. We can see that it does not correctly work on this specific example:

In [17]:
longest_increasing_subarray_wrong([1,2,3,1,2,3,4,5,6])

3

To fix this, we can either repeat the line `longest = max(current, longest)`, or just make this check unconditional, inside the loop, like bellow:

In [18]:
def longest_increasing_subarray(arr):
    current=1
    longest = 1
    for i in range(len(arr)-1):
        if arr[i+1] > arr[i]:
            current += 1
        else:
            current = 1
        longest = max(current, longest)
    return longest

In [19]:
longest_increasing_subarray([1,2,3,1,2,3,4,5,6])

6

### **Problem 3 — Shift in a Rotated Sorted Array**

A **rotated sorted array** is a sorted array that has been cyclically shifted by some number of elements.  
For example, `[2, 4, 5, 6, 7, 0, 1]` is a rotation of `[0, 1, 2, 4, 5, 6, 7]`.

Write a function `find_shift(arr)` that returns the **number of positions** the array has been shifted to the right — equivalently, the **index of the smallest element** in the array.

Your algorithm must run in **O(log n)** time and use **binary search**.

#### **Examples**

```python
find_shift([1, 2, 3, 5, 8, 11]) == 0
find_shift([11, 1, 2, 3, 5, 8]) == 1
find_shift([2, 3, 5, 8, 11, 1]) == 5
find_shift([5, 8, 11, 1, 2, 3]) == 3
```

#### **Hints**

- If `arr[0] < arr[-1]`, the array is not rotated.
- Otherwise, maintain two indices `left = 0` and `right = len(arr) - 1` such that `arr[left] > arr[right]`.  
    Repeatedly compute
    `mid = (left + right) // 2`
    and update either `left` or `right` depending on which side contains the smallest element.
    
- Think carefully about the **loop’s stopping condition**.
#### **Note**

Partial credit will be awarded for a correct solution running in Θ(n) time.

### Discussion
This was intended to be the hardest problem --- but many of you were close to solving it completely, and several provided full, working solution. Pretty good!

#### Slow solution

First of all, the problem states at the end that for a correct solution running in time $\Theta(n)$, partial points will be awarded (in this case, 5/10 points). 

One useful skill when approaching concrete problems is **abstraction** --- ability to identify and ignore irrelevant details, and focus on the essence of the problem. Note that, if one wants to just provide a $\Theta(n)$ solution, the entire structure of the array promised in the problem statement is irrelevant, except for the fact that we would like to find the index of the minimum element of the array. We can just abstract away all this additional structure, and solve the more general (and, simultanously, simpler to think of) problem:

**More genral problem** Given an arbitrary array `arr` with no repeating elements, return the index of the smallest element in this array. Your algorithm should run in time $\Theta(n)$.

If this was the problem statement, it would have been much easier than problem 2, and on the similar difficulty as problem 1. One needs only to iterate over all possible indices, and keep track of the position of the smallest element seen so far:

In [42]:
def find_shift_slow(arr):
    min_el_index = 0
    for i in range(len(arr)):
        if arr[i] < arr[min_el_index]:
            min_el_index = i
    return min_el_index

If we know relevant python built-in functions, and methods of the list, this could be instead written with a simple one-liner.

In [43]:
def find_shift_slow_2(arr):
    return arr.index(min(arr))

Note, that this code, despite being much shorter, is essentially equivalent to the code above, in particular it also iterates over the entire array in time $\Theta(n)$: the `min` function makes a single loop over the array to find the minimum value, and the `index` method again loops over the entire array to find the index of the element given as the input.

#### Conceptually wrong solution

One can be tempted to start by calling `min` function on the array `arr` to find the value of the minimum element, and then proceeded to write some version of binary search, to find its index in the array, as below:

In [21]:
def find_shift(arr):
    x = min(arr)
    left, right = 0, len(arr)-1
    while left < right:
        ...

This is conceptually wrong: once we have called `min`, which couldn't use the fact that the array is a rotated-sorted array, and iterates over it to find the smallest element, we have already spent $\Theta(n)$ time - we must have iterated over the entire array. Any kind of binary search later completely misses the point: the reason we would want to use binary search, is to have an algorithm running in time $O(\log n)$, but we already spent linear time --- we could have found the index of the minimum element along with its value in this very loop, and be done with it (as in `find_shift_slow`)

### Towards $O(\log n)$ solution.

Many of you wrote a code similar to the following one

In [127]:
def find_shift_attempt_1(arr):
    if arr[0] < arr[-1]:
        return 0
    left, right = 0, len(arr)-1
    while left < right:
        mid = (left + right) // 2
        if arr[mid] > arr[left]:
            left = mid + 1
        else:
            right = mid - 1
    return left
    

This is a solution going in the right direction: keeps two variables `left` and `right` for the left and right endpoint of the interval under consideration (with a guarantee that the shift is somewhere between those two). In each iteration we compute the value `mid` of the midpoint of the interval. The key observation is, that if `arr[mid] > arr[left]`, it means that the minimum value needs to be somewhere to the right of `mid` in the rotated sorted array, otherwise it has to be somewhere to the left.

This code should already be suspicious just by looking at it: we have a single `if` statement in the loop, considering two cases; whatever is the condition, in one case we continue in the interval `[left, mid-1]`, in the other: in the interval `[mid+1, right]`. Surely, this can't work since if `mid` happens to be exactly the shift we want to calculate,  after the `if-else` instruction it will be excluded from the range, and will not be returned by the function at any later point. Let's verify that it indeed doesn't work in this case:

In [128]:
find_shift_attempt_1([5,6,7,0,1,2,3])

2

If we are trying this kind of implementation, we should check first if `mid` happens to be exactly the element we are looking for. How to check it? Note that the right shift is the only index such that `arr[shift] < arr[shift-1]` --- in all other positions the array is increasing. The following code therefore is almost correct:

In [129]:
def find_shift_attempt_2(arr):
    if arr[0] < arr[-1]:
        return 0
    left, right = 0, len(arr)-1
    while left < right:
        mid = (left + right) // 2
        if arr[mid] < arr[mid-1]:
            return mid
        if arr[mid] > arr[left]:
            left = mid + 1
        else:
            right = mid - 1
    return left

In [130]:
find_shift_attempt_2([5,6,7,0,1,2,3])

3

As it turns out, this code is still wrong, but for quite a bit more subtle reason. Let's try to automatically generate all possible shifts, and test our code on all those examples.

In [132]:
n = 15
arr = [i for i in range(n)]

for shift in range(n):
    shift_arr = arr[shift:] + arr[:shift]
    
    if find_shift_attempt_2(shift_arr) != find_shift_slow(shift_arr):
        print("Wrong answer for ", shift_arr)
        print("Expected: ", find_shift_slow(shift_arr), " and got ", find_shift_attempt_2(shift_arr))
        break

Wrong answer for  [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 0, 1, 2]
Expected:  12  and got  14


Even more mysteriously, if instead we change the condition very slightly, and instead `if arr[mid] > arr[left]`, write `if arr[mid] > arr[right]` (with the same intuition behind why it should be correct), the code works:

In [133]:
def find_shift_attempt_3(arr):
    if arr[0] < arr[-1]:
        return 0
    left, right = 0, len(arr)-1
    while left < right:
        mid = (left + right) // 2
        if arr[mid] < arr[mid-1]:
            return mid
        if arr[mid] > arr[right]:
            left = mid + 1
        else:
            right = mid - 1
    return left

Testing it the same way as above:

In [134]:
n = 50
arr = [i for i in range(n)]
all_correct = True
for shift in range(n):
    shift_arr = arr[shift:] + arr[:shift]
    
    if find_shift_attempt_3(shift_arr) != find_shift_slow(shift_arr):
        print("Wrong answer for ", shift_arr)
        print("Expected: ", find_shift_slow(shift_arr), " and got ", find_shift_attempt_2(shift_arr))
        all_correct = False
        break
if all_correct:
    print("Correct!")

Correct!


What is wrong with `find_shift_attempt_2`? Turns out that the bug is very subtle. What happens if `mid` is exactly one entry before the correct shift? We will update `left = mid+1`, correctly, but now the subarray from `left` to `right` is just a sorted array. The correct answer there should be `left`, but the condition `if arr[mid] > arr[left]: left = mid+1` is not the right condition while we are in the sorted part of the array. We should either compare with the right endpoint, as in `find_shift_attempt_3`, or more explicitly just check in each iteration of the loop, if `arr[left] < arr[right]` --- if this is the case, we know that the subarray between `left` and `right` is sorted, so the minimum element is `left`.

In [135]:
def find_shift_attempt_4(arr):
    left, right = 0, len(arr)-1
    while left < right:
        if arr[left] < arr[right]:
            return left
        mid = (left + right) // 2
        if arr[mid] < arr[mid-1]:
            return mid
        if arr[mid] > arr[left]:
            left = mid + 1
        else:
            right = mid - 1
    return left

We can test that this is again a correct solution:

In [136]:
n = 50
arr = [i for i in range(n)]
all_correct = True
for shift in range(n):
    shift_arr = arr[shift:] + arr[:shift]
    
    if (find_shift_attempt_3(shift_arr) != find_shift_slow(shift_arr)):
        print("Wrong answer for ", shift_arr)
        print("Expected: ", find_shift_slow(shift_arr), " and got ", find_shift_attempt_4(shift_arr))
        all_correct = False
        break
if all_correct:
    print("Correct!")

Correct!


## Functions as values

One useful feature of Python (present also in many other modern programming languages) is that functions themselves can be treated as **values**, the same way integers, strings or lists. For example, let us define a simple function that takes a number $x$ and returns a `Bool` indicating whether this value is greater than `5`.

In [65]:
def greater_than_5(x):
    return x > 5

We can assign this function, to a new variable (not the **function value** on some arguments --- the function itself!)

In [66]:
fun = greater_than_5

and call it, using the same function call-syntax

In [67]:
fun(7)

True

Why this is useful? Since functions are treated as any other values, we can, for example, pass them as arguments to other functions. As an example, we can write a simple function `my_filter` that takes two arguments: a list `arr`, and a **function** `predicate`.

In [78]:
def my_filter(predicate, arr):
    ...

The argument `predicate` itself should be a function that takes an element and outputs Bool, `arr` is just an arbitrary list. We would want the function `my_filter` to return all elements `x` of the list `arr` such that `predicate(x) == True`. This is very similar in logic to the code we have already written multiple times:

In [137]:
def my_filter(predicate, arr):
    output = []
    for x in arr:
        if predicate(x):
            output.append(x)
    return output

In [138]:
my_filter(greater_than_5, [2,5,7,1,3])

[7]

the main new thing is that the function above takes as its argument another function to call on each element of the list `arr`. Note that Python already has a list-comprehension syntax to do exactly this:

In [139]:
[ x for x in [2,5,7,1,3] if x > 5]

[7]

but in other languages such a syntax is not present, and one either has to write an explicit loop, or uses a filter function like above, that takes a predicate as input. 

We called this function `my_filter`, because a `filter` function doing exactly this, is already a python built-in:

In [83]:
list(filter(greater_than_5, [2,5,7,1,3]))

[7]

(except the Python `filter` doesn't return a `list`, it returns an `iterable` --- a bit more advanced object that we might discuss later in the course).

We can try applying this my_filter with different predicates, to see that it works as expected.

In [88]:
def smaller_than_3(x):
    return x < 3

In [90]:
my_filter(smaller_than_3, [2,5,7,1,3])

[2, 1]

Often, the function that we pass as an argument to another function is just a very simple one-liner, like the `smaller_than_3` function above. In this cases it is unnecesairly verbose to use `def` syntax to create the function, and give it a name --- especially if it is going to be used only in this single place in code.

Fortunatelly, python provides a simple syntax to create a short, unnamed function: we can write `lambda variable: expression`, to create a short unnamed function with a argument, and the entire code being `return expression`. We can then assign this function to some variable, for example:

In [140]:
fun = lambda var : var < 3

In [141]:
fun(5)

False

is the same as

In [142]:
def fun(var):
    return var < 3

In [143]:
fun(2)

True

Of course, if we actually want to assign the name to the function it is of relatively little value to use the lambda construction. But we can just directly use `lambda` when passing a function as an argument to another function. For example, instead of:

In [144]:
def greater_than_5(x):
    return x > 5
    
my_filter(greater_than_5, [2,5,7,1,3])

[7]

We can write:


In [29]:
my_filter([2,5,7,1,3], lambda x : x > 5)

[7]

### Going back to binary search:

Let us try to attempt a more abstract (and hence simpler) approach to the `find_shift` problem. To this end, we will write yet another variant of the binary search algorithm.

### Another binary search exercise:
Write a function that takes an array `arr` of Bool, which starts with some number of `True`, and then has `False` all the way untill the end (for example, `arr` could be `[True, True, False, False, False, False]`). We want it to return the index of first `False` in time $O(\log n)$.

Following the same kind of logic we used several times writing variants of binary search, we can write something like this:

In [145]:
def find_first_false(arr, predicate):
    if not arr[0]:
        return 0
    if arr[-1]:
        return len(arr)
        
    left, right = 0, len(arr)-1
    ## at all times arr[left] == True, arr[right] == False
    while left < right - 1:
        mid = (left + right) // 2
        if arr[mid]:
            left = mid
        else:
            right = mid
    return right

Since the array we look at has almost no structure, there are very few details one needs to keep track of in an implementation like this.

First, we check some extreme cases: is the first entry is already `False` --- in which case this is surely first `False`. Is the last entry still `True`? In this case, the array is just full of `True` values, and we should return `len(arr)` (as a convention for "there is no first `False`).

The logic behind the loop itself is relatively simple: from this point it will be true all the time that `arr[left] == True` and `arr[right] == False`, and we will attempt to shrink the distance between `left` and `right` by roughly a factor of two in each step. To this end, we compute the index of the midpoint `mid = (left + right)//2`, check if `arr[mid]`, and update either `left` or `right` variable accordingly, to be **equal** to mid. (Note that in this solution, we don't care that it might be the case that `mid` is actually exactly the first `False` we are looking for. We will detect it later on.

With this construction, it can easily be seen that at all times we keep `arr[left] == True` and `arr[right] == False`. The only thing we need to be careful with is the condition in the loop: it should be `while left < right - 1`, not `while left < right` --- otherwise the loop would have never finished.

Once the loop terminates, we know three things: `arr[left] == True`, `arr[right] == False` (since this is how we designed the if condition to work), and `left == right - 1`, since otherwise we would not have exited the loop. Ergo, we identified two neighboring elements, one with value `True`, the other with value `False`. First `False` needs to be at index `right`!

**Remark** It is possible to write this in a slightly more clever (and shorter) way, correctness of which is a bit less obvious (as an exercise: try to think of various corner cases, and why the code below works just as well) - ignoring the check whether arr[-1], and setting `right = len(arr)` (essentially pretending that there is an extra `False` added after the end of the array) --- one needs to be careful that this never leads to accessing the array `arr` outside of its range.

In [101]:
def find_first_false(arr):
    if not arr[0]:
        return 0
        
    left, right = 0, len(arr)
    ## at all times arr[left] == True, arr[right] == False
    while left < right - 1:
        mid = (left + right) // 2
        if arr[mid]:
            left = mid
        else:
            right = mid
    return right

### Binary search with predicate given as argument.

The problem of finding first `False` in an array that starts with some number of `True` followed by `False` up to the end does not seem very useful - how often do we really see such an array? It becomes much more useful, if we combine it with the notion that we can pass functions as arguments to our function. As it turn out, we don't need to modify the `find_first_false` much at all to adapt it to a much more general scenario.

**Exercise**
Write a function `find_first_unsat(arr, predicate)` that takes an array `arr`, and a `predicate` as its arguments. The `predicate` is just a function that takes one argument, and outputs `Bool`. We can assume that for some $k$ it is the case that `predicate(arr[i]) == True` whenever $i < k$, and `predicate(arr[i]) == False` when $i\geq k$. That is, if we applied `predicate` to each element of the array arr, we would have gotten an array as in the previous exercise. We would like to find the first `k` such that `predicate(arr[k]) == False`, again using binary search

Of course, we can't just apply the `predicate` to all elements of the array `arr` in the loop, and use the previous function --- it would take linear time just to apply the `predicate` everywhere. On the other hand, it doesn't take much work to modify the code above to work under this more general situation. Simply, everywhere we look at `arr[i]` in the code of `find_first_false`, we should instead look at `predicate(arr[i])`:

In [146]:
def find_first_unsat(arr, predicate):
    if not predicate(arr[0]):
        return 0
        
    left, right = 0, len(arr)
    ## at all times arr[left] == True, arr[right] == False
    while left < right - 1:
        mid = (left + right) // 2
        if predicate(arr[mid]):
            left = mid
        else:
            right = mid
    return right

### Conclusion: Specialization
Turns out, that this more abstract version of the binary search above is much cleaner than previous, specialized version, for example `find_first_attempt_4` --- there are fewer things to keep track of. The `predicate(arr[i])` could only be `True` or `False`, we do not need to handle separately the case when we ended up in increasing part of the array, we do not need to separately take care of the case where `mid` happens to be exactly the element we are looking for.

As it turns out, we can use this general function, to solve the exam problem!

Note that if we look at the rotated shifted array `arr`, the first $k$ elements (up to the shift) all satisfy `arr[j] >= arr[0]` --- are all at least the first element of the array, whereas all elements beyond the drop are smaller than `arr[0]`. If we apply the predicate that checks whether the element is larger or smaller than `arr[0]`, our goal becomes exactly finding the first element which does not satisfy the predicate (given the guarantee that the predicate is monotone: it's first satisfied, and then unsatisfied by elements in the array):

In [147]:
def find_shift_attempt_5(arr):
    return find_first_unsat(arr, lambda x: x >= arr[0])

The code above is almost correct, let's test it (first, wrapping the testing code into a function that takes as an argument a function - so that we don't have to repeat it anymore):

In [148]:
def test_find_shift(function_to_test, n=20):
    arr = [i for i in range(n)]
    for i in range(n):
        shifted_arr = arr[i:] + arr[:i]
        if function_to_test(arr) != find_shift_slow(arr):
            print("Wrong answer for ", shifted_arr)
            print("Expected: ", find_shift_slow(shifted_arr), " and got ", function_to_test(shifted_arr))
            return False
    print("Correct!")
    return True

In [112]:
test_find_shift(find_shift_attempt_5)

Wrong answer for  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Expected:  0  and got  20


False

Ooops! We need to explicitly check if the array is already sorted --- since in this case we were asked to return `0` as a shift, but after applying the predicate `lambda x : x >= arr[0]` we get the array that's equal to `[True, ..., True]`, and the `find_first_unsat` returns `len(arr)`. Let's fix it:

In [117]:
def find_shift_final(arr):
    if arr[0] < arr[-1]:
        return 0
    return find_first_unsat(arr, lambda x: x >= arr[0])

In [118]:
test_find_shift(find_shift_final)

Correct!


True

## Homework problem: quicksort

**Exercise**

Modify the function `split(arr, pivot)` from the beginning of this lecture, so that it returns three lists: `list1` with elements smaller than `pivot`, `list2` with elements equal to `pivot`, `list3` with elements larger than pivot.

Then write a  function`quick_sort`, that takes as an argument an array. This function should pick a random elment of the array as a pivot, calls `split` to split the array into smaller, equal and larger elements than pivot, and recursively sort all smaller and all larger elements. Finally, concatenate all three lists.

To do this, we will need to import the module `random`, and use the function `random.randint` to generate a random index. Let's check documentation how to use it:

In [149]:
import random

In [150]:
? random.randint

[31mSignature:[39m  random.randint(a, b)
[31mDocstring:[39m
Return random integer in range [a, b], including both end points.
        
[31mFile:[39m      /opt/homebrew/Cellar/python@3.14/3.14.0/Frameworks/Python.framework/Versions/3.14/lib/python3.14/random.py
[31mType:[39m      method

In [151]:
random.randint(0, 10)

6

Our code will probably start with something like this:

In [152]:
def quick_sort(arr):
    if len(arr) == 0:
        return arr
    pivot_index = random.randint(0, len(arr)-1)
    pivot = arr[pivot_index]
    ...

Finish it!