# Refresher problems

The exercises in this lab are designed to refresh and in some cases challenge your python knowlege and problem solving skills.

> **Note** that the solutions to these exercises are in standard cPython.  If you have knowledge of scientific python libraries such as numpy or scipy and are **confident** in using them to complete the exercises then that is fine.  However, don't feel you need to overcomplicate your code.  We will study scientific libraries later in the course.

Even if you are confident in your basic python skills try to work through all of the questions.  

**Some extra challenges relevant to data science and ML:**

* Aim to produce efficient solutions to problems i.e. those that require a minimal amount of memory, computations and run time.  I have provided example solutions, but maybe you can do better!
* Time your solutions to identify if you have improved run time. 
* Keep your solutions readable.  Write good quality human readable code and give your functions PEP8 docstrings.
* Test your problems with more than one data input - this can help identify weaknesses and errors in your code.
* Add some defensive error checking and/or exception handling for function inputs.
---

## Imports

In [1]:
# pythonic cross platform approach for HTTPS request
import requests

## Exercise 1

Given five positive integers, find the minimum and maximum values that can be calculated by **summing exactly four of the five integers**. Print the respective minimum and maximum values.

**Example:**

```python
arr = [9, 3, 5, 7, 1]
```
**output:**

```
16
24
```

In [2]:
# your code here ...
arr = [9, 3, 5, 7, 1]

In [3]:
# example solution
def sum_smallest_n(arr, n=4):
    '''
    Return the sum of the smallest n of an array
    
    Params:
    -------
    arr: list
        assumes list of numerical data 
    
    n: int, optional (default=4)
        Number of items to return 
        
    Returns:
    -------
    int
    '''
    return sum(sorted(arr)[:n])

def sum_largest_n(arr, n=4):
    '''
    Return the sum of the largest n of an array
    
    Params:
    -------
    arr: list
        assumes list of numerical data 
    
    n: int, optional (default=4)
        Number of items to return 
        
    Returns:
    -------
    int
    '''
    return sum(sorted(arr, reverse=True)[:n])

In [4]:
arr = [9, 3, 5, 7, 1]
print(sum_smallest_n(arr))
print(sum_largest_n(arr))

16
24


---
## Exercise 2

A left rotation operation on an array shifts each of the array's elements unit to the left. 

**Example 1**

```python
# original list
to_roll = [1 ,2, 3, 4, 5]

#number of rotations/rolls left
n = 2

#call the function that rotates the array.
roll_left(to_roll, n)
```
**output:**
```
[3, 4, 5, 1, 2]
```

Note that the lowest index item moves to the highest index in a rotation. This is called a *circular array.*


**Example 2:**

```python
# original list
to_roll = [1 ,2, 3, 4, 5]

#number of rotations/rolls left
n = 8

# call the function that rotates the array.
roll_left(to_roll, n)
```
**output:**
```
[4, 5, 1, 2, 3]
```

**Task:**

* Given an array of integers and a number, perform left rotations on the array. 
* Return the updated array and print to screen.

**Test data**
```python
# test 1
to_roll, n = [1, 2, 3, 4, 5]
n = 7 
expected = [3, 4, 5, 1, 2]

# test 2
to_roll = [41, 73, 89, 7, 10, 1, 59, 58, 84, 77, 77, 97, 58, 1, 86, 58, 26, 
           10, 86, 51]
n = 10
expected = [77, 97, 58, 1, 86, 58, 26, 10, 86, 51, 41, 73, 89, 7, 10, 1, 59, 
            58, 84, 77]

```

In [5]:
# your code here ...

In [6]:
#example solution
def roll_left(to_roll, n):
    '''
    Circular rolling of a python list to the left using
    the modulo operator and list slicing.
    
    Returns a new list that has been rolled left.
    
    Params:
    ------
    to_roll: list
        The python list to roll
        
    n: int
        The number of rolls
        
    Returns:
    --------
    list
    
    Example usage:
    --------------
    ```python
    >>> to_roll = [1, 2, 3, 4, 5]
    >>> roll_left(to_rotate, 1)
    [2, 3, 4, 5, 1]
    
    >>> to_roll = [1, 2, 3, 4, 5]
    >>> roll_left(to_rotate, 3)
    [4, 5, 1, 2, 3]
    
    >>> to_roll = [1, 2, 3, 4, 5]
    >>> roll_left(to_rotate, 6)
    [2, 3, 4, 5, 1]
    
    ```
    '''
    split_index =  n % len(to_roll)
    return to_roll[split_index:] + to_roll[:split_index]

In [7]:
to_roll = [1, 2, 3, 4, 5]
roll_left(to_roll, 8)

[4, 5, 1, 2, 3]

In [8]:
to_roll = [41, 73, 89, 7, 10, 1, 59, 58, 84, 77, 77, 97, 58, 1, 86, 58, 26, 
           10, 86, 51]
roll_left(to_roll, 10)

[77, 97, 58, 1, 86, 58, 26, 10, 86, 51, 41, 73, 89, 7, 10, 1, 59, 58, 84, 77]

## Exercise 3

Given a 6x6 matrix $A$

```
1 1 1 0 0 0
0 1 0 0 0 0
1 1 1 0 0 0
0 0 2 4 4 0
0 0 0 2 0 0
0 0 1 2 4 0
```
An hourglass is a subset of values with indices falling in this pattern in $A$'s graphical representation:

```
a b c
  d 
e f g
```

In total, there are 16 hourglasses in $A$. An hourglass sum is the sum of all the values it contains. 

For example the first and second hour glasses in $A$ are therefore.
```
1 1 1    1 1 0
  1        0
1 1 1    1 1 0
```

subset 1 total = 1+1+1+1+1+1+1 = 7

subset 2 total = 1+1+0+0+1+1+0 = 4
   
 
**Task:**
* Code a function that accepts an matrix as a parameter (you can assume it is always 6x6)
* The function must calculate all hourglass sums and return the **maximum** (`int`).

**Test data**

```python
# expected answer = 19
matrix = [[1, 1, 1, 0, 0, 0],
          [0, 1, 0, 0, 0, 0],
          [1, 1, 1, 0, 0, 0],
          [0, 0, 2, 4, 4, 0],
          [0, 0, 0, 2, 0, 0],
          [0, 0, 1, 2, 4, 0]]


# expected answer = 13
matrix2 = [[1, 1, 1, 0, 0, 0],
           [0, 1, 0, 0, 0, 0],
           [1, 1, 1, 0, 0, 0],
           [0, 9, 2, -4, -4, 0],
           [0, 0, 0, -2, 0, 0],
           [0, 0, -1, -2, -4, 0]]
```

In [9]:
# your code here ...


In [10]:
# example solution 
matrix = [[1, 1, 1, 0, 0, 0],
          [0, 1, 0, 0, 0, 0],
          [1, 1, 1, 0, 0, 0],
          [0, 0, 2, 4, 4, 0],
          [0, 0, 0, 2, 0, 0],
          [0, 0, 1, 2, 4, 0]]

matrix2 = [[1, 1, 1, 0, 0, 0],
           [0, 1, 0, 0, 0, 0],
           [1, 1, 1, 0, 0, 0],
           [0, 9, 2, -4, -4, 0],
           [0, 0, 0, -2, 0, 0],
           [0, 0, -1, -2, -4, 0]]

In [11]:
def max_hourglass_sum(matrix):
    '''
    For 6 x 6 array calculates all the 3 x 3 hourglass sums
    and returns the maximum.
    
    E.g. Given the 6x6 matrix $A$
    
    1 1 1 0 0 0
    0 1 0 0 0 0
    1 1 1 0 0 0
    0 0 2 4 4 0
    0 0 0 2 0 0
    0 0 1 2 4 0
    
    The first and second hourglasses are
    
    1 1 1    1 1 0
      1        0
    1 1 1    1 1 0
    
    subset_1 total = 1+1+1+1+1+1+1 = 7
    subset_2 total = 1+1+0+0+1+1+0 = 4
       
    Params:
    ------
    matrix: list
        A 6 x 6 matrix implemented as a list of lists
        
    Returns:
    --------
    int
    
    Example usage:
    -------------
    ```python
    >>> matrix = [[1, 1, 1, 0, 0, 0],
    ...           [0, 1, 0, 0, 0, 0],
    ...           [1, 1, 1, 0, 0, 0],
    ...           [0, 0, 2, 4, 4, 0],
    ...           [0, 0, 0, 2, 0, 0],
    ...           [0, 0, 1, 2, 4, 0]]
    >>> max_hourglass_sum(matrix)
    19
    ```
    '''
    n_cols = len(matrix[0])
    n_rows = len(matrix)
    maximum = float('-inf')

    for row in range(n_rows-2):
        for col in range(n_cols - 2):
            
            # hourglass subset
            top = matrix[row][col:col+3]
            middle = [matrix[row+1][col+1]]
            bottom = matrix[row+2][col:col+3]
            
            subset_sum = sum(top + middle + bottom)
            if subset_sum > maximum:
                maximum = subset_sum

    return maximum

In [12]:
print(max_hourglass_sum(matrix))
print(max_hourglass_sum(matrix2))

19
13


---
## Exercise **n**:

A string is said to be a special string if either of two conditions is met:

* All of the characters are the same, e.g. aaa.
* All characters except the middle one are the same, e.g. aadaa.

A special substring is any substring of a string which meets one of those criteria. Given a string, determine how many special substrings can be formed from it. 

**Example**

```python
s = 'mnonopoo'
```

s contains the following 12 special substrings: 

```python
{'m', 'n', 'o', 'n', 'o', 'p', 'o', 'o', 'non', 'ono', 'opo', 'oo'}
```

**Task:**

* Write a function called `substr_count(s)` that accepts a `str` parameter `s` and calculates the number of instances of a special string within it.  
* Use the example above and the test data below to test your function.
* **Extra Challenge**: can you solve this problem efficiently with only one or two passes of s?

**Test data**

```python
# expected answer = 7
# {'a', 's', 'a', 's', 'd', 'asa', 'sas'}
s = 'asasd'

# expected answer = 10
# {'a', 'b', 'c', 'b', 'a', 'b', 'a'. 'bcb', 'bab', 'aba'}
s = 'abcbaba'

# expected answer = 10
# {'a', 'a', 'a', 'a', 'aa', 'aa', 'aa', 'aaa', 'aaa', 'aaaa'}
s = 'aaaa'

# expected answer = 393074
# len(big_s) = 327308 (!)
f = open("big_special_str.txt", "r")
big_s = f.read()
```

> **I have provided the code below to download the instance of big s for you**

In [13]:
RESPONSE_SUCCESS = 200
BIG_S_URL = 'https://raw.githubusercontent.com/health-data-science-OR/' \
  + 'coding-for-ml/main/content/01_advanced_python/labs/big_special_str.txt'

def get_big_s():
    '''
    downloads large test problem
    '''
    response = requests.get(BIG_S_URL)

    if response.status_code == RESPONSE_SUCCESS:

        # write to file
        with open("big_special_str.txt", 'r') as f:
            big_s = f.read()
    else:
        print('connection error for big s')
        
    return big_s

In [14]:
# test data
test_data = ['mnonopoo', 'asasd', 'abcbaba', 'aaaa', get_big_s()]
expected = [12, 7, 10, 10, 393074]

In [15]:
# your code here

In [16]:
# example solution...
def compact_format(s):
    '''
    Converts a string into a compact list format 
    that includes the letter and an integer indicating
    the number of times is appears consecutively before
    a charactor change.
    
    Params:
    ------
    s: str
        A string to convert
        
    Returns:
    ------
    list
    
    Example usage:
    ------------
    
    ```python
    >>> s = 'asasd'
    >>> compact_format(s)
    [['a', 1], ['s', 1], ['a', 1], ['s', 1], ['d', 1]]
    
    >>> s = 'aaaa'
    >>> compact_format(s)
   [['a', 4]]
    ```
    
    '''
    current_char = s[0]
    count = 1
    compact = []

    for i in range(1, len(s)):
        if current_char == s[i]:
            count += 1
        else:
            compact.append([current_char, count])
            current_char = s[i]
            count = 1

    #final letter
    compact.append([current_char, count])
    return compact

In [17]:
def pairwise_comparisons(n):
    '''
    The number of all pairwise combinations that can be performed.
    '''
    return int(((n*(n-1))/2))

In [18]:
def substr_count(s):
    '''
    Count the special substring instances in the string
    
    e.g. 'aaaa' = {'a', 'a', 'a', 'a', 'aa', 'aa', 'aa', 'aaa', 'aaa', 'aaaa'}
    
    function returns = 10
    
    e.g. 'abcbaba' = {'a', 'b', 'c', 'b', 'a', 'b', 'a'. 'bcb', 'bab', 'aba'}

    function returns = 10
    
    Function performs a single pass of s and then a second pass of a smaller
    compact representation.  Technically this could all be achieved in a single
    pass.  This solution is slightly more readable at the loss of a bit of efficiency.
    
    Params:
    -------
    s: str
        The string to parse.  
    
    Returns:
    --------
    int
    
    Example usage:
    ------------
    
    ```python
    >>> s = 'aaaa'
    >>> substr_count(s)
    10
    ```
    
    '''
    count = len(s)
     
    # pre-processing of string into compact format
    cs = compact_format(s)

    # loop comparison of s[1] to s[n-1]
    for i in range(1, len(cs)-1):
        
        count += pairwise_comparisons(cs[i][1])
        
        #only 1 middle char + same char in head & tail
        if cs[i][1] == 1 and cs[i-1][0] == cs[i+1][0] : 
            # e.g.1 cacc.  Therefore count + 1 (cac)
            # e.g.2 cccacc  add 2 (ccacc and cac)
            # e.g.3 ccccacc add 2 (ccacc and cac)
            # this is the minimum of each of the two 
            count += min(cs[i-1][1], cs[i+1][1])
            
    # first and last missed in above loop
    count += pairwise_comparisons(cs[0][1])
    if(len(cs) > 1):
        count += pairwise_comparisons(cs[len(cs)-1][1])
        
    return count

In [19]:
# this is what compact format outputs
s = 'asasd'
compact_format(s)

[['a', 1], ['s', 1], ['a', 1], ['s', 1], ['d', 1]]

In [20]:
# test function
test_data = ['mnonopoo', 'asasd', 'abcbaba', 'aaaa', get_big_s()]
expected = [12, 7, 10, 10, 393074]
results = [substr_count(s) for s in test_data]

print(expected)
print(results)

[12, 7, 10, 10, 393074]
[12, 7, 10, 10, 393074]
