# Lab 4
<br>

# Errors, exceptions, sorting and complexity

---

##### CS1P. Semester 2. Python 3.x
 ---

## Purpose of the lab

There are only two (albeit large) problems in this lab. They will exercise your skills on:
* files
* dictionaries
* handling exceptions
* sorting 
* analysing time complexity

In part A, you will implement a robust program that detects and handles as many errors as possible, as well as gives informative error messages to the user.

The problem in part B is from HackerRank, it will exercise your problem solving skills, as well as your knowledge on sorting and time complexity. 

In [24]:
from utils.tick import tick

# A: Birthday book

## A.1 

The idea of this exercise is to read in people's birthdays and produce reminders of birthdays that are coming up. A birthday consists of a month and a date, which can be represented by a dictionary such as

    { "month":"Sep", "day":17 }
    
The birthday book is a dictionary in which the keys are people's names, and the values are birthdays, represented as dictionaries as above. The functions you have to define are described below; they should all take a birthday book as a parameter, as well as any other parameter specified below. Before you start coding, make sure you have a clear idea of how to produce the desired information.

(a) Write some code to set up a birthday book with several people and their birthdays, for testing purposes.

In [None]:
# Setting up a birthday book
book = {}
book["Simon"] = { "month":"Sep", "date":17 }
# add more entries

In [38]:
# Setting up a birthday book
book = {}
book["Simon"] = { "month":"Sep", "date":17 }
book["Dorothy"] = { "month":"Sep", "date":18 }
book["Tina"] = { "month":"May", "date":7 }
book["Sue"] = { "month":"Apr", "date":14 }
book["Phil"] = { "month":"Jun", "date":7 }
book["Sally"] = { "month":"Sep", "date":24 }
book["John"] = { "month":"Dec", "date":30 }
book["Anne"] = { "month":"Jan", "date":2 }

<br> 

(b) Define a function which, given a person's name, prints their birthday.

In [39]:
# Function to print a birthday
def printBirthday(name, date):
    print(f"{name}: {date['month']}, {date['date']}")

# The lookup function
def birthdayByName(name, book):
    if name in book:
        printBirthday(name, book[name])
    else:
        print(f"{name} is not in the birthday book")
birthdayByName("Sue", book)

Sue: Apr, 14


<br>
(c) Define a function which, given a month, prints a list of all the people who have birthdays in that month, with the dates.

In [40]:
# Months with their lengths
months = { "Jan":31, "Feb":29, "Mar":31, "Apr":30, "May":31, "Jun":30,
           "Jul":31, "Aug":31, "Sep":30, "Oct":31, "Nov":30, "Dec":31 }

# Months in calendar order
monthOrder = [ "Jan", "Feb", "Mar", "Apr", "May", "Jun",
               "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" ]

# Find birthdays in a given month

def findInMonth(month,book):
    if not(month in months.keys()):
        print(f"{month} is not a valid month")
    else:
        for name in book:
            if book[name]["month"] == month:
                printBirthday(name,book[name])
findInMonth("Sep", book)

# Students could decide to handle case sensitive spellings 
# or different ways in which a particular month can be written
# this is open-ended

Simon: Sep, 17
Dorothy: Sep, 18
Sally: Sep, 24


<br>
(d)
Define a function which, given a month and a date, prints a list of all the people who have birthdays within the next week, with the dates. Don't forget that some of these birthdays might be in the next month, and if the given date is in December, some of them might be in January.

In [41]:
# Check a given month and date for validity

def validDate(month,date):
    if month in months:
        return date >= 1 and date <= months[month]
    else:
        return False
#print(validDate("Dec",30))

# Convert a date into a day number from the beginning of the year
# Assume that the date is valid

def dayNumber(month,date):
    days = 0
    m = 0
    while monthOrder[m] != month:
        days = days + months[monthOrder[m]]
        m = m + 1
    days = days + date
    return days
#print(dayNumber("Sep",17))

# Find birthdays in the next week

def findSoon(month,date,book):
    if validDate(month,date):
        # how many days are there from Jan 1 to month, date?
        dayNum = dayNumber(month,date)
        for name, d in book.items():
            dn = dayNumber(d["month"],d["date"])
            # if dn is in the past, add 366
            if dn < dayNum:
                dn = dn + 366
            if dayNum <= dn <= dayNum + 7:
                printBirthday(name,d)
    else:
        print("Invalid date")
findSoon("Dec", 30, book)

John: Dec, 30
Anne: Jan, 2


## A.2 Reading birthdays from a file

The aim of this task is to write a function `getBirthdays` which takes a file name as a parameter and reads birthdays from the file, storing them in a dictionary which should also be a parameter of the function. The first line of the function definition should therefore be

	def getBirthdays(fileName, book):
    
The file should contain a number of lines with one birthday per line, in the following format:

	John,Mar,23    
	Susan,Feb,16
    
and so on. The file `birthdays.txt` in the Unit4/Lab folder contains some data that you can use for testing; you can also create your own files.

For this task, don't worry about handling errors: assume that the file exists, that it has the correct format, that every line gives a valid date, etc. 

In [42]:
# Read birthdays from a file

def getBirthdays(fileName,book):
    with open(fileName,"r") as f:
        line = f.readline()
        while line != "":
            data = line.strip().split(",")
            book[data[0]] = { "month":data[1], "date":int(data[2]) }
            line = f.readline()

# Add a birthday

def addBirthday(name,month,date,book):
    book[name] = { "month":month, "date":date }

<br>

## A.3 Menu
The task now is to combine the functions you have implemented so far into a complete application. Write a program which repeatedly asks the user to enter a command, asks for further details if necessary, and carries out the corresponding operation. 

For example, one command could be "read"; in this case the program should ask the user to enter a filename, and then read birthdays from that file into the birthday book. There should be a textual menu with a command for each of the operations from tasks A1 and A2, as well as a command "quit" which terminates the program. Also add a command (and a function) allowing a new birthday to be stored in the book.

Later in the course we will see how to build a graphical user interface instead of using keyboard input.

In [45]:
"""
The code below gives a basic structure for an interactive application 
which asks the user for a choice from a menu. I anticipate that some 
students may need help to get something similar to this structure or 
equivalent. Of course, the code below is not the only way to do it, 
but note the similarity to other programs we have seen that repeatedly 
get data from the user, in particular the call of getChoice at the end 
of a while loop, with another call before the loop in order to
"prime the pump".
"""

# The main program

def getChoice():
    print("-"*30)
    print("Choose from the following options:")
    print("l:  look up a birthday")
    print("m:  find birthdays in a given month")
    print("w:  find birthdays in the next week")
    print("a:  add a birthday")
    print("r:  read birthdays from a file")
    print("q:  quit")
    print("-"*30)
    print()
    choice = input("Enter your choice: ")    
    return choice

def main():
    c = getChoice()
    while c != "q":
        if c == "l":
            name = input("Enter the name: ")
            birthdayByName(name,book)
        elif c == "m":
            month = input("Enter the month: ")
            findInMonth(month,book)
        elif c == "w":
            month = input("Enter the current month: ")
            date = int(input("Enter the current date: "))
            findSoon(month,date,book)
        elif c == "a":
            name = input("Enter the name: ")
            month = input("Enter the month: ")
            date = int(input("Enter the date: "))
            addBirthday(name,month,date,book)
        elif c == "r":
            fileName = input("Enter the file name: ")
            getBirthdays(fileName,book)
        else:
            print("You did not choose a valid option")
        c = getChoice()
main()

------------------------------
Choose from the following options:
l:  look up a birthday
m:  find birthdays in a given month
w:  find birthdays in the next week
a:  add a birthday
r:  read birthdays from a file
q:  quit
------------------------------

Enter your choice: Sue
You did not choose a valid option
------------------------------
Choose from the following options:
l:  look up a birthday
m:  find birthdays in a given month
w:  find birthdays in the next week
a:  add a birthday
r:  read birthdays from a file
q:  quit
------------------------------

Enter your choice: l
Enter the name: Sue
Sue: Apr, 14
------------------------------
Choose from the following options:
l:  look up a birthday
m:  find birthdays in a given month
w:  find birthdays in the next week
a:  add a birthday
r:  read birthdays from a file
q:  quit
------------------------------

Enter your choice: m
Enter the month: Sep
Simon: Sep, 17
Dorothy: Sep, 18
Sally: Sep, 24
------------------------------
Choose from t

<br>

## A.4 Handling Exceptions
In this task, you will make the birthday book program robust by detecting and/or handling as many errors as possible and giving informative error messages to the user. There are many possibilities for errors in the input to the program. For example:
* the file of birthdays might not exist or might not have the correct format; 
* the dates can be invalid (e.g., Feb 31); 
* when finding a person's birthday, the person might not be in the birthday book; 
* when asking for birthdays in a given month, the name of the month might be incorrect; 
* the user might enter an incorrect command in response to the top-level prompt; and so on.

Modify the birthday book program so that as many errors as you can think of are detected. In some cases, for example trying to open a non-existent file, you should handle the exception raised by the built-in Python function. In other cases, you might like to raise and handle your own exceptions, or you might prefer to use other techniques (for example, checking that the top-level command is correct can be done easily with a series of if statements).
Test your program thoroughly, remembering that you are now checking that error cases are handled correctly.

In [None]:
"""
The previous tasks are fairly straightforward, but this one is 
potentially quite a lot of work and requires careful thought to do 
it thoroughly. Students should certainly handle the exception raised
by open if the file does not exist. 

Exceptions have been covered in the lecture. There are numerous other 
errors to detect: incorrect month, incorrect date in the given month, 
incorrect file format when reading birthdays in, trying to add a new 
birthday for an existing person. Students should think of and detect 
anything that could possibly be regarded as an error. They should test 
thoroughly: they are now testing mainly with error cases. Most of the
errors can be handled by checking the format of the strings and numbers 
supplied, but they might like to experiment with using exceptions.
"""

# B: [Fraudulent Activity Notifications](https://www.hackerrank.com/challenges/fraudulent-activity-notifications/problem?h_l=interview&playlist_slugs%5B%5D=interview-preparation-kit&playlist_slugs%5B%5D=sorting)

### Background

HackerLand National Bank has a simple policy for warning clients about possible fraudulent account activity. If the amount spent by a client on a particular day is greater than or equal to $2 \times$ the client's median spending for a trailing number of days, they send the client a notification about potential fraud. The bank doesn't send the client any notifications until they have at least that trailing number of prior days' transaction data.

<div class="alert alert-info">
    <b>NOTE</b>: The median of a list of numbers can be found by arranging all the numbers from smallest to greatest. If there is an odd number of numbers, the middle one is picked. If there is an even number of numbers, median is then defined to be the average of the two middle values.
</div>


### Problem Specification

(i) Write a function `activityNotifications`, which takes as parameter, a filename representing the client's activities, and returns an integer representing the number of client notifications. 

(ii) What is the time complexity of your implementation? Don't worry about the time it takes to read the file.

(iii) If answer to (ii) above is more than $O(nd)$, how can you improve your implementation?

---

One of the txt files (`fan1.txt`) is displayed below.
<img src="imgs/fan1.png" width=50%> 

**How to read the input file:**

First line contains two space-separated integers 
* **n**, the number of days of transaction data;
* **d**, the number of trailing days' data used to calculate median spending.

Second line contains space-separated non-negative integers where each integer $i$ denotes $expenditure[i]$.

**Constraints:**

$1 \leq n \leq 2 \times 10^5$

$1 \leq d \leq n$

$0 \leq expenditure[i] \leq 200$

**Output format**

An integer denoting the total number of times the client receives a notification over a period of $n$ days.



---

### Illustration 1

```Python3
# fan1.txt
# 9 5
# 2 3 4 2 3 6 8 4 5

activityNotification("fan1.txt") 
>>> 2
```

**EXPLANATION:**

We must determine the total number of _**notifications**_ the client receives over a period of $n = 9$ days. 

For the first $d = 5$ days, the customer receives no notifications because the bank has insufficient transaction data: _**notifications = 0**_.

Day 6: $\mbox{trailing expenditure} = [2, 3, 4, 2, 3]$,  $median = 3$, and $\mbox{expenditure on the 6-th day} = 6$. Since $6 \geq 2 \times \; median$, there will be a notice:  _**notifications = 0 + 1 = 1**_. 

Day 7: $\mbox{trailing expenditure} = [3, 4, 2, 3, 6]$,  $median = 3$, and $\mbox{expenditure on the 7-th day} = 8$. Since $8 \geq 2 \times \; median$, there will be a notice:  _**notifications = 1 + 1 = 2**_. 

Day 8: $\mbox{trailing expenditure} = [4, 2, 3, 6, 8]$,  $median = 4$, and $\mbox{expenditure on the 8-th day} = 4$. Since $4 < 2 \times \; median$, there will be no notice:  _**notifications = 2 + 0 = 2**_. 

Day 9: $\mbox{trailing expenditure} = [2, 3, 6, 8, 4]$,  $median = 4$, and $\mbox{expenditure on the 9-th day} = 5$. Since $5 \geq 2 \times \; median$, there will be no notice:  _**notifications = 2 + 0 = 2**_. 

---

### Illustration 2
```Python3
# fan2.txt
# 5 4
# 1 2 3 4 4

activityNotification("fan2.txt") 
>>> 0
```

**EXPLANATION:**

We must determine the total number of _**notifications**_ the client receives over a period of $n = 5$ days. 

For the first $d = 4$ days, the customer receives no notifications because the bank has insufficient transaction data: _**notifications = 0**_.

Day 5: $\mbox{trailing expenditure} = [1, 2, 3, 4]$,  $median = 2.5$, and $\mbox{expenditure on the 5-th day} = 4$. Since $4 < 2 \times \; median$, there will be no notice:  _**notifications = 0 + 0 = 0**_. 

**Note for tutors:**

This problem will require that the students do proper planning before implementing anything. Unit 5 will be on problem solving and program planning, but I want to use this task to identify their weak points in this regard. My solution terminates in less than 3 seconds on the large test cases. If any student writes a solution that takes longer than 30 seconds (considering computer hardware), encourage them to improve their implementation. This is where the theoretical analysis of complexity comes in, any implementation short of $O(nd)$ time in total will definitely be slow. 

**Hints** you can give them include:
* consider using counting sort
* how can you find median at each iteration without using the sort function everytime
* think of how to maintain bucket returned from counting sort in constant time
* how can you find median of a list based on bucket returned from counting sort, without the overhead of sorting again.

Below is my program plan, you should read this to understand my solution.

1. Read in the file, process it and return a tuple of integer ($n$), integer ($d$) and list of integers (expenditures).

2. In `activityNotifcations`, initialise final result **notifications := 0**

3. Sorting will need to take place, since median is involved. Range of integer is between $0$ and $200$, so I will implement counting sort. Write a function countSort, it takes $O(n+200) = O(n)$ time. 

4. Find first median using bucket created from counting sort above, process this and decide if **notifications** gets incremented or not ($O(d)$ time). 

5. Find remaining $n - d - 1$ median and continue as in 4. I know that everytime I need to find the `next median`, integer at the head of previous list is removed and a new integer is appended to its tail to form the next list. For instance in `fan1.txt`, list for day 6 is [2, 3, 4, 2, 3] and list for day 7 is [3, 4, 2, 3, 6]. So write a function `updateBucket` to handle this removal and insertion in constant time.

In [21]:
# read file
# return n, d and expenditures
def read_file(filename):
    with open(filename) as f:
        # .read().splitlines() gets rid of whitespace characters
        # without having to use .strip()        
        f = f.read().splitlines() # fan1.txt = ['9 5', '2 3 4 2 3 6 8 4 5']
        n, d = list(map(int, f[0].split()))
        expenditures = list(map(int, f[1].split()))
    return n, d, expenditures
        
read_file("fan1.txt")
        

(9, 5, [2, 3, 4, 2, 3, 6, 8, 4, 5])

In [None]:
def countSort(expenditures):
    # countSort takes O(len(expenditures) + 200) time
    
    # range of expenditures to be sorted
    # for us to obtain median is between 0 and 200
    bucket = [0 for _ in range(201)]
    
    for elt in expenditures:
        bucket[elt] += 1
    return bucket

def updateBucket(bucket, remove, insert):
    # updateBucket takes O(1) time
    bucket[remove] -= 1
    bucket[insert] += 1
    return bucket

In [None]:
# for fan1.txt, d = 5 (odd)
# expenditure[:5] = lst = [2, 3, 4, 2, 3], bucket[:5] = [0, 0, 2, 2, 1]
# index 0 has freq 0
# index 1 has freq 0
# index 2 has freq 2
# index 3 has freq 2
# index 4 has freq 1


# the index of median element after we sort lst is d//2 = 2
# if we sort [2, 3, 4, 2, 3], we obtain [2, 2, 3, 3, 4]
# Thus our median is 3.

# Instead of sorting, we use bucket to find our median.
# However, because of how counting sort works, our median is that
# particular index for which total freq from index 0 >= (d//2 + 1)

def find_median(bucket, position):
    # position is never greater than d//2 
    # so find_median terminates in less than O(d) time
    
    position += 1 # increment to match bucket's indexing style
    median = -1
    # keep subtracting the frequency of elements in bucket from position
    # the index for which position becomes 0 is the median element
    while position > 0:
        median += 1
        position -= bucket[median]
    return median

In [46]:
def medianPositon(bucket, d):
    # medianPositon uses find_median, which takes O(d) time
    
    # we want to find the median element using bucket from countSort
    if d%2 == 0: # if d is even
        # pos1 and pos2 will store position of the two median values
        pos1, pos2 = (d//2)-1, d//2
        # I am not dividing my return value by 2 here
        # instead I will not multiply by 2 in line 26
        return find_median(bucket, pos1) + find_median(bucket, pos2)
    # otherwise d is odd
    return find_median(bucket, d//2)
    

def activityNotifications(filename):
    # step 1: read in the file
    n, d, expenditure = read_file(filename) # time for this is excluded
    # step 2
    notifications = 0
    
    # initial expenditures needed to find first median is up to d days
    initial_bucket = countSort(expenditure[:d]) # O(d + 200) = O(d)
    
    for idx, elt in enumerate(expenditure[d:]): # O(n - d)
        # if d is even and elt >= 2 x (median1 + median2) / 2
        if d%2 == 0 and elt >= medianPositon(initial_bucket, d): # << O(d)
            notifications += 1
        # if d is odd and elt >= 2 x median
        elif elt >= 2*medianPositon(initial_bucket, d): # << O(d)
            notifications += 1
            
        # in bucket, decrement the freq of elt at head of list by 1 
        # increment the freq of new elt by 1
        # this is how we keep maintaining bucket in constant time
        # to find next median, without having to sort everytime
        initial_bucket = updateBucket(initial_bucket, expenditure[idx], elt)
    return notifications

# total complexity is
# O(d) + [ O(n - d) * O(d)] 
# = O(d) + O(nd) - O(d^2)
# = O(nd)


In [47]:
# these are small test cases
with tick():
    assert activityNotifications("fan1.txt") == 2
    assert activityNotifications("fan2.txt") == 0    
    assert activityNotifications("fan3.txt") == 1

In [None]:
# the next three cells are large test cases
# my solution terminated in less than 3 seconds
# if your overall complexity is more than O(nd), 
# I can't guarantee how long it will take to terminate on your computer

In [48]:
with tick():
    assert activityNotifications("fan4.txt") == 633

In [49]:
with tick():
    assert activityNotifications("fan5.txt") == 770

In [50]:
with tick():
    assert activityNotifications("fan6.txt") == 926