### Problem:

British Columbia publishes many different sets of information via [DataBC](http://www2.gov.bc.ca/gov/content/governments/about-the-bc-government/databc). One such [set of information](https://catalogue.data.gov.bc.ca/dataset/bc-surgical-wait-times) lists the median (50th percentile) and 90th percentile wait times for surgeries in different hospitals around BC as well as province-wide wait times. Design an analysis program to analyze the information in [surgical_wait_times_annual.csv](surgical_wait_times_annual.csv) to determine which type of surgery has the longest median wait time in 2015/2016.

Since this is a How to Design Analysis Programs problem, I will start by opening the [How to Design Analysis Programs recipe page](https://canvas.ubc.ca/courses/9377/pages/how-to-design-analysis-programs).

Now I'll work through the steps in the recipe, one by one. This is a big problem with many steps, but if I work through the recipe and focus on one step at a time I won't be overwhelmed by the size of the problem. I'll be able to design correct, well-structured solutions to each of the subproblems (e.g. the program planning, each data definition, each function design) and then put them all together to create a well-structured solution to the overall problem.

### Step 1a: Identify the information in the file your program will read

When I open [surgical_wait_times_annual.csv](surgical_wait_times_annual.csv), I see that it contains many rows of information about surgical wait times. Each row contains
- the fiscal year
- the health authority
- the hospital
- the procedure group
- a number for waiting
- a number for completed
- a number for 50th percentile
- a number for 90th percentile

Unfortunately, no units are specified in the column headers which makes it impossible to know exactly what information these numbers represent. This is a common problem with open data; the files are shared, which is great, but they are not always in a user-friendly format. Sometimes you can find more information about what is in the file on the associated webpage. In this case, I can't find any more information. If it was important to me to understand exactly what information these numbers represent, I'd have to try to contact someone at DataBC to find out. 

Given the problem I'm trying to solve, I don't really need to understand what the units are for these numbers so I can proceed. I want to figure out what surgery has the longest median wait-time in 2015/2016. I assume that the wait-times are given in months, but even if they are in weeks or years (or some other units), I can solve this problem without knowing what the units are.

### Step 1b: Write a description of what your program will produce

I don't need to brainstorm different possible outputs because the problem told me exactly what to produce. If I was working on an open-ended problem (e.g. if the problem just said 'do some kind of analysis with the information'), I would need to brainstorm many possible outputs and then decide which of those outputs I want to focus on.

For the output that was given in the problem statement, I want to output the name of the surgery group that has the longest median wait-time in 2015/2016. There are many different Health Authorities and hospitals listed, so I need to decide more precisely what I will output. I will output the name of the surgery group that has the longest median wait time for "All Health Authorities" in 2015/2016.

### Step 1c: Write or draw examples of what your program will produce

```python
expect(main("surgical_wait_times_annual.csv"), "Knee Replacement")
```

### Step 2a: Design data definitions

We've finished the planning so now we need to design our data definitions. The first step is to think about which pieces of the available information I need to store in my data definitions. I want to store just enough so that I an solve the problem I'm trying to solve. I know that I need the surgery group and the 50th percentile wait times, but I will also need the fiscal year and the Health Authority. I will need to use a compound for each surgery group and I will need an arbitrary number of surgery groups.

First, I need to design the compound data definition. Then, I can use it to design the list data definition. I won't describe all of these steps, but you can go back to the HtDD, compound, or arbitrary-sized worked examples if you'd like more commentary on how to design data definitions.

In [1]:
from typing import NamedTuple, List, Optional
from cs103 import *

SurgeryWaitTime = NamedTuple ('SurgeryWaitTime', [('surgery_group', str), 
                                                  ('health_authority', str), 
                                                  ('fiscal_year', str), 
                                                  ('median_wait', float)]) # in range[0, ...)
# interp. a surgery wait time record with the surgery group, health authority, 
#         fiscal year and median wait time

SWT1 = SurgeryWaitTime('Abdominoplasty', 'All Health Authorities', '2009/2010', 9.1)
SWT2 = SurgeryWaitTime('Bladder Surgery', 'All Health Authorities', '2009/2010', 4.6)

@typecheck
def fn_for_surgery_wait_time(swt: SurgeryWaitTime) -> ...: # template based on compound 
    return ...(swt.surgery_group,
               swt.health_authority,
               swt.fiscal_year,
               swt.median_wait)

# List[SurgeryWaitTime]
# interp. a list of surgery wait times

L0 = []
L1 = [SWT1, SWT2]

@typecheck
def fn_for_loswt(loswt: List[SurgeryWaitTime]) -> ...: # template based on arbitrary-sized 
    # description of the acc                           # and the reference rule 
    acc = ... # type: ...
    for swt in loswt:
        acc = ...(acc, fn_for_surgery_wait_time(swt))
    return ...(acc)

### Step 2b: Design a function to read the information and store it as data in your program

In order to work on the read function, I'm going to copy the How to Design Analysis Programs template from the recipe page. The template has space for the data definitions, which I've already done so I will delete that section from the template. I'll also delete the imports that I've already done above (for List, NamedTuple and cs103). (Often, I copy the template before I start Step 2a and then I don't need to delete anything as I just design the data definitions in the appropriate part of the template.)

I'll complete the main function first and then work on the read function.

The only things I need to update in the main function are the signature and purpose and examples. I have one example above (from step 1c), so I'll use that. 

For now, I will complete the stub for the analyze function so that I can run my code as I work on the read function. Note that the code still doesn't run at this point because read refers to Consumed, but I haven't yet updated it to refer to the actual type that we're consuming (SurgeryWaitTime). So, I need to update the signature for read.

Now I need to work on some examples for read. The information file is very large, so I can't test that all the lines are read correctly. (Otherwise I'd be checking over 17000 surgery wait times!) So, I will create two small test files that I can use to test read.

I also need to update the type of the accumulator, and the type that we create in the body of the for loop. These types need to be consistent. I'll also update the names of the variables `c` and `loc` to `swt` and `loswt`.

Finally, I need to figure out which columns I need to read from the file. I will look back at my data definition and see that I need (in this order) the name of the surgery group, the health authority, the fiscal year, and the median wait time. The first three are represented as strings, so I don't need to parse them further. The median wait time is represented as a float, so I need to be able to convert the string that I'll get from the csv reader into a float. I'll use our function [parse_float](https://canvas.ubc.ca/courses/9377/pages/functions-for-converting-strings-to-other-types-module-7) to do that.

Normally, I'd just keep editing this template until the program is complete, but for this example I'll leave a copy of the program in this state (before I've started working on analyze) for you to see. This code immediately below is in a Markdown cell so you can't run it, but if you wanted to run it you could copy it to Jupyter and delete the first and last lines. (Those lines mark a Python code block in Markdown.)

```python
import csv

###########
# Functions

@typecheck
def main(filename: str) -> str:
    """
    Reads information from given filename and returns the name of the surgery 
    group that has the longest median wait time in 2015/2016
    """
    # Template from HtDAP, based on function composition
    return analyze(read(filename))


@typecheck
def read(filename: str) -> List[SurgeryWaitTime]:
    """    
    reads information from the specified file and returns a list of surgery 
    wait times
    """
    #return []  #stub
    # Template from HtDAP
    # loswt contains the result so far
    loswt = [] # type: List[SurgeryWaitTime]

    with open(filename) as csvfile:
        
        reader = csv.reader(csvfile)
        next(reader) # skip header line

        for row in reader:
            swt = SurgeryWaitTime(row[3], row[1], row[0], parse_float(row[6]))
            loswt.append(swt)
    
    return loswt


@typecheck
def analyze(loswt: List[SurgeryWaitTime]) -> str:
    """
    return the name of the surgery group that has the longest median wait 
    time for all health authorities in 2015/2016
    """
    return ""

# Begin testing
start_testing()

# examples and tests for read
expect(read("surgical_wait_times_annual_test1.csv"),  
       [SurgeryWaitTime('Vascular Surgery - Other', 'WEST COAST GENERAL HOSPITAL', '2015/2016', 0.9),
        SurgeryWaitTime('Wound/Laceration Care', 'WEST COAST GENERAL HOSPITAL', '2015/2016', 3.6)])

expect(read("surgical_wait_times_annual_test2.csv"),  
      [SurgeryWaitTime('Abdominoplasty', 'All Facilities', '2009/2010', 9.1),
       SurgeryWaitTime('Aortic Aneurysm Repair', 'All Facilities', '2009/2010', 4.1)])

# examples and tests for main
expect(main("surgical_wait_times_annual.csv"), "Knee Replacement")

# show testing summary
summary()
```

### Step 2c: Design functions to analyze the data

Now I need to finish designing the analyze function. I already started it, but I only completed the stub. The next step is to write some examples. I'll start by using some of the output from my read examples as input for my analyze function. I know that my analyze function needs to take output from my read function because of the way we've composed analyze and read in the main function.

I'll start with the list of surgery wait times that is returned when test file 1 is read. This analyze example is fairly straightforward; I can look at the two entries in the list and see that Wound/Laceration Care has the longer median wait time.

Now I'll copy the list of surgery wait times that is returned when test file 2 is read. Uh oh! None of these times are from 2015/2016 but that wasn't something I had thought about before. I need to consider this case. What should the analyze function return if none of the wait times are from 2015/2016? Since there are no times to examine to consider which has the longest wait time, I think this function should return `None` in this case. I need to edit my signature/purpose to reflect this update.

Note: I thought about the analyze function carefully before, but I didn't realize that it might need to return None until I was working on the examples. It's fairly common to have to make changes to the design as you go. It's important to make sure that the changes are applied consistently across the whole program. For this program, I also need to change the return type, purpose, and examples for main as it may now return `None`.

Back to analyze - now I need to complete the template. In order to solve this problem, I need to do three distinct operations. First, I need to filter the list of input so that it only includes surgery wait times from 2015/2016. Then, I need to filter that list so that it only includes entries with health authority 'All Health Authorities'. Finally, I need to search through that list to find the longest median wait time. Since there are three distinct operations, I will template this as a function composition.

Now I need to design those three helper functions. I won't comment on those, but the full solution is below.

In the solution, note that all of the examples/tests are organized neatly at the bottom of the program. The examples/tests are in the same order as the functions.

In [2]:
import csv

###########
# Functions

@typecheck
def main(filename: str) -> Optional[str]:
    """
    Reads information from given filename and returns the name of the surgery group 
    that has the longest median wait time in 2015/2016 or None if none of the times listed are from 2015/2016
    """
    # Template from HtDAP, based on function composition
    return analyze(read(filename))


@typecheck
def read(filename: str) -> List[SurgeryWaitTime]:
    """    
    reads information from the specified file and returns a list of surgery wait times
    """
    #return []  #stub
    # Template from HtDAP
    # loswt contains the result so far
    loswt = [] # type: List[SurgeryWaitTime]

    with open(filename) as csvfile:
        
        reader = csv.reader(csvfile)
        next(reader) # skip header line

        for row in reader:
            swt = SurgeryWaitTime(row[3], row[1], row[0], parse_float(row[6]))
            loswt.append(swt)
    
    return loswt

@typecheck
def analyze(loswt: List[SurgeryWaitTime]) -> Optional[str]:
    """
    return the name of the surgery group that has the longest median wait time for 
    all health authorities in 2015/2016, or None if none of the entries in loswt are 
    from 2015/2016
    """
    #return "" #stub
    # template based on function composition
    return longest_median_wait_time(only_all_authorities(only_2015_2016(loswt)))

@typecheck
def only_2015_2016(loswt:List[SurgeryWaitTime]) -> List[SurgeryWaitTime]:
    """
    returns a list containing the surgery wait times from loswt that are in 2015/2016
    """
    #return [] #stub
    # template from List[SurgeryWaitTIme]
    # acc contains the result so far
    acc = [] # type: List[SurgeryWaitTime]
    for swt in loswt:
        if in_2015_2016(swt):
            acc.append(swt)
    return acc

@typecheck
def in_2015_2016(swt: SurgeryWaitTime) -> bool:
    """
    return True if swt is in 2015/2016
    """
    # return False #stub
    # template from SurgeryWaitTime
    return swt.fiscal_year == '2015/2016'


@typecheck
def only_all_authorities(loswt: List[SurgeryWaitTime]) -> List[SurgeryWaitTime]:
    """
    returns a list containing the surgery wait times from loswt that have 
    health authority 'All Health Authorities'
    """
    #return [] #stub
    # template from List[SurgeryWaitTime]
    # acc contains the result so far
    acc = [] # type: List[SurgeryWaitTime]
    for swt in loswt:
        if all_authorities(swt):
            acc.append(swt)
    return acc

@typecheck
def all_authorities(swt: SurgeryWaitTime) -> bool:
    """
    return True if swt's health authority is 'All Health Authorities'
    """
    # return False #stub
    # template from SurgeryWaitTime
    return swt.health_authority == 'All Health Authorities'


@typecheck
def longest_median_wait_time(loswt: List[SurgeryWaitTime]) -> Optional[str]:
    """ 
    return the name of the surgery group that has the longest median wait time in 2015/2016, or None
    if loswt is empty
    """
    #return "" #stub
    # template from List[SurgeryWaitTime]
    if len(loswt) == 0: 
        return None
    
    # longest_wait contains the SurgeryWaitTime from loswt that has the longest median wait time 
    # of the entries seen so far
    longest_wait = loswt[0] # type: SurgeryWaitTime
    for swt in loswt:
        if swt.median_wait > longest_wait.median_wait:
            longest_wait = swt
    return longest_wait.surgery_group





# Begin testing
start_testing()

# examples and tests for main
expect(main("surgical_wait_times_annual.csv"), "Knee Replacement")
expect(main("surgical_wait_times_annual_test2.csv"), None)

# examples and tests for read
expect(read("surgical_wait_times_annual_test1.csv"),  
       [SurgeryWaitTime('Vascular Surgery - Other', 'Vancouver Island', '2015/2016', 0.9),
        SurgeryWaitTime('Wound/Laceration Care', 'Vancouver Island', '2015/2016', 3.6)])
expect(read("surgical_wait_times_annual_test2.csv"),  
       [SurgeryWaitTime('Abdominoplasty', 'All Health Authorities', '2009/2010', 9.1),
        SurgeryWaitTime('Aortic Aneurysm Repair', 'All Health Authorities', '2009/2010', 4.1)])
# examples and tests for analyze
expect(analyze([SurgeryWaitTime('Vascular Surgery - Other', 'Vancouver Island', '2015/2016', 0.9),
                SurgeryWaitTime('Wound/Laceration Care', 'Vancouver Island', '2015/2016', 3.6)]),
       None)
expect(analyze([SurgeryWaitTime('Vascular Surgery - Other', 'All Health Authorities', '2015/2016', 0.9),
                SurgeryWaitTime('Wound/Laceration Care', 'All Health Authorities', '2015/2016', 3.6)]),
       'Wound/Laceration Care')
expect(analyze([SurgeryWaitTime('Abdominoplasty', 'All Health Authorities', '2009/2010', 9.1),
                SurgeryWaitTime('Aortic Aneurysm Repair', 'All Health Authorities', '2009/2010', 4.1)]),
       None)

# examples and tests for only_2015_2016
expect(only_2015_2016([SurgeryWaitTime('Abdominoplasty', 'All Health Authorities', '2009/2010', 9.1),
                       SurgeryWaitTime('Aortic Aneurysm Repair', 'All Health Authorities', '2009/2010', 4.1)]),
       [])
expect(only_2015_2016([SurgeryWaitTime('Abdominoplasty', 'All Health Authorities', '2009/2010', 9.1),
                       SurgeryWaitTime('Aortic Aneurysm Repair', 'All Health Authorities', '2009/2010', 4.1),
                       SurgeryWaitTime('Vascular Surgery - Other', 'Vancouver Island', '2015/2016', 0.9),
                       SurgeryWaitTime('Wound/Laceration Care', 'Vancouver Island', '2015/2016', 3.6)]),
       [SurgeryWaitTime('Vascular Surgery - Other', 'Vancouver Island', '2015/2016', 0.9),
        SurgeryWaitTime('Wound/Laceration Care', 'Vancouver Island', '2015/2016', 3.6)])
expect(only_2015_2016([SurgeryWaitTime('Vascular Surgery - Other', 'Vancouver Island', '2015/2016', 0.9),
                       SurgeryWaitTime('Wound/Laceration Care', 'Vancouver Island', '2015/2016', 3.6)]),
       [SurgeryWaitTime('Vascular Surgery - Other', 'Vancouver Island', '2015/2016', 0.9),
        SurgeryWaitTime('Wound/Laceration Care', 'Vancouver Island', '2015/2016', 3.6)])

# examples and tests for in_2015_2016
expect(in_2015_2016(SurgeryWaitTime('Vascular Surgery - Other', 'Vancouver Island', '2015/2016', 0.9)), True)
expect(in_2015_2016(SurgeryWaitTime('Abdominoplasty', 'All Health Authorities', '2009/2010', 9.1)), False)

# examples and tests for only_all_authorities
expect(only_all_authorities([]), [])
expect(only_all_authorities([SurgeryWaitTime('Abdominoplasty', 'All Health Authorities', '2009/2010', 9.1),
                             SurgeryWaitTime('Vascular Surgery - Other', 'Vancouver Island', '2015/2016', 0.9)]), 
       [SurgeryWaitTime('Abdominoplasty', 'All Health Authorities', '2009/2010', 9.1)])

# examples and tests for all_authorities
expect(all_authorities(SurgeryWaitTime('Abdominoplasty', 'All Health Authorities', '2009/2010', 9.1)), True)
expect(all_authorities(SurgeryWaitTime('Vascular Surgery - Other', 'Vancouver Island', '2015/2016', 0.9)), False)

# examples and tests for longest_median_wait_time
expect(longest_median_wait_time([]), None)
expect(longest_median_wait_time([SurgeryWaitTime('Abdominoplasty', 'All Health Authorities', '2009/2010', 9.1),
                                 SurgeryWaitTime('Aortic Aneurysm Repair', 'All Health Authorities', '2009/2010', 4.1)]),
       'Abdominoplasty')

# show testing summary
summary()

[92m18 of 18 tests passed[0m
