## 1-minute introduction to Jupyter ##

A Jupyter notebook consists of cells. Each cell contains either text or code.

A text cell will not have any text to the left of the cell. A code cell has `In [ ]:` to the left of the cell.

If the cell contains code, you can edit it. Press <kbd>Enter</kbd> to edit the selected cell. While editing the code, press <kbd>Enter</kbd> to create a new line, or <kbd>Shift</kbd>+<kbd>Enter</kbd> to run the code. If you are not editing the code, select a cell and press <kbd>Ctrl</kbd>+<kbd>Enter</kbd> to run the code.

Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = ""
COLLABORATORS = ""

---

# Assignment 5: Functions and File IO

In this assignment, you should write your code in a **readable** way, and **modularise** chunks of code that would otherwise be repeated.

Modularising a section of program code means to reorganise it in a way that allows it to be reused in another part of the code easily, usually in the form of a function.

Your function definitions should have **appropriate docstrings**.

## Part 1

**Refactor** your code from [Assignment 1](assignment_01.ipynb) to produce a **function** `to_hms(seconds)` that:

1. takes in an integer argument `seconds` containing the number of seconds,
2. validates the variable `seconds`,
3. returns a list containing the number of hours, minutes, and seconds if the input is valid,
4. prints an error message if the input is invalid.


### Example output

```
>>> to_hms(10)
[0,0,10]
>>> to_hms(61)
[0,1,1]
>>> to_hms(7199)
[1,59,59]
>>> to_hms('10')
Unsupported input type.
```

In [None]:
def to_hms(seconds):
    '''
    Converts seconds to hours, minutes, and seconds, and returns it as a list.
    
    Example:
    >>> to_hms(10)
    [0,0,10]
    >>> to_hms(61)
    [0,1,1]
    >>> to_hms(7199)
    [1,59,59]
    '''
# Type your code below


In [None]:
to_hms('10')

In [None]:
# Cell for manual tests; ignore this cell


In [None]:
# Autograding tests for docstring (hidden); ignore this cell

In [None]:
# Autograding tests for return type
# If your code works, this cell should produce no output when run
for test_val in (10,61,7199):
    result = to_hms(test_val)
    assert type(result) == list, 'Wrong return value type.'
    if type(result) == list:
        for element in result:
            assert type(element) == int, 'Wrong value type in return list.'

In [None]:
# Autograding tests for output
# If your code works, this cell should produce no output when run
assert to_hms(10) == [0,0,10], 'Wrong return value.'
assert to_hms(61) == [0,1,1], 'Wrong return value.'
assert to_hms(7199) == [1,59,59], 'Wrong return value.'

In [None]:
# Autograding tests for error message
# If your code works, this cell should produce no output when run
import sys
from io import StringIO

capture = StringIO()
temp = sys.stdout
sys.stdout = capture
to_hms('10')
sys.stdout = temp
assert capture.getvalue() == 'Unsupported input type.\n', 'Wrong error message'

## Part 2

Write a **function** `to_hms(seconds,output)` that:

1. takes in an integer variable `seconds` containing the number of seconds,  
   and a string variable `output` indicating the type of output required,
2. validates the variable `seconds`,
3. if `output == 'list'`, returns a list containing the number of hours, minutes, and seconds if the input is valid,
4. or if `output == 'string'`, returns a string with the number of hours, minutes, and seconds, formatted appropriately,
5. prints an error message if either input is invalid and exits the program,

**Example**

```
>>> to_hms(7199,'list')
[1,59,59]
>>> to_hms(7199,'string')
'1 hour, 59 minutes, 59 seconds'
>>> to_hms(61,'string')
'1 minute, 1 second'
>>> to_hms('10')
Unsupported input type.
>>> to_hms(61,'str')
Unsupported output type.
```

In [None]:
def to_hms(seconds,output):
# Type your code below


In [None]:
# Cell for manual tests; ignore this cell


In [None]:
# Hidden test; ignore this cell

In [None]:
# Autograding test for return type
# If your code works, this cell should produce no output when run
for test_val,test_type in ((7199,'list'),(7199,'string'),(61,'string')):
    result = to_hms(test_val,test_type)
    if test_type == 'list':
        assert type(result) == list, 'Wrong return value type.'
    elif test_type == 'string':
        assert type(result) == str, 'Wrong return value type.'

In [None]:
# Autograding test for return value (list)
# If your code works, this cell should produce no output when run
assert to_hms(7199,'list') == [1,59,59], 'Wrong return value.'
assert to_hms(10,'list') == [0,0,10], 'Wrong return value.'
assert to_hms(61,'list') == [0,1,1], 'Wrong return value.'


In [None]:
# Autograding test for return value (string)
# If your code works, this cell should produce no output when run
assert to_hms(7199,'string') == '1 hour, 59 minutes, 59 seconds', 'Wrong return value.'
assert to_hms(10,'string') == '10 seconds', 'Wrong return value.'
assert to_hms(61,'string') == '1 minute, 1 second', 'Wrong return value.'

In [None]:
# Autograding test for error message (input string)
# If your code works, this cell should produce no output when run
import sys
from io import StringIO

capture = StringIO()
temp = sys.stdout
sys.stdout = capture
to_hms('10','string')
sys.stdout = temp
assert capture.getvalue() == 'Unsupported input type.\n', 'Wrong error message'

In [None]:
# Autograding test for error message (output str)
# If your code works, this cell should produce no output when run
import sys
from io import StringIO

capture = StringIO()
temp = sys.stdout
sys.stdout = capture
to_hms(61,'str')
sys.stdout = temp
assert capture.getvalue() == 'Unsupported output type.\n', 'Wrong error message'

## Part 3

Implement the two procedures from Assignment 3 Part 3 as **functions**.

### Task 1:  Implement `round_dp()` to round half up to `n` decimal places

Write a function, `round_dp(num_str,n)`, that takes in a numerical string `num_str` and an integer `n`, and rounds `num_str` to `n` decimal places, and returns it as a `string`.

**Hint:** You may find it useful to implement Assignment 2 Part 2 as a function, `get_dp(num_str)` if you need to check the number of decimal places in the input value.

**Example**

    >>> round_dp('56.789',2)
    56.79

**Suggested procedure**

A rounding check is done by checking the **digit after the last significant digit** to see if it is greater than or equal to 5.

    round_dp('56.789',2):
    
    digit after last significant digit: 9
    number up to last significant digit: 56.78
    
It is easier to round the number by removing the decimal first and treating it as an integer, then adding the decimal again.

    number up to last significant digit, without decimal: 5678
    rounded up: 5679
    number up to last significant digit, decimal added back: 56.79

In [None]:
# Type your code here



round_dp('56.789',2)

In [None]:
# Autograding test for return type
# If your code works, this cell should produce no output when run
for n in range(4):
    assert type(round_dp('56.789',n)) == str, 'Wrong return type.'

In [None]:
# Autograding test for return value
# If your code works, this cell should produce no output when run
test_cases = ((0,'23'),
              (1,'23.5'),
              (2,'23.46'),
              (3,'23.456'),
              (4,'23.456'))

for n,result in test_cases:
    assert round_dp('23.456',n) == result, f"Wrong value returned for round_dp('23.456',{n}). Should be {result}, got {round_dp('23.456',n)}"

### Task 2:  Implement `round_sf()` to round half up to `n` significant figures

Write a function, `round_sf(num,n)`, that takes in a numerical string `num`, rounds it to `n` significant figures, and returns it as a `string`.

**Hint:** It will be useful to implement Assignment 2 Part 1 as a function, `get_sf(num)` if you need to check the number of significant figures in the input value.

**Example**

    >>> round_sf('56.789',3)
    56.8
    >>> round_sf('56.789',1)
    60
    >>> round_sf('56789',3)
    56800

**Suggested procedure**

A rounding check is done by checking the **digit after the last significant digit** to see if it is greater than or equal to 5.

    round_sf('56.789',3):
    
    digit after last significant digit: 8
    number up to last significant digit: 56.7
    
It is easier to round the number by removing the decimal first and treating it as an integer, then adding the decimal again.

    number up to last significant digit, without decimal: 567
    rounded up: 568
    number up to last significant digit, decimal added back: 56.8

If there is no decimal, the number should be padded up to the ones place.

    round_sf('56789',3):
    
    digit after last significant digit: 8
    number up to last significant digit: 567
    rounded up: 568
    number up to last significant digit: zero-padded: 56800

In [None]:
# Type your code here



print(round_sf('56.789',3))
print(round_sf('56.789',1))
print(round_sf('56789',3))

In [None]:
# Autograding test for return value
# If your code works, this cell should produce no output when run
for n in range(1,6):
    assert type(round_sf('23.456',n)) == str, 'Wrong return type.'

In [None]:
# Autograding test for return value
# If your code works, this cell should produce no output when run
test_cases = ((1,'20'),
              (2,'23'),
              (3,'23.5'),
              (4,'23.46'),
              (5,'23.456'),
              (6,'23.456'))

for n,result in test_cases:
    assert round_sf('23.456',n) == result, f"Wrong value returned for round_dp('23.456',{n})."

## Part 4: Read data from a CSV file and edit it

The file `pre-u-enrolment-by-age.csv` contains data on pre-university enrolment numbers, sorted by year, age, and gender.

### Task 1: Read the data into a nested list

Define a function, `read_csv(filename)`, that will read in CSV data stored in `filename` and return:

- a **list** `header` containing the column labels (from the first row)
- a **nested list** `data` containing the data.

In [None]:
def read_csv(filename):
# Type your code below


    return header,data

In [None]:
# Not an autograding test!
# Use this code snippet to check if your read_csv() function is working correctly
header,data = read_csv('pre-u-enrolment-by-age.csv')
print(header)
for row in data:
    for idx in range(len(header)):
        print(f'{header[idx]}: {row[idx]}')

In [None]:
# Autograding test for return type
# If your code works, this cell should produce no output when run
header,data = read_csv('pre-u-enrolment-by-age.csv')

assert type(header) == list, 'Wrong header type'
assert type(data) == list, 'Wrong data type'
for i,row in enumerate(data):
    assert type(row) == list, f'Wrong data type for row {i}'

In [None]:
# Autograding test for return value
# If your code works, this cell should produce no output when run
def ans_read_csv(filename):
    data = []
    with open(filename,'r') as f:
        for line in f:
            row = line.strip().split(',')
            data.append(row)
    header = data.pop(0)
    return header,data

header,data = read_csv('pre-u-enrolment-by-age.csv')
ans_header,ans_data = ans_read_csv('pre-u-enrolment-by-age.csv')

assert header == ans_header, 'Wrong header'
for i,row in enumerate(ans_data):
    assert data[i] == row, f'Wrong data in row {i}'

If you run the cells above first, the variables `header` and `data` will continue to be accessible to code that is run in subsequent cells below.

### Task 2: Filter the data

Define a function, `filter_gender(data,sex)`, that will:

1. Only keep rows of `data` where the value in the `sex` column is 'MF'
   (We are only keeping data for total enrolment, not split by gender.)
2. Remove the `sex` column.

You should end up with data for three columns: `year`, `age`, and `enrolment_jc`.

### Sample output: new_data

    >>> new_data = filter_gender(data,'MF')
    >>> new_data
    [[1984,'17 YRS',8710],
     [1984,'18 YRS',3927],
     [...],
     [...],
     ...]

Write your code in the code cell below. Do not overwrite the original nested list, `data`.  
(This is also good practice in data science so that you preserve the original data.)

You may use the header indexes directly without using `index()`.

In [None]:
def filter_gender(data,sex):
# Type your code below

    return new_data

new_data = filter_gender(data,'MF')

In [None]:
# Not an autograding test!
# You can write code here to test the value of new_data

new_data

In [None]:
# Autograding test for return value
# If your code works, this cell should produce no output when run
def ans_filter_gender(data,sex):
    new_data = []
    for this_row in data:
        if this_row[2] == 'MF':
            new_row = [this_row[0],this_row[1],this_row[3]]
            new_data.append(new_row)
        return new_data

new_data = filter_gender(data,'MF')
ans_new_data = ans_filter_gender(data,'MF')
for i,row in enumerate(ans_new_data):
    assert new_data[i] == row, 'Error in new_data for row {i}'

### Task 3: Sum up enrolment by year

Define a function, `sum_by_year(new_data)`, that will:

1. Add up the total enrolment for each year, regardless of age
2. Remove the `age` column.

You should end up with data for two columns: `year` and `total_enrolment`.

### Sample output: new2_data

    >>> new2_data = sum_by_year(new_data)
    >>> new2_data
    [[1984,21471],
     [1985,24699],
     [...],
     [...],
     ...]

Write your code in the code cell below. Do not overwrite the previous nested lists, `data` and `new_data`.  

In [None]:
def sum_by_year(new_data):
    '''
    [ENTER YOUR DOCSTRING HERE]
    '''
    new2_data = []
    # Type your code below
    
    return new2_data

new2_data = sum_by_year(new_data)

In [None]:
# Not an autograding test!
# You can write code here to test the value of new2_data

new2_data

In [None]:
# Autograding test (hidden); ignore this cell

### Task 4: Write the data to a CSV file

Define a function, `write_csv(filename,header,data)`, that will write `header` and `data` to `filename` in CSV format and return the number of lines written. Any existing data in `filename` should be erased.

Write your data to a CSV file, `total-enrolment-by-year.csv`, in the same format as the original file.

### Sample output

    >>> filename = 'total-enrolment-by-year.csv'
    >>> new2_header = ['year','total_enrolment']
    >>> write_csv(filename,new2_header,new2_data)
    35

In [None]:
def write_csv(filename,header,data):
    # Type your code below
    
    return row_count

In [None]:
# Not an autograding test!
# You can write code here to test the value of new2_data

filename = 'total-enrolment-by-year.csv'
new2_header = ['year','total_enrolment']
write_csv(filename,new2_header,new2_data)

In [None]:
# Autograding test for data writing
# If your code works, this cell should produce no output when run
new2_header = ['year','total_enrolment']
filename = 'total-enrolment-by-year.csv'
rows_written = write_csv(filename,new2_header,new2_data)

assert rows_written == len(new2_data)+1, 'Wrong number of rows written'

In [None]:
# Autograding test for return value
# If your code works, this cell should produce no output when run
filename = 'total-enrolment-by-year.csv'
with open(filename,'r') as f:
    header = f.readline().strip().split(',')
    data = []
    for line in f:
        row = line.strip().split(',')
        row[1] = int(row[1])
        data.append(row)

assert header == ['year', 'total_enrolment'], 'Wrong header'
for i,row in enumerate(data):
    for j,element in enumerate(row):
        assert str(element) == str(new2_data[i][j]), 'Wrong data for row {i}, element {j} of new2_data'

# Feedback and suggestions

Any feedback or suggestions for this assignment?

YOUR ANSWER HERE