---
# Week 3 + 4
---

MISC: 
 - Resources to study HTML + CSS. (Beware of tutorial hell and bloated tech stacks ....) 
   - W3C tutorial: https://www.w3schools.com/html/
   - Freecodecamp: https://www.freecodecamp.org/news/learn-html-beginners-course/
   - Search Google / Youtube. Look for basic HTML and basic CSS. 
 - Software dev practices: 
   - Make a plan but don't be afraid to change the plan. What modules, functions are needed. What are the input and outputs. 
   - Start small. And keep the parts small. Don't write gigantic function / module that does a lot of things. 
   - Write small test cases. 
   - If you find yourself repeating similar code, "refactor" it into reusable module / function. 

---
# OOP
---
Possible FAQ: Why OOP?
  - Programming paradigm / style are for humans, not for machines. 
  - Why not something else? Why not stick with assembly, strict imperative, just list, just dict, functional? We just want to arrange code in such a way that *most* people can digest the code faster than otherwise …. (programming paradigm)
  - Some other perks of paradigm: tooling (debugger, linter, semantic checks …)

In [1]:
import random
import datetime

In [3]:
# How would you store details about a student? 
# We could use list, but we need to remember which index correspond to which info. 
# It could be error prone....
# We could use dictionary. But could you tell what went wrong in the student record below? 
student1 = {
    "name": "John Doe", 
    "age": 16, 
    "id": 4814057, 
    "attendence": ["12/01/2023", "11/25/2023"], 
    "subjects": ["How to survive MBS", "Python programming", "SQL"], 
}
student2 = {
    "Name": "Jane Doe", 
    "age": 17, 
    "ID": 124587124, 
    "attend": [], 
    "subject": ["How to survive MBS", "Python programming", "SQL"], 
}

def is_same_student(student1, student2):
    has_same_id = student1["id"] == student2["id"]
    has_same_name = student1["name"] == student2["name"]
    if has_same_id and (not has_same_name):
        raise ValueError
    return has_same_id

is_same_student(student1, student2)

KeyError: 'id'

In [24]:
# How can we do this more systematically? 
# What else do you think this class should record / be able to do? 
# Pretty printing
# Checking equality
# Sorting. 
# Other suggestion? How would you implement a "friend request" feature? 



class Student(object):
    
    def __init__(self, 
                 name, 
                 age=None, 
                 birth_year=None, 
                 idnum=None, 
                 first_name=None, 
                 last_name=None
        ):
        self.name = name
        self.first_name = first_name
        self.last_name = last_name

        self.birth_year = birth_year
        self.age = age        
        if idnum is None:
            self.id = random.randint(0, 100000)
        else:
            self.id = idnum

        self.attendence = []
    
    # what is the difference between method and functions? 
    def get_age(self):
        current_year = datetime.datetime.now().year
        if self.age is not None:
            return self.age
        elif self.birth_year is not None:
            return current_year - self.birth_year
        else:
            raise ValueError("Age not determined. ")
    
    def record_attendence(self, date):
        self.attendence.append(date) # add date formatting here. 
        return 
    
    def is_first_last_format(self):
        return self.name.split(' ') == [self.first_name, self.last_name]
    
#     def __lt__(self, other):
#         return self.age < other.age
    
    def __eq__(self, other):
        return self.id == other.id
    
#     def __le__(self, other):
#         return self.name <= other.name
    
#     def __lt__(self, other):
#         return self.name < other.name
    
    def __repr__(self):
        return f"Student(name={self.name}, age={self.age}, num attend={len(self.attendence)})"

student1 = Student("John Doe", birth_year=1995, idnum=12345)
student2 = Student("Jane Doe", birth_year=1994)
student3 = Student("John Doe", birth_year=1995, idnum=12345)
# student2 >= student1

In [25]:
print(student1)

Student(
name=John Doe, 
age=None, 
num attend=0)


# File and File formats
- What is a file:
    - A finite sequence of 8 bit bytes …
    - Something your OS recognise as a file.
    - Something you can access using Python’s `open()` function.
- Why file format?
    - Machines need to negotiate how to interpret data.
    - Text: html, plain text, python, c, ... (need not be human readable)
    - Images: png (there was an interesting bug about this format in 2023), jpeg, tiff
    - Compressed: gzip
    - Audio: mp3, wav
    - File name extension.
    
- Discuss: 
  - Why don't `open()` just return the entire content of the file instead of going through this `.tell(), .seek(), .read()` business? 
  - What could go wrong if we forget to close a file handle? 
- Demo CSV files

## CSV demo

1. What is a csv? 
2. What should we expect the `CSV` module to do? (and how do we find out if it does that?)
3. Demo

In [32]:
# How to read in csv into a list of lists? 
# What could potentially go wrong in this code? 

# The task is to read in the data in `rand_student.csv` into a python list of lists. 
# Also, print out the average age of the students on record. 
# What are the mistakes made in the following? 


import csv

file = open("rand_student.csv") # mistake 
reader = csv.reader(file)
header = next(reader)
raw_data = [row for row in reader] # bad design 

ages = []
student_ids = []
for row in raw_data: 
    idnum = row[-2]
    age = int(row[-1])
    if idnum not in student_ids:
        student_ids.append(idnum)
        ages.append(age)
    
    ages.append(age)

# another mistake 
file.close()


print(raw_data)
print(f"Average age = {sum(ages) / len(ages)}") # won't work. 

[['Science', '07/15/2022', 'Friday', 'John Smith', '123456', '15'], ['History', '12/21/2022', 'Wednesday', 'Bob Johnson', '987654', '14'], ['Mathematics', '06/16/2022', 'Thursday', 'Jane Doe', '654321', '16'], ['Social Studies', '06/11/2022', 'Saturday', 'Bob Johnson', '987654', '14'], ['Science', '12/13/2022', 'Tuesday', 'Jane Doe', '654321', '16'], ['English', '09/27/2022', 'Tuesday', 'Bob Johnson', '987654', '14'], ['Science', '09/21/2022', 'Wednesday', 'John Smith', '123456', '15'], ['History', '11/11/2022', 'Friday', 'John Smith', '123456', '15'], ['Social Studies', '07/25/2022', 'Monday', 'Bob Johnson', '987654', '14'], ['Mathematics', '04/11/2022', 'Monday', 'John Smith', '123456', '15']]
Average age = 14.846153846153847


In [None]:
# Correct code
import csv

file = open("rand_student.csv") # default is "r"-mode. be VERY CAREFUL about 'w'-mode. 
reader = csv.reader(file)
header = next(reader)
raw_data = [row for row in reader]
file.close() # we have finished using the file! Close it as soon as possible! 

ages = []
student_ids = []
for row in raw_data: #what happen if we do `for row in reader:` here again? 
    age = int(row[-1])
    idnum = int(row[-2])
    if idnum not in student_ids:
        student_ids.append(idnum)
        ages.append(age)


print(header)
print(raw_data)
print(f"Average age = {sum(ages) / len(ages)}")

    


In [None]:
# Using `with` context manager. 
import csv
raw_data = []
with open("rand_student.csv") as infile: 
    reader = csv.reader(infile)
    header = next(reader)
    raw_data = [row for row in reader]

ages = []
student_ids = []
for row in raw_data:
    age = int(row[-1]) # what if we don't know the index? 
    idnum = int(row[-2])
    if idnum not in student_ids:
        student_ids.append(idnum)
        ages.append(age)

print(header)
print(raw_data)
print(f"Average age = {sum(ages) / len(ages)}")


In [None]:
# csv.DictReader demo

import csv
ages = []
student_ids = []
raw_data = []
with open("rand_student.csv") as infile: 
    dictreader = csv.DictReader(infile)
    header = next(dictreader) # mistake
    for row in dictreader:
        raw_data.append(row)
        age = int(row["age"])
        idnum = int(row["id"])
        if idnum not in student_ids:
            student_ids.append(idnum)
            ages.append(age)

print(raw_data)
print(f"Average age = {sum(ages) / len(ages)}")

        

In [None]:
# Using Student object

import csv
students = []
with open("rand_student.csv") as infile: 
    dictreader = csv.DictReader(infile)
    for row in dictreader:
        s = Student(row["name"], row["age"], row["id"])
        s.record_attendence(row["date"])
        students.append(s)
students

In [None]:
import csv
students = []
with open("rand_student.csv") as infile: 
    reader = csv.reader(infile)
    header = next(reader)
    for row in reader:
        date = row[1]
        idnum = int(row[-2])
        age = int(row[-1])
        name = row[-3]
        s = Student(name, age, idnum)
        s.record_attendence(date)
        students.append(s)
sorted(students)

## RAM vs Disk 
Below is a comparison of reading from Disk vs from RAM. It is a dumb comparison but nonetheless shows the huge disparity. 

When would we use which storage? Why? 
When do we want to read from / store to file? Why? 

In [33]:
def read_from_file(filename):
    with open(filename) as infile:
        x = infile.read()
    return x

%timeit read_from_file("./test.txt")

10.8 µs ± 52.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [34]:
x = '3\n1\n5\n9\n6\n10\n2\n'
def read_from_mem(numlist):
    s = sum(numlist) # we can do a bunch of computation and still be faster than reading from disk. 
    y = x[::-1] + x[::-1]
    return y, s
%timeit read_from_mem([3, 5, 9, 6, 10, 2])

275 ns ± 1.06 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
