# STA 141B: Homework 1
Winter 2018

## Information

After the colons (in the same line) please write just your first name, last name, and the 9 digit student ID number below.

First Name: Sam

Last Name: Tsoi

Student ID: 913032178


## Instructions

We use a script that extracts your answers by looking for cells in between the cells containing the exercise statements.  So you 

- MUST add cells in between the exercise statements and add answers within them and
- MUST NOT modify the existing cells, particularly not the problem statement

To make markdown, please switch the cell type to markdown (from code) - you can hit 'm' when you are in command mode - and use the markdown language.  For a brief tutorial see: https://daringfireball.net/projects/markdown/syntax


## Part 1: The Doomsday Algorithm

The Doomsday algorithm, devised by mathematician J. H. Conway, computes the day of the week any given date fell on. The algorithm is designed to be simple enough to memorize and use for mental calculation.

__Example.__ With the algorithm, we can compute that July 4, 1776 (the day the United States declared independence from Great Britain) was a Thursday.

The algorithm is based on the fact that for any year, several dates always fall on the same day of the week, called the <em style="color:#F00">doomsday</em> for the year. These dates include 4/4, 6/6, 8/8, 10/10, and 12/12.

__Example.__ The doomsday for 2016 is Monday, so in 2016 the dates above all fell on Mondays. The doomsday for 2017 is Tuesday, so in 2017 the dates above will all fall on Tuesdays.

The doomsday algorithm has three major steps:

1. Compute the anchor day for the target century.
2. Compute the doomsday for the target year based on the anchor day.
3. Determine the day of week for the target date by counting the number of days to the nearest doomsday.

Each step is explained in detail below.

### The Anchor Day

The doomsday for the first year in a century is called the <em style="color:#F00">anchor day</em> for that century. The anchor day is needed to compute the doomsday for any other year in that century. The anchor day for a century $c$ can be computed with the formula:
$$
a = \bigl( 5 (c \bmod 4) + 2 \bigr) \bmod 7
$$
The result $a$ corresponds to a day of the week, starting with $0$ for Sunday and ending with $6$ for Saturday.

__Note.__ The modulo operation $(x \bmod y)$ finds the remainder after dividing $x$ by $y$. For instance, $12 \bmod 3 = 0$ since the remainder after dividing $12$ by $3$ is $0$. Similarly, $11 \bmod 7 = 4$, since the remainder after dividing $11$ by $7$ is $4$.

__Example.__ Suppose the target year is 1954, so the century is $c = 19$. Plugging this into the formula gives
$$a = \bigl( 5 (19 \bmod 4) + 2 \bigr) \bmod 7 = \bigl( 5(3) + 2 \bigr) \bmod 7 = 3.$$
In other words, the anchor day for 1900-1999 is Wednesday, which is also the doomsday for 1900.

__Exercise 1.1.__ Write a function that accepts a year as input and computes the anchor day for that year's century. The modulo operator `%` and functions in the `math` module may be useful. Document your function with a docstring and test your function for a few different years.  Do this in a new cell below this one.

In [1]:
def computeAnchor(year):
    """accepts a year as input and computes anchor day for that year's century.
    input: year
    output: an integer between 0 and 6 inclusive indicating which day of week anchor day is for that century"""
    cent = str(year)
    cent = cent[0:2]
    anchor = (5*(int(cent) % 4) + 2) % 7
    return anchor
computeAnchor(1954)
computeAnchor(2406)
computeAnchor(1743)


0

The anchor day for the year 1954 is Wednesday, as the output for computeAnchor() for year 1954 is 3. 
The anchor day for the year 2406 is Tuesday, as the output is 2.
The anchor day for the year 1743 is Sunday, as the output is 0.
The function outputs an integer between 0 and 6 to make calculating doomsday easier for later on. 0 represents Sunday, 1 represents Monday, etc. 

<span style='color:red'>Grade:8/10</span>

Notes: should not use str slicing 

### The Doomsday

Once the anchor day is known, let $y$ be the last two digits of the target year. Then the doomsday for the target year can be computed with the formula:
$$d = \left(y + \left\lfloor\frac{y}{4}\right\rfloor + a\right) \bmod 7$$
The result $d$ corresponds to a day of the week.

__Note.__ The floor operation $\lfloor x \rfloor$ rounds $x$ down to the nearest integer. For instance, $\lfloor 3.1 \rfloor = 3$ and $\lfloor 3.8 \rfloor = 3$.

__Example.__ Again suppose the target year is 1954. Then the anchor day is $a = 3$, and $y = 54$, so the formula gives
$$
d = \left(54 + \left\lfloor\frac{54}{4}\right\rfloor + 3\right) \bmod 7 = (54 + 13 + 3) \bmod 7 = 0.
$$
Thus the doomsday for 1954 is Sunday.

__Exercise 1.2.__ Write a function that accepts a year as input and computes the doomsday for that year. Your function may need to call the function you wrote in exercise 1.1. Make sure to document and test your function.

In [2]:
#leap year, january 1st, dec 31

#using math.floor
import math
def computeDooms(year):
    """accepts a year as input and computes the doomsday for that year
input: year
output: an integer between 0 and 6 inclusive indicating which day of the week the doomsday is for that year"""
    anchor = computeAnchor(year)
    digits = str(year) #puts it into a string so we can iterate and parse the string
    digits = digits[2:4] #returns the last 2 digits of the year
    digits = int(digits)
    dooms = (digits + math.floor(digits/4) + anchor) % 7
    return dooms
computeDooms(1670)
computeDooms(1599)
computeDooms(2001)

3

The dooms day for the year 1670 is on a Friday, as the output for computeDooms is 5.
The dooms day for the year 1599 is on a Sunday, as the output for computeDooms is 0.
The dooms day for the year 2001 is on a Wednesday, as the output for computeDooms is 3.
The function outputs an integer between 0 and 6 to make calculating the day of week any certain day is easier later on. 0 represents Sunday, 1 represents Monday, etc. 

<span style='color:red'>Grade:8/10</span>

Notes: same as above 

### The Day of Week

The final step in the Doomsday algorithm is to count the number of days between the target date and a nearby doomsday, modulo 7. This gives the day of the week.

Every month has at least one doomsday:
* (regular years) 1/10, 2/28
* (leap years) 1/11, 2/29
* 3/21, 4/4, 5/9, 6/6, 7/11, 8/8, 9/5, 10/10, 11/7, 12/12

__Example.__ Suppose we want to find the day of the week for 7/21/1954. The doomsday for 1954 is Sunday, and a nearby doomsday is 7/11. There are 10 days in July between 7/11 and 7/21. Since $10 \bmod 7 = 3$, the date 7/21/1954 falls 3 days after a Sunday, on a Wednesday.

__Exercise 1.3.__ Write a function to determine the day of the week for a given day, month, and year. Be careful of leap years! Your function should return a string such as "Thursday" rather than a number. As usual, document and test your code.

In [4]:
def dayOfWeek(date):
    """determines the day of the week for a given day, month, and year
    input: a date, in the format of mm/dd/year or m/d/year. 
    month must be between 1 and 12 inclusive, day can be any digit. year must be a 4 character year
    output: a string, representing day of week a certain date is"""
    
    #string manipulation, assigning variables
    date = str(date)
    date = date.split("/")
    month = date[0]
    day = date[1]
    year = date[2]
    
    #calculate dooms day for that year
    dooms = computeDooms(year)
    
    #using a dictionary, using month as keys and matching it with its respective doomsday
    doomsRegDict = {'1':10, '2':28, '3':21, '4':4, '5':9,'6':6,'7':11,'8':8,'9':5,'10':10,'11':7, '12':12}
    doomsLeapDict = {'1':11, '2':29, '3':21, '4':4, '5':9,'6':6,'7':11,'8':8,'9':5,'10':10,'11':7, '12':12}
    #is the year we're looking at a leap year?
    
    #leap year: not only does the year need to be divisible by 4, but the centurial also needs to be divisble by 400. 
    #to make this easier, i made hard cases for 1700,1800, and 1900.
    if(int(year) % 4 == 0 and not (int(year) == 1700 or int(year) == 1800 or int(year) == 1900)): #leap year
        daysToDooms = int(day) - doomsLeapDict[month]
    else: #not leap year
        daysToDooms = int(day) - doomsRegDict[month]
        
    #finding out the date of the week depending on # of days to doomsday
    addingToDayOfWeek = daysToDooms % 7 
   
    #day of week is dependent on doomsday that we calculated earlier
    dayOfWeek = dooms + addingToDayOfWeek
    dayOfWeek = dayOfWeek %7 

    
    # we want to return a string depending on the day of the week
    if(dayOfWeek == 0):
        return "Sunday"
    elif(dayOfWeek == 1):
        return "Monday"
    elif(dayOfWeek == 2):
        return "Tuesday"
    elif(dayOfWeek == 3):
        return "Wednesday"
    elif(dayOfWeek == 4):
        return "Thursday"
    elif(dayOfWeek == 5):
        return "Friday"
    elif(dayOfWeek == 6):
        return "Saturday"
    return "None"

#testing function
dayOfWeek("1/1/1932")
dayOfWeek("1/4/2010")
dayOfWeek("3/5/1800") #testing leap year

'Wednesday'

1/1/1932 is a Friday, 1/4/2010 is a Monday, and 3/5/1800 is a Wednesday. Typically, a year that is divisble by 4 is a leap year but 1700, 1800, and 1900 are not leap years. This was included and taken to account in the code.

<span style='color:red'>Grade:15/20</span>

Notes: wrong leap year checking 

__Exercise 1.4.__ How many times did Friday the 13th occur in the years 1900-1999? Does this number seem to be similar to other centuries?

In [6]:
def fridayThe13th(year1,year2):
    """calculate the number of times Friday the 13th occur given the years
    input: the beginning year and the end year for how many times Friday the 13th occurs
    output: an integer, the number of how many times Friday the 13th occurs"""
    
    #/13/1900
    #preparing variables
    year1 = int(year1)
    year2 = int(year2)
    counter = 0
    
    #going through each year between the two years. so 1900 - 1999 would go from 1900, 1901, 1902, ... , 1999 INCLUSIVE
    for year in list(range(year1,year2+1)):
        
        #considering each month! so, january to december
        for month in list(range(1,13)):
            
            #concatenating month/day/year into a string, ready for using dayOfWeek(date)
            dateString = str(month)+"/13/"+str(year) # i.e. 1/13/1910, 2/13/1910, 12/13/1945, etc.
            
            #if the day of the week is Friday, then we increment counter
            if(dayOfWeek(dateString) == "Friday"):
                counter += 1
    return counter

#testing variables
fridayThe13th(1900,1999)
fridayThe13th(1800,1899)
fridayThe13th(2000,2099)

172

Friday the 13th occured 172 times in the years 1900-1999, inclusive of those years. Calculating the number of times Friday the 13th occurs in the years 1800-1899 and 2000-2099, the number of Friday the 13th is the same throughout these centuries.

<span style='color:red'>Grade:10/10</span>

Notes:

__Exercise 1.5.__ How many times did Friday the 13th occur between the year 2000 and today?

In [7]:
fridayThe13th(2000,2017)

31

Between the year 2000 and 2017, Friday the 13th has occured 31 times. Since it is the beginning of 2018 right now, no Friday the 13th has occured yet. Since my function fridayThe13th() is inclusive of years, I put 2000 and 2017 and my input to compute the number of times Friday the 13th has occured.

<span style='color:red'>Grade:10/10</span>

Notes:

## Part 2: 1978 Birthdays

__Exercise 2.1.__ The file `birthdays.txt` contains the number of births in the United States for each day in 1978. Inspect the file to determine the format. Note that columns are separated by the tab character, which can be entered in Python as `\t`. Write a function that uses iterators and list comprehensions with the string methods `split()` and `strip()` to  convert each line of data to the list format

```Python
[month, day, year, count]
```
The elements of this list should be integers, not strings. The function `read_birthdays` provided below will help you load the file.

In [8]:
import re
def read_birthdays(file_path):
    """Read the contents of the birthdays file into a string.
    
    Arguments:
        file_path (string): The path to the birthdays file.
        
    Returns:
        string: The contents of the birthdays file.
    """

    #reading each lines
    with open(file_path) as file:
        line = file.readlines()
        
    #LINES BELOW IS PARSING THROUGH THE FILE, MAKING IT INTO A LIST OF [month, day, year, count]
    
    #getting rid of newline
    line = [x.strip() for x in line] 
    line = [x for x in line if x != '']
    #random gibberish at the beginning of document is rid of
    line = line[4:]
    result = []
    
    #parsing through each line, getting rid of / and \t which separates the date and the number of births on that day
    #adding to result list
    result = []
    for x in line:
        line = re.split(r'[/,\t]',x)
        lineInt = [int(x) for x in line]
        result.append(lineInt)

    return result
    
#reading "birthdays.txt"
read_birthdays("birthdays.txt")

[[1, 1, 78, 7701],
 [1, 2, 78, 7527],
 [1, 3, 78, 8825],
 [1, 4, 78, 8859],
 [1, 5, 78, 9043],
 [1, 6, 78, 9208],
 [1, 7, 78, 8084],
 [1, 8, 78, 7611],
 [1, 9, 78, 9172],
 [1, 10, 78, 9089],
 [1, 11, 78, 9210],
 [1, 12, 78, 9259],
 [1, 13, 78, 9138],
 [1, 14, 78, 8299],
 [1, 15, 78, 7771],
 [1, 16, 78, 9458],
 [1, 17, 78, 9339],
 [1, 18, 78, 9120],
 [1, 19, 78, 9226],
 [1, 20, 78, 9305],
 [1, 21, 78, 7954],
 [1, 22, 78, 7560],
 [1, 23, 78, 9252],
 [1, 24, 78, 9416],
 [1, 25, 78, 9090],
 [1, 26, 78, 9387],
 [1, 27, 78, 8983],
 [1, 28, 78, 7946],
 [1, 29, 78, 7527],
 [1, 30, 78, 9184],
 [1, 31, 78, 9152],
 [2, 1, 78, 9159],
 [2, 2, 78, 9218],
 [2, 3, 78, 9167],
 [2, 4, 78, 8065],
 [2, 5, 78, 7804],
 [2, 6, 78, 9225],
 [2, 7, 78, 9328],
 [2, 8, 78, 9139],
 [2, 9, 78, 9247],
 [2, 10, 78, 9527],
 [2, 11, 78, 8144],
 [2, 12, 78, 7950],
 [2, 13, 78, 8966],
 [2, 14, 78, 9859],
 [2, 15, 78, 9285],
 [2, 16, 78, 9103],
 [2, 17, 78, 9238],
 [2, 18, 78, 8167],
 [2, 19, 78, 7695],
 [2, 20, 78, 9021]

read_birthdays("birthdays.txt") reads in "birthdays.txt", parsing through each line and putting the month, day, year, and number of birthday of that day into a list.

<span style='color:red'>Grade:10/10</span>

Notes:

__Exercise 2.2.__ Which month had the most births in 1978? Which day of the week had the most births? Which day of the week had the fewest? What conclusions can you draw? You may find the `Counter` class in the `collections` module useful.


In [9]:
result = read_birthdays("../data/birthdays.txt")
def monthBirth(result):
    """displays the number of births for each month given the list [month, day, year, count]
    input: list in the format of [month, day, year, count]
    output: a dictionary, in the format of month: cumulated number of births"""
    
    dict = {}
    #going through the list to add all the births for dict
    for i in result:
        #if the month is already in dictionary, add the value to its respective month
        if i[0] in dict:
            new = dict[i[0]] + i[3]
            
        #if not, make a new key with the number of births for that month
        else:
            new = i[3]
        dict[i[0]] = new
    return dict

#test
print(monthBirth(result))


def dayBirth(result):
    """displays the number of births for each day of the week given the list [month, day, year, count]
    input: list in the format of [month, day, year, count]
    output: a dictionary, in the format of day of week: cumulated number of births"""
    dict2 = {}
    
    #going through the list to add all the births for dict2
    for i in result:
        #concatenating month/day/year into a string, ready for using dayOfWeek(date)
        dateString = str(i[0])+"/"+ str(i[1])+"/19"+str(i[2])
        dayOfWee = str(dayOfWeek(dateString))
        
        #if the day of week is already in dictionary, add the value to its respective day
        if dayOfWee in dict2:
            new = dict2[dayOfWee] + i[3]
        #if not, make a new key with the number of births for that day of week
        else:
            new = i[3]
        dict2[dayOfWee] = new
    return dict2

#test
print(dayBirth(result))

{1: 270695, 2: 249875, 3: 276584, 4: 254577, 5: 270812, 6: 270756, 7: 294701, 8: 302795, 9: 293891, 10: 288955, 11: 274671, 12: 284927}
{'Friday': 500541, 'Tuesday': 504858, 'Saturday': 432085, 'Monday': 487309, 'Wednesday': 493897, 'Sunday': 421400, 'Thursday': 493149}


The month that had the most birth in 1978 was August, with 302795 births. The day of the week with the most births in 1978 was Tuesday, with 504858 births. The day of the week with the fewest births in 1978 was Sunday, with 421400 births. I put the 

<span style='color:red'>Grade:20/20</span>

Notes:

__Exercise 2.3.__ What would be an effective way to present the information in exercise 2.2? You don't need to write any code for this exercise, just discuss what you would do.

What I did in 2.2 is I created a dictionary, with what I want as independent variable as the key and the number of births as the dependent variable. For example, to find the month with most births, the keys in the dictionary are months and the valuse are number of births with its respective month. I think this is an effective way to display the information because it organizes the month and the number of births neatly, kind of in a table. There may be other data structures to explore as well because dictionaries are typically not sorted, so it is displayed chronologically on how items are added. I looked at the values of the keys in order to see which key had the most births, which might be not as convenient as looking at the first element or just spitting out a string of the key with the most birth. If we really do want to do this, there are ways to get this done by sorting the values, but there are just no built-in functions. A hash data structure can also be used, I believe! I think a list or a tuple would not be as convenient because we want a value attached to an item. This might be okay if we use a list and had the indices as months, but this wouldn't be as clear. Additionally, it would be hard to use and not as clear with the days of the week since we are using strings.

<span style='color:red'>Grade:10/10</span>

Notes: