# STA 141B: Homework 1

Fall 2018

## Information

After the colons (in the same line) please write just your first name, last name, and the 9 digit student ID number below.

First Name: Jared

Last Name: Yu

Student ID: 914640019

## Instructions

We use a script that extracts your answers by looking for cells in between the cells containing the exercise statements.  So you 

- MUST add cells in between the exercise statements and add answers within them and
- MUST NOT modify the existing cells, particularly not the problem statement

To make markdown, please switch the cell type to markdown (from code) - you can hit 'm' when you are in command mode - and use the markdown language.  For a brief tutorial see: https://daringfireball.net/projects/markdown/syntax

## Part 1: The Doomsday Algorithm

The Doomsday algorithm, devised by mathematician J. H. Conway, computes the day of the week any given date fell on. The algorithm is designed to be simple enough to memorize and use for mental calculation.

__Example.__ With the algorithm, we can compute that July 4, 1776 (the day the United States declared independence from Great Britain) was a Thursday.

The algorithm is based on the fact that for any year, several dates always fall on the same day of the week, called the <em style="color:#F00">doomsday</em> for the year. These dates include 4/4, 6/6, 8/8, 10/10, and 12/12.

__Example.__ The doomsday for 2016 is Monday, so in 2016 the dates above all fell on Mondays. The doomsday for 2017 is Tuesday, so in 2017 the dates above will all fall on Tuesdays.

The doomsday algorithm has three major steps:

1. Compute the anchor day for the target century.
2. Compute the doomsday for the target year based on the anchor day.
3. Determine the day of week for the target date by counting the number of days to the nearest doomsday.

Each step is explained in detail below.

### The Anchor Day

The doomsday for the first year in a century is called the <em style="color:#F00">anchor day</em> for that century. The anchor day is needed to compute the doomsday for any other year in that century. The anchor day for a century $c$ can be computed with the formula:
$$
a = \bigl( 5 (c \bmod 4) + 2 \bigr) \bmod 7
$$
The result $a$ corresponds to a day of the week, starting with $0$ for Sunday and ending with $6$ for Saturday.

__Note.__ The modulo operation $(x \bmod y)$ finds the remainder after dividing $x$ by $y$. For instance, $12 \bmod 3 = 0$ since the remainder after dividing $12$ by $3$ is $0$. Similarly, $11 \bmod 7 = 4$, since the remainder after dividing $11$ by $7$ is $4$.

__Example.__ Suppose the target year is 1954, so the century is $c = 19$. Plugging this into the formula gives
$$a = \bigl( 5 (19 \bmod 4) + 2 \bigr) \bmod 7 = \bigl( 5(3) + 2 \bigr) \bmod 7 = 3.$$
In other words, the anchor day for 1900-1999 is Wednesday, which is also the doomsday for 1900.

__Exercise 1.1.__ Write a function that accepts a year as input and computes the anchor day for that year's century. The modulo operator `%` and integer division `\\` will be useful. Document your function with a docstring and test your function for a few different years.

In [1]:
import math

def anchor_day(date_0):
    """
    The function first takes the year that is given and converts it to a string. It then extracts the first two
    numbers of the 4-digit year. This determines which century the year is. The next step is to utilize the given formula
    to find out what the anchor day is for a given century. The fomula is:
    a = (5(c mod 4) + 2) mod 7
    where a is the anchor day and c is the century.
    input: year
    output: anchor day
    """
    century = int(str(date_0)[0:2]) # Assign the first 2 numbers of the year as the century
    the_day = (5*(century % 4) + 2) % 7 # Determine the day
    return(the_day)

print(anchor_day(1954), anchor_day(2018), anchor_day(1000))

3 2 5


### The Doomsday

Once the anchor day is known, let $y$ be the last two digits of the target year. Then the doomsday for the target year can be computed with the formula:
$$d = \left(y + \left\lfloor\frac{y}{4}\right\rfloor + a\right) \bmod 7$$
The result $d$ corresponds to a day of the week.

__Note.__ The floor operation $\lfloor x \rfloor$ rounds $x$ down to the nearest integer. For instance, $\lfloor 3.1 \rfloor = 3$ and $\lfloor 3.8 \rfloor = 3$.

__Example.__ Again suppose the target year is 1954. Then the anchor day is $a = 3$, and $y = 54$, so the formula gives
$$
d = \left(54 + \left\lfloor\frac{54}{4}\right\rfloor + 3\right) \bmod 7 = (54 + 13 + 3) \bmod 7 = 0.
$$
Thus the doomsday for 1954 is Sunday.

__Exercise 1.2.__ Write a function that accepts a year as input and computes the doomsday for that year. Your function may need to call the function you wrote in exercise 1.1. Make sure to document and test your function.

In [2]:
def doomsday(date_0):
    """
    The function doomsday utilizes the previous function called anchor_day() to first retrieve the anchor date
    of the given date. Then the function will find out the last two numbers of the 4-digit year. It does this
    by converting the date to a string, extracting the last two numbers, and finally converting it back to
    an integer. It then utlizes the given mathematical formula to determine what the doomsday is. The formula is:
    d = (y + floor(y/4) + a) mod 7
    where a is the anchor date and y is the last two digits of the 4-digit year.
    input: year
    output: doomsday
    """
    anchor_date = anchor_day(date_0) # Retrieve the anchor date using the previous function
    year = int(str(date_0)[2:4]) # Find the last 2 digits of the year given ##XX
    doomsdate = (year + math.floor(year/4) + anchor_date) % 7 # Determine the doomsday using the given formula
    return(doomsdate)

print(doomsday(2011), doomsday(2001), doomsday(2000))

1 3 2


### The Day of Week

The final step in the Doomsday algorithm is to count the number of days between the target date and a nearby doomsday, modulo 7. This gives the day of the week.

Every month has at least one doomsday:
* (regular years) 1/10, 2/28
* (leap years) 1/11, 2/29
* 3/21, 4/4, 5/9, 6/6, 7/11, 8/8, 9/5, 10/10, 11/7, 12/12

__Example.__ Suppose we want to find the day of the week for 7/21/1954. The doomsday for 1954 is Sunday, and a nearby doomsday is 7/11. There are 10 days in July between 7/11 and 7/21. Since $10 \bmod 7 = 3$, the date 7/21/1954 falls 3 days after a Sunday, on a Wednesday.

__Exercise 1.3.__ Write a function to determine the day of the week for a given day, month, and year. Be careful of leap years! Your function should return a string such as "Thursday" rather than a number. As usual, document and test your code.

In [3]:
def dotw(month, day, year):
    """
    The dotw function works by first checking for a leap year. It then determines which dates to choose from
    depending on if it is a leap year or not. After determining the doomsdate with the previous doomsday
    function, it also figures the difference between this date and the given date. Also, by combining
    it with the modulo operator, it can determine the day of the week.
    """
    weekday_dictionary = {0:'Sunday', 1:'Monday', 2:'Tuesday', 3:'Wednesday', 4:'Thursday',
                          5:'Friday', 6:'Saturday'}
    
    # Idea from classmate Kevin Chu for leap year check/dictionary
    if (((year % 4 == 0) and (year % 100 != 0)) or (year % 400 == 0)): # Check if leap year
        dayofmonth = [11, 29, 21, 4, 9, 6, 11, 8, 5, 10, 7, 12]
    else:
        dayofmonth = [10, 28, 21, 4, 9, 6, 11, 8, 5, 10, 7, 12]
        
    doomsdate = doomsday(year) # Determine the doomsday for the year using the doomsday function.
    weekday = day - dayofmonth[month-1] # Start of doomsday weekday calculation
    day_of_the_week = (doomsdate + (weekday % 7)) % 7 # Determine the day of the week

    return(weekday_dictionary[day_of_the_week])

print(dotw(10,8,2018), dotw(1,11,2000), dotw(1,12,2000))    

Monday Tuesday Wednesday


__Exercise 1.4.__ Davis picks up yard waste on the first Monday of the month.  How many times did the 1st of the month (first day of the month) fall on a Monday in the years 2000-2016 (including 2016)?

In [4]:
count = 0 # Initialize counter
for year in range(2000,2017): # Cycles through years 2000 - 2016
    for month in range(1,13): # Cycles through months 1 - 12
        if dotw(month, 1, year) == 'Monday': # Check if day is Monday
            count += 1 # Update counter

print("There are", count, "Mondays that are also the first of the month.")

There are 28 Mondays that are also the first of the month.


## Part 2: 1978 Birthdays

__Exercise 2.1.__ The file `birthdays.txt` contains the number of births in the United States for each day in 1978. Inspect the file to determine the format. Note that columns are separated by the tab character, which can be entered in Python as `\t`. Write a function that uses iterators and list comprehensions with the string methods `split()` and `strip()` to  convert each line of data to the tuple format

```Python
(month, day, year, count)
```
The elements of this list should be integers, not strings.  Read in the data and create this list of tuples.

In [5]:
# https://stackoverflow.com/questions/6696027/split-elements-of-a-list-in-python
# https://stackoverflow.com/questions/7876272/select-value-from-list-of-tuples-where-condition
# https://stackoverflow.com/questions/642154/how-to-convert-strings-into-integers-in-python
def birthday_data(file_path):
    """
    This function utilizes the birthday.txt data to open the file and separate the contents into 4 different
    categories. The 4 categories are: month, day, year, and count. The first portion was given by the TA, where
    he showed us how to open the file and exclude the first few lines. The other portion utlizes the split()
    function to divide the data by the separator. Afterwards the new columns which are in list format are then
    zipped together as a list of tuples.
    """
    file = open(file_path) # This code was given by the TA, opens the file path
    bday_data = [data.strip() for i, data in enumerate(file) if i>5] # Ignore the first 5 lines
    split_data = [i.split('\t',1) for i in bday_data if i != ''] # Divide the data by the separator '\t'
    birthday_count = [i[1][0:] for i in split_data] # Retrieve the counts columns from the split data and save as list
    birthday_dates = [i[0][0:8] for i in split_data] # Retrieve the dates column from the split data and save as list
    birthday_dates = [i.split('/') for i in birthday_dates] # Divide the dates by the separator '\'
    birthday_months = [i[0] for i in birthday_dates] # Retrieve the months column from the split data and save as list
    birthday_days = [i[1] for i in birthday_dates] # Retrieve the days column from the split data and save as list
    birthday_years = [i[2] for i in birthday_dates] # Retrieve the years column from the split data and save as list
    new_format = list(zip(birthday_months, birthday_days, birthday_years, birthday_count)) # Save lists as tuples
    bday_integers = [list(map(int,(w,x,y,z))) for w,x,y,z in new_format] # Convert the strings to integers
    return(bday_integers)
birthdays = birthday_data('C://Users//qizhe//Desktop//STA 141B//hw1b//hw1//birthdays.txt')
birthdays

[[1, 1, 78, 7701],
 [1, 2, 78, 7527],
 [1, 3, 78, 8825],
 [1, 4, 78, 8859],
 [1, 5, 78, 9043],
 [1, 6, 78, 9208],
 [1, 7, 78, 8084],
 [1, 8, 78, 7611],
 [1, 9, 78, 9172],
 [1, 10, 78, 9089],
 [1, 11, 78, 9210],
 [1, 12, 78, 9259],
 [1, 13, 78, 9138],
 [1, 14, 78, 8299],
 [1, 15, 78, 7771],
 [1, 16, 78, 9458],
 [1, 17, 78, 9339],
 [1, 18, 78, 9120],
 [1, 19, 78, 9226],
 [1, 20, 78, 9305],
 [1, 21, 78, 7954],
 [1, 22, 78, 7560],
 [1, 23, 78, 9252],
 [1, 24, 78, 9416],
 [1, 25, 78, 9090],
 [1, 26, 78, 9387],
 [1, 27, 78, 8983],
 [1, 28, 78, 7946],
 [1, 29, 78, 7527],
 [1, 30, 78, 9184],
 [1, 31, 78, 9152],
 [2, 1, 78, 9159],
 [2, 2, 78, 9218],
 [2, 3, 78, 9167],
 [2, 4, 78, 8065],
 [2, 5, 78, 7804],
 [2, 6, 78, 9225],
 [2, 7, 78, 9328],
 [2, 8, 78, 9139],
 [2, 9, 78, 9247],
 [2, 10, 78, 9527],
 [2, 11, 78, 8144],
 [2, 12, 78, 7950],
 [2, 13, 78, 8966],
 [2, 14, 78, 9859],
 [2, 15, 78, 9285],
 [2, 16, 78, 9103],
 [2, 17, 78, 9238],
 [2, 18, 78, 8167],
 [2, 19, 78, 7695],
 [2, 20, 78, 9021]

__Exercise 2.2.__ 

1. Count the number of birthdays by the month (number of birthdays per month).
2. Count the number of birthdays by the day of the week. 

What conclusions can you draw? You may find the `Counter` class in the `collections` module useful.

In [6]:
# https://stackoverflow.com/questions/26660654/how-do-i-print-the-key-value-pairs-of-a-dictionary-in-python

from collections import Counter
import statistics # For using the mean function

month_dictionary = {1:'January:', 2:'February:', 3:'March:', 4:'April:', 5:'May:', 6:'June:', 7:'July:',
                    8:'August:', 9:'September:', 10:'October:', 11:'November:', 12:'December:'}

count = Counter() # Initialize Counter() for months
count_dow = Counter() # Initialize Counter() for weekdays
for i in birthdays:
    count[i[0]] += i[3] # Sum the birthdays per month
    count_dow[dotw(i[0],i[1],1978)] += i[3] # Sum the birthdays per weekday

month_avg = 0 # Initialize average
for i in range(1,13):
    print(month_dictionary[i], count[i]) # Display the month and number of birthdays
    month_avg += count[i] # Sum the birthdays

for month, month_count in dict(count_dow).items():
    print(month, month_count) # Print month and count
    
month_avg = round(month_avg/12) # Take the mean by dividing by number of months and remove the decimal point

weekday_avg = round(sum(dict(count_dow).values())/7) # Find the average for the weekdays

print('The average number of birthdays per month is ', month_avg, 
      'The average number of birthdays per weekday is ', weekday_avg,
      'The month with most birthdays is August at 302795.'
      + 'The month with fewest birthdays is February at 249875.'
      + 'The day with most birthdays is Tuesday at 504858. The day with fewest birthdays is Sunday at 421400.')

January: 270695
February: 249875
March: 276584
April: 254577
May: 270812
June: 270756
July: 294701
August: 302795
September: 293891
October: 288955
November: 274671
December: 284927
Sunday 421400
Monday 487309
Tuesday 504858
Wednesday 493897
Thursday 493149
Friday 500541
Saturday 432085
The average number of birthdays per month is  277770 The average number of birthdays per weekday is  476177 The month with most birthdays is August at 302795.The month with fewest birthdays is February at 249875.The day with most birthdays is Tuesday at 504858. The day with fewest birthdays is Sunday at 421400.
