# STA 141B: Homework 1

Fall 2018

## Information

After the colons (in the same line) please write just your first name, last name, and the 9 digit student ID number below.

First Name: Ryan

Last Name: Gosiaco

Student ID: 912819444

## Instructions

We use a script that extracts your answers by looking for cells in between the cells containing the exercise statements.  So you 

- MUST add cells in between the exercise statements and add answers within them and
- MUST NOT modify the existing cells, particularly not the problem statement

To make markdown, please switch the cell type to markdown (from code) - you can hit 'm' when you are in command mode - and use the markdown language.  For a brief tutorial see: https://daringfireball.net/projects/markdown/syntax

## Part 1: The Doomsday Algorithm

The Doomsday algorithm, devised by mathematician J. H. Conway, computes the day of the week any given date fell on. The algorithm is designed to be simple enough to memorize and use for mental calculation.

__Example.__ With the algorithm, we can compute that July 4, 1776 (the day the United States declared independence from Great Britain) was a Thursday.

The algorithm is based on the fact that for any year, several dates always fall on the same day of the week, called the <em style="color:#F00">doomsday</em> for the year. These dates include 4/4, 6/6, 8/8, 10/10, and 12/12.

__Example.__ The doomsday for 2016 is Monday, so in 2016 the dates above all fell on Mondays. The doomsday for 2017 is Tuesday, so in 2017 the dates above will all fall on Tuesdays.

The doomsday algorithm has three major steps:

1. Compute the anchor day for the target century.
2. Compute the doomsday for the target year based on the anchor day.
3. Determine the day of week for the target date by counting the number of days to the nearest doomsday.

Each step is explained in detail below.

### The Anchor Day

The doomsday for the first year in a century is called the <em style="color:#F00">anchor day</em> for that century. The anchor day is needed to compute the doomsday for any other year in that century. The anchor day for a century $c$ can be computed with the formula:
$$
a = \bigl( 5 (c \bmod 4) + 2 \bigr) \bmod 7
$$
The result $a$ corresponds to a day of the week, starting with $0$ for Sunday and ending with $6$ for Saturday.

__Note.__ The modulo operation $(x \bmod y)$ finds the remainder after dividing $x$ by $y$. For instance, $12 \bmod 3 = 0$ since the remainder after dividing $12$ by $3$ is $0$. Similarly, $11 \bmod 7 = 4$, since the remainder after dividing $11$ by $7$ is $4$.

__Example.__ Suppose the target year is 1954, so the century is $c = 19$. Plugging this into the formula gives
$$a = \bigl( 5 (19 \bmod 4) + 2 \bigr) \bmod 7 = \bigl( 5(3) + 2 \bigr) \bmod 7 = 3.$$
In other words, the anchor day for 1900-1999 is Wednesday, which is also the doomsday for 1900.

__Exercise 1.1.__ Write a function that accepts a year as input and computes the anchor day for that year's century. The modulo operator `%` and integer division `\\` will be useful. Document your function with a docstring and test your function for a few different years.

In [4]:
def anchor_day(year):
  """
  Computes the anchor day for the given year's century.
  
  Parameters:
  year (int): The year.
  
  Returns:
  day (int): The day from 0-6 where Sunday is 0 and Saturday is 6.
  
  """
  c = year // 100
  day = (5*(c%4)+2)%7
  return day

anchor_day(1954)


3

### The Doomsday

Once the anchor day is known, let $y$ be the last two digits of the target year. Then the doomsday for the target year can be computed with the formula:
$$d = \left(y + \left\lfloor\frac{y}{4}\right\rfloor + a\right) \bmod 7$$
The result $d$ corresponds to a day of the week.

__Note.__ The floor operation $\lfloor x \rfloor$ rounds $x$ down to the nearest integer. For instance, $\lfloor 3.1 \rfloor = 3$ and $\lfloor 3.8 \rfloor = 3$.

__Example.__ Again suppose the target year is 1954. Then the anchor day is $a = 3$, and $y = 54$, so the formula gives
$$
d = \left(54 + \left\lfloor\frac{54}{4}\right\rfloor + 3\right) \bmod 7 = (54 + 13 + 3) \bmod 7 = 0.
$$
Thus the doomsday for 1954 is Sunday.

__Exercise 1.2.__ Write a function that accepts a year as input and computes the doomsday for that year. Your function may need to call the function you wrote in exercise 1.1. Make sure to document and test your function.

In [26]:
import math

def doomsday(year):
  """
  Computes the doomsday for the given year.
  
  Parameters:
  year (int): The year.
  
  Returns:
  day (int): The day from 0-6 where Sunday is 0 and Saturday is 6.
  
  """
  ad = anchor_day(year)
  y = year%100
  day = (y + math.floor(y/4) + ad)%7
  return day

doomsday(2001)

3

### The Day of Week

The final step in the Doomsday algorithm is to count the number of days between the target date and a nearby doomsday, modulo 7. This gives the day of the week.

Every month has at least one doomsday:
* (regular years) 1/10, 2/28
* (leap years) 1/11, 2/29
* 3/21, 4/4, 5/9, 6/6, 7/11, 8/8, 9/5, 10/10, 11/7, 12/12

__Example.__ Suppose we want to find the day of the week for 7/21/1954. The doomsday for 1954 is Sunday, and a nearby doomsday is 7/11. There are 10 days in July between 7/11 and 7/21. Since $10 \bmod 7 = 3$, the date 7/21/1954 falls 3 days after a Sunday, on a Wednesday.

__Exercise 1.3.__ Write a function to determine the day of the week for a given day, month, and year. Be careful of leap years! Your function should return a string such as "Thursday" rather than a number. As usual, document and test your code.

In [34]:
def leap_year(year):
  """
  Determines if the given year is a leap year.
  
  Parameters:
  year (int): The year.
  
  Returns:
  boolean: True if it is a leap year, False otherwise.
  
  Logic referenced from https://support.microsoft.com/en-us/help/214019/method-to-determine-whether-a-year-is-a-leap-year
  
  """
  if (year%4) != 0:
    return False
  elif (year%100) != 0:
    return True
  elif (year%400) == 0:
    return True
  return False
  
leap_year(2001)

from datetime import datetime as dt
  
def day_of_week(date):
  """
  Determines the day of week of the given date.
  
  Parameters:
  date (string): The date given in "M/D/Y" format, for example "1/21/1990".
  
  Returns:
  string: The day of the week.
  
  """
  date_obj = dt.strptime(date, '%m/%d/%Y')
  year = date_obj.year
  month = date_obj.month
  day = date_obj.day
  
  days = ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]
  doomsday_list = [10, 28, 21, 4, 9, 6, 11, 8, 5, 10, 7, 12]
  leap_list = [11, 29, 21, 4, 9, 6, 11, 8, 5, 10, 7, 12]
  leap = leap_year(year)
  
  if leap == True:
    ref_day = leap_list[month - 1]
    d_day = doomsday(year)
    date_length = day - ref_day
    delta = date_length%7
    if (d_day + delta) > 6:
      return days[d_day + delta - 7]
    return days[d_day + delta]
  
  ref_day = doomsday_list[month - 1]
  d_day = doomsday(year)
  date_length = day - ref_day
  delta = date_length%7
  if (d_day + delta) > 6:
    return days[d_day + delta - 7]
  return days[d_day + delta]

day_of_week("1/2/2001")
  
  

'Tuesday'

__Exercise 1.4.__ Davis picks up yard waste on the first Monday of the month.  How many times did the 1st of the month (first day of the month) fall on a Monday in the years 2000-2016 (including 2016)?

In [44]:
# prob rewrite day_of_week and take out the main function of figuring out the day

def first_mondays():
  """
  Computes the amount of times that the 1st of the month has also been a Monday.
  
  Returns:
  int: The number of Mondays that have also been the 1st of the month.
  
  """
  mondays = 0
  start_year = 2000
  for i in range(16):
    for j in range(1,12):
      day = day_of_week(str(j)+"/1/"+str(start_year))
      if  day == "Monday":
        mondays += 1
    start_year = 2000 + i
  return mondays

first_mondays()
        



23

## Part 2: 1978 Birthdays

__Exercise 2.1.__ The file `birthdays.txt` contains the number of births in the United States for each day in 1978. Inspect the file to determine the format. Note that columns are separated by the tab character, which can be entered in Python as `\t`. Write a function that uses iterators and list comprehensions with the string methods `split()` and `strip()` to  convert each line of data to the tuple format

```Python
(month, day, year, count)
```
The elements of this list should be integers, not strings.  Read in the data and create this list of tuples.

In [8]:
from google.colab import drive

drive.mount('/content/gdrive')

#This is because I used Google Colab, a free and remote Jupyter Notebook.
#It is only used to mount the filesystem to access the birthdays.txt file.


Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive


In [0]:
def read_file(file_dir):
  """
  Reads in the given file and strips the whitespace and header from it.
  
  Parameters:
  file_dir: The location of the file that you want to open
  
  Returns:
  lines (list): The file in a list object.
  
  """
  file=open(file_dir)
  lines=[line.strip() for i,line in enumerate(file) if i>5]
  return lines

lines=read_file('/content/gdrive/My Drive/Colab Notebooks/birthdays.txt')

In [0]:
def create_list(file):
  """
  Takes in the raw list generated by read_file and creates the tuple of integers in the format (month, day, year, count).
  
  Parameters:
  file (list): The raw list created by the read_file function.
  
  Returns:
  new_file (list): A list of tuples in the format (month, day, year, count).
  
  """
  new_file = list()
  for line in range(365):
    new_line = file[line]
    new_line = new_line.split("\t")
    new_line[0] = new_line[0].split("/")
    new_line[0].append(new_line[1])
    new_file.append(new_line[0])
    new_file[line] = list(map(int, new_file[line]))
  return new_file

ll = create_list(lines)

__Exercise 2.2.__ 

1. Count the number of birthdays by the month (number of birthdays per month).
2. Count the number of birthdays by the day of the week. 

What conclusions can you draw? You may find the `Counter` class in the `collections` module useful.

In [11]:
import numpy as np

def bday_month(list):
  """
  Computes the number of birthdays per month.
  
  Parameters:
  list (list): The list of birthdays with tuples in the format of (month, day, year, count).
  
  Returns:
  months (array): An array of the sums where index 0 is January and index 1 is February and so forth.
  
  """
  #months = np.zeros(12)
  months = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
  for i in range(0,365):
    months[list[i][0] - 1] = (months[list[i][0] - 1] + list[i][3])
  return months

bday_month(ll)
      
    

[270695,
 249875,
 276584,
 254577,
 270812,
 270756,
 294701,
 302795,
 293891,
 288955,
 274671,
 284927]

In [12]:
import numpy as np

def bday_week(list):
  """
  Computes the number of birthdays per week.
  
  Weeks are defined as starting on Monday and ending on Sunday, as such, the first week of 1978 starts on January 2nd.
  
  Parameters:
  list (list): The list of birthdays with tuples in the format of (month, day, year, count).
  
  Returns:
  weeks (array): An array of the sums where index 0 is Week 1 and index 2 is Week 2 and so forth.
  
  """
  weeks = np.zeros(52)
  day_counter = 0
  week_counter = 0
  for i in range(1, 365):
    #print(week_counter)
    weeks[week_counter] = weeks[week_counter] + list[i][3]
    day_counter += 1
    if day_counter%7 == 0:
      week_counter += 1
  return weeks

bday_week(ll)

array([59157., 61938., 61962., 61601., 61749., 62560., 62313., 62622.,
       62633., 62894., 62332., 61884., 61568., 60786., 58955., 59192.,
       59884., 60252., 60589., 60986., 61975., 61118., 62119., 62216.,
       63697., 64801., 64692., 67131., 68741., 68062., 67218., 68719.,
       68795., 67656., 67346., 67523., 69139., 69733., 69133., 67216.,
       65995., 64331., 63937., 63387., 64357., 64859., 62385., 64789.,
       64356., 65412., 65108., 63735.])

From the results of the amount of birthdays per month, it seems like it is somewhat evenly distributed. The month of August has the most at 302,795 and the month of February has the lowest at 249,875. 

This is also reflected in the amount of birthdays per week. Each week is within a few thousand birthdays with the lowest being week 15 (April 3 to April 9) at 58,955 and the highest being week 38 (September 18 to September 24) at 69,733.