# STA 141B: Homework 1
Winter 2018

## Information

After the colons (in the same line) please write just your first name, last name, and the 9 digit student ID number below.

First Name: Timothy

Last Name: Murphy

Student ID: 912614348

## Instructions

We use a script that extracts your answers by looking for cells in between the cells containing the exercise statements.  So you 

- MUST add cells in between the exercise statements and add answers within them and
- MUST NOT modify the existing cells, particularly not the problem statement

To make markdown, please switch the cell type to markdown (from code) - you can hit 'm' when you are in command mode - and use the markdown language.  For a brief tutorial see: https://daringfireball.net/projects/markdown/syntax


## Part 1: The Doomsday Algorithm

The Doomsday algorithm, devised by mathematician J. H. Conway, computes the day of the week any given date fell on. The algorithm is designed to be simple enough to memorize and use for mental calculation.

__Example.__ With the algorithm, we can compute that July 4, 1776 (the day the United States declared independence from Great Britain) was a Thursday.

The algorithm is based on the fact that for any year, several dates always fall on the same day of the week, called the <em style="color:#F00">doomsday</em> for the year. These dates include 4/4, 6/6, 8/8, 10/10, and 12/12.

__Example.__ The doomsday for 2016 is Monday, so in 2016 the dates above all fell on Mondays. The doomsday for 2017 is Tuesday, so in 2017 the dates above will all fall on Tuesdays.

The doomsday algorithm has three major steps:

1. Compute the anchor day for the target century.
2. Compute the doomsday for the target year based on the anchor day.
3. Determine the day of week for the target date by counting the number of days to the nearest doomsday.

Each step is explained in detail below.

### The Anchor Day

The doomsday for the first year in a century is called the <em style="color:#F00">anchor day</em> for that century. The anchor day is needed to compute the doomsday for any other year in that century. The anchor day for a century $c$ can be computed with the formula:
$$
a = \bigl( 5 (c \bmod 4) + 2 \bigr) \bmod 7
$$
The result $a$ corresponds to a day of the week, starting with $0$ for Sunday and ending with $6$ for Saturday.

__Note.__ The modulo operation $(x \bmod y)$ finds the remainder after dividing $x$ by $y$. For instance, $12 \bmod 3 = 0$ since the remainder after dividing $12$ by $3$ is $0$. Similarly, $11 \bmod 7 = 4$, since the remainder after dividing $11$ by $7$ is $4$.

__Example.__ Suppose the target year is 1954, so the century is $c = 19$. Plugging this into the formula gives
$$a = \bigl( 5 (19 \bmod 4) + 2 \bigr) \bmod 7 = \bigl( 5(3) + 2 \bigr) \bmod 7 = 3.$$
In other words, the anchor day for 1900-1999 is Wednesday, which is also the doomsday for 1900.

__Exercise 1.1.__ Write a function that accepts a year as input and computes the anchor day for that year's century. The modulo operator `%` and functions in the `math` module may be useful. Document your function with a docstring and test your function for a few different years.  Do this in a new cell below this one.

In [2]:
def century_anchor_day(year):
    """
    This function accepts a year as input
    and computes the anchor day for that year's century.
    With the output corresponding to the following days:
    0 - Sunday, 1 - Monday, 2 - Tuesday, 3 - Wendesday,
    4 - Thursday, 5 - Friday, 6 - Saturday
    """

    # anchor day for the given century
    century = int(str(year)[:2])
    a = (5 * (century % 4) + 2) % 7
    return a


# INPUT YEAR HERE
anchor_day1 = century_anchor_day(int(input("Please enter a year (yyyy):")))
print("The anchor day is:", anchor_day1)

Please enter a year (yyyy):1995
The anchor day is: 3


### The Doomsday

Once the anchor day is known, let $y$ be the last two digits of the target year. Then the doomsday for the target year can be computed with the formula:
$$d = \left(y + \left\lfloor\frac{y}{4}\right\rfloor + a\right) \bmod 7$$
The result $d$ corresponds to a day of the week.

__Note.__ The floor operation $\lfloor x \rfloor$ rounds $x$ down to the nearest integer. For instance, $\lfloor 3.1 \rfloor = 3$ and $\lfloor 3.8 \rfloor = 3$.

__Example.__ Again suppose the target year is 1954. Then the anchor day is $a = 3$, and $y = 54$, so the formula gives
$$
d = \left(54 + \left\lfloor\frac{54}{4}\right\rfloor + 3\right) \bmod 7 = (54 + 13 + 3) \bmod 7 = 0.
$$
Thus the doomsday for 1954 is Sunday.

__Exercise 1.2.__ Write a function that accepts a year as input and computes the doomsday for that year. Your function may need to call the function you wrote in exercise 1.1. Make sure to document and test your function.

In [3]:
import math
def doomsday_target_year(year):
    """
    This function accepts a year as input
    and computes the doomsday for that year.
    With the output corresponding to the following days:
    0 - Sunday, 1 - Monday, 2 - Tuesday, 3 - Wendesday,
    4 - Thursday, 5 - Friday, 6 - Saturday
    """

    anchor_day2 = century_anchor_day(year)
    target_year = int(str(year)[-2:])

    # doomsday for target year
    dday = math.floor((target_year + (target_year / 4) + anchor_day2) % 7)
    return dday


# INPUT YEAR HERE
dooms_day1 = doomsday_target_year(int(input("Please enter a year (yyyy):")))
print("The dooms day is:",dooms_day1)

Please enter a year (yyyy):1995
The dooms day is: 2


### The Day of Week

The final step in the Doomsday algorithm is to count the number of days between the target date and a nearby doomsday, modulo 7. This gives the day of the week.

Every month has at least one doomsday:
* (regular years) 1/10, 2/28
* (leap years) 1/11, 2/29
* 3/21, 4/4, 5/9, 6/6, 7/11, 8/8, 9/5, 10/10, 11/7, 12/12

__Example.__ Suppose we want to find the day of the week for 7/21/1954. The doomsday for 1954 is Sunday, and a nearby doomsday is 7/11. There are 10 days in July between 7/11 and 7/21. Since $10 \bmod 7 = 3$, the date 7/21/1954 falls 3 days after a Sunday, on a Wednesday.

__Exercise 1.3.__ Write a function to determine the day of the week for a given day, month, and year. Be careful of leap years! Your function should return a string such as "Thursday" rather than a number. As usual, document and test your code.

In [4]:
def day_of_the_week(t_month, t_day, t_year):
    """
    This function to determine the day of
    the week for a given day, month, and year.
    """
    dd = doomsday_target_year(t_year)

    # days of the week, doom days for regular year, and for leap year
    day_names = [
        "Sunday",
        "Monday",
        "Tuesday",
        "Wednesday",
        "Thursday",
        "Friday",
        "Saturday"
    ]

    reg_doom_days = [10, 28, 21, 4, 9, 6, 11, 8, 5, 10, 7, 12]
    leap_doom_days = [11, 29, 21, 4, 9, 6, 11, 8, 5, 10, 7, 12]

    # boolean check for leap year
    if((t_year % 4 == 0 and t_year % 100 != 0) or t_year % 400 == 0):
        dmonth = leap_doom_days[t_month - 1]
        dshift = abs(dmonth - t_day) % 7

    else:
        dmonth = reg_doom_days[t_month - 1]
        dshift = abs(dmonth - t_day) % 7
    
    # vary how shifting day dependend where the target day is within the month
    if t_day > dmonth:
        numerical_dow = (dd + dshift) % 7

    else:
        numerical_dow = (dd - dshift) % 7

    dow = day_names[numerical_dow]
    
    return(dow)


# input date here:
mm = int(input("Please enter a month in the following format (mm):"))
dd = int(input("Please enter a day in the following format (dd):"))
yyyy = int(input("Please enter a year in the following format (yyyy):"))

print(day_of_the_week(mm, dd, yyyy))

Please enter a month in the following format (mm):02
Please enter a day in the following format (dd):19
Please enter a year in the following format (yyyy):1995
Sunday


__Exercise 1.4.__ How many times did Friday the 13th occur in the years 1900-1999? Does this number seem to be similar to other centuries?

In [5]:
init_1 = 0  # initialize the date counter
for year in range(1900, 2000):
    for month in range(1, 13):  # loop through months 1-12 given the different dd's
        if day_of_the_week(month, 13, year) == "Friday":  # calc number of days
            init_1 += 1
num_fridays_1 = init_1
print("Between 1900-1999, Friday the 13th occured a total of",num_fridays_1, "times.")


Between 1900-1999, Friday the 13th occured a total of 172 times.


__Exercise 1.5.__ How many times did Friday the 13th occur between the year 2000 and today?

In [6]:
import datetime
# import current date using datetime module
# split the month, day, year into separate variables
date_call = datetime.date.today()
current_month = int(str(date_call)[5:7])
current_day = int(str(date_call)[8:])
current_year = int(str(date_call)[:4])

# print(current_month), print(current_day), print(current_year)

init_2 = 0  # initialize the date counter
# loop through initial year to current_year - 1
for year in range(2000, current_year):
    # loop through months 1 - current month given the different dd's
    for month in range(1, 13):
        if day_of_the_week(month, 13, year) == "Friday":  # calc number of days
            init_2 += 1

# add the months within the current year
for month in range(1, current_month + 1):
    if day_of_the_week(month, 13, current_year) == "Friday":
        init_2 += 1

num_fridays_2 = init_2
print("Since the year 2000, Friday the 13th occured a total of",num_fridays_2, "times.")


Since the year 2000, Friday the 13th occured a total of 31 times.


## Part 2: 1978 Birthdays

__Exercise 2.1.__ The file `birthdays.txt` contains the number of births in the United States for each day in 1978. Inspect the file to determine the format. Note that columns are separated by the tab character, which can be entered in Python as `\t`. Write a function that uses iterators and list comprehensions with the string methods `split()` and `strip()` to  convert each line of data to the list format

```Python
[month, day, year, count]
```
The elements of this list should be integers, not strings. The function `read_birthdays` provided below will help you load the file.

In [8]:
def read_birthdays(file_path):
    """Read the contents of the birthdays file into a string.

    Arguments:
        file_path (string): The path to the birthdays file.

    Returns:
        string: The contents of the birthdays file.
    """
    # open file and read in as data_file
    with open(file_path) as data_file:
        text_file = data_file.readlines()  # read the file into a list
        num_lines = sum(1 for line in text_file)  # count the num of lines

        bday_file = []  # initialize empty list

        # create a list of strings that strips and replaces with "\n"
        for bday in range(1, num_lines):
            text_file[bday] = text_file[bday].strip()
            text_file[bday] = text_file[bday].replace("\t", "/")
            text_file[bday] = text_file[bday].replace("/", ",")
            bday_file.append(text_file[bday])

        # initialize empty list to append lists to
        nbday_file = []

        # append the lists to "nbday_file"
        for elem in bday_file:
            nbday_file.append(elem.split(","))

        # convert to integers
        nbday_file = [[int(elem) for elem in indx] for indx in nbday_file]

    return nbday_file


nbday_file = read_birthdays("/Users/tmm/Desktop/STA141B/hw1/final_product/birthdays.txt")
print(nbday_file)

[[1, 2, 78, 7527], [1, 3, 78, 8825], [1, 4, 78, 8859], [1, 5, 78, 9043], [1, 6, 78, 9208], [1, 7, 78, 8084], [1, 8, 78, 7611], [1, 9, 78, 9172], [1, 10, 78, 9089], [1, 11, 78, 9210], [1, 12, 78, 9259], [1, 13, 78, 9138], [1, 14, 78, 8299], [1, 15, 78, 7771], [1, 16, 78, 9458], [1, 17, 78, 9339], [1, 18, 78, 9120], [1, 19, 78, 9226], [1, 20, 78, 9305], [1, 21, 78, 7954], [1, 22, 78, 7560], [1, 23, 78, 9252], [1, 24, 78, 9416], [1, 25, 78, 9090], [1, 26, 78, 9387], [1, 27, 78, 8983], [1, 28, 78, 7946], [1, 29, 78, 7527], [1, 30, 78, 9184], [1, 31, 78, 9152], [2, 1, 78, 9159], [2, 2, 78, 9218], [2, 3, 78, 9167], [2, 4, 78, 8065], [2, 5, 78, 7804], [2, 6, 78, 9225], [2, 7, 78, 9328], [2, 8, 78, 9139], [2, 9, 78, 9247], [2, 10, 78, 9527], [2, 11, 78, 8144], [2, 12, 78, 7950], [2, 13, 78, 8966], [2, 14, 78, 9859], [2, 15, 78, 9285], [2, 16, 78, 9103], [2, 17, 78, 9238], [2, 18, 78, 8167], [2, 19, 78, 7695], [2, 20, 78, 9021], [2, 21, 78, 9252], [2, 22, 78, 9335], [2, 23, 78, 9268], [2, 24, 7

__Exercise 2.2.__ Which month had the most births in 1978? Which day of the week had the most births? Which day of the week had the fewest? What conclusions can you draw? You may find the `Counter` class in the `collections` module useful.

In [7]:
the_list = nbday_file.copy()


def most_births(the_list):
    """
    Returns the month with the greatest number of birthdays in the
    year 1978
    """

    # initialize the dict which will hold the count for bdays in each month
    dic_1 = {}

    # define dictionary with names of months to point to in the end
    m_dic = {
        1: "January",
        2: "February",
        3: "March",
        4: "April",
        5: "May",
        6: "June",
        7: "July",
        8: "August",
        9: "September",
        10: "October",
        11: "November",
        12: "December"
    }

    # loop through the birthdays and sum the counts for matching month
    # append to a list and then update the initial dict "dic_1"
    for bday in the_list:
        for month in range(1, 13):
            if month == bday[0]:
                dic_1.setdefault(month, []).append(bday[3])

    # sum the counts
    new_dic_1 = {key: sum(occurrences) for key, occurrences in dic_1.items()}
    months = new_dic_1
    high_month = max(months, key=lambda key: months[key])
    high_month = m_dic[high_month]

    print("The month with the greatest number of births is:", high_month)


most_births(the_list)


def day_spectrum(the_list):
    """
    Returns the day of the week with the greatest number of birthdays in the
    year 1978.
    """

    # initialize the dict which will hold the count for bdays in each week
    dic_2 = {}

    # define a list with the days of the week as reference
    dow = [
        "Sunday",
        "Monday",
        "Tuesday",
        "Wednesday",
        "Thursday",
        "Friday",
        "Saturday"
    ]

    # loop through the birthdays and sum the counts for matching day
    # append to a list and then update the initial dict "dic_2"
    for bday in the_list:
        for day in dow:
            if day_of_the_week(bday[0], bday[1], 1978) == day:
                dic_2.setdefault(day, []).append(bday[3])

    # sum the counts
    new_dic_2 = {key: sum(occurrences) for key, occurrences in dic_2.items()}
    day = new_dic_2
    high_day = max(day, key=lambda key: day[key])  # max day
    low_day = min(day, key=lambda key: day[key])  # min day
    
    print("Most births:", high_day, "\nLeast births:", low_day)

day_spectrum(the_list)





The month with the greatest number of births is: August
Most births: Tuesday 
Least births: Sunday


__Exercise 2.3.__ What would be an effective way to present the information in exercise 2.2? You don't need to write any code for this exercise, just discuss what you would do.

An effective way to present the information in exercise 2.2 would be to implement a data visualization such as a "bar graph". This would provide a successful representation of the output due to the closeness in counts for months and days of the week. Using a bar graph allows one to adjust the scale of the axis' so that the differences in values are more apparent. 