# Question 1:

### Part A:
I usually spend 3 weeks at home in the winter, 10 weeks at home in the summer, 4 days at home for Thanksgiving, and 9 days at home for spring break, totalling 104 days. 

I also usually spend about 5 of those days on a trip to the beach with my family, leaving 99 days sleeping at home.

This leaves 261 days at college, of which I usually spend 2 nights travelling with outdoors club in the mountains.

Thus, the prior probabilities are:

P(Home) = 99/365 = 27.1%

P(Traveling) = 7/365 = 1.9%

P(College) = 259/365 = 71.0%


### Part B:

Given that I hear the ocean, we need to find the likelihood of hearing the ocean in each location:

P(ocean sounds | home) = 0/99, since I do not live on the water and no one in my family would be watching a movie about the ocean in the morning.

P(ocean sounds | vacation) = 5/7, since I travel to the beach 5 out of the 7 days I travel.

P(ocean sounds | college) = 0/259, since Pitt is not near the ocean and my roommate would not be watching a movie about the ocean in the morning.

We can also say that P(ocean sounds) = 5/365, since that is how many days I spend 



### Result for part B:

Thus, the probabilities of me waking up in different places from Bayes Thm:

P(Home | ocean sounds) = P(ocean sounds | home) * P(Home) / P(ocean sounds) = 0

P(Vacation | ocean sounds) = P(ocean sounds | vacation) * P(vacation) / P(ocean sounds) = (5/7) * (7/365) / (5/365) = 1

P(College | ocean sounds) = P(ocean sounds | College) * P(college) / P(ocean sounds) = 0

#### Thus, there is 100% probability that I am on vacation if I hear ocean sounds in the morning.

# Question 2:

#### Prior used (unless uniform=True): 

A reasonable first guess is the uniform prior, since we know the order of magnitude of p (between 0 and 1), and we don't need transformation invariance since we are going to stick with proportions. Also, we can reasonably assume that values of p near 0.5 are at least as likely as values near 0 or 1 (elections in the US are almost never landslides), while the Jeffreys prior (and Haldane) assume that values near 0 or 1 are more likely. However, we also know that Pennsylvania is usually a battleground state, so we should actually choose a distribution that peaks near 0.5 (not uniform). One option is to use the ratio of registered democrats to registered republicans in a binomial prior, but this requires an assumption about the ratio of democratic-leaning to republican-leaning independents (same as ratio of party-affiliated voters) as well as an assumption about turnout, both of which we we cannot draw a conclusion about. A better option, especially since one of the candidates is an incumbent, is to use the proportion from the previous election as a binomial prior, and treat the poll as an "update" to our belief about the system since then (slide 29, lecture 6). We can weight the previous election lightly to reflect the possibility that voter preferences have changed in the meantime. A calculation of the weight is made by assuming the following consistency condition: in 95% of elections, the proportion of a candidate's votes will be within 0.15 of the previous election's results for the candidate from the same party. This is consistent with previous results in Pennsylvania in the era known as the Sixth Party System (1972-current). Thus, we can assume that 2$\sigma$ = 0.15N, where N is the effective "sample size" for our prior indicating our degree of belief, and calculate N using the formula for binomial $\sigma$ = $\sqrt{N*p*(1-p)}$. This yields an effective N ~ 44.3 for the prior for Biden, and N ~ 44.4 for the prior for Trump.

In [17]:
import election_polling as ep

#### Distribution of results:

In [18]:
print("Probability of poll accuracy for democrats:\n" + 
      f"+/- 0.01: {ep.dem_prob_2020(772, 394, 0.01)}\n\
    0.02: {ep.dem_prob_2020(772, 394, 0.02)}\n\
    0.03: {ep.dem_prob_2020(772, 394, 0.03)}\n\
    0.04: {ep.dem_prob_2020(772, 394, 0.04)}\n\
    0.05: {ep.dem_prob_2020(772, 394, 0.05)}\n\
    0.1: {ep.dem_prob_2020(772, 394, 0.1)}\n")

print("Probability of poll accuracy for republicans:\n" + 
      f"+/- 0.01: {ep.rep_prob_2020(772, 394, 0.01)}\n\
    0.02: {ep.rep_prob_2020(772, 394, 0.02)}\n\
    0.03: {ep.rep_prob_2020(772, 394, 0.03)}\n\
    0.04: {ep.rep_prob_2020(772, 394, 0.04)}\n\
    0.05: {ep.rep_prob_2020(772, 394, 0.05)}\n\
    0.1: {ep.rep_prob_2020(772, 394, 0.1)}\n")

Probability of poll accuracy for democrats:
+/- 0.01: 0.430348414047545
    0.02: 0.7446368270397548
    0.03: 0.9121490550848238
    0.04: 0.9772182206286344
    0.05: 0.99560084120987
    0.1: 0.9999892980350744

Probability of poll accuracy for republicans:
+/- 0.01: 0.4312138264761985
    0.02: 0.7457027457042842
    0.03: 0.9128607380509826
    0.04: 0.9775227001070578
    0.05: 0.9956882916825537
    0.1: 0.999988490778047



Fortunately, for the Marist poll we are in the limit where the prior does not change probabilities much, since switching to a uniform prior gives only slightly less-strong constraints:

In [19]:
print("Probability of poll accuracy for democrats, uniform prior:\n" + 
      f"+/- 0.01: {ep.dem_prob_2020(772, 394, 0.01, True)}\n\
    0.02: {ep.dem_prob_2020(772, 394, 0.02, True)}\n\
    0.03: {ep.dem_prob_2020(772, 394, 0.03, True)}\n\
    0.04: {ep.dem_prob_2020(772, 394, 0.04, True)}\n\
    0.05: {ep.dem_prob_2020(772, 394, 0.05, True)}\n\
    0.1: {ep.dem_prob_2020(772, 394, 0.1, True)}\n")

print("Probability of poll accuracy for republicans, uniform prior:\n" + 
      f"+/- 0.01: {ep.rep_prob_2020(772, 355, 0.01, True)}\n\
    0.02: {ep.rep_prob_2020(772, 355, 0.02, True)}\n\
    0.03: {ep.rep_prob_2020(772, 355, 0.03, True)}\n\
    0.04: {ep.rep_prob_2020(772, 355, 0.04, True)}\n\
    0.05: {ep.rep_prob_2020(772, 355, 0.05, True)}\n\
    0.1: {ep.rep_prob_2020(772, 355, 0.1, True)}\n")

Probability of poll accuracy for democrats, uniform prior:
+/- 0.01: 0.4220674908060955
    0.02: 0.7343537988221598
    0.03: 0.9051869849409073
    0.04: 0.9741781370790087
    0.05: 0.9947009875819788
    0.1: 0.9999918862007855

Probability of poll accuracy for republicans, uniform prior:
+/- 0.01: 0.4231596156635507
    0.02: 0.7356964339370274
    0.03: 0.9060557008020031
    0.04: 0.9744868183488867
    0.05: 0.9946991091171776
    0.1: 0.9998575837553266



#### Evaluating election results 
We know that the Biden result differed from the poll by 0.0099, and the Trump result differed from the poll by 0.028.

In [20]:
ep.dem_prob_2020(772, 394, 0.0099)

0.4264821062539621

In [21]:
ep.rep_prob_2020(772, 355, 0.028)

0.8909814592614466

For the Biden vote, the Marist poll seems pretty accurate. The probability of the Biden election result being closer to the poll than observed is only 42.6%.

For the Trump vote, it seems that the Marist poll made a slight underestimate, albeit not a statistically significant one. The probability of the Trump election result being closer to the poll than observed is 89%, which means that had the polling been accurate, the election result would likely have been closer to the poll's prediction than it turned out to be. However, this is below the 95% threshold for statistical significance. 

Thus, the Biden result is consistent with polling, while the Trump result is not, but not significantly so.