# Stat 165/265 Homework 2 Lab: Proper and Improper Scoring Rules

Scoring rules, or score functions, measure how good a probabilistic forecast is. As we saw in lecture, generally we want to use strictly proper scoring rules to compare forecasts, in order to extract people's true beliefs. This notebook will touch on a couple different scoring rules, both proper and improper, to explore how they differ and how improper rules can be "gamed."

## Gradescope Submission
To submit this assignment, rerun the notebook from scratch (by selecting Kernel > Restart & Run all), and then print as a pdf (File > download as > pdf) and submit it to Gradescope.


**This assignment should be completed and submitted before Tuesday, January 30th, 2024 at 11:59 PM**

# I. Naive Linear Score

Imagine we have a biased coin that has a 60% chance of Heads and 40% chance of Tails and we ask forecasters to give a probability distribution for the outcome of the coin flip, i.e. a vector $q = (q_H, q_T)$ where $q_H + q_T = 1$. As mentioned in lecture, here's a simple linear score $f$ we could define:

$$f(q, i) = q_i$$

or in other words, you get $q_H$ points if the coin lands Heads and $q_T$ points if the coin lands Tails. To get a feel for this score, we've implemented an interactive widget below that outputs your average score over many coin flips, when you input a forecast for the probability of heads. Try it out for a few different forecasts and numbers of flips.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact_manual
import ipywidgets as widgets

In [None]:
def submit_forecast(heads_forecast, num_flips):
    flips = np.random.rand(num_flips)
    heads = (flips < 0.6).sum()
    tails = num_flips - heads
    avg_score = (heads_forecast * heads + (1 - heads_forecast) * tails)/num_flips
    return (f"Score over {num_flips} coin flips: ", avg_score)

In [None]:
game = interact_manual(submit_forecast,
                       heads_forecast=widgets.FloatSlider(min=0, max=1, step=.01, value=.5),
                       num_flips=widgets.IntSlider(value=50))
game;

Over many coin flips, we can compute the expected score for any given vector $q$, which we've implemented below.

In [None]:
# Here we define a helper function to make sure that we're only working with valid probability distributions
def check_distributions(q, true_dist):
    assert(np.sum(q) == 1 and np.sum(true_dist) == 1), "Probabilities must sum to 1."
    assert(len(q) == len(true_dist))

In [None]:
def expected_linear_score(q, true_dist):
    check_distributions(q, true_dist)

    return np.dot(q, true_dist)

If you know that the coin is biased 60-40, and you want to get the highest score according to this linear score, what forecast should you give as your $q$? You could report $q = (0.6, 0.4)$ and we see that this gives an expected score of 0.52. Can you find a $q$ that does better than this?

In [None]:
biased_coin = (0.6, 0.4)

q = (0.6, 0.4)
expected_linear_score(q, biased_coin)

### Q1. What's the optimal $q$ for this naive linear score and what score does it get?

Your answer here.

### Q2. Plot the naive linear score for all possible forecasts $q$. Is the naive linear score a proper scoring rule? Explain why or why not.

Your answer here.

# II. Logarithmic Score

As we saw in lecture, Good's logarithmic scoring rule is defined as:

$$f(q, i) = \log(q_i)$$

We've implemented a function to calculate the expected logarithmic score of a coin flip forecast below. Again, a higher score is better.

In [None]:
def expected_log_score(q, true_dist):
    check_distributions(q, true_dist)
    assert(min(q) > 0 and min(true_dist) > 0), "Choose nonzero probabilities" # avoid undefined scores

    return np.dot(np.log(q), true_dist)

We'll continue using the biased coin that lands Heads 60% of the time.

In [None]:
q = (0.6, 0.4)
expected_log_score(q, biased_coin)

### Q3. What's the logarithmic score for $q = (0.2, 0.8)$? For $q = (0.5, 0.5)$? For $q = (0.999, 0.001)$?

Your answer here.

### Q4. Plot the logarithmic score for all possible forecasts $q$.

Your answer here.

### Q5. What's the optimal $q$ for the logarithmic score, when forecasting the biased coin? Is the logarithmic score a proper scoring rule?

Your answer here.

# III. Brier's Quadratic Score

Also seen in lecture, Brier's quadratic scoring rule is defined as:

$$f(q, i) = -\sum_{j=1}^n (q_j - o_{ij})^2 $$

where $o_{ij} = 1$ if $i = j$ and 0 otherwise. This expression can be equivalently written as:

$$f(q, i) = 2q_i - 1 - \sum_{j=1}^n q_j^2$$

### Q6. Implement the function `expected_quad_score` below to calculate the expected quadratic score given a forecast and a true distribution.

In [None]:
def expected_quad_score(q, true_dist):
    check_distributions(q, true_dist)

    score = # Your code here

    return score

### Q7. Plot the quadratic score for all possible forecasts for the biased coin.

Your answer here.

# IV. Logarithmic vs. Quadratic Scoring Rules

### Q8. Rare events.
Imagine you and your friend Alice are competing to see who is the better forecaster. You know that you are generally better than Alice at being precise about the probabilities of rare events -- for example, once something is rarer than a 1 in 100 chance, Alice treats that as having a 1 in 100 probability, while you are good at differentiating between things that are 1 in 100 and those that are 1 in a 10 thousand. **If you want a forecasting competition to show your advantage on this, would you rather have the scoring rule be logarithmic or quadratic? Why?**


Your answer here.

### Q9. Precision in the middle

Now imagine you are competing against your friend Bob. He's quite good at estimating rare event probabilities. However, you notice that when an two things both have substantial probabililty of happening, he tends to just forecast 50%, while you are good at evaluating considerations that push you to make more precise forecasts, such as 42-58. **Would you rather compete with Bob on a log score or quadratic score? Why?**

Your answer here.

### Q10. More than two options

Imagine you are forecasting which of the Final Four teams (Teams A, B, C, and D) in a tournament will be the champion. You believe the teams chances are, in order: 5%, 10%, 60%, 25%. You submit this forecast, and Team C wins, so you are pretty happy you put a high probability on that outcome! Then you learn that someone else who also put a 60% chance on Team C got a higher quadratic score than you. **How is this possible? Give an example of a forecast that would beat your quadratic score.**

Your answer here.

### Submission

Follow the instructions at the top, which detail how to submit this notebook to Gradescope.