In [None]:
with open("day07.input") as file:
    heights = [int(number) for number in file.readline().split(",")]

# Part 1

We are minimizing $\sum_i |e_i|$ (absolute value of residuals), so the solution should be the median of the heights.

For a proof that median minimizes MAE, see e.g. [math.stackexchange.com](https://math.stackexchange.com/questions/113270/the-median-minimizes-the-sum-of-absolute-deviations-the-ell-1-norm).

In [None]:
import statistics

target = int(statistics.median(heights))
sum(abs(height - target) for height in heights)

# Part 2

If the height difference is $n$, the cost of moving is $c = 1 + 2 + 3 + \dots + n = \sum_{i=1}^n i = n(n + 1) / 2$

For a proof of the above, see e.g. [math.stackexchange.com](https://math.stackexchange.com/questions/2260/proof-1234-cdotsn-fracn-timesn12).

...and BTW: For a range of consecutive integers that doesn't start at 1, the above formula can be generalized to:

$\sum_{i=a}^b i = (b - a + 1)(b + a)/2$

In [None]:
def cost(heights, common_height):
    total_cost = 0
    for height in heights:
        diff = abs(height - common_height)
        total_cost += diff * (diff + 1) / 2
    return int(total_cost)

We are basically minimizing $\sum_i e_i^2$ (squared residuals), so the target height should be close to the mean of the heights.

For a proof that the mean minimizes MSE, see e.g. [math.stackexchange.com](https://math.stackexchange.com/questions/2554243/understanding-the-mean-minimizes-the-mean-squared-error)).

We check a few numbers on each side of the mean, just to be on the safe side :).

In [None]:
import statistics

heights_mean = int(statistics.mean(heights))
min(cost(heights, test) for test in range(heights_mean - 2, heights_mean + 2))