<a href="https://githubtocolab.com/fuszti/advent_of_code_2022/blob/main/day_01/AoC_2022_Day_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/></a>

<details>
<summary>What is this notebook?</summary>
The [Advent of Code](https://adventofcode.com/2022) is an advent calendar with programming tasks. You have to solve 2 algorithmic problems on each day. I challenge myself to solve them by data scientist tools. So you will see pandas, numpy, torch or datatable tricks here. The tasks are not data scientist tasks, so you can find easier or faster solutions. Perhaps you sometimes find my solutions too artificial. But I try to use the data scientist tool as meaningful way as I can.

In [my repository](https://github.com/fuszti/advent_of_code_2022) you find the input.txt file for each day. You can upload that to here, so you can run the code on big input.
</details>

In [None]:
#@title Creating example small input file { display-mode: "form" }
small_input_text = \
"""1000
2000
3000

4000

5000
6000

7000
8000
9000

10000"""
with open("small_input.txt", "w") as small_file:
    small_file.write(small_input_text)

# Task 1
Source: https://adventofcode.com/2022/day/1
<details>
  <summary>Show me the description of the task 1</summary>
The jungle must be too overgrown and difficult to navigate in vehicles or access from the air; the Elves' expedition traditionally goes on foot. As your boats approach land, the Elves begin taking inventory of their supplies. One important consideration is food - in particular, the number of Calories each Elf is carrying (your puzzle input).

The Elves take turns writing down the number of Calories contained by the various meals, snacks, rations, etc. that they've brought with them, one item per line. Each Elf separates their own inventory from the previous Elf's inventory (if any) by a blank line.

For example, suppose the Elves finish writing their items' Calories and end up with the following list:

```
1000
2000
3000

4000

5000
6000

7000
8000
9000

10000
```
This list represents the Calories of the food carried by five Elves:

The first Elf is carrying food with 1000, 2000, and 3000 Calories, a total of 6000 Calories.
The second Elf is carrying one food item with 4000 Calories.
The third Elf is carrying food with 5000 and 6000 Calories, a total of 11000 Calories.
The fourth Elf is carrying food with 7000, 8000, and 9000 Calories, a total of 24000 Calories.
The fifth Elf is carrying one food item with 10000 Calories.
In case the Elves get hungry and need extra snacks, they need to know which Elf to ask: they'd like to know how many Calories are being carried by the Elf carrying the most Calories. In the example above, this is 24000 (carried by the fourth Elf).

Find the Elf carrying the most Calories. How many total Calories is that Elf carrying?

Your puzzle answer was 69177.
</details>

In [None]:
from itertools import zip_longest
import numpy as np

In [None]:
def solve_task_1(input_file_name):
    calories_for_each_elf = read_calories_into_matrix(input_file_name)
    return calories_for_each_elf.sum(axis=0).max()

def read_calories_into_matrix(input_file_name):
    """
    Returns matrix, where nth column contains the items' calories of nth elv
    """
    calories = read_calories_into_lists(input_file_name)
    calories_matrix = np.array(list(zip_longest(*calories, fillvalue=0)))
    return calories_matrix

def read_calories_into_lists(input_file_name):
    calories = [[]]
    with open(input_file_name, "r") as input_file:
        for item_calories in input_file:
            try:
                item_calories_int = int(item_calories)
                calories[-1].append(item_calories_int)
            except:
                calories.append([])
    return calories

In [None]:
task_input = "small_input.txt" # "input.txt"
solve_task_1(task_input)

## Key tricks
The numpy sum and max function is a natural choice for the task. But the question is that how I can create a matrix from the input file. Where the elves' list have different lengths.

This is a padding problem. A comfortable tool for this is the zip_longest fucntion from the itertools.

Source: https://stackoverflow.com/questions/63879780/numpy-array-from-list-of-lists-with-different-length-padding

# Task 2

Source: https://adventofcode.com/2022/day/1
<details>
  <summary>Show me the description of the task 2</summary>
  
By the time you calculate the answer to the Elves' question, they've already realized that the Elf carrying the most Calories of food might eventually run out of snacks.

To avoid this unacceptable situation, the Elves would instead like to know the total Calories carried by the top three Elves carrying the most Calories. That way, even if one of those Elves runs out of snacks, they still have two backups.

In the example above, the top three Elves are the fourth Elf (with 24000 Calories), then the third Elf (with 11000 Calories), then the fifth Elf (with 10000 Calories). The sum of the Calories carried by these three elves is 45000.

Find the top three Elves carrying the most Calories. How many Calories are those Elves carrying in total?

Your puzzle answer was 207456.
</details>

In [None]:
def solve_task_2(input_file_name):
    calories_for_each_elf = read_calories_into_matrix(input_file_name)
    sum_of_calories = calories_for_each_elf.sum(axis=0)
    ordered_indices = np.argsort(sum_of_calories, axis=-1, 
                                 kind='quicksort', order=None)
    return sum_of_calories[ordered_indices[-3:]].sum()

In [None]:
task_input = "small_input.txt" # "input.txt"
solve_task_2(task_input)

## Key tricks
We have to find the top k elements in the sum vector. The easy solution comes from the using of the numpy argsort function that returns the sorted indices based on the values. So the first element in the result vector is the index of the smallest element in the sum vector, the last one is the index of the biggest element. Therefore we can select the top 3 element from the sum vector by indexing the last 3 elements from this ordered index array.

Source:

https://www.geeksforgeeks.org/how-to-get-the-n-largest-values-of-an-array-using-numpy/
https://numpy.org/doc/stable/reference/generated/numpy.argsort.html