# Day 7: No Space Left On Device

Input is command history from a shell on a device. Parse it to answer:

_Find all of the directories with a total size of at most 100000. What is the sum of the total sizes of those directories?_

In the example (copied below) the answer is `a` and `e`; the sum of their total sizes is 95437 (94853 + 584).
Files can be counted more than once.




## Example 

```
$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k
```

Implies this tree

```
- / (dir)
  - a (dir)
    - e (dir)
      - i (file, size=584)
    - f (file, size=29116)
    - g (file, size=2557)
    - h.lst (file, size=62596)
  - b.txt (file, size=14848514)
  - c.dat (file, size=8504156)
  - d (dir)
    - j (file, size=4060174)
    - d.log (file, size=8033020)
    - d.ext (file, size=5626152)
    - k (file, size=7214296)`
```

In [1]:
from __future__ import annotations
from dataclasses import dataclass
from typing import Dict
from typing import Optional

In [2]:
SUM_OF_SIZES : int = 100000

In [3]:
@dataclass
class Node:
    name : str
    parent : Optional[Node]
    size : Optional[int]
    contents : Optional[Dict[Node]]


## History format

Every line in the history file describes either a command or a file system entry.

### $ cd

Manipulates the current working directory.

#### `$ cd ..`

cwd = cwd.parent

#### `$ cd /`

cwd = _root_

Only happens on the first line. Could assume that's always the case for simplicity.

#### `$ cd `_name_

cwd = _name_

### ls

Lines that follow describe file system entries, up to the next line starting with `$`.

### Filesystem entries

Two kinds, files and directories. Each has a name.

#### _number_ _name_

A file entry whose size is _number_.

#### `dir` _name_

A directory entry.

In [4]:
def load_data(filename: str) -> Node:
    with open(filename) as f:
        root = None
        cwd = None
        for line in f.readlines():
            line = line.split()
            match line:
                case ('$','cd','/'):
                    root = Node(name='/', parent=None, size=None, contents={})
                    cwd = root
                case ('$','cd','..'):
                    cwd = cwd.parent
                case ('$','cd',name):
                    cwd = cwd.contents[name]
                case ('$','ls'):
                    # Just ignore this. If we wanted to be careful then
                    # we could flag which state the parser is in here.
                    pass
                case ('dir', name):
                    # Could check for repeats here but this will overwrite.
                    cwd.contents[name] = Node(name=name, parent=cwd, size=None, contents={})
                case (number, name):
                    # Could check for repeats here but this will overwrite.
                    cwd.contents[name] = Node(name=name, parent=cwd, size=int(number), contents=None)
                case _:
                    raise Exception('Unrecognized line {line}')
    return root


In [5]:
def node_sum(node : Node) -> int:
    if node.size is not None:
        return node.size
    size : int = 0
    for n in node.contents.values():
        size += node_sum(n)
    node.size = size
    return size

_Exercise node_sum()_

In [6]:
def exercise_node_sum():
    root = Node(name='/', parent=None, size=None, contents={})
    root.contents['a'] = Node('a', root, 100, None)
    bdir = Node('bdir', root, None, {})
    root.contents['bdir'] = bdir
    # Leave b dir empty.
    cdir = Node('cdir', root, None, {})
    root.contents['cdir'] = cdir
    cdir.contents['cfile1'] = Node('cfile1', cdir, 20, None)
    cdir.contents['cfile2'] = Node('cfile2', cdir, 30, None)
    ddir = Node('ddir', root, None, {})
    ddir.contents['dfile1'] = Node('dfile1', cdir, 2, None)
    ddir.contents['dfile2'] = Node('dfile2', cdir, 3, None)
    root.contents['ddir'] = ddir
    print(f'{node_sum(root)}')

exercise_node_sum()

155


In [7]:
def dfs_sum(node: Node, predicate):
    sum = 0
    if node.contents is not None:
        for n in node.contents.values():
            sum += dfs_sum(n, predicate)
    return sum + predicate(node)

In [8]:
def summer(filename : str):
    root = load_data(filename)
    # Has the side effect of filling in the size for all directories.
    node_sum(root)
    dir_at_most_100k = lambda x : 0 if x.contents is None else x.size if x.size < 100000 else 0
    print(f'{dfs_sum(root,dir_at_most_100k)}')

In [9]:
def sum_root():
    root = load_data('input.txt')
    print(f'{node_sum(root)}')

sum_root()

46552309


# Solution

In [10]:
summer('input.txt')


1723892
