# Day 7
## No Space Left On Device

You can hear birds chirping and raindrops hitting leaves as the expedition proceeds. Occasionally, you can even hear much louder sounds in the distance; how big do the animals get out here, anyway?

The device the Elves gave you has problems with more than just its communication system. You try to run a system update:

```
$ system-update --please --pretty-please-with-sugar-on-top
Error: No space left on device
Perhaps you can delete some files to make space for the update?
```
You browse around the filesystem to assess the situation and save the resulting terminal output (your puzzle input). For example:
```
$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k
```
The filesystem consists of a tree of files (plain data) and directories (which can contain other directories or files). The outermost directory is called /. You can navigate around the filesystem, moving into or out of directories and listing the contents of the directory you're currently in.

Within the terminal output, lines that begin with $ are **commands you executed**, very much like some modern computers:

- `cd` means change directory. This changes which directory is the current directory, but the specific result depends on the argument:
- `cd x` moves in one level: it looks in the current directory for the directory named x and makes it the current directory.
- `cd ..` moves out one level: it finds the directory that contains the current directory, then makes that directory the current directory.
- `cd /` switches the current directory to the outermost directory, /.
- `ls` means list. It prints out all of the files and directories immediately contained by the current directory:
- `123 abc` means that the current directory contains a file named abc with size 123.
- `dir xyz` means that the current directory contains a directory named xyz.

Given the commands and output in the example above, you can determine that the filesystem looks visually like this:
```
- / (dir)
  - a (dir)
    - e (dir)
      - i (file, size=584)
    - f (file, size=29116)
    - g (file, size=2557)
    - h.lst (file, size=62596)
  - b.txt (file, size=14848514)
  - c.dat (file, size=8504156)
  - d (dir)
    - j (file, size=4060174)
    - d.log (file, size=8033020)
    - d.ext (file, size=5626152)
    - k (file, size=7214296)
```
Here, there are four directories: / (the outermost directory), a and d (which are in /), and e (which is in a). These directories also contain files of various sizes.

Since the disk is full, your first step should probably be to find directories that are good candidates for deletion. To do this, you need to determine the total size of each directory. The total size of a directory is the sum of the sizes of the files it contains, directly or indirectly. (Directories themselves do not count as having any intrinsic size.)

The total sizes of the directories above can be found as follows:

The total size of directory e is 584 because it contains a single file i of size 584 and no other directories.
The directory a has total size 94853 because it contains files f (size 29116), g (size 2557), and h.lst (size 62596), plus file i indirectly (a contains e which contains i).
Directory d has total size 24933642.
As the outermost directory, / contains every file. Its total size is 48381165, the sum of the size of every file.
To begin, find all of the directories with a total size of at most 100000, then calculate the sum of their total sizes. In the example above, these directories are a and e; the sum of their total sizes is 95437 (94853 + 584). (As in this example, this process can count files more than once!)

Find all of the directories with a total size of at most 100000. **What is the sum of the total sizes of those directories?**

## Part 1

### Read terminal output

In [119]:
with open("terminal_output.txt", "r") as f:
    # terminal = f.read()
    terminal = [i.rstrip("\n") for i in f.readlines()]

In [120]:
# terminal
print(terminal[:200])

['$ cd /', '$ ls', 'dir fnsvfbzt', 'dir hqdssf', 'dir jwphbz', 'dir lncqsmj', 'dir mhqs', 'dir trwqgzsb', '132067 vjw', 'dir wbsph', '$ cd fnsvfbzt', '$ ls', '62158 sfwnts.hbj', '$ cd ..', '$ cd hqdssf', '$ ls', '45626 cvcbmcm', 'dir dlsmjsbz', 'dir hqdssf', 'dir mhqs', 'dir mtw', 'dir sfccfsrd', 'dir shzgg', '$ cd dlsmjsbz', '$ ls', '9205 qcqbgd.lzd', '$ cd ..', '$ cd hqdssf', '$ ls', '105963 mhqs.zrn', '87909 slwshm.nwr', '$ cd ..', '$ cd mhqs', '$ ls', 'dir ctfl', '45923 jvvl.rcs', 'dir jzjm', 'dir lncqsmj', 'dir mhqs', 'dir wfbvtfmr', '$ cd ctfl', '$ ls', 'dir shzgg', '$ cd shzgg', '$ ls', '18097 cvcbmcm', '289064 mhqs', '208557 slwshm.nwr', '283449 vjw', 'dir wfbvtfmr', '$ cd wfbvtfmr', '$ ls', '263560 dssbpgnl.szh', 'dir hnqjmq', '76551 jvvl.rcs', '195911 lncqsmj', '185776 slwshm.nwr', '$ cd hnqjmq', '$ ls', '3307 rjd.lgh', '$ cd ..', '$ cd ..', '$ cd ..', '$ cd ..', '$ cd jzjm', '$ ls', '31719 rjjrg.pjq', '$ cd ..', '$ cd lncqsmj', '$ ls', 'dir mhqs', '$ cd mhqs', '$ ls', '13829

### Try solution with pathlib and pandas

In [13]:
from pathlib import Path
import pandas as pd

In [121]:
filesystem_df = pd.DataFrame(columns=["parentdir", "filetype", "name", "size"])

In [122]:
filesystem_df

Unnamed: 0,parentdir,filetype,name,size


In [43]:
filesystem_df = pd.concat([filesystem_df, pd.DataFrame({"name": "rodrigo", "size": 52}, index=[0])])

In [44]:
filesystem_df = pd.concat([filesystem_df, pd.DataFrame({"name": "daniela", "size": 50, "filetype": "dir"}, 
                                                       index=[test1.index.max()+1])])

In [52]:
# filesystem_df["parentdir"].fillna("")
filesystem_df.fillna("")

Unnamed: 0,parentdir,filetype,name,size
0,,,rodrigo,52
1,,dir,daniela,50


Seem to work.

### Initialise dataframe and create function to interpret terminal log

In [53]:
from pathlib import Path
import pandas as pd

In [123]:
filesystem_df = pd.DataFrame({"parentdir": Path("/"), "filetype": "dir", "name": "/", "size": 0}, 
                             index=[0])

In [124]:
filesystem_df

Unnamed: 0,parentdir,filetype,name,size
0,/,dir,/,0


In [89]:
def process_terminal_output(terminal_list, filesystem_dataframe):
    # start with root
    the_path = Path("/")
    # navigate (from the second item)
    for output in terminal_list[1:]:
        # interpret output
        arguments = output.split(" ")
        # 1. directory
        if arguments[0] == "dir":
            dict_line = {"parentdir": the_path, "filetype": "dir", "name": arguments[1], "size": 0}
            filesystem_dataframe = pd.concat([filesystem_dataframe, 
                                             pd.DataFrame(dict_line, index=[filesystem_dataframe.index.max()+1])])
            print(f"Added directory {the_path / arguments[1]}")
        # 2. file
        if arguments[0].isdigit():
            dict_line = {"parentdir": the_path, "filetype": "file", "name": arguments[1], "size": int(arguments[0])}
            filesystem_dataframe = pd.concat([filesystem_dataframe, 
                                             pd.DataFrame(dict_line, index=[filesystem_dataframe.index.max()+1])])
            print(f"Added file {the_path / arguments[1]}")
        # 3. changing directory
        if arguments[0] == "$" and arguments[1] == "cd":
            # change to parent
            if arguments[2] == "..":
                the_path = the_path.parent
            # change to child
            else:
                the_path = the_path / arguments[2]
            print(f"Changed to {the_path}")
    # return resulting dataframe
    return filesystem_dataframe

### Run process

In [125]:
filesystem_df = process_terminal_output(terminal, filesystem_df)

Added directory /fnsvfbzt
Added directory /hqdssf
Added directory /jwphbz
Added directory /lncqsmj
Added directory /mhqs
Added directory /trwqgzsb
Added file /vjw
Added directory /wbsph
Changed to /fnsvfbzt
Added file /fnsvfbzt/sfwnts.hbj
Changed to /
Changed to /hqdssf
Added file /hqdssf/cvcbmcm
Added directory /hqdssf/dlsmjsbz
Added directory /hqdssf/hqdssf
Added directory /hqdssf/mhqs
Added directory /hqdssf/mtw
Added directory /hqdssf/sfccfsrd
Added directory /hqdssf/shzgg
Changed to /hqdssf/dlsmjsbz
Added file /hqdssf/dlsmjsbz/qcqbgd.lzd
Changed to /hqdssf
Changed to /hqdssf/hqdssf
Added file /hqdssf/hqdssf/mhqs.zrn
Added file /hqdssf/hqdssf/slwshm.nwr
Changed to /hqdssf
Changed to /hqdssf/mhqs
Added directory /hqdssf/mhqs/ctfl
Added file /hqdssf/mhqs/jvvl.rcs
Added directory /hqdssf/mhqs/jzjm
Added directory /hqdssf/mhqs/lncqsmj
Added directory /hqdssf/mhqs/mhqs
Added directory /hqdssf/mhqs/wfbvtfmr
Changed to /hqdssf/mhqs/ctfl
Added directory /hqdssf/mhqs/ctfl/shzgg
Changed to /

In [126]:
filesystem_df

Unnamed: 0,parentdir,filetype,name,size
0,/,dir,/,0
1,/,dir,fnsvfbzt,0
2,/,dir,hqdssf,0
3,/,dir,jwphbz,0
4,/,dir,lncqsmj,0
...,...,...,...,...
488,/trwqgzsb,file,jgphdrnq,88439
489,/wbsph,dir,mhqs,0
490,/wbsph,dir,vgh,0
491,/wbsph/mhqs,file,rfh.zdb,133965


### How many distinct directories?

In [92]:
filesystem_df["parentdir"].drop_duplicates()

0                     /
501           /fnsvfbzt
502             /hqdssf
509    /hqdssf/dlsmjsbz
510      /hqdssf/hqdssf
             ...       
979               /mhqs
980           /trwqgzsb
981              /wbsph
983         /wbsph/mhqs
984          /wbsph/vgh
Name: parentdir, Length: 200, dtype: object

In [93]:
filesystem_df[filesystem_df["filetype"]=="dir"]["name"].drop_duplicates()

0             /
1      fnsvfbzt
2        hqdssf
3        jwphbz
4       lncqsmj
         ...   
468       gzsnl
471       vrnqh
475    jljbrrgg
485         bnb
490         vgh
Name: name, Length: 150, dtype: object

### Aggregate directories by size

In [99]:
dir_sizes_df = pd.DataFrame(filesystem_df.groupby(['parentdir'])['size'].sum())

In [100]:
dir_sizes_df = dir_sizes_df.reset_index()
dir_sizes_df

Unnamed: 0,parentdir,size
0,/,42690379
1,/fnsvfbzt,62158
2,/hqdssf,45626
3,/hqdssf/dlsmjsbz,9205
4,/hqdssf/hqdssf,193872
...,...,...
195,/mhqs,116955
196,/trwqgzsb,88439
197,/wbsph,0
198,/wbsph/mhqs,133965


### Find directories smaller than 100,000 and sum them

In [103]:
dir_sizes_df[dir_sizes_df["size"] <= 100000]

Unnamed: 0,parentdir,size
1,/fnsvfbzt,62158
2,/hqdssf,45626
3,/hqdssf/dlsmjsbz,9205
5,/hqdssf/mhqs,45923
6,/hqdssf/mhqs/ctfl,0
...,...,...
192,/lncqsmj/mpc,0
193,/lncqsmj/mpc/shzgg,0
196,/trwqgzsb,88439
197,/wbsph,0


In [104]:
dir_sizes_df[dir_sizes_df["size"] <= 100000]["size"].sum()

2257410

#### 2257410

That's not the right answer; your answer is too high. If you're stuck, make sure you're using the full input data; there are also some general tips on the about page, or you can ask for hints on the subreddit. Please wait one minute before trying again. (You guessed 2257410.)

### Troubleshooting: test with example

In [106]:
# troubleshooting
test_terminal = """$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k""".split("\n")
test_terminal

['$ cd /',
 '$ ls',
 'dir a',
 '14848514 b.txt',
 '8504156 c.dat',
 'dir d',
 '$ cd a',
 '$ ls',
 'dir e',
 '29116 f',
 '2557 g',
 '62596 h.lst',
 '$ cd e',
 '$ ls',
 '584 i',
 '$ cd ..',
 '$ cd ..',
 '$ cd d',
 '$ ls',
 '4060174 j',
 '8033020 d.log',
 '5626152 d.ext',
 '7214296 k']

In [110]:
# troubleshooting
filesystem_df = pd.DataFrame({"parentdir": Path("/"), "filetype": "dir", "name": "/", "size": 0}, 
                             index=[0])

In [111]:
# troubleshooting
filesystem_df

Unnamed: 0,parentdir,filetype,name,size
0,/,dir,/,0


In [112]:
# troubleshooting
filesystem_df = process_terminal_output(test_terminal, filesystem_df)

Added directory /a
Added file /b.txt
Added file /c.dat
Added directory /d
Changed to /a
Added directory /a/e
Added file /a/f
Added file /a/g
Added file /a/h.lst
Changed to /a/e
Added file /a/e/i
Changed to /a
Changed to /
Changed to /d
Added file /d/j
Added file /d/d.log
Added file /d/d.ext
Added file /d/k


In [113]:
# troubleshooting
filesystem_df

Unnamed: 0,parentdir,filetype,name,size
0,/,dir,/,0
1,/,dir,a,0
2,/,file,b.txt,14848514
3,/,file,c.dat,8504156
4,/,dir,d,0
5,/a,dir,e,0
6,/a,file,f,29116
7,/a,file,g,2557
8,/a,file,h.lst,62596
9,/a/e,file,i,584


In [114]:
# troubleshooting
dir_sizes_df = pd.DataFrame(filesystem_df.groupby(['parentdir'])['size'].sum())

In [115]:
# troubleshooting
dir_sizes_df = dir_sizes_df.reset_index()
dir_sizes_df

Unnamed: 0,parentdir,size
0,/,23352670
1,/a,94269
2,/a/e,584
3,/d,24933642


In [116]:
# troubleshooting
dir_sizes_df[dir_sizes_df["size"] <= 100000]

Unnamed: 0,parentdir,size
1,/a,94269
2,/a/e,584


In [117]:
# troubleshooting
dir_sizes_df[dir_sizes_df["size"] <= 100000]["size"].sum()

94853

This seems strange; this value (94853) is *too low** because `e` should be included in `a` - according to the answer, `a`(=94269+584) plus `e` equals to `95437`

### Troubleshooting: test repeat files

In [127]:
# run from the beginning and...
filesystem_df

Unnamed: 0,parentdir,filetype,name,size
0,/,dir,/,0
1,/,dir,fnsvfbzt,0
2,/,dir,hqdssf,0
3,/,dir,jwphbz,0
4,/,dir,lncqsmj,0
...,...,...,...,...
488,/trwqgzsb,file,jgphdrnq,88439
489,/wbsph,dir,mhqs,0
490,/wbsph,dir,vgh,0
491,/wbsph/mhqs,file,rfh.zdb,133965


In [130]:
filesystem_df[filesystem_df["filetype"]=="file"]["name"].value_counts()
# there might be something here

rjjrg.pjq       26
jvvl.rcs        21
vjw             18
slwshm.nwr      18
cvcbmcm         13
                ..
hqdssf.fhn       1
dfththnq.vnm     1
pcz.snl          1
vqm              1
rfh.zdb          1
Name: name, Length: 180, dtype: int64

In [132]:
# Investigate
filesystem_df[filesystem_df["name"]=="rjjrg.pjq"]

Unnamed: 0,parentdir,filetype,name,size
38,/hqdssf/mhqs/jzjm,file,rjjrg.pjq,31719
46,/hqdssf/mhqs/wfbvtfmr,file,rjjrg.pjq,131522
62,/hqdssf/mtw/shzgg,file,rjjrg.pjq,184378
99,/hqdssf/sfccfsrd/hhhqdmz/fdbpld/bgrnz/rcv/hqdssf,file,rjjrg.pjq,205031
111,/hqdssf/sfccfsrd/hhhqdmz/fdbpld/bgrnz/rcv/hqds...,file,rjjrg.pjq,126331
118,/hqdssf/sfccfsrd/hhhqdmz/fdbpld/bgrnz/rcv/mvgq...,file,rjjrg.pjq,202755
124,/hqdssf/sfccfsrd/hhhqdmz/fdbpld/bgrnz/rcv/pjs,file,rjjrg.pjq,162974
152,/hqdssf/sfccfsrd/mvjttqr/hqdssf/dgdtm/cnshbf,file,rjjrg.pjq,122945
161,/hqdssf/sfccfsrd/mvjttqr/zdzchmtq/nncghrr,file,rjjrg.pjq,180468
174,/hqdssf/sfccfsrd/zmrt/bpztq,file,rjjrg.pjq,184906


In [133]:
filesystem_df.value_counts()

parentdir                                            filetype  name         size  
/                                                    dir       /            0         1
/hqdssf/sfccfsrd/zqqtb/ljnmbqvd/tzjftf/qblrnn/frqcc  dir       wpplgrc      0         1
                                                               clj          0         1
/hqdssf/sfccfsrd/zqqtb/ljnmbqvd/tzjftf/qblrnn/brrqp  file      wfbvtfmr     270783    1
/hqdssf/sfccfsrd/zqqtb/ljnmbqvd/tzjftf/qblrnn        file      rngbhp.ntg   286918    1
                                                                                     ..
/hqdssf/sfccfsrd/mvjttqr/zdzchmtq                    file      lmphsmv      271886    1
                                                     dir       nncghrr      0         1
/hqdssf/sfccfsrd/mvjttqr/hqdssf/vpqjmmf/tsbtz        file      qttbcgd.vtj  59649     1
                                                               qqfnbd.nqv   244065    1
/wbsph/vgh                           

### Second approach: find the size of each dir and all its subdirectories

In [135]:
filesystem_df

Unnamed: 0,parentdir,filetype,name,size
0,/,dir,/,0
1,/,dir,fnsvfbzt,0
2,/,dir,hqdssf,0
3,/,dir,jwphbz,0
4,/,dir,lncqsmj,0
...,...,...,...,...
488,/trwqgzsb,file,jgphdrnq,88439
489,/wbsph,dir,mhqs,0
490,/wbsph,dir,vgh,0
491,/wbsph/mhqs,file,rfh.zdb,133965


In [136]:
# How many distinct directories?
filesystem_df["parentdir"].drop_duplicates()

0                     /
9             /fnsvfbzt
10              /hqdssf
17     /hqdssf/dlsmjsbz
18       /hqdssf/hqdssf
             ...       
487               /mhqs
488           /trwqgzsb
489              /wbsph
491         /wbsph/mhqs
492          /wbsph/vgh
Name: parentdir, Length: 200, dtype: object

In [137]:
filesystem_df[filesystem_df["filetype"]=="dir"]["name"].drop_duplicates()

0             /
1      fnsvfbzt
2        hqdssf
3        jwphbz
4       lncqsmj
         ...   
468       gzsnl
471       vrnqh
475    jljbrrgg
485         bnb
490         vgh
Name: name, Length: 150, dtype: object

In [138]:
# Aggregate directories by size
dir_sizes_df = pd.DataFrame(filesystem_df.groupby(['parentdir'])['size'].sum())

In [139]:
dir_sizes_df = dir_sizes_df.reset_index()
dir_sizes_df

Unnamed: 0,parentdir,size
0,/,132067
1,/fnsvfbzt,62158
2,/hqdssf,45626
3,/hqdssf/dlsmjsbz,9205
4,/hqdssf/hqdssf,193872
...,...,...
195,/mhqs,116955
196,/trwqgzsb,88439
197,/wbsph,0
198,/wbsph/mhqs,133965


### Calculate the size of all subdirectories within a given directory

In [155]:
dir_sizes_df["parentdir_txt"] = dir_sizes_df["parentdir"].apply(lambda x: f"{x}")
dir_sizes_df

Unnamed: 0,parentdir,size,parentdir_txt
0,/,132067,/
1,/fnsvfbzt,62158,/fnsvfbzt
2,/hqdssf,45626,/hqdssf
3,/hqdssf/dlsmjsbz,9205,/hqdssf/dlsmjsbz
4,/hqdssf/hqdssf,193872,/hqdssf/hqdssf
...,...,...,...
195,/mhqs,116955,/mhqs
196,/trwqgzsb,88439,/trwqgzsb
197,/wbsph,0,/wbsph
198,/wbsph/mhqs,133965,/wbsph/mhqs


In [168]:
def calculate_subdirectories_size(dataframe, directory):
    return dataframe[dataframe["parentdir_txt"].str.contains(directory)]['size'].sum()

In [169]:
dir_sizes_df["subdir_size"] = dir_sizes_df["parentdir_txt"].apply(
    lambda x: calculate_subdirectories_size(dir_sizes_df, x))
dir_sizes_df

Unnamed: 0,parentdir,size,parentdir_txt,subdir_size
0,/,132067,/,42558312
1,/fnsvfbzt,62158,/fnsvfbzt,62158
2,/hqdssf,45626,/hqdssf,37225320
3,/hqdssf/dlsmjsbz,9205,/hqdssf/dlsmjsbz,9205
4,/hqdssf/hqdssf,193872,/hqdssf/hqdssf,193872
...,...,...,...,...
195,/mhqs,116955,/mhqs,5831886
196,/trwqgzsb,88439,/trwqgzsb,88439
197,/wbsph,0,/wbsph,170057
198,/wbsph/mhqs,133965,/wbsph/mhqs,133965


### Find directories smaller than 100,000 and sum them

In [170]:
dir_sizes_df[dir_sizes_df["subdir_size"] <= 100000]

Unnamed: 0,parentdir,size,parentdir_txt,subdir_size
1,/fnsvfbzt,62158,/fnsvfbzt,62158
3,/hqdssf/dlsmjsbz,9205,/hqdssf/dlsmjsbz,9205
9,/hqdssf/mhqs/ctfl/shzgg/wfbvtfmr/hnqjmq,3307,/hqdssf/mhqs/ctfl/shzgg/wfbvtfmr/hnqjmq,3307
10,/hqdssf/mhqs/jzjm,31719,/hqdssf/mhqs/jzjm,31719
18,/hqdssf/mtw/dcgpfrsf/mhqs,74450,/hqdssf/mtw/dcgpfrsf/mhqs,74450
19,/hqdssf/mtw/gwqm,0,/hqdssf/mtw/gwqm,76055
20,/hqdssf/mtw/gwqm/trghjhvs,2620,/hqdssf/mtw/gwqm/trghjhvs,76055
21,/hqdssf/mtw/gwqm/trghjhvs/shzgg,73435,/hqdssf/mtw/gwqm/trghjhvs/shzgg,73435
25,/hqdssf/sfccfsrd/cssmfv,13440,/hqdssf/sfccfsrd/cssmfv,13440
34,/hqdssf/sfccfsrd/hhhqdmz/fdbpld/bgrnz/rcv/hqds...,99041,/hqdssf/sfccfsrd/hhhqdmz/fdbpld/bgrnz/rcv/hqds...,99041


In [171]:
dir_sizes_df[dir_sizes_df["subdir_size"] <= 100000]["subdir_size"].sum()

2031851

#### 2031851

That's the right answer! You are one gold star closer to collecting enough star fruit.

Your puzzle answer was 2031851.

The first half of this puzzle is complete! It provides one gold star: *

## Part Two
Now, you're ready to choose a directory to delete.

The total disk space available to the filesystem is `70000000`. To run the update, you need unused space of at least `30000000`. You need to find a directory you can delete that will free up enough space to run the update.

In the example above, the total size of the outermost directory (and thus the total amount of used space) is `48381165`; this means that the size of the unused space must currently be `21618835`, which isn't quite the `30000000` required by the update. Therefore, the update still requires a directory with total size of at least `8381165` to be deleted before it can run.

To achieve this, you have the following options:

- Delete directory `e`, which would increase unused space by `584`.
- Delete directory `a`, which would increase unused space by `94853`.
- Delete directory `d`, which would increase unused space by `24933642`.
- Delete directory `/`, which would increase unused space by `48381165`.

Directories `e` and `a` are both too small; deleting them would not free up enough space. However, directories `d` and `/` are both big enough! Between these, choose the **smallest**: `d`, increasing unused space by 24933642.

Find the smallest directory that, if deleted, would free up enough space on the filesystem to run the update. **What is the total size of that directory?**

### House cleanup first: confirm size of used space with test dataset

In [173]:
# troubleshooting
filesystem_test_df = pd.DataFrame({"parentdir": Path("/"), "filetype": "dir", "name": "/", "size": 0}, 
                             index=[0])
filesystem_test_df = process_terminal_output(test_terminal, filesystem_test_df)
dir_sizes_test_df = pd.DataFrame(filesystem_test_df.groupby(['parentdir'])['size'].sum())
dir_sizes_test_df = dir_sizes_test_df.reset_index()
dir_sizes_test_df["parentdir_txt"] = dir_sizes_test_df["parentdir"].apply(lambda x: f"{x}")
dir_sizes_test_df["subdir_size"] = dir_sizes_test_df["parentdir_txt"].apply(
    lambda x: calculate_subdirectories_size(dir_sizes_test_df, x))
dir_sizes_test_df[dir_sizes_test_df["subdir_size"] <= 100000]

Added directory /a
Added file /b.txt
Added file /c.dat
Added directory /d
Changed to /a
Added directory /a/e
Added file /a/f
Added file /a/g
Added file /a/h.lst
Changed to /a/e
Added file /a/e/i
Changed to /a
Changed to /
Changed to /d
Added file /d/j
Added file /d/d.log
Added file /d/d.ext
Added file /d/k


Unnamed: 0,parentdir,size,parentdir_txt,subdir_size
1,/a,94269,/a,94853
2,/a/e,584,/a/e,584


In [174]:
dir_sizes_test_df[dir_sizes_test_df["subdir_size"] <= 100000]["subdir_size"].sum()

95437

In [175]:
dir_sizes_test_df

Unnamed: 0,parentdir,size,parentdir_txt,subdir_size
0,/,23352670,/,48381165
1,/a,94269,/a,94853
2,/a/e,584,/a/e,584
3,/d,24933642,/d,24933642


#### 48,381,165 OK

### Problem solving

> Find the smallest directory that, if deleted, would free up enough space on the filesystem to run the update. *What is the total size of that directory?*

In [189]:
total_disk_space = 70000000
total_size_outermost_directory = dir_sizes_df[dir_sizes_df["parentdir_txt"]=="/"]["subdir_size"][0]
required_space = 30000000
available_space = total_disk_space - total_size_outermost_directory
delete_required = required_space - available_space

In [190]:
print(f"Total disk space available to the filesystem: {total_disk_space}\n" \
      f"Total size of the outermost directory: {total_size_outermost_directory}\n" \
      f"Required space: {required_space}\n" \
      f"Available space: {available_space}\n" \
      f"Required delete size: {delete_required}")

Total disk space available to the filesystem: 70000000
Total size of the outermost directory: 42558312
Required space: 30000000
Available space: 27441688
Required delete size: 2558312


In [176]:
dir_sizes_df

Unnamed: 0,parentdir,size,parentdir_txt,subdir_size
0,/,132067,/,42558312
1,/fnsvfbzt,62158,/fnsvfbzt,62158
2,/hqdssf,45626,/hqdssf,37225320
3,/hqdssf/dlsmjsbz,9205,/hqdssf/dlsmjsbz,9205
4,/hqdssf/hqdssf,193872,/hqdssf/hqdssf,193872
...,...,...,...,...
195,/mhqs,116955,/mhqs,5831886
196,/trwqgzsb,88439,/trwqgzsb,88439
197,/wbsph,0,/wbsph,170057
198,/wbsph/mhqs,133965,/wbsph/mhqs,133965


In [191]:
dir_sizes_df[dir_sizes_df["subdir_size"] >= delete_required].sort_values(["subdir_size"], ascending=True)

Unnamed: 0,parentdir,size,parentdir_txt,subdir_size
61,/hqdssf/sfccfsrd/zmrt,288144,/hqdssf/sfccfsrd/zmrt,2568781
174,/lncqsmj/jmsw,0,/lncqsmj/jmsw,2654809
146,/hqdssf/shzgg/hngpzst/fgzlgbm,664089,/hqdssf/shzgg/hngpzst/fgzlgbm,3054935
97,/hqdssf/sfccfsrd/zqqtb/ljnmbqvd/qpclpbz/vgbvrw...,186412,/hqdssf/sfccfsrd/zqqtb/ljnmbqvd/qpclpbz/vgbvrw...,3161529
145,/hqdssf/shzgg/hngpzst,18808,/hqdssf/shzgg/hngpzst,3321697
125,/hqdssf/sfccfsrd/zqqtb/ljnmbqvd/tzjftf,367869,/hqdssf/sfccfsrd/zqqtb/ljnmbqvd/tzjftf,3331150
30,/hqdssf/sfccfsrd/hhhqdmz/fdbpld/bgrnz/rcv,480116,/hqdssf/sfccfsrd/hhhqdmz/fdbpld/bgrnz/rcv,3466202
47,/hqdssf/sfccfsrd/mvjttqr,55645,/hqdssf/sfccfsrd/mvjttqr,3681111
28,/hqdssf/sfccfsrd/hhhqdmz/fdbpld/bgrnz,0,/hqdssf/sfccfsrd/hhhqdmz/fdbpld/bgrnz,3715869
27,/hqdssf/sfccfsrd/hhhqdmz/fdbpld,495250,/hqdssf/sfccfsrd/hhhqdmz/fdbpld,4310399


In [193]:
above_reqs_df = dir_sizes_df[dir_sizes_df["subdir_size"] >= 
                             delete_required].sort_values(["subdir_size"], ascending=True)

In [195]:
above_reqs_df.loc[61, "subdir_size"]

2568781

#### 2568781

That's the right answer! You are one gold star closer to collecting enough star fruit.

You have completed Day 7! 