# No Space Left On Device

## Part 1

You can hear birds chirping and raindrops hitting leaves as the expedition proceeds. Occasionally, you can even hear much louder sounds in the distance; how big do the animals get out here, anyway?

The device the Elves gave you has problems with more than just its communication system. You try to run a system update:

$ system-update --please --pretty-please-with-sugar-on-top
Error: No space left on device
Perhaps you can delete some files to make space for the update?

You browse around the filesystem to assess the situation and save the resulting terminal output (your puzzle input). For example:

$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k
The filesystem consists of a tree of files (plain data) and directories (which can contain other directories or files). The outermost directory is called /. You can navigate around the filesystem, moving into or out of directories and listing the contents of the directory you're currently in.

Within the terminal output, lines that begin with $ are commands you executed, very much like some modern computers:

cd means change directory. This changes which directory is the current directory, but the specific result depends on the argument:
cd x moves in one level: it looks in the current directory for the directory named x and makes it the current directory.
cd .. moves out one level: it finds the directory that contains the current directory, then makes that directory the current directory.
cd / switches the current directory to the outermost directory, /.
ls means list. It prints out all of the files and directories immediately contained by the current directory:
123 abc means that the current directory contains a file named abc with size 123.
dir xyz means that the current directory contains a directory named xyz.
Given the commands and output in the example above, you can determine that the filesystem looks visually like this:

- / (dir)
  - a (dir)
    - e (dir)
      - i (file, size=584)
    - f (file, size=29116)
    - g (file, size=2557)
    - h.lst (file, size=62596)
  - b.txt (file, size=14848514)
  - c.dat (file, size=8504156)
  - d (dir)
    - j (file, size=4060174)
    - d.log (file, size=8033020)
    - d.ext (file, size=5626152)
    - k (file, size=7214296)
Here, there are four directories: / (the outermost directory), a and d (which are in /), and e (which is in a). These directories also contain files of various sizes.

Since the disk is full, your first step should probably be to find directories that are good candidates for deletion. To do this, you need to determine the total size of each directory. The total size of a directory is the sum of the sizes of the files it contains, directly or indirectly. (Directories themselves do not count as having any intrinsic size.)

The total sizes of the directories above can be found as follows:

The total size of directory e is 584 because it contains a single file i of size 584 and no other directories.
The directory a has total size 94853 because it contains files f (size 29116), g (size 2557), and h.lst (size 62596), plus file i indirectly (a contains e which contains i).
Directory d has total size 24933642.
As the outermost directory, / contains every file. Its total size is 48381165, the sum of the size of every file.
To begin, find all of the directories with a total size of at most 100000, then calculate the sum of their total sizes. In the example above, these directories are a and e; the sum of their total sizes is 95437 (94853 + 584). (As in this example, this process can count files more than once!)

Find all of the directories with a total size of at most 100000. What is the sum of the total sizes of those directories?

In [1]:
with open('day_07_input.txt', 'r') as f:
    data = f.readlines()
data = [dat.strip() for dat in data]

In [2]:
data[:10]

['$ cd /',
 '$ ls',
 '150961 cmnwnpwb',
 '28669 hhcp.jzd',
 'dir jssbn',
 'dir lfrctthp',
 '133395 lfrctthp.tlv',
 'dir ltwmz',
 'dir nmzntmcf',
 'dir vhj']

In [3]:
data[-10:]

['$ cd ..',
 '$ cd lfrctthp',
 '$ ls',
 '104555 cwjbzd.hbf',
 '56298 pldmc.hjd',
 '27639 sppmmj.nmr',
 '$ cd ..',
 '$ cd tlhttrgs',
 '$ ls',
 '252680 zsjgqqb']

In [4]:
#initialize parameters
all_dicts = []
one_dict = {}
level = 0
path = ''

#cycle through all commands
for dat in data:
#     print(dat)
    
    #check if directory
    if dat.startswith('$ cd'):
        print()
        print(dat)
        
        #save directory name
        folder = dat.replace('$ cd ', '')
        
        #if dictionary has contents, save
        if len(one_dict) > 0:
            all_dicts.append(one_dict)
        
        #start new dictionary for new directory
        one_dict = {}
        
        #if at home base, keep starting params
        if folder == '/':
            path = '/'
            level = 0
            
        #if going up a level, subtract a directory
        elif folder == '..':
            path = '/'.join(path.split('/')[:-2]) + '/'
            level -= 1
            continue
            
        #all other folders add a level
        else:
            path = path + folder + '/'
            level += 1
            
        #add all pieces to single dictionary
        one_dict['directory'] = folder
        one_dict['level'] = level
        one_dict['path'] = path  
        one_dict['contents'] = []
        
    #ignore ls
    elif dat.startswith('$ ls'):
        continue
    
    #add contents to single dictionary
    else:
        print(f'content: {dat}')
        one_dict['contents'].append(dat)
        
        #if its the last element in the list, add to list of all dictionaries
        if dat == data[-1]:
            all_dicts.append(one_dict)


$ cd /
content: 150961 cmnwnpwb
content: 28669 hhcp.jzd
content: dir jssbn
content: dir lfrctthp
content: 133395 lfrctthp.tlv
content: dir ltwmz
content: dir nmzntmcf
content: dir vhj
content: 256180 wbs.vmh
content: 257693 zsntdzf

$ cd jssbn
content: 89372 dvlb
content: dir lfrctthp
content: dir pjzpjjq
content: dir rbtbtt
content: 203148 sppmmj
content: 130200 sppmmj.bmm
content: dir tlhttrgs
content: 248929 vsbvlr

$ cd lfrctthp
content: dir lfrctthp
content: dir srf
content: 165285 vlfc
content: 202701 wbs.vmh

$ cd lfrctthp
content: 25083 gsb.flc

$ cd ..

$ cd srf
content: 20386 hcnjd.nsq
content: 143480 jjlz.mtq
content: dir rwvdvvsf
content: 88782 sbmhf
content: 143464 wbs.vmh
content: dir wvhhr

$ cd rwvdvvsf
content: 20009 bqz
content: 133188 czdm

$ cd ..

$ cd wvhhr
content: 10445 vrwdvnh.jhf

$ cd ..

$ cd ..

$ cd ..

$ cd pjzpjjq
content: 14329 chgbd.zjf
content: dir dvlb
content: 212284 pjc
content: dir qlrn
content: 225566 rhzgmnb.nhd
content: 145766 sppmmj.dzz
conte

In [5]:
all_dicts[:5]

[{'directory': '/',
  'level': 0,
  'path': '/',
  'contents': ['150961 cmnwnpwb',
   '28669 hhcp.jzd',
   'dir jssbn',
   'dir lfrctthp',
   '133395 lfrctthp.tlv',
   'dir ltwmz',
   'dir nmzntmcf',
   'dir vhj',
   '256180 wbs.vmh',
   '257693 zsntdzf']},
 {'directory': 'jssbn',
  'level': 1,
  'path': '/jssbn/',
  'contents': ['89372 dvlb',
   'dir lfrctthp',
   'dir pjzpjjq',
   'dir rbtbtt',
   '203148 sppmmj',
   '130200 sppmmj.bmm',
   'dir tlhttrgs',
   '248929 vsbvlr']},
 {'directory': 'lfrctthp',
  'level': 2,
  'path': '/jssbn/lfrctthp/',
  'contents': ['dir lfrctthp', 'dir srf', '165285 vlfc', '202701 wbs.vmh']},
 {'directory': 'lfrctthp',
  'level': 3,
  'path': '/jssbn/lfrctthp/lfrctthp/',
  'contents': ['25083 gsb.flc']},
 {'directory': 'srf',
  'level': 3,
  'path': '/jssbn/lfrctthp/srf/',
  'contents': ['20386 hcnjd.nsq',
   '143480 jjlz.mtq',
   'dir rwvdvvsf',
   '88782 sbmhf',
   '143464 wbs.vmh',
   'dir wvhhr']}]

In [6]:
for one_dict in all_dicts:
    file_size = 0
    dir_sizes = 0
    
    #cycle through contents of each dictionary
    for content in one_dict['contents']:
        first_piece = content.split(' ')[0]
        
        #if the file starts with a number, capture it's size
        if first_piece.isdigit():
            file_size += int(first_piece)
    #save total file sizes
    one_dict['file_sizes'] = file_size
    
    #capture all subdirectories
    dirs = [content for content in one_dict['contents'] if content.startswith('dir')]
    
    #count number of subdirectories
    one_dict['num_subdir'] = len(dirs)
     
    #initialize subdirectory size to 0
    one_dict['subdir_sizes'] = 0
            
    #set directory size to size of total files
    one_dict['directory_size'] = one_dict['file_sizes'] + one_dict['subdir_sizes']

In [11]:
import pandas as pd

In [12]:
#make df for manipulation
df = pd.DataFrame(all_dicts)
df

Unnamed: 0,directory,level,path,contents,file_sizes,num_subdir,subdir_sizes,directory_size
0,/,0,/,"[150961 cmnwnpwb, 28669 hhcp.jzd, dir jssbn, d...",826898,5,0,826898
1,jssbn,1,/jssbn/,"[89372 dvlb, dir lfrctthp, dir pjzpjjq, dir rb...",671649,4,0,671649
2,lfrctthp,2,/jssbn/lfrctthp/,"[dir lfrctthp, dir srf, 165285 vlfc, 202701 wb...",367986,2,0,367986
3,lfrctthp,3,/jssbn/lfrctthp/lfrctthp/,[25083 gsb.flc],25083,0,0,25083
4,srf,3,/jssbn/lfrctthp/srf/,"[20386 hcnjd.nsq, 143480 jjlz.mtq, dir rwvdvvs...",396112,2,0,396112
...,...,...,...,...,...,...,...,...
186,jnm,2,/vhj/jnm/,"[66785 bfwm, 196636 dssh.rwn, dir sppmmj, 1401...",603328,1,0,603328
187,sppmmj,3,/vhj/jnm/sppmmj/,"[101586 ccpnsjm.cwc, dir lfrctthp, 127582 vsbv...",300169,1,0,300169
188,lfrctthp,4,/vhj/jnm/sppmmj/lfrctthp/,"[122902 lfrctthp, 247157 svmpmrl.tcc]",370059,0,0,370059
189,lfrctthp,2,/vhj/lfrctthp/,"[104555 cwjbzd.hbf, 56298 pldmc.hjd, 27639 spp...",188492,0,0,188492


In [13]:
#only look at directories with subdirectories & start at the lowest level
subset = df [df.num_subdir > 0]
subset = subset.sort_values('level', ascending=False)
subset

Unnamed: 0,directory,level,path,contents,file_sizes,num_subdir,subdir_sizes,directory_size
56,cwjbzd,7,/jssbn/tlhttrgs/zjghthcb/hvwjc/wnj/dvlb/cwjbzd/,[dir tlhttrgs],0,1,0,0
26,tlhttrgs,6,/jssbn/pjzpjjq/dvlb/zhj/nqv/tlhttrgs/,"[dir cqtnvzn, 220458 wbs.vmh]",220458,1,0,220458
114,cgf,6,/lfrctthp/sqhvvsb/ldmwm/cwjbzd/tlhttrgs/cgf/,[dir dvlb],0,1,0,0
71,sppmmj,6,/lfrctthp/cwjbzd/lfrctthp/fqswn/dhgghnm/sppmmj/,[dir dsss],0,1,0,0
33,mhbbpdpj,6,/jssbn/pjzpjjq/qlrn/wbgvqpc/qfhvjtv/mhbbpdpj/,[dir ghrbbh],0,1,0,0
...,...,...,...,...,...,...,...,...
64,lfrctthp,1,/lfrctthp/,"[dir cwjbzd, dir dvlb, 65658 fclf, 191985 hhcp...",442942,9,0,442942
142,nmzntmcf,1,/nmzntmcf/,"[dir jqcms, dir lrtsts, dir lvchpdf, dir qpzqp...",0,6,0,0
1,jssbn,1,/jssbn/,"[89372 dvlb, dir lfrctthp, dir pjzpjjq, dir rb...",671649,4,0,671649
185,vhj,1,/vhj/,"[221377 cwjbzd.tvv, 98748 czdm, 108605 hhcp.jz...",529286,3,0,529286


In [14]:
for x in subset.index:
    print(subset.loc[x].directory)
    
    #get subdirectories for one directory
    subdirs = [content.split(' ')[1] for content in subset.loc[x].contents if content.startswith('dir')]
    print(subdirs)
    
    #find each subdirectory and get it's size
    subdirs_sizes_ls = []
    for subdir in subdirs:
        find_subdir = subset.loc[x].path + subdir + '/'
        print(find_subdir)
        subdir_size = df [df.path == find_subdir].directory_size.values[0]
        subdirs_sizes_ls.append(subdir_size)
        print(subdir_size)
        
    #add all the subdirectory sizes together
    df.loc[x,'subdir_sizes'] = sum(subdirs_sizes_ls)
    
    #calculate new directory size based on subdirectories
    df.loc[x,'directory_size'] = df.loc[x,'subdir_sizes'] + df.loc[x,'file_sizes']
    print()

cwjbzd
['tlhttrgs']
/jssbn/tlhttrgs/zjghthcb/hvwjc/wnj/dvlb/cwjbzd/tlhttrgs/
240264

tlhttrgs
['cqtnvzn']
/jssbn/pjzpjjq/dvlb/zhj/nqv/tlhttrgs/cqtnvzn/
49609

cgf
['dvlb']
/lfrctthp/sqhvvsb/ldmwm/cwjbzd/tlhttrgs/cgf/dvlb/
77864

sppmmj
['dsss']
/lfrctthp/cwjbzd/lfrctthp/fqswn/dhgghnm/sppmmj/dsss/
201519

mhbbpdpj
['ghrbbh']
/jssbn/pjzpjjq/qlrn/wbgvqpc/qfhvjtv/mhbbpdpj/ghrbbh/
24686

sgddcfdn
['lnjln']
/jssbn/pjzpjjq/dvlb/fnwsmj/lfrctthp/sgddcfdn/lnjln/
28970

dvlb
['cwjbzd']
/jssbn/tlhttrgs/zjghthcb/hvwjc/wnj/dvlb/cwjbzd/
240264

qfhvjtv
['mhbbpdpj']
/jssbn/pjzpjjq/qlrn/wbgvqpc/qfhvjtv/mhbbpdpj/
24686

vpbwj
['tlhttrgs']
/lfrctthp/cwjbzd/lfrctthp/fqswn/vpbwj/tlhttrgs/
145100

llcflrds
['cngdgq']
/lfrctthp/sqhvvsb/ldmwm/cwjbzd/llcflrds/cngdgq/
235724

thgh
['nrntbh', 'qlhbf']
/jssbn/tlhttrgs/zjghthcb/lfrctthp/thgh/nrntbh/
139964
/jssbn/tlhttrgs/zjghthcb/lfrctthp/thgh/qlhbf/
115891

mdmh
['zvtjfz']
/nmzntmcf/lrtsts/drtrdnsg/sppmmj/mdmh/zvtjfz/
142470

qzvwhr
['cwjbzd']
/nmzntmcf/lvchpdf/

In [15]:
#find all dictories with a size of 100,00 or less and add them together
df [df.directory_size <= 100_000].directory_size.sum()

1443806

## Part 2

Now, you're ready to choose a directory to delete.

The total disk space available to the filesystem is 70000000. To run the update, you need unused space of at least 30000000. You need to find a directory you can delete that will free up enough space to run the update.

In the example above, the total size of the outermost directory (and thus the total amount of used space) is 48381165; this means that the size of the unused space must currently be 21618835, which isn't quite the 30000000 required by the update. Therefore, the update still requires a directory with total size of at least 8381165 to be deleted before it can run.

To achieve this, you have the following options:

Delete directory e, which would increase unused space by 584.
Delete directory a, which would increase unused space by 94853.
Delete directory d, which would increase unused space by 24933642.
Delete directory /, which would increase unused space by 48381165.
Directories e and a are both too small; deleting them would not free up enough space. However, directories d and / are both big enough! Between these, choose the smallest: d, increasing unused space by 24933642.

Find the smallest directory that, if deleted, would free up enough space on the filesystem to run the update. What is the total size of that directory?

In [23]:
total_space = 70000000
total_used = df.loc[0,'directory_size']
total_needed = 30000000

In [24]:
total_space - total_used

29086555

In [25]:
total_needed - (total_space - total_used)

913445

In [36]:
df [df.directory_size >= 913445].sort_values('directory_size').head(1)

Unnamed: 0,directory,level,path,contents,file_sizes,num_subdir,subdir_sizes,directory_size
58,lfrctthp,4,/jssbn/tlhttrgs/zjghthcb/lfrctthp/,"[168164 ccpnsjm.cwc, 68428 dcmfjn, 133121 dvlb...",515930,2,426368,942298


In [37]:
total_space - total_used + 942298

30028853