<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Hard-Drive-Back-Upper" data-toc-modified-id="Hard-Drive-Back-Upper-1">Hard Drive Back Upper</a></span></li></ul></div>

# Hard Drive Back Upper
The purpose of this notebook is to develop the code that copies all new files in a source directory to a target directory.

Rules for conflicts:
+ If the **same** file exists in both the source and destination don't copy. Here the same means
    + Same full file path
+ If the file exists on the source but not the destination copy it across
+ If the file exists on the destination but not the source, leave it on the destination

The current implementation will ignore empty folders.

In [11]:
import os
from shutil import copyfile
from time import gmtime, strftime

def log_print(message):
    print('[%s]: %s'%(strftime("%Y-%m-%d %H:%M:%S", gmtime()), message))
    
def files_in_dir(root_dir):
    file_set = set()

    for dir_, _, files in os.walk(root_dir):
        for file_name in files:
            rel_dir = os.path.relpath(dir_, root_dir)
            rel_file = os.path.join(rel_dir, file_name)
            file_set.add(rel_file)
    return file_set

In [12]:
base_dir = '/Users/louwrenslabuschagne/Documents/gitProjects/hard-drive-back-upper/'
src_drive = base_dir+'tests/my_src_mnt/'
dst_drive = base_dir+'tests/my_dst_mnt/'

In [22]:
src_files = files_in_dir(src_drive)
#macOS creates DS_Store files for each folder, this just removes them from our source files to copy
src_files = set([file for file in src_files if 'DS_Store' not in file])
src_files

{'./newFile.txt',
 './newnewFile.t',
 'folder_1/aFile1.txt',
 'folder_1/aFile2.txt',
 'folder_1/sub_folder_1/aFile1.txt',
 'folder_2/aFile1.txt',
 'newFolder/more.txt',
 'newFolder/newFile2.txt'}

In [23]:
dst_files = files_in_dir(dst_drive)
dst_files = set([file for file in dst_files if 'DS_Store' not in file])
dst_files

{'./asdf',
 './newFile.txt',
 './newnewFile.t',
 'folder_1/aFile1.txt',
 'folder_1/aFile2.txt',
 'folder_1/sub_folder_1/aFile1.txt',
 'folder_2/aFile1.txt',
 'newFolder/more.txt',
 'newFolder/newFile2.txt'}

In [25]:
files_that_are_on_the_src_and_not_on_the_dst = [file for file in src_files if file not in dst_files]
files_that_are_on_the_src_and_not_on_the_dst 

[]

In [15]:
log_print('Start')
# The ^ operator is used on sets in Python and gives the difference in elements in 2 sets
# in other words, files that we've not coppied
files_that_are_on_the_src_and_not_on_the_dst = src_files ^ dst_files
log_print('Found %d new files to copy'%len(files_that_are_on_the_src_and_not_on_the_dst))

#if there are new files to copy
if len(files_that_are_on_the_src_and_not_on_the_dst) != 0:
    log_print('New files files to be copied %s'%str(files_that_are_on_the_src_and_not_on_the_dst))
    log_print('Copying started')
    for file in files_that_are_on_the_src_and_not_on_the_dst:
        #get the full file path of the source and destination files
        dst_file = dst_drive+file 
        src_file = src_drive+file

        #but if the folders don't exists, we first need to create them
        #this gets the destination folder for file
        dst_folder = dst_file.rsplit('/', maxsplit=1)[0]

        #check if the destination folder exists, if not create it
        if not os.path.exists(dst_folder):
            os.makedirs(dst_folder)
            log_print('Created folder %s'%('dst:/'+dst_folder.replace(dst_drive, '')))

        copyfile(src_file, dst_file)
        log_print('Coppied %s --> %s'%('src:/'+file, 'dst:/'+file))
    log_print('Copying Complete')
log_print('End')

[2019-04-17 15:15:39]: Start
[2019-04-17 15:15:39]: Found 0 new files to copy
[2019-04-17 15:15:39]: End


In [21]:
if bool('a') & bool('a'):
    print('hallo')

hallo
