# Chapter 11 - Organizing Files

## Notes

## Practice Questions

1. What is the difference between shutil.copy() and shutil.copytree()?  
    **Answer:** `shutil.copy()` copies a file from source to destination while `shutil.copytree()` copies a folder.

2. What function is used to rename files?  
    **Answer:** `shutil.move()`

3. What is the difference between the delete functions in the send2trash and shutil modules?  
    **Answer:** `send2trash` sends deleted files/folders to recycling bin instead of permanently deleting them.

4. ZipFile objects have a close() method just like File objects’ close() method. What ZipFile method is equivalent to File objects’ open() method?  
    **Answer:** zipfile.ZipFile()

## Practice Programs

### Selectively Copying
Write a program that walks through a folder tree and searches for files with a certain file extension (such as .pdf or .jpg). Copy these files from their current location to a new folder.

In [11]:
from pathlib import Path
import os
import shutil

src_dir = Path("..")
dest_dir = Path("./testdir")

# Create 'testdir' if it does not already exist
os.makedirs(dest_dir, exist_ok=True)


def selective_copy(ext: str):
    files_to_copy = []

    for cwd, _, filenames in os.walk(src_dir):
        for filename in filenames:
            if filename.endswith(ext):
                files_to_copy.append(Path(cwd) / filename)

    for file in files_to_copy:
        shutil.copy(file, dest_dir)

    for path in dest_dir.glob("*"):
        print(path.name)


selective_copy(".ipynb")

organize_files-11.ipynb
debugging-5.ipynb
read_write_file-10.ipynb
regex-9.ipynb
functions-4.ipynb
strings-8.ipynb
python_basics-1.ipynb
lists-6.ipynb
if_else_flow_control-2.ipynb
loops-3.ipynb
dicts_data-7.ipynb


### Deleting Unneeded Files
It’s not uncommon for a few unneeded but humongous files or folders to take up the bulk of the space on your hard drive. If you’re trying to free up room on your computer, it’s more effective to identify the largest unneeded files first.

Write a program that walks through a folder tree and searches for exceptionally large files or folders—say, ones that have a file size of more than 100MB. (Remember that, to get a file’s size, you can use os.path.getsize() from the os module.) Print these files with their absolute path to the screen.

In [21]:
def get_large_files_or_dirs():

    print("Please enter the path of the folder you would like to search:")
    path = input()
    
    while not Path(path).exists():
        print("Please enter a valid folder path:")
        path = input()
    
    print("Please enter the number of bytes you would like to check for:")
    while True:
        try:
            size = int(input())
            break
        except ValueError:
            print("Please enter a valid number of bytes (integer):")

    dir = Path(path)
    
    if not dir.exists():
        print("Please enter a valid folder path.")
        return
    
    for cwd, subdirs, filenames in os.walk(dir):
        for fn in filenames:
            filepath = Path(cwd) / fn
            if os.path.getsize(filepath) > size:
                print(os.path.normpath(filepath.absolute()))

        for sd in subdirs:
            subdir_path = Path(cwd) / sd
            if os.path.getsize(subdir_path) > size:
                print(os.path.normpath(subdir_path.absolute()))


get_large_files_or_dirs()


Please enter the path of the folder you would like to search:
Please enter the number of bytes you would like to check for:
/home/abdurr08/projects/python-learning/notebooks
/home/abdurr08/projects/python-learning/.git
/home/abdurr08/projects/python-learning/notebooks/organize_files-11.ipynb
/home/abdurr08/projects/python-learning/notebooks/debugging-5.ipynb
/home/abdurr08/projects/python-learning/notebooks/read_write_file-10.ipynb
/home/abdurr08/projects/python-learning/notebooks/regex-9.ipynb
/home/abdurr08/projects/python-learning/notebooks/functions-4.ipynb
/home/abdurr08/projects/python-learning/notebooks/strings-8.ipynb
/home/abdurr08/projects/python-learning/notebooks/python_basics-1.ipynb
/home/abdurr08/projects/python-learning/notebooks/lists-6.ipynb
/home/abdurr08/projects/python-learning/notebooks/if_else_flow_control-2.ipynb
/home/abdurr08/projects/python-learning/notebooks/loops-3.ipynb
/home/abdurr08/projects/python-learning/notebooks/dicts_data-7.ipynb
/home/abdurr08/pro

### Renumbering Files
Write a program that finds all files with a given prefix, such as spam001.txt, spam002.txt, and so on, in a single folder and locates any gaps in the numbering (such as if there is a spam001.txt and a spam003.txt but no spam002.txt). Have the program rename all the later files to close this gap.

To create these example files (skipping spam042.txt, spam086.txt, and spam103.txt), run the following code:

In [66]:
# Creating ./numbered_files
test_dir = Path("./numbered_files")
os.makedirs(test_dir, exist_ok=True)
for i in range(1, 121):
    if i not in (42, 86, 103):
        with open(f'./numbered_files/spam{str(i).zfill(3)}.txt', 'w') as file:
            pass

In [67]:
import re
import shutil


def renumber_files():
    print("Please enter the relative of absolute folder path you would like perform renumbering on:")
    # dir_path = Path(input)
    dir_path = test_dir

    # while not dir_path.exists():
    #     print("Path not found. Please enter a valid directory path.")
    #     dir_path = Path(input())

    print("Please enter the regex pattern for the file prefix:")
    try:
        # Compile the user-provided pattern
        # re.escape() can be used if the user input should be treated literally,
        # preventing special regex characters from being interpreted as such.
        # For a true regex pattern, re.escape() should NOT be used.
        
        # pattern = re.compile(input())
        pattern = re.compile(r'spam\d+.txt$')

        matching_files = [f for f in os.listdir(dir_path) if re.search(pattern, f)]

        if matching_files:
            matching_files.sort()

            def longest_common_suffix(strings):
                # Reverse all strings
                reversed_strings = [s[::-1] for s in strings]
                # Find common prefix of reversed strings
                prefix = os.path.commonprefix(reversed_strings)
                # Reverse it back to get the suffix
                return prefix[::-1]

            prefix = os.path.commonprefix(matching_files)
            suffix = longest_common_suffix(matching_files)

            num_zeros = len(str(len(matching_files)))

            for i in range(len(matching_files)):
                name_to_be = f"{prefix}{i+1:0{num_zeros}d}{suffix}"
                if matching_files[i] != name_to_be:
                    shutil.move(dir_path / matching_files[i], dir_path / name_to_be)
                    # print("Filename:", dir_path / matching_files[i], "was changed to", dir_path / name_to_be)

        else:
            print("No files found with given directory and regex pattern")


    except re.error as e:
        print(f"Invalid regular expression pattern: {e}")


renumber_files()

Please enter the relative of absolute folder path you would like perform renumbering on:
Please enter the regex pattern for the file prefix:


In [75]:
# Remove the ./testdir and all files within
if os.path.exists(test_dir):
    shutil.rmtree(test_dir)
    print(f"Removed directory: {test_dir}")
else:
    print(f"Directory not found: {test_dir}")

Removed directory: numbered_files


As an added challenge, write another program that can insert gaps into numbered files (and bump up the numbers in the filenames after the gap) so that a new file can be inserted.

### Converting Dates from American- to European-Style
Say your boss emails you thousands of files with American-style dates (MM-DD-YYYY) in their names and needs them renamed to European-style dates (DD-MM-YYYY). This boring task could take all day to do by hand! Instead, write a program that does the following:

  1.  Searches all filenames in the current working directory and all subdirectories for American-style dates. Use the os.walk() function to go through the subfolders.

  2.  Uses regular expressions to identify filenames with the MM-DD-YYYY pattern in them—for example, spam12-31-1900.txt. Assume the months and days always use two digits, and that files with non-date matches don’t exist. (You won’t find files named something like 99-99-9999.txt.)

  3.  When a filename is found, renames the file with the month and day swapped to make it European-style. Use the shutil.move() function to do the renaming.

In [68]:
from datetime import datetime, timedelta

# Creating ./dated_files
dated_files_dir = Path("./dated_files")
os.makedirs(dated_files_dir, exist_ok=True)

# Today
today = datetime.today()

# Generate list of dates from 30 days ago to today
date_list = [(today - timedelta(days=i)).strftime("%m-%d-%Y") for i in range(30, -1, -1)]

for date in date_list:
    with open(f'./dated_files/{date}.txt', 'w') as file:
        pass

In [74]:
def reformat_dated_files():
    na_pattern = re.compile(r'(\d{2})-(\d{2})-(\d{4})')
    for _, _, filenames in os.walk(dated_files_dir):
        for fn in filenames:
            match = na_pattern.search(fn)
            if match:
                eu_dated_name = na_pattern.sub(r'\2-\1-\3', fn)
                shutil.move(dated_files_dir / fn, dated_files_dir / eu_dated_name)
                # print("Filename:", dated_files_dir / fn, "was changed to", dated_files_dir / eu_dated_name)
                
reformat_dated_files()

In [76]:
# Remove the ./dated_files and all files within
if os.path.exists(dated_files_dir):
    shutil.rmtree(dated_files_dir)
    print(f"Removed directory: {dated_files_dir}")
else:
    print(f"Directory not found: {dated_files_dir}")

Removed directory: dated_files
