In [None]:
"""
Author: Ryleigh J. Bruce
Date: June 4, 2024

Purpose: To sort through a directory of images and copy files over to a new folder based on a specific string in the file name. A text file containing a list of all of the copied files is produced alongside the folder.


Note: The author generated this text in part with GPT-4,
OpenAI’s large-scale language-generation model. Upon generating
draft code, the authors reviewed, edited, and revised the code
to their own liking and takes ultimate responsibility for
the content of this code.

"""

'\nAuthor: Ryleigh J. Bruce\nDate: June 3, 2024\n\nPurpose: Selecting a random image from a folder and generating an interactive visualization.\n\n\nNote: The author generated this text in part with GPT-4,\nOpenAI’s large-scale language-generation model. Upon generating\ndraft code, the authors reviewed, edited, and revised the code\nto their own liking and takes ultimate responsibility for\nthe content of this code.\n\n'

**NOTE: These scripts will need to be modified to extract the necessary information from metadata. Delete once necessary adjustements have been completed.**

## Module: Mount the Notebook to Google Drive and Install Necessary Libraries

Here the drive module is imported, allowing the Colab environment to access files within Google Drive. The notebook is then mounted to Google Drive so that it can interact with the files.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


The `os` and `shutil` Python modules allow for file processing within the Colab environment, specifically reading, writing, copying, and moving files.

In [None]:
import os
import shutil
import random
from PIL import Image
import matplotlib.pyplot as plt

# File Search Based on Species

## Module: Defining the Directories and Search Parameters

This code block defines the source folder, the destination folder, and the file path for the text file that will be produced alongside the new image folder.

In [None]:
# Define the source directory where images are stored
source_directory = '/content/drive/MyDrive/shared-data/Notebook datafiles/4370-entire-subset/small-animal-collection'
# Define the destination directory where deer images will be copied
destination_directory = "/content/drive/MyDrive/shared-data/Notebook datafiles/image-filter/racoons"
# Define the text file path where filenames will be saved
output_text_file = "/content/drive/MyDrive/shared-data/Notebook datafiles/image-filter/Racoon Images.txt"

This line determines what the later script is looking for in the file names. Here, the string that has been specified is ‘raccoon’.

In [None]:
#define the species that is being searched for
species = 'raccoon'

## Module: Sorting the Dataset and Saving the Selected Images

In this code block the os module is used to check for the destination directory, and will create one if it does not exist.

The code `os.makedirs(destination_directory` uses the `os` module to create a directory at the specified path. The `exist_ok=True` portion of the script ensures that the code will not fail if a directory already exists, and instead will move on to the following modules.

The `images = []` initializes the images list to be used in the making of a text file in later code.

In [None]:
# Ensure that the destination directory exists, create if it does not
os.makedirs(destination_directory, exist_ok=True)

images = []

The script begins by opening a text file that will be used to record the names of the selected images. The `os.wal` function from the `os` module is used to go through all of the files at the supplied source directory, while checking for the specified species in the file name (here it is searching for ‘racoon’). When files matching that criteria is found the name of the file is written in the text file and the file is copied to the destination directory.

The final print statement notifies us that the script has completed and the images have been copied to a new folder.

The ‘except’ block ensures that any files that aren’t able to be copied to the destination folder are printed along with the associated error code.

In [None]:
# Open the text file for writing
with open(output_text_file, "w") as file:
    # Walk through the all files in the source directory
    for dirpath, dirnames, filenames in os.walk(source_directory):
# Filter for files that include 'Raccoon' in their name and are image files
        for filename in filenames:
            if species in filename.lower() and filename.lower().endswith(('.png', '.jpg', '.jpeg', '.JPG')):
                # Full path of the file
                full_file_path = os.path.join(dirpath, filename)
                # Add the file to the list of images
                images.append(full_file_path)
                # Write filename to the text file
                file.write(filename + "\n")
                # Copy the file to the destination directory
                shutil.copy(full_file_path, os.path.join(destination_directory, filename))
                try:
                    shutil.copy(full_file_path, os.path.join(destination_directory, filename))
                except Exception as e:
                    print(f"Failed to copy {filename}. Reason: {str(e)}")

print("Files have been filtered and copied.")

Files have been filtered and copied.


## Module: Display a Subset of the Filtered Images

In this code block the `subset_size` is the number of images that will be displayed within the grid. Here the value is set to 15. The subset is selected randomly using the `random.sample` function.

In [None]:
# Display a subset of images in grid format
subset_size = 15
selected_files_subset = random.sample(images, min(subset_size, len(images)))

`plt.figure(figsize=(20, 10))` sets the size of the figure to 20 units wide and 10 units tall. The columns and rows values have been set to 5 and 3 respectively.

In [None]:
fig = plt.figure(figsize=(20, 10)) # Size of the entire figure
columns = 5
rows = 3

<Figure size 2000x1000 with 0 Axes>

This code block loops over each file in the `selected_files_subset` and opens them using `PIL`. It then adds a new subplot to the figure for each image and displays it in the current subplot. `axis(‘off’)` removes the x and y axes from the subplot to maintain legibility. `ax.set_title(os.path.basename(file_path), fontsize=8, pad=5)` sets the title of the subplot as the filename of the image in size 8 font with a padding of five pixels from the image.

In [None]:
for i, file_path in enumerate(selected_files_subset):
    img = Image.open(file_path)
    ax = fig.add_subplot(rows, columns, i + 1)
    ax.imshow(img)
    ax.axis('off')
    ax.set_title(os.path.basename(file_path), fontsize=8, pad=5)

Here `plt.subplots_adjust(wspace=0.5, hspace=0.5)` is used to maintain uniform spacing between the subplots by setting the width space (`wspace`) and height space (`hspace`) to 0.5 units. `plt.tight_layout(pad=1)` automatically adjusts the canvas to ensure that there is no overlapping content, and the gridded images are then displayed.

In [None]:
# Display a subset of images in grid format
subset_size = 15
selected_files_subset = random.sample(images, min(subset_size, len(images)))

fig = plt.figure(figsize=(20, 10)) # Size of the entire figure
columns = 5
rows = 3

for i, file_path in enumerate(selected_files_subset):
    img = Image.open(file_path)
    ax = fig.add_subplot(rows, columns, i + 1)
    ax.imshow(img)
    ax.axis('off')
    ax.set_title(os.path.basename(file_path), fontsize=8, pad=5)

# Adjust spacing
plt.subplots_adjust(wspace=0.5, hspace=0.5)
plt.tight_layout(pad=1)

plt.show()

Output hidden; open in https://colab.research.google.com to view.

# File Search Based on Date

## Module: Defining the Directories and Search Parameters

This script functions and looks largely the same as the previous file search script, aside from changing the string the script is searching for in the file names.

In [None]:
# Define the source directory where images are stored
source_directory = '/content/drive/MyDrive/shared-data/Notebook datafiles/4370-entire-subset/small-animal-collection'
# Define the destination directory where images will be copied
destination_directory = '/content/drive/MyDrive/shared-data/Notebook datafiles/image-filter/June 3rd 2020'
# Define the text file path where filenames will be saved
output_text_file = '/content/drive/MyDrive/shared-data/Notebook datafiles/image-filter/June 3rd 2020 Images.txt'

It is critical to format the date the same way that it is formatted in the file names, or else the search will return no images.

In [None]:
# Define the date we are searching for in the filename
date_to_search = "2020-06-03"

## Module: Sorting the Dataset and Saving the Selected Images

The remainder of the script remains the same, aside from the `species` variable being replaced by the `date_to_search` variable.

In [None]:
# Ensure that the destination directory exists, create if it does not
os.makedirs(destination_directory, exist_ok=True)

images = []

# Open the text file for writing
with open(output_text_file, "w") as file:
    # Walk through the all files in the source directory
    for dirpath, dirnames, filenames in os.walk(source_directory):
# Filter for files that include 'Raccoon' in their name and are image files
        for filename in filenames:
            if date_to_search in filename.lower() and filename.lower().endswith(('.png', '.jpg', '.jpeg', '.JPG')):
                # Full path of the file
                full_file_path = os.path.join(dirpath, filename)
                # Add the file to the list of images
                images.append(full_file_path)
                # Write filename to the text file
                file.write(filename + "\n")
                # Copy the file to the destination directory
                shutil.copy(full_file_path, os.path.join(destination_directory, filename))
                try:
                    shutil.copy(full_file_path, os.path.join(destination_directory, filename))
                except Exception as e:
                    print(f"Failed to copy {filename}. Reason: {str(e)}")

print("Files have been filtered and copied.")

Files have been filtered and copied.


## Module: Display a Subset of the Filtered Images

A randomly selected subset of images can now be displayed using the script from the previous module to ensure that the script is functioning properly:

In [None]:
# Display a subset of images in grid format
subset_size = 15
selected_files_subset = random.sample(images, min(subset_size, len(images)))

fig = plt.figure(figsize=(20, 10)) # Size of the entire figure
columns = 5
rows = 3

for i, file_path in enumerate(selected_files_subset):
    img = Image.open(file_path)
    ax = fig.add_subplot(rows, columns, i + 1)
    ax.imshow(img)
    ax.axis('off')
    ax.set_title(os.path.basename(file_path), fontsize=8, pad=5)

# Adjust spacing
plt.subplots_adjust(wspace=0.5, hspace=0.5)
plt.tight_layout(pad=1)

plt.show()

Output hidden; open in https://colab.research.google.com to view.

# File Search based on Location

## Module: Defining the Directories and Search Parameters

As long as the desired variable is in the file name, this script can be modified to search files based on a range of variables.

In [None]:
# Define the source directory where images are stored
source_directory = '/content/drive/MyDrive/shared-data/Notebook datafiles/4370-entire-subset/small-animal-collection'
# Define the destination directory where images will be copied
destination_directory = '/content/drive/MyDrive/shared-data/Notebook datafiles/image-filter/camera2'
# Define the text file path where filenames will be saved
output_text_file = '/content/drive/MyDrive/shared-data/Notebook datafiles/image-filter/camera2/Camera 2 Images.txt'

Here the script searches for and copies all images taken at the ‘camera2’ site.

In [None]:
# Define the date we are searching for in the filename
camera_location = "camera2"

## Module: Sorting the Dataset and Saving the Selected Images

The remainder of the script remains the same, aside from the `date_to_search` variable being replaced by the `camera_location` variable.

In [None]:
# Ensure that the destination directory exists, create if it does not
os.makedirs(destination_directory, exist_ok=True)

images = []

# Open the text file for writing
with open(output_text_file, "w") as file:
    # Walk through the all files in the source directory
    for dirpath, dirnames, filenames in os.walk(source_directory):
# Filter for files that include 'Raccoon' in their name and are image files
        for filename in filenames:
            if camera_location in filename.lower() and filename.lower().endswith(('.png', '.jpg', '.jpeg', '.JPG')):
                # Full path of the file
                full_file_path = os.path.join(dirpath, filename)
                # Add the file to the list of images
                images.append(full_file_path)
                # Write filename to the text file
                file.write(filename + "\n")
                # Copy the file to the destination directory
                shutil.copy(full_file_path, os.path.join(destination_directory, filename))
                try:
                    shutil.copy(full_file_path, os.path.join(destination_directory, filename))
                except Exception as e:
                    print(f"Failed to copy {filename}. Reason: {str(e)}")

print("Files have been filtered and copied.")

Files have been filtered and copied.


## Module: Display a Subset of the Filtered Images

A randomly selected subset of images can now be displayed using the script from the previous modules to ensure that the script is functioning properly:

In [None]:
# Display a subset of images in grid format
subset_size = 15
selected_files_subset = random.sample(images, min(subset_size, len(images)))

fig = plt.figure(figsize=(20, 10)) # Size of the entire figure
columns = 5
rows = 3

for i, file_path in enumerate(selected_files_subset):
    img = Image.open(file_path)
    ax = fig.add_subplot(rows, columns, i + 1)
    ax.imshow(img)
    ax.axis('off')
    ax.set_title(os.path.basename(file_path), fontsize=8, pad=5)

# Adjust spacing
plt.subplots_adjust(wspace=0.5, hspace=0.5)
plt.tight_layout(pad=1)

plt.show()

Output hidden; open in https://colab.research.google.com to view.