# Photo Metadata Extraction and Formatting

## Overview

This program extracts selected metadata from photos in a directory and formats the metadata as a CSV (comma-separated values) text file. The resulting CSV file can be opened and edited in a spreadsheet application; it also can be used to upload metadata to a digital library system.

## Step 1: Import modules
Sources for the modules used in this program:
https://docs.python.org/3/library/os.html#files-and-directories
https://docs.python.org/3/library/mimetypes.html
https://pypi.org/project/ExifRead/
https://docs.python.org/3/library/csv.html

In [1]:
import os
import mimetypes
import exifread
import csv

## Step 2: Identify the photo directory

In [2]:
photo_directory = '/Users/heather_m_campbell/Documents/GitHub/452-final-project/Photos'

Use the len() function to verify the number of files in your directory.

In [3]:
print(len(photo_directory))

67


Generate the file names in a directory tree by walking the tree (default is top-down). For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).

In [240]:
allfiles = os.walk(photo_directory)
print(allfiles)

<generator object walk at 0x10cc983b8>


Loop through allfiles to get file names in all directories. For each directory, assign third value in 3-tuple (position 2), which is a list of files in that directory, to the file_names list. Then, in each loop iteration, use a nested loop to append the extracted files to the all_file_names list. After these loops iterate, all files in all directories should be in the all_file_names list.

For each file in the file_names list, check if file name contains 'ipynb' or 'DS_Store.' Don't want to include those nonimage files in output. In if statement, use Boolean AND to make sure both conditions are met.

In [241]:
file_num = 0
file_names = []
file_data = []
all_file_list = []

for dir in allfiles:
    path = dir[0]
#     folder = dir[1]    # doesn't work b/c returns list of directories at that level
    folder = path[(len(photo_directory)+1):]  # slice path after photo directory to get folder
    file_names = dir[2]
#     print(file_names)
    for file in file_names:
        if 'ipynb' not in file and 'DS_Store' not in file:
            file_num = file_num + 1
            file_data = [file_num, path, folder, file]
            all_file_list.append(file_data)

print(len(all_file_list))
# print(all_file_list)

115


In [242]:
# print(all_file_list)

Create empty list for each file's metadata. Create empty list to accumulate all file_data lists. Initialize counter variable so sequential IDs can be assigned to files.

mimetypes.guess_type() function returns tuple (type, encoding); type of file is based on extension (IANA); [0] --> print first value in file_format tuple (type)

The all_file_list now comprises a list for each file, containing the file ID number (position 0), path (position 1), folder (position 2), and file name (position 3).

Read photo metadata. Can't do this via a list; has to access the path of each file.

In [243]:
file_metadata = []
all_file_metadata = []

for file in all_file_list:
    file_ID = file[0]
    event = file[2]
    file_name = file[3]
    file_path = file[1] + '/' + file_name    
    file_format = mimetypes.guess_type(file_path, strict=False)
    file_size = os.stat(file_path).st_size
    file_size_MB = round((file_size*.000001),2)  # round to 2 decimal places
    image_metadata = open(file_path, 'rb')
    tags = exifread.process_file(image_metadata, details=False)
    datetime_original = tags.get('EXIF DateTimeOriginal')
    datetime_digitized = tags.get('EXIF DateTimeOriginal')
    image_software = tags.get('Image Software')
    image_width = tags.get('Image XResolution')
    image_height = tags.get('Image YResolution')
    image_units = tags.get('Image ResolutionUnit')
    latitude_ref = tags.get('GPS GPSLatitudeRef') # generates a list of coordinates
    latitude = tags.get('GPS GPSLatitude')
    longitude_ref = tags.get('GPS GPSLongitudeRef')
    longitude = tags.get('GPS GPSLongitude')
    camera = tags.get('Image Model')
    exposure = tags.get('EXIF ExposureTime')
    flash = tags.get('EXIF Flash')
    lens = tags.get('EXIF LensModel')
    file_metadata = [file_ID, event, file_name, file_format[0], file_size, file_size_MB, 
                     datetime_original, datetime_digitized,
                     image_software, image_width, image_height, image_units, 
                     latitude_ref, latitude, longitude_ref, longitude, 
                     camera, exposure, flash, lens]
    all_file_metadata.append(file_metadata)
    
# print(all_file_metadata)

csv.writer() method
column headers for fields: [file_ID, event, file_name, file_format[0], file_size, file_size_MB, camera, image_software, image_width, image_height, image_units, latitude, datetime_original, datetime_digitized]

In [244]:
outfile = open('photo_data.csv', 'w')
csv_out = csv.writer(outfile)
csv_out.writerow(['ID', 'Event', 'File Name', 'File Format', 'File Size (Bytes)',
                  'File Size (MB)', 'Date Taken', 'Date Digitized',
                  'Software', 'Image Width', 'Image Height', 'Units', 
                  'Latitude', 'Coordinates', 'Longitude', 'Coordinates',
                  'Camera', 'Exposure Time', 'Flash Used?', 'Lens Model'] )
csv_out.writerows(all_file_metadata)
outfile.close()