Skip to content

A script to batch rename PDF files based on metadata/XMP title and author

Notifications You must be signed in to change notification settings

jdmonaco/pdf-title-rename

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

pdf-title-rename

A batch-renaming script for PDF files based on the Title and Author information in the metadata and XMP. The XMP metadata, if available, supersedes the standard PDF metadata. The output format is currently:

[FirstAuthorLastName [LastAuthorLastName ]- ][SanitizedTitleText].pdf

Only the article title is used if no author information is found. First and last author surnames are used if the creator field in the XMP is a list of more than one author.

To use the script, you can list one or more file paths to PDF files that should be processed. If you have downloaded a large number of files in a particular directory, you can use the shell expansion wildcard * to list those files; for example,

pdf-title-rename path/to/folder/*.pdf

will process all .pdf files in path/to/folder. If you want to see what it would do without changing anything, do a dry run with -n. There is also an interactive mode with -i that will let you open the files and manually enter titles and author strings if you have problematic PDFs without proper metadata.

usage: pdf-title-rename [-h] [-n] [-i] [-d DESTINATION] files [files ...]

PDF batch rename

positional arguments:
  files                 list of pdf files to rename

optional arguments:
  -h, --help            show this help message and exit
  -n                    dry-run listing of filename changes
  -i                    interactive mode
  -d DESTINATION, --dest DESTINATION
                        destination folder for renamed files

This script is intended as a first pass in an academic PDF workflow to get browsable filenames for a pile of articles that have been downloaded but not yet filed away.

Requirements

The PDF parsing uses PDFMiner. The XMP parsing uses this xmp module. Important: For XMP parsing to work, you will need to copy the code from that link to an xmp.py file and put it on your PYTHONPATH. (Note, I've now included xmp.py in the repo in case that link goes away; you still have to put it on your python path.)

About

A script to batch rename PDF files based on metadata/XMP title and author

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages