Formatters
##What is a Formatter?
The scrappy.formatters.Formatter
class converts information scraped from TheTVDB into a
predefined file-name format. In other words, it formats a string based on the data contained in a tvdb.Show
object.
By default, Scrappy employs the seriesname.SXX.EXX.episodename.ext
format. This is a widely-used format that most video library scrapers will recognize. Under some conditions, however, it may be desireable to use another predefined format or to define your own formats.
##Using an alternative Formatter
###API scripting
To use an alternative formatter, pass an instance to the Scrape
class' __init___
function:
s = scrape('*.avi*', formatter=some_formatter_instance)
if s.map_episode_info():
s.rename_files()
Alternative formatters can be found in the scrappy.formatters
module or implemented (see below).
###Command-Line Application
The Formatter
to be used can be declared via the --formatter
command-line argument.
You may specify one of the formatters contained in scrappy.formatters
by passing its name, e.g.: --formatter formatter_X0X
.
You may also pass a formatter you have written yourself by passing the path to your script and the name of the formatter, separated by a colon, e.g.: --formatter ../path/to/file/my_formatters.py:custom_formatter
##Defining a custom Formatter
Using the Formatter
class is easy. It requires three elements:
- a python advanced format unicode string
- a separator
- a
parser
, which is just a dictionary mapping strings to lists of functions.
We will review each of these in turn.
The advanced formatted string is referred to as a format string. It defines the order in which data fields appear in the file name, and must be unicode to avoid probems with your filesystem or TVDB scrapes. To begin, let us consider the format string for the default seriesname.SXX.EXX.episodename.ext
file-name format. As you can see, formatting is quite intuitive:
u'{seriesname}{sep}S{seasonnumber}{sep}E{episodenumber}{sep}{episodename}'
The string is composed of constants (substrings that are identical across all file names) variables (substrings that depend on information scraped from TheTVDB) and a separator variable, which indicates that the aforementioned separator should be inserted at a given position. The extension is automatically appended and must not be specified.
Both types of variables are contained in curly braces. {sep}
indicates that a separator must be inserted. By default, the separator is an empty string, thereby concatenating all fields. A popular separator is '.'
.
The other variables may include any name that can be found in the Show
, Season
or Episode
classes from the tvdb_api
module. Popular choices include:
- seriesname
- seasonnumber
- episodenumber
- episodename
- runtime
- language
- dvd_episodenumber
- absolute_number
- firstaired
- id
- imdb_id
As previously mentioned, the parser
object is a simple dictionary mapping strings to lists of functions. The strings are none other than the variables in the format string (without the enclosing braces). The list of functions serves as a processing chain that transforms the data in the variable as desired before inserting it into the format chain.
The scrappy.formatters
module includes several default processing functions that cover most commonly-applied transformations:
-
titlecase
: U.S. standard title format ('it's always sunny in philadelphia' -> 'It's Always Sunny in Philadelphia') -
stripper
: remove leading and trailing whitespace -
zfiller
: pad numbers with leading zeros such that the number occupies two spaces ('3' -> '03') -
all_lower
: all lowercase letters -
all_upper
: all uppercase letters -
dot_sep
: replace whitespace with a'.'
separator.
These functions are executed in the same order as they appear in the list, passing the output of one into the next. For example, consider the following key->value pair:
'seriesname': [stripper, zfiller, titlecase, dot_sep]
This processing chain will turn hello world 3
into Hello.World.03
.
Note: variables can be omitted from the formatter if no processing is required.
The distinction between the processing chain and the formatting string is confusing to some, but it is actually very straightforward:
- Formatting String: define what data should appear in the file name and in what order it should appear. Specify the separator.
- Processing Chain: Define how the data to be inserted into the formatting string should appear.