louist87 edited this page Jan 17, 2013 · 5 revisions

##What is a Formatter?

The scrappy.formatters.Formatter class converts information scraped from TheTVDB into a predefined file-name format. In other words, it formats a string based on the data contained in a tvdb.Show object.

By default, Scrappy employs the seriesname.SXX.EXX.episodename.ext format. This is a widely-used format that most video library scrapers will recognize. Under some conditions, however, it may be desireable to use another predefined format or to define your own formats.

##Using an alternative Formatter

###API scripting

To use an alternative formatter, pass an instance to the Scrape class' __init___ function:

s = scrape('*.avi*', formatter=some_formatter_instance)
if s.map_episode_info():

Alternative formatters can be found in the scrappy.formatters module or implemented (see below).

###Command-Line Application

The Formatter to be used can be declared via the --formatter command-line argument. You may specify one of the formatters contained in scrappy.formatters by passing its name, e.g.: --formatter formatter_X0X.

You may also pass a formatter you have written yourself by passing the path to your script and the name of the formatter, separated by a colon, e.g.: --formatter ../path/to/file/

##Defining a custom Formatter

Using the Formatter class is easy. It requires three elements:

  1. a python advanced format unicode string
  2. a separator
  3. a parser, which is just a dictionary mapping strings to lists of functions.

We will review each of these in turn.

The advanced formatted string is referred to as a format string. It defines the order in which data fields appear in the file name, and must be unicode to avoid probems with your filesystem or TVDB scrapes. To begin, let us consider the format string for the default seriesname.SXX.EXX.episodename.ext file-name format. As you can see, formatting is quite intuitive:


The string is composed of constants (substrings that are identical across all file names) variables (substrings that depend on information scraped from TheTVDB) and a separator variable, which indicates that the aforementioned separator should be inserted at a given position. The extension is automatically appended and must not be specified.

Both types of variables are contained in curly braces. {sep} indicates that a separator must be inserted. By default, the separator is an empty string, thereby concatenating all fields. A popular separator is '.'.

The other variables may include any name that can be found in the Show, Season or Episode classes from the tvdb_api module. Popular choices include:

  • seriesname
  • seasonnumber
  • episodenumber
  • episodename
  • runtime
  • language
  • dvd_episodenumber
  • absolute_number
  • firstaired
  • id
  • imdb_id

As previously mentioned, the parser object is a simple dictionary mapping strings to lists of functions. The strings are none other than the variables in the format string (without the enclosing braces). The list of functions serves as a processing chain that transforms the data in the variable as desired before inserting it into the format chain.

The scrappy.formatters module includes several default processing functions that cover most commonly-applied transformations:

  • titlecase: U.S. standard title format ('it's always sunny in philadelphia' -> 'It's Always Sunny in Philadelphia')
  • stripper: remove leading and trailing whitespace
  • zfiller: pad numbers with leading zeros such that the number occupies two spaces ('3' -> '03')
  • all_lower: all lowercase letters
  • all_upper: all uppercase letters
  • dot_sep: replace whitespace with a '.' separator.

These functions are executed in the same order as they appear in the list, passing the output of one into the next. For example, consider the following key->value pair:

'seriesname': [stripper, zfiller, titlecase, dot_sep]

This processing chain will turn hello world 3 into Hello.World.03.

Note: variables can be omitted from the formatter if no processing is required.

The distinction between the processing chain and the formatting string is confusing to some, but it is actually very straightforward:

  • Formatting String: define what data should appear in the file name and in what order it should appear. Specify the separator.
  • Processing Chain: Define how the data to be inserted into the formatting string should appear.