Generates a table of contents for an HTML document based on headings, with anchor links from the TOC to specific headings.
Takes an HTML page and generates a table of contents.



Released under the GNU General Public Licence, Version 2:

This Version

  • Version: 0.3

  • Release Date: 2010-01-18

Revision History

Version: 0.3

  • Release Date: 2010-01-18

  • Changes:

    • Converted functions into a class with properties and methods. See "Using PyToc", below.

Version: 0.21

  • Release Date: 2010-01-11

  • Changes:

    • Fixed HTML rendering error in format_contents()

Version: 0.2

  • Release Date: 2010-01-11

  • Changes:

    • Fixed bugs
    • Cleaned up code
    • Added format_contents() to generate an HTML table of contents
    • Added count_chars() for use inside format_contents()

Version: 0.11

  • Release Date: 2010-01-11

  • Changes:

    • Fixed formatting
    • Added heading permalink

Version: 0.1

  • Release Date: 2010-01-11

  • Changes:

    • First Commit

Using PyToc

You can see the code in action on my website's About page.

It's pretty simple to use. Download and save it somewhere in your PATH.

Here's a demonstration:

import urllib
import pytoc

url = ''
page = urllib.urlopen(url)
html =

toc = pytoc.Toc(html_in=html)

toc.html_toc # returns an HTML table of contents
toc.html_out # returns the html with anchors and numbering in headings
toc.toc_list # returns a list of tuples in the form (section number, title)

####Input Properties

The following are input properties you enter to generate the table of contens.

  • html_in - The HTML document for which you want to generate a table of contents.

    This is the only necessary property to assign. The rest have default values that may meet your needs.

  • levels - A list of numbers corresponding to the heading levels you want to include in your TOC.

    E.g. [3, 4] would include <h3> and <h4> headings.

    Default is [3, 4].

  • id - The base id of the HTML table of contents to be generated.

    Default is "toc".

  • title - The title of the generated table of contents.

    Default is "Contents".


  • make_toc() - this generates the table of contents and populates the output properties.

    Returns True when complete.

####Output Properties

After calling the make_toc() method, the following output properties are populated with values.

  • html_out - The same as html_in except with the TOC anchors and numbering included in the headings.

  • html_toc - The generated HTML table of contents.

  • toc_list - A list of tuples containing the anchors and headings, in case you would rather roll your own HTML table of contents.

That's it, really.