Extract data from an HTML table and store results to a csv file.
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
examples
table2csv
.gitignore
MANIFEST.in
README.rst
setup.py

README.rst

table2csv

Simple script for downloading html tables as csv.

Installation

pip install -U table2csv

Usage

table2csv http://en.wikipedia.org/wiki/List_of_Super_Bowl_champions > dump.txt
python -m table2csv.main http://en.wikipedia.org/wiki/List_of_Super_Bowl_champions > dump.txt

Use --nth=[int] to grab a certain table from the page.

Features

  • accepts a URL
  • Identifies all the tables
  • Merges tables that share same structure (e.g. same column headers get merged)
  • Figures out which table is the biggest
  • extracts text
  • extracts links

TODO

  • detect the data types found within each column
  • add support for tables with hierarchical indices on the rows and/or columns

View on Github