Extract data from an HTML table and store results to a csv file.
Python
Latest commit 730fb6d Oct 2, 2015 @hernamesbarbara Merge pull request #2 from Candunc/master
Changed the seperator to a comma, removed some errors
Permalink
Failed to load latest commit information.
examples
table2csv
.gitignore
MANIFEST.in set up for pypi Nov 6, 2013
README.rst
setup.py updated readme and examples Nov 7, 2013

README.rst

table2csv

Simple script for downloading html tables as csv.

Installation

pip install -U table2csv

Usage

table2csv http://en.wikipedia.org/wiki/List_of_Super_Bowl_champions > dump.txt
python -m table2csv.main http://en.wikipedia.org/wiki/List_of_Super_Bowl_champions > dump.txt

Use --nth=[int] to grab a certain table from the page.

Features

  • accepts a URL
  • Identifies all the tables
  • Merges tables that share same structure (e.g. same column headers get merged)
  • Figures out which table is the biggest
  • extracts text
  • extracts links

TODO

  • detect the data types found within each column
  • add support for tables with hierarchical indices on the rows and/or columns

View on Github