Some Perl scripts to scrape live texts and other data from football websites
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Live Text

This project comes up with some simple Perl scripts to scrape live text commentaries and other data from German, English and Russian football websites and transform them in nice and handy xml files.

Getting Started

All scripts are stand alone scripts. Just define some variables as described in the script files and there you go.


All scripts are run from the command line: perl

On Unix systems, the xsl stylesheets can be run from the command line with xsltproc: xsltproc stylesheet.xsl input.xml > output.xml

Scripts,, (all German) and (English) will download all live text commentaries from one season or tournament. The output format (one file per season) is compatible to, a java package to prepare some data for AutoChirp, a web application for automatized and prescheduled tweets. (German) and (English) will download all match reports of one season (tested with Bundesliga on kicker and Premier League on sportsmole, other competetions might require some minor changes in the regexes).

If you want to use the data with corpus linguistic applications like Corpus Workbench or AntConc, a xsl-transformation that pushes the metadata to xml-attributes, will be necessary. You can use livetext.xsl and matchreport.xsl for this purpose.

Use Cases

To see some examples how the data can be used, please visit, or


Simon Meier-Vieracker, Berlin (


This project is licensed under the GNU General Public License v2.0