Skip to content

rufuspollock/dataset-gla

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains code and data related to Greater London Authority spending. Its primary purpose it to prepare the openly available GLA data for loading into OpenSpending.

See also this blog post http://schoolofdata.org/2013/03/26/using-sql-for-lightweight-data-analysis/

Data

Consolidated data is in data/all.csv. For the schema see datapackage.json.

Data Preparation

Do the following steps:

  1. Pull down a copy of the data:

     node scripts/scrape.js
    
  2. Symlink the directory with the downloaded data to archive/latest

     ln -s archive/{current-date} archive/latest
    
  3. Clean the data

     node scripts/process.js
    

Some Background

This data is pretty horrible. In the current 65 files (summer 2013) one can find approximately 20+ different structures of the CSV files. See scripts/process.js for the gory details.

CSV files are listed on http://www.london.gov.uk/mayor-assembly/gla/spending-money-wisely/budget-expenditure-charges/expenditure-over-250

That site states:

The Mayor is committed to providing financial transparency. In 2008 he instructed that regular reports should be published on all GLA expenditure over £1,000 (including VAT). From summer 2010 the reporting threshhold was reduced to £500 (including VAT), and from Period 1 2011/12 the reporting threshhold was changed to £500 excluding VAT. From Period 2 2012/13 onwards the reporting threshold was changed to £250 excluding VAT.

From Period 4 2012/13 onwards the report includes expenditure from the GLA's subsidiary, GLA Land & Property Ltd.

There are more than 60 CSV files as of July 2013 (a list can be found in scrape.json").

Unfortunately the "format" varies substantially, not only in terms of fields but in e.g. number of blank columns or blank lines etc etc.

A summary can be found in this Data Explorer gist.

Aside: from the presence of "SAP Document No" field in several of the CSVs it appears likely that the GLA are using SAP for their accounting systems.

Errors

  • (July 2013) Period 8 2012/13 is an HTML file showing a 403 Access Denied from someone's login session
    • (March 2013) Bad file for Period 8 2012/13 (13 October - 10 November). The file is not named in the usual way "Mayor's%20250%20Report%20-%202012-13%20-%20P8%20%20-%20Final.csv" and appears to be an Excel file that was not converted to CSV!
  • Amounts are formatted with "," making them appear as strings to computers.
  • Dates vary substantially in format from "16 Mar 2011" in this file to "21.01.2010" in January 2010 data
  • Use of (978) to indicate negative amounts rather than -978
  • Repeated data in 2012-13-P4 file

Plan

Repeat monthly part each month as new data becomes available!

About

Greater London Authority spending data (for OpenSpending)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published