This is the developer documentation for the Farmsubsidy website, located under the following url:
Sources for the website can be found on GitHub:
The website is build with
Django as a web framework.
The following list contains only the most central requirements to get an overview which software components are used. For a complete overview have a look at the requirements.txt file on GitHub.
- Django 1.5.x
- Haystack 2.0.x for search (GitHub Fork with modifications)
- django-registration for user login
- django-piston for the API
The website uses
South for DB migrations/changes:
1) Get a copy of the project
Git clone the project:
git clone email@example.com:openspending/Farm-Subsidy.git #or use https
2) Install requirements
pip and install the requirements:
virtualenv venv source venv/bin/activate pip install -r requirements.txt
3) Configure Django
The Django project is located in the
settings.py file is split into two separate files.
settings which shouldn't change in a deployment,
settings.py.template contains settings which
should be adopted in a new deployment.
Create a copy
settings.py.template and adopt the settings to your needs.
4) Install PostgreSQL
Farmsubsidy code uses some
SQL syntax which is not compatible with
SQLite and the website
is intended to handle/present large amounts of data, so you have to start directly with a native
DB and ommit a test installation with
SQLite. Since Farmsubsidy is build and tested with
PostgreSQL installation is recommended.
If you haven't that much experience with installing databases: it's not as painful as you might think, for the mac e.g. there is a client which can be installed and is up and running with one click: http://postgresapp.com/
psql and create a new DB with:
CREATE DATABASE farm_geo;
If you are just running a test installation on localhost not using a username and passwort (don't do that in production) this should already do the trick!
5) Sync/migrate the DB
Since there is an old
GeoDjango dependency in the
South migrations, early migrations won`t work
without hassle, so sync all apps with
cd web python manage.py syncdb --all
South back to work again, first list all apps which uses migrations:
python manage.py migrate --list
Then do fake migrations to the latest migration for all apps, e.g.:
python manage.py migrate data LATESTMIGRATIONNUMBER --fake
6) Install Haystack backend
If you use
Whoosh as a backend for Haystack, you have to install it (older version due to dependencies):
pip install whoosh==2.4
7) Temporary: create payment_totals.txt
This is due to some legacy code and will be removed as soon as possible:
Create a textfile
data/stats/payment_totals.txt (from repository root, not from
and enter some fake numbers like this:
8) Run the server
Run the development server with:
python manage.py runserver
You should be able to see the farm subsidy website under the URL provided and enter the admin area.
Execute the following SQL manually in case your columns don't fit (it can't be migrated):
ALTER TABLE data_recipient ALTER COLUMN total SET DEFAULT 0.0; ALTER TABLE data_countryyear ALTER COLUMN total SET DEFAULT 0.0; ALTER TABLE data_recipientyear ALTER COLUMN total SET DEFAULT 0.0; ALTER TABLE data_scheme ALTER COLUMN total SET DEFAULT 0.0; ALTER TABLE data_schemeyear ALTER COLUMN total SET DEFAULT 0.0; ALTER TABLE data_recipientschemeyear ALTER COLUMN total SET DEFAULT 0.0; ALTER TABLE data_totalyear ALTER COLUMN total SET DEFAULT 0.0;
It's needed to make
total columns default to
Like all Django projects the Farmsubsidy website is organized in different Django apps. Here is a list of the existing apps with a short description. Don't take the Importance column too serious, it is just for rough orientation:
|api||API for farmsubsidy||/api/||inactive||+|
|countryinfo||App for transparency index||/transparency/||active||++|
|data||Central app, data structure||/, /ES/*||active||+++|
|features||News and reports app||/news/*||active||+|
|frontend||Annotation management for users||/myaccount/*||active||+|
|listmaker||Experimental, recipient lists||/lists/*||inactive||+|
|petition||Special petition app, ignore||/petition/*||inactive||o|
|locale||Minimal french localization file, ignore|
|misc||Small helper classes and functions|
|templates||Folder for common templates|
You can find the main data structure in the
The core models are:
recipient is a receiver of subsidy payments and is in most cases a company or governmental
There are no unique recipient IDs provided by the EU, so the IDs are provided internally by
the system. The central identifying attribute for the recipient is the
though there will sometimes be double entries for the same entities due to inconsistencies in
the source data.
Most other information like adress data or geo information is not mandatory.
scheme is identifying a type of payment. Since the structure of the EU subsidy system has
changed over the years you can also find different type of schemes for the payments, examples are:
- Export subsidies
- Market regulations
- School Milk (yeah, healthy :-))
In the last years, the dominating schemes are:
- European Agricultural Fund for Rural Development (EAFRD)
- Direct payments under European Agricultural Guarantee Fund (EAGF direct)
- Other payments under European Agricultural Guarantee Fund (EAGF other)
See also the :ref:`background` chapter for where to read about this.
payment is a paid subsidy for a certain
recipient connected with an existing
a special year. There can be several payments per year for different schemes for the same recipient.
Loading aggregated data (up to year 2012)
For Farmsubsidy there are aggregated data files up to the farm subsidy data for 2012.
Download the data
You can download the aggregated data files in
CSV format under the following URL:
Data for a single country is provided in a packaged format, e.g.:
Put the data in the data folder in the following format:
You need the following files there:
Import the data
Now you can import the data with custom Django management commands, e.g. for Austria:
python manage.py copier -c AT #takes some time...
Loading year-by-year data (year 2013 or newer)
Starting with the data for 2013 there are some changes in the data integration process going along with the introduction of the new Farmsubsidy GitHub scraper repository.
Data is now scraped and stored on a year-by-year basis (see: :ref:`scraper_data_format`) and has to be put in the data format for import in the following form:
There is a new management command load_year_data in the data app of the Farmsubsidy sources which can be used like this:
python manage.py load_year_data COUNTRY YEAR DELIMITER [--simulate] [--ignore-existing]
This management command loads data from the new simplified data format. It tries to match recipients
name attribute and connects a payment either to a matched recipient or creates a new one if
no match was found. You can run the command with the
--simulate option to get an impression of
how many recipients would be matched.
--ignore-existing option lets you ignore already existing entries for the given year and country in the DB,
otherwise there would be an error message.
This management command is still in a BETA stadium. If you use it for integration of data in the production deployment please check how the data is integrated, if everthing is at the right place and if format, attributes and number of payments are correct. Have a look at the code on GitHub and correct if necessary!
Note that there is also a new ID format for new
payment entries calles
This is for easier ordering and determining the latest IDs, since IDs are stored in text format (ahum :-)) at the
moment, which leads to ordering like this: "GB1, GB892, GB99".
ZIDs are stored in a format like this: "[COUNTRY_CODE]Z[ID Number + 0s leading to 7 ciphers]",
leading to orderings like: "GBZ0000001, GBZ0000099, GBZ0000892".
Please be careful here. It is not yet fully determined, if the introduction of a new ID format has
negative hidden side effects on other places (if you know, drop a note). At the moment
ZIDs are also
quite (too) short due to a currently existing limitation of
max_length=10 for the ID fields.
Post-integration data processing
At the moment there is some data denormalization going on reorganizing the data into different tables for performance purposes:
python manage.py normalize -c AT #takes even longer...
Repeat that for every country or test with data for just one country.
VACUUM VERBOSE ANALYZE on all database tables afterwards (make sure you are
connected to the correct database before, on
Now you should be able to browse the imported data on the local website and see the list
recipients in the Django admin area.
Update the search index
When all/some countries are imported, run search indexing:
python manage.py fs_update_index #yes, you guessed it, don't drink too much coffee :-)... #For this step you can definitely go away and do something else.
Now you should be able to use the search box on the website and get some results.
Update total payments number
After this you can update the total payments number on the front page like this:
python manage.py payment_totals #This is quick. Whew. :-)
Test coverage is poor, but new tests are being written all the time, as my resolution is not to fix any bug without writing a test for it first.
Some tests only test code, but mostly the tests are there for making sure the database is being processes correctly in the (de)normalization process.
Because there is quite a large dataset (to make testing better) it's highly recommended that a persistent test database is set up and the persistent test runner from Django Test Utils is used.
The initial data for the recipient, payment and scheme model is found in
This should be loaded in to the
test_[db_name] database before running the tests.
Below are the steps that should be taken, assuming the code is actually running:
settings.py(see comment there)
- Create the test database somehow. I find this is easiest done by running
./manage.py testserveras this doesn't destroy the database on exit. You could also prefix the database name in settings with
test_, syncdb and then change it back again.
- Load the data in
./web/data/fixtures/data.sqlin to the new database. This isn't added automatically because of the time it takes to run tests without the persistent database.
Changelog for the development of the website.
Current Changes (version not yet determined) (2014-03-15)
- Added new section in docs for
Websitedevelopment documentation (see: :ref:`website`)
- Added detailed installation instructions for website/DB deployment (see: :ref:`website_installation`)
- Integration fragmented doc files of GitHub repository in new
- Added source code description in docs with app overview (see: :ref:`website_source_overview`)
- Added information about how to load data in the DB (see: :ref:`website_loading_data`)
- Added new management command
dataapp on GitHub for loading year specific data in new data format starting with the 2013 data. Data loading can be simulated with
--simulate, new recipients are matched by
nameattribute against existing recipients. New
ZIDID format for
recipients. (see load_year_data.py file on GitHub)
- Added documentation about how to use management command
load_year_data, additional infos about current stadium and precautions when using (see: :ref:`loading_year_by_year_data`)