Represent API: Data
Boundary files are under boundaries/. Most are stored in a directory tree matching Open Civic Data Division Identifiers (OCD-ID) starting at boundaries/ocd-division/. Federal, provincial and territorial boundary files are further scoped by redistribution year.
A few boundary files exist outside the OCD-ID tree. Some, like
ca_csd, are Census geography files whose OCD-ID would clash with Canada's. Others are the sources of multiple boundary sets in the API, each with a different OCD-ID.
Open North has permission to redistribute all shapefiles in this repository. Please read the overall license and the
LICENSE.txt file in each directory to know your rights. In some cases, you will not have permission to redistribute the shapefile.
Open North lacks permission to redistribute the shapefiles of some boundary sets in the API. Refer to the
data_url of those boundary sets to get copies of those shapefiles.
All datasets are from government sources, with one exception: the postal codeOM dataset in the
postcodes/fed directory is from Geocoder.ca. The
definition.py files will have more details on sources and any modifications made to the files. Postal CodeOM is an official mark of Canada Post Corporation.
# Invoke must not be installed globally. pip uninstall invoke # Create a virtual environment. mkvirtualenv representdata # Install the requirements. pip install -r requirements.txt flake8 npm install -g esri-dump
For all the following commands, add
--base=path/to/private/data to run them on the private repository.
Load the virtual environment:
pyenv activate representdata
List the available maintenance tasks:
Update the OCD-IDs:
curl -O https://raw.githubusercontent.com/opencivicdata/ocd-division-ids/master/identifiers/country-ca.csv
Maintain definition files
Make the code style consistent:
Check that all
definition.py files are valid:
Check that all data directories contain a
LICENSE.txt (don't run on the private repository):
Check that the source, data and license URLs work:
Find and correct the URLs in
definition.py files. If you update a
licence_url, you may need to update other occurrences in
tasks.py and this master spreadsheet. Once all corrections are made, re-run
If you update a
data_url, update its shapefile,
id_func following the instructions below.
After downloading shapefiles, but before committing, you must process the shapefiles, as described in the next step.
Check for old boundaries that may require manual updates:
Update a specific out-of-date shapefile. This task updates the
last_updated date in the
invoke shapefiles --base=boundaries/ocd-division/country:ca/province:qc/2011
Or, update all out-of-date shapefiles. The output may contain additional instructions:
Some shapefiles are online but require exceptional processing (
invoke shapefiles will report
Unrecognized extension). Remember to update
rm -f boundaries/ca_nb_wards/OGRGeoJSON.* esri-dump http://geonb.snb.ca/ArcGIS/rest/services/GeoNB_ENB_MunicipalWards/MapServer/0 > boundaries/ca_nb_wards/wards.geojson ogr2ogr -f "ESRI Shapefile" boundaries/ca_nb_wards boundaries/ca_nb_wards/wards.geojson
After running these commands, you may have both untracked files and deleted files. This is due to sources changing filenames. If you
git add the directory, the untracked files will be staged to be added and the deleted files will be staged to be removed.
After running these commands, you may have only modified the
definition.py file, i.e. only the
last_updated value is changed. That's also fine.
After receiving a new boundary file for all municipalities in Quebec, you need to update the
definition.py file in
- Update the filename in
- Copy the output into the appropriate section of
- Separately define the boundaries of jurisdictions whose names duplicate others' (Plessisville (32045))
- Perform the other checks in the comments of the file
After loading the boundaries into Represent, check La Tuque and Sept-Îles in particular. Delete any boundary sets from Represent that are not current.
Get information about the shapefile:
ogrinfo -al -geom=NO boundaries/ocd-division/country:ca/province:qc/2011
Determine the attribute for the feature's name and, if it exists, the attribute for the feature's public identifier.
For features that are numbered like "Ward 1", if there is no attribute for the numeric identifier, we can extract it from the name, like
id_func=lambda f: re.sub(r'\D', '', f.get('WARD')). Similarly, if there is no attribute for the name, we can build it from the numeric identifier, like
name_func=lambda f: 'Ward %s' % f.get('WARD').
For features that aren't numbered like "Ward 1", determining the public identifier may be tricky: the ID should be discoverable online; no two features should have the same ID; and
OBJECTID is never the ID.
Read this section of the example
definition.py file for help writing a
If you're updating many shapefiles, it may be long to run
ogrinfo on each. Instead, run
../represent-canada/manage.py analyzeshapefiles -d . > manifest and
git diff manifest instead.
Once you've updated the
definition.py files to correctly extract the feature's name and public identifier, you can commit the
definition.py files and data files.
Fix file permissions:
Check if the data request process spreadsheet is out-of-date:
Or less verbose:
invoke spreadsheet --base=. --private-base=../represent-canada-private-data
Each data directory under concordances/ has a README explaining how to source and update its concordances. If the concordances are more than a year old and can't be sourced, they should be removed. To do so, substitute the corresponding values in the above READMEs for
fab alpheus update_concordances:args="<slug> <source> data/shapefiles/public/concordances/empty.csv"
Each data directory under postcodes/ has a README explaining how to source and update its postcodes.
We would like to express our gratitude to Kent Mewhort at the Canadian Internet Policy and Public Interest Clinic (CIPPIC), whose legal research (PDF) made it possible for this repository to be made public.