Skip to content

Commit

Permalink
Merge pull request #1 from open-contracting/dev-pr
Browse files Browse the repository at this point in the history
OCDS Kingfisher Process - first commit.
  • Loading branch information
odscjames committed Jan 8, 2019
2 parents d954f0a + ed16e9a commit f389ce4
Show file tree
Hide file tree
Showing 52 changed files with 2,016 additions and 1 deletion.
3 changes: 3 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[flake8]
exclude = venv/, .ve/, data/, src/
max-line-length = 160
16 changes: 16 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
*.sqlite3
*.swp
*.mo
*~
.ve
*.pyc
__pycache__
media
.coverage
htmlcov
docs/_build
.cache/*
.hypothesis/*
.pytest_cache
venv/
data/
24 changes: 24 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
sudo: false
addons:
chrome: stable
postgresql: "10"
apt:
packages:
- postgresql-10
- postgresql-client-10
env:
global:
- PGPORT=5433
- KINGFISHER_PROCESS_DB_URI='postgres:///travis'
services:
- postgresql
language: python
python:
- "3.5"

install:
- "pip install -r requirements.txt"
- "pip install flake8"
script:
- "flake8 ocdskingfisherprocess/ ocdskingfisher-process-cli tests"
- "py.test"
29 changes: 29 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
BSD 3-Clause License

Copyright (c) 2018, Open Contracting Data Standard
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
# kingfisher-process
# OCDS Kingfisher
15 changes: 15 additions & 0 deletions docs/cli-check-collection.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Command line tool - check-collection option
===========================================

This command checks all data so far in a collection.

It can be run multiple times on a collection, and data already checked will not be rechecked.

Pass the ID of the collection you want checked. Use :doc:`cli-list-collections` to look up the ID you want.

.. code-block:: shell-session
python ocdskingfisher-process-cli check-collection 17
TODO write about checking different schema versions here - but how that works is about to change, so no point documenting it now.
8 changes: 8 additions & 0 deletions docs/cli-list-collections.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Command line tool - list-collections option
===========================================

This command lists all the collections this install of the app knows about.

.. code-block:: shell-session
python ocdskingfisher-process-cli list-collections
14 changes: 14 additions & 0 deletions docs/cli-upgrade-database.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Command line tool - upgrade-database option
===========================================

This tool will setup from scratch or update to the latest versions the tables and structure in the Postgresql database.

.. code-block:: shell-session
python ocdskingfisher-process-cli upgrade-database
If you want to delete all the existing tables before setting up empty tables, pass the `deletefirst` flag.

.. code-block:: shell-session
python ocdskingfisher-process-cli upgrade-database --deletefirst
19 changes: 19 additions & 0 deletions docs/cli.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Command line tool
=================


You can use the tool with the provided CLI script. There are various sub commands.

You can pass the `verbose` flag to all sub commands, to get more output printed to the terminal.

.. code-block:: shell-session
python ocdskingfisher-process-cli --verbose run ...
.. toctree::


cli-upgrade-database.rst
cli-list-collections.rst
cli-check-collection.rst

5 changes: 5 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
master_doc = 'index'

project = 'OCDS Kingfisher Process Tool'
copyright = '2018, Open Contracting Data Standard'

45 changes: 45 additions & 0 deletions docs/config.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
Configuration
=============

Database Configuration
----------------------

Postgresql Database settings can be set using a `~/.config/ocdskingfisher-process/config.ini` file. A sample one is included in the
main directory.


.. code-block:: ini
[DBHOST]
HOSTNAME = localhost
PORT = 5432
USERNAME = ocdsdata
PASSWORD = FIXME
DBNAME = ocdsdata
It will also attempt to load the password from a `~/.pgpass` file, if one is present.

You can also set the `KINGFISHER_PROCESS_DB_URI` environmental variable to use a custom PostgreSQL server, for example
`postgresql://user:password@localhost:5432/dbname`.

The order of precedence is (from least-important to most-important):

- config file
- password from `~/.pgpass`
- environmental variable

Web Configuration
-----------------

TODO write up the API Key - notes: KINGFISHER_PROCESS_WEB_API_KEYS env var or [WEB] API_KEYS= in ini. Comma seperated.

Logging Configuration
---------------------

This tool will provide additional logging information using the standard Python logging module, with loggers in the "ocdskingfisher"
namespace.

When using the command line tool, it can be configured by setting a `~/.config/ocdskingfisher-process/logging.json` file.
A sample one is included in the main directory.

54 changes: 54 additions & 0 deletions docs/data-model.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
Data Model
==========

Collections
-----------

Collections are a set of data that are handled separately.

A collection is defined uniquely by a combination of all the variables listed below.

* Name. A String. Can be anything you want.
* Date. The date the collection started.
* Sample. A Boolean flag.

A collection is also given a numeric ID.

Files
-----

Each collection contains one or more files.

Each file is uniquely identified in a collection by it's file name.

Data Types for Files
--------------------

When giving file to this software to load, you must specify a data type. This can be:

* record - the file is a record.
* release - the file is a release.
* record_package - the file is a record package.
* release_package - the file is a release package.
* record_package_json_lines - the file is JSON lines, and every line is a record package
* release_package_json_lines - see last entry, but release packages.
* record_package_list - the file is a list of record packages. eg [ { record-package-1 } , { record-package-2 } ]
* release_package_list - see last entry, but release packages.
* record_package_list_in_results - the file is a list of record packages in the results attribute. eg { 'results': [ { record-package-1 } , { record-package-2 } ] }
* release_package_list_in_results - see last entry, but release packages.

Items
-----

Each File contains one or more items, where an item as a piece of OCDS data - a release, record, release package or record-package.

Some files only contain one item, and in that case there will only be one item per file.

Some files contain many items. For example;

* JSON Lines files
* A file downloaded from an API where the file is a JSON object that contains a list of records. eg http://www.contratosabiertos.cdmx.gob.mx/api/contratos/array

Each items has an integer number, which lists the order they appear in.

Each item is uniquely identified in a file by it's number.
22 changes: 22 additions & 0 deletions docs/development.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Development
===========

Run tests
---------

Run `py.test` from root directory.

The tests will drop and create the database, so you probably want to specify a special testing database with a environmental variable - see :doc:`config`.


Main Database - Postgresql
--------------------------

Create DB Migrations with Alembic - http://alembic.zzzcomputing.com/en/latest/

.. code-block:: shell-session
alembic --config=mainalembic.ini revision -m "message"
Add changes to new migration, and make sure you update database.py table structures and delete_tables to.

15 changes: 15 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
OCDS Kingfisher Process tool
============================

OCDS Kingfisher Process is a tool for storing and analysing data from publishers of the Open Contracting Data Standard.

(It does not download data - for that, see the Scrape part of Kingfisher)

.. toctree::

data-model.rst
requirements-install.rst
config.rst
cli.rst
web.rst
development.rst
52 changes: 52 additions & 0 deletions docs/requirements-install.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
Requirements and Install
========================

Requirements
------------

Requirements:

- python v3.5 or higher
- Postgresql v10 or higher

Requirements for website
------------------------

Requirements:

- A Web Server capable of running a WSGI Python app

Installation
------------

Set up a venv and install requirements:

.. code-block:: shell-session
virtualenv -p python3 .ve
source .ve/bin/activate
pip install -r requirements.txt
pip install -e .
Database
--------

You need to create a UTF8 Postgresql database and create a user with write access.

Once you have created the database, you need to configure the tool to connect to the database.

You can see one way of doing that in the example below, but for other options see :doc:`config`.

You also have to run a command to create the tables in database.

You can see the command in the example below, but for more on that see :doc:`cli-upgrade-database`.

Example of creating an database user, database and setting up the schema:

.. code-block:: shell-session
sudo -u postgres createuser ocdskingfisher --pwprompt
sudo -u postgres createdb ocdskingfisher -O ocdskingfisher --encoding UTF8 --template template0 --lc-collate en_US.UTF-8 --lc-ctype en_US.UTF-8
export KINGFISHER_PROCESS_DB_URI='postgres://ocdskingfisher:PASSWORD YOU CHOSE@localhost/ocdskingfisher'
python ocdskingfisher-process-cli upgrade-database
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
## This file is a hack to make Read The Docs Work
Loading

0 comments on commit f389ce4

Please sign in to comment.