Skip to content

Commit

Permalink
Merge 5007bc2 into 721439d
Browse files Browse the repository at this point in the history
  • Loading branch information
jpmckinney committed Nov 2, 2018
2 parents 721439d + 5007bc2 commit 45dd51e
Show file tree
Hide file tree
Showing 26 changed files with 310 additions and 74 deletions.
12 changes: 10 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,24 @@
# Changelog

# 0.0.3
# 0.0.3 (2018-11-01)

New options:

* compile: `--versioned`
* compile: `--package`, `--versioned`

New commands:

* package-releases
* split-record-packages
* split-release-packages

Other changes:

* Add helpful error messages if:
* the input is not [line-delimited JSON](https://en.wikipedia.org/wiki/JSON_streaming) data;
* the input to the `indent` command is not valid JSON.
* Change default behavior to print UTF-8 characters instead of escape sequences.
* Add `--ascii` option to print escape sequences instead of UTF-8 characters.
* Rename base exception class from `ReportError` to `OCDSKitError`.

# 0.0.2 (2018-03-14)
Expand Down
56 changes: 49 additions & 7 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ OCDS Kit

|PyPI version| |Build Status| |Dependency Status| |Coverage Status|

A suite of command-line tools for working with OCDS data.
A suite of command-line tools for working with OCDS data, including: creating release packages from releases; creating record packages from release packages; creating compiled releases and versioned releases from release packages; combining small packages into large packages; splitting large packages into small packages.

::

Expand Down Expand Up @@ -50,6 +50,39 @@ Optional arguments for all commands are:
* ``--encoding ENCODING`` the file encoding
* ``--pretty`` pretty print output

compile
~~~~~~~

Reads release packages from standard input, merges the releases by OCID, and prints the compiled releases.

Optional arguments:

* ``--package`` wrap the compiled releases in a record package
* ``--versioned`` if ``--package`` is set, include versioned releases in the record package; otherwise, print versioned releases instead of compiled releases

::

cat tests/fixtures/realdata/release-package-1.json | ocdskit compile > out.json

package-releases
~~~~~~~~~~~~~~~~

Reads releases from standard input, and prints one release package. You will need to edit the package metadata.

Optional positional arguments:

* ``extension`` add this extension to the package

::

cat tests/fixtures/release_*.json | ocdskit package-releases > out.json

To convert record packages to a release package, you can use `use jq </docs/Using_jq.md>`__ to get the releases from the record packages, and the ``package-releases`` command to print a release package. You will need to edit the package metadata.

::

cat tests/fixtures/realdata/record-package* | jq -crM .records[].releases[] | ocdskit package-releases

combine-record-packages
~~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -68,18 +101,27 @@ Reads release packages from standard input, collects releases, and prints one re

cat tests/fixtures/release-package_*.json | ocdskit combine-release-packages > out.json

compile
~~~~~~~
split-record-packages
~~~~~~~~~~~~~~~~~~~~~

Reads release packages from standard input, merges the releases by OCID, and prints the compiled releases.
Reads record packages from standard input, and prints smaller record packages for each.

Optional arguments:
::

cat tests/fixtures/realdata/record-package-1.json | ocdskit split-record-packages 2 | split -l 1 -a 4

* ``--versioned`` print versioned releases
The ``split`` command will write files named ``xaaaa``, ``xaaab``, ``xaaac``, etc. Don't combine the OCDS Kit ``--pretty`` option with the ``split`` command.

split-release-packages
~~~~~~~~~~~~~~~~~~~~~~

Reads release packages from standard input, and prints smaller release packages for each.

::

cat tests/fixtures/realdata/release-package-1.json | ocdskit compile > out.json
cat tests/fixtures/realdata/release-package-1.json | ocdskit split-release-packages 2 | split -l 1 -a 4

The ``split`` command will write files named ``xaaaa``, ``xaaab``, ``xaaac``, etc. Don't combine the OCDS Kit ``--pretty`` option with the ``split`` command.

tabulate
~~~~~~~~
Expand Down
4 changes: 4 additions & 0 deletions docs/Using_the_command_line.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,8 @@ Read the first 1000 bytes of a file:

head -c 1000 filename.json

Add newlines to ends of files (Fish shell):

for i in *.json; echo >> $i; end

On Windows, you may need to install [Cygwin](http://cygwin.com.) to use some command-line tools. PowerShell has [some corresponding tools](http://xahlee.info/powershell/PowerShell_for_unixer.html).
4 changes: 4 additions & 0 deletions ocdskit/cli/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,12 @@
'ocdskit.cli.commands.indent',
'ocdskit.cli.commands.mapping_sheet',
'ocdskit.cli.commands.measure',
'ocdskit.cli.commands.package_releases',
'ocdskit.cli.commands.schema_report',
'ocdskit.cli.commands.schema_strict',
'ocdskit.cli.commands.set_closed_codelist_enums',
'ocdskit.cli.commands.split_record_packages',
'ocdskit.cli.commands.split_release_packages',
'ocdskit.cli.commands.tabulate',
'ocdskit.cli.commands.validate',
)
Expand All @@ -26,6 +29,7 @@
def main():
parser = argparse.ArgumentParser(description='Open Contracting Data Standard CLI')
parser.add_argument('--encoding', help='the file encoding')
parser.add_argument('--ascii', help='print escape sequences instead of UTF-8 characters', action='store_true')
parser.add_argument('--pretty', help='pretty print output', action='store_true')

subparsers = parser.add_subparsers(dest='subcommand')
Expand Down
34 changes: 34 additions & 0 deletions ocdskit/cli/commands/base.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import json
import io
import sys
from collections import OrderedDict


class BaseCommand:
Expand All @@ -26,6 +27,18 @@ def handle(self):
def buffer(self):
return io.TextIOWrapper(sys.stdin.buffer, encoding=self.args.encoding)

def json_load(self, io):
"""
Parses JSON from a stream.
"""
return json.load(io, object_pairs_hook=OrderedDict)

def json_loads(self, data):
"""
Parses JSON from a string.
"""
return json.loads(data, object_pairs_hook=OrderedDict)

def print(self, data):
"""
Prints JSON data.
Expand All @@ -35,5 +48,26 @@ def print(self, data):
kwargs = {'indent': 2, 'separators': (',', ': ')}
else:
kwargs = {'separators': (',', ':')}
if not self.args.ascii:
kwargs['ensure_ascii'] = False

print(json.dumps(data, **kwargs))

def _update_package_metadata(self, output, package):
output['uri'] = package['uri']
output['publishedDate'] = package['publishedDate']
output['publisher'] = package['publisher']

if 'extensions' in package:
# Python has no OrderedSet, so we use OrderedDict to keep extensions in order without duplication.
output['extensions'].update(dict.fromkeys(package['extensions'], True))

for field in ('license', 'publicationPolicy', 'version'):
if field in package:
output[field] = package[field]

def _set_extensions_metadata(self, output):
if output['extensions']:
output['extensions'] = list(output['extensions'])
else:
del output['extensions']
26 changes: 6 additions & 20 deletions ocdskit/cli/commands/combine_record_packages.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import json
from collections import OrderedDict

from .base import BaseCommand
Expand All @@ -12,31 +11,18 @@ def handle(self):
output = OrderedDict([('extensions', OrderedDict()), ('packages', []), ('records', [])])

for line in self.buffer():
package = json.loads(line, object_pairs_hook=OrderedDict)
package = self.json_loads(line)

# Use sample metadata.
output['uri'] = package['uri']
output['publishedDate'] = package['publishedDate']
output['publisher'] = package['publisher']
self._update_package_metadata(output, package)

if 'extensions' in package:
# Python has no OrderedSet, so we use OrderedDict to keep extensions in order without duplication.
output['extensions'].update(dict.fromkeys(package['extensions'], True))
output['records'].extend(package['records'])

for field in ('license', 'publicationPolicy', 'version'):
if field in package:
output[field] = package[field]

for field in ('packages', 'records'):
if field in package:
output[field].extend(package[field])
if 'packages' in package:
output['packages'].extend(package['packages'])

if not output['packages']:
del output['packages']

if output['extensions']:
output['extensions'] = list(output['extensions'])
else:
del output['extensions']
self._set_extensions_metadata(output)

self.print(output)
21 changes: 3 additions & 18 deletions ocdskit/cli/commands/combine_release_packages.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import json
from collections import OrderedDict

from .base import BaseCommand
Expand All @@ -12,26 +11,12 @@ def handle(self):
output = OrderedDict([('extensions', OrderedDict()), ('releases', [])])

for line in self.buffer():
package = json.loads(line, object_pairs_hook=OrderedDict)
package = self.json_loads(line)

# Use sample metadata.
output['uri'] = package['uri']
output['publishedDate'] = package['publishedDate']
output['publisher'] = package['publisher']

if 'extensions' in package:
# Python has no OrderedSet, so we use OrderedDict to keep extensions in order without duplication.
output['extensions'].update(dict.fromkeys(package['extensions'], True))

for field in ('license', 'publicationPolicy', 'version'):
if field in package:
output[field] = package[field]
self._update_package_metadata(output, package)

output['releases'].extend(package['releases'])

if output['extensions']:
output['extensions'] = list(output['extensions'])
else:
del output['extensions']
self._set_extensions_metadata(output)

self.print(output)
53 changes: 42 additions & 11 deletions ocdskit/cli/commands/compile.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import json
from collections import defaultdict, OrderedDict

import ocdsmerge
Expand All @@ -11,20 +10,52 @@ class Command(BaseCommand):
help = 'reads release packages from standard input, merges the releases by OCID, and prints the compiled releases'

def add_arguments(self):
self.add_argument('-V', '--versioned', help='print versioned releases', action='store_true')
self.add_argument('--package', action='store_true',
help='wrap the compiled releases in a record package')
self.add_argument('--versioned', action='store_true',
help='if --package is set, include versioned releases in the record package; '
'otherwise, print versioned releases instead of compiled releases')

def handle(self):
if self.args.package:
output = OrderedDict([('extensions', OrderedDict()), ('packages', []), ('records', [])])

releases_by_ocid = defaultdict(list)

for line in self.buffer():
release_package = json.loads(line, object_pairs_hook=OrderedDict)
for release in release_package['releases']:
package = self.json_loads(line)

for release in package['releases']:
releases_by_ocid[release['ocid']].append(release)

for releases in releases_by_ocid.values():
if self.args.versioned:
merge_method = ocdsmerge.merge_versioned
else:
merge_method = ocdsmerge.merge
merged_release = merge_method(releases)
self.print(merged_release)
if self.args.package:
self._update_package_metadata(output, package)

output['packages'].append(package['uri'])

if self.args.package:
for ocid, releases in releases_by_ocid.items():
record = OrderedDict([
('ocid', ocid),
('releases', releases),
('compiledRelease', ocdsmerge.merge(releases)),
])

if self.args.versioned:
record['versionedRelease'] = ocdsmerge.merge_versioned(releases)

output['records'].append(record)

self._set_extensions_metadata(output)

self.print(output)
else:
for releases in releases_by_ocid.values():
if self.args.versioned:
merge_method = ocdsmerge.merge_versioned
else:
merge_method = ocdsmerge.merge

merged_release = merge_method(releases)

self.print(merged_release)
7 changes: 3 additions & 4 deletions ocdskit/cli/commands/indent.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import json
import logging
import os.path
from collections import OrderedDict

from .base import BaseCommand

Expand All @@ -27,15 +26,15 @@ def handle(self):
if name.endswith('.json'):
self.indent(os.path.join(root, name))
else:
logger.warn('{} is a directory. Set --recursive to recurse into directories.'.format(file))
logger.warning('{} is a directory. Set --recursive to recurse into directories.'.format(file))

def indent(self, path):
try:
with open(path) as f:
data = json.load(f, object_pairs_hook=OrderedDict)
data = self.json_load(f)

with open(path, 'w') as f:
json.dump(data, f, indent=self.args.indent, separators=(',', ': '))
json.dump(data, f, ensure_ascii=False, indent=self.args.indent, separators=(',', ': '))
f.write('\n')
except json.decoder.JSONDecodeError as e:
logger.error('{} is not valid JSON. (json.decoder.JSONDecodeError: {})'.format(path, e))
3 changes: 1 addition & 2 deletions ocdskit/cli/commands/mapping_sheet.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import copy
import csv
import json
import re
import sys
from collections import OrderedDict
Expand All @@ -15,7 +14,7 @@ class Command(BaseCommand):
help = 'generates a spreadsheet with all field paths from an OCDS schema'

def handle(self):
release = json.load(self.buffer(), object_pairs_hook=OrderedDict)
release = self.json_load(self.buffer())

release = JsonRef.replace_refs(release)

Expand Down

0 comments on commit 45dd51e

Please sign in to comment.