From 70760773d2e4cbd30eac13e51a3f8171c20cffed Mon Sep 17 00:00:00 2001 From: rightlag Date: Mon, 7 Aug 2017 22:24:29 -0400 Subject: [PATCH] Update readme.md --- readme.md | 281 +++++++++++++++++++++++++++++------------------------- 1 file changed, 151 insertions(+), 130 deletions(-) diff --git a/readme.md b/readme.md index 2507a72..4c61e24 100644 --- a/readme.md +++ b/readme.md @@ -7,17 +7,52 @@ Coverage Status

-> A module that parses [JSON Schema](http://json-schema.org/) documents to validate client-submitted data and convert JSON schema documents to Avro schema documents. +> Validate client-submitted data using [JSON Schema](http://json-schema.org/) documents and convert JSON Schema documents into different data-interchange formats. + +## Contents + +- [Installation](#installation) +- [Usage](#usage) +- [Data Validation](#data-validation) +- [Data Validation CLI](#data-validation-cli) +- [Data Validation API](#data-validation-api) +- [Structured Messaged Generation](#structured-message-generation) +- [Supported Data-Interchange Formats](#supported-data-interchange-formats) +- [Avro](#avro) +- [Data-Interchange CLI](#data-interchange-cli) +- [Data-Interchange API](#data-interchange-api) +- [Testing](#testing) +- [Additional Resources](#additional-resources) +- [Maintainers](#maintainers) +- [Contributing](#contributing) + +## Why aptos? + +- Validate client-submitted data +- Convert JSON Schema documents into different data-interchange formats +- Simple syntax +- CLI support for data validation and JSON Schema conversion +- [Swagger](https://swagger.io/) specification support + +## Installation + +**via git** + + $ git clone https://github.com/pennsignals/aptos.git && cd aptos + $ python setup.py install ## Usage -`aptos` supports validating client-submitted data and generates Avro structured messages from a given JSON Schema document. +`aptos` supports the following capabilities: + + - **Data Validation:** Validate client-submitted data using [validation keywords](http://json-schema.org/latest/json-schema-validation.html#rfc.section.6) described in the JSON Schema specification. + - **Schema Conversion:** Convert JSON Schema documents into different data-interchange formats. See the list of [supported data-interchange formats](#supported-data-interchange-formats) for more information. ``` usage: aptos [arguments] SCHEMA aptos is a tool for validating client-submitted data using the JSON Schema -vocabulary and converts JSON Schema documents to different data-interchange +vocabulary and converts JSON Schema documents into different data-interchange formats. positional arguments: @@ -29,7 +64,7 @@ optional arguments: Arguments: {validate,convert} validate Validate a JSON instance - convert Convert a JSON Schema to a different data-interchange + convert Convert a JSON Schema into a different data-interchange format More information on JSON Schema: http://json-schema.org/ @@ -38,64 +73,53 @@ More information on JSON Schema: http://json-schema.org/ ## Data Validation -Given a JSON Schema document, `aptos` can validate client-submitted data to ensure that it satisfies a certain number of criteria. +Here is a basic example of a JSON Schema: ```json { - "title": "Product", + "title": "Person", "type": "object", - "definitions": { - "geographical": { - "title": "Geographical", - "description": "A geographical coordinate", - "type": "object", - "properties": { - "latitude": { "type": "number" }, - "longitude": { "type": "number" } - } - } - }, "properties": { - "id": { - "description": "The unique identifier for a product", - "type": "number" - }, - "name": { + "firstName": { "type": "string" }, - "price": { - "type": "number", - "minimum": 0, - "exclusiveMinimum": true - }, - "tags": { - "type": "array", - "items": { - "type": "string" - }, - "minItems": 1, - "uniqueItems": true - }, - "dimensions": { - "title": "Dimensions", - "type": "object", - "properties": { - "length": {"type": "number"}, - "width": {"type": "number"}, - "height": {"type": "number"} - }, - "required": ["length", "width", "height"] + "lastName": { + "type": "string" }, - "warehouseLocation": { - "description": "Coordinates of the warehouse with the product", - "$ref": "#/definitions/geographical" + "age": { + "description": "Age in years", + "type": "integer", + "minimum": 0 } }, - "required": ["id", "name", "price"] + "required": ["firstName", "lastName"] } ``` -Validation keywords such as `uniqueItems`, `required`, and `minItems` can be used in a schema to impose requirements for successful validation of an instance. +Given a JSON Schema, `aptos` can validate client-submitted data to ensure that it satisfies a certain number of criteria. + +JSON Schema [Validation keywords](http://json-schema.org/latest/json-schema-validation.html#rfc.section.6) such as `minimum` and `required` can be used to impose requirements for successful validation of an instance. In the JSON Schema above, both the `firstName` and `lastName` properties are required, and the `age` property *MUST* have a value greater than or equal to 0. + +| Valid Instance :heavy_check_mark: | Invalid Instance :heavy_multiplication_x: | +|-------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------| +| `{"firstName": "John", "lastName": "Doe", "age": 42}` | `{"firstName": "John", "age": -15}` (missing required property `lastName` and `age` is not greater than or equal to 0) | + +`aptos` can validate client-submitted data using either the CLI or the API: + +### Data Validation CLI + + $ aptos validate -instance INSTANCE SCHEMA + +**Arguments:** + + - **INSTANCE:** JSON document being validated + - **SCHEMA:** JSON document containing the description + +| Successful Validation :heavy_check_mark: | Unsuccessful Validation :heavy_multiplication_x: | +|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| +| ![](https://user-images.githubusercontent.com/2184329/29053486-5c787966-7bbe-11e7-8fd3-4cb51d87d7d9.png) | ![](https://user-images.githubusercontent.com/2184329/29053538-afcce9c6-7bbe-11e7-8be5-61ac1d876fc1.png) | + +### Data Validation API ```python import json @@ -106,33 +130,66 @@ from aptos.visitor import ValidationVisitor with open('/path/to/schema') as fp: schema = json.load(fp) -component = SchemaParser.parse('/path/to/schema') -# Valid client-submitted data (instance) +component = SchemaParser.parse(schema) +# Invalid client-submitted data (instance) instance = { - "id": 2, - "name": "An ice sculpture", - "price": 12.50, - "tags": ["cold", "ice"], - "dimensions": { - "length": 7.0, - "width": 12.0, - "height": 9.5 - }, - "warehouseLocation": { - "latitude": -78.75, - "longitude": 20.4 - } + 'firstName': 'John' } -component.accept(ValidationVisitor(instance)) +try: + component.accept(ValidationVisitor(instance)) +except AssertionError as e: + print(e) # instance {'firstName': 'John'} is missing required property 'lastName' ``` ## Structured Message Generation -Given a JSON Schema document, `aptos` can generate Avro structured messages. +Given a JSON Schema, `aptos` can generate different structured messages. + +:warning: **Note:** The JSON Schema being converted *MUST* be a valid [JSON Object](https://spacetelescope.github.io/understanding-json-schema/reference/object.html). + +## Supported Data-Interchange Formats + +| Format | Supported | Notes | +|---------------------------------------------------------------------|:------------------------:|-----------------------------| +| [Apache Avro](https://avro.apache.org/) | :heavy_check_mark: | | +| [Protocol Buffers](https://developers.google.com/protocol-buffers/) | :heavy_multiplication_x: | Planned for future releases | +| [Apache Thrift](https://thrift.apache.org/) | :heavy_multiplication_x: | Planned for future releases | +| [Apache Parquet](https://parquet.apache.org/) | :heavy_multiplication_x: | Planned for future releases | ### Avro -For brevity, the [Product](https://github.com/pennsignals/aptos/blob/master/tests/schema/product) schema is omitted from the example. +Using the `Person` schema in the previous example, `aptos` can convert the schema into the Avro data-interchange format using either the CLI or the API. + +`aptos` maps the following JSON schema types to Avro types: + +| JSON Schema Type | Avro Type | +|------------------|-----------| +| `string` | `string` | +| `boolean` | `boolean` | +| `null` | `null` | +| `integer` | `long` | +| `number` | `double` | +| `object` | `record` | +| `array` | `array` | + +JSON Schema documents containing the `enum` validation keyword are mapped to Avro [`enum`](http://avro.apache.org/docs/current/spec.html#Enums) `symbols` attribute. + +JSON Schema documents with the `type` keyword as an array are mapped to Avro [Union](http://avro.apache.org/docs/current/spec.html#Unions) types. + +## Data-Interchange CLI + + $ aptos convert -format FORMAT SCHEMA + +**Arguments:** + + - **FORMAT:** Data-interchange format + - **SCHEMA:** JSON document containing the description + +

+ +

+ +## Data-Interchange API ```python import json @@ -153,82 +210,46 @@ The above code generates the following Avro schema: ```json { "type": "record", - "name": "Product", "fields": [ { - "type": "double", - "name": "price", - "doc": "" - }, - { + "doc": "", "type": "string", - "name": "name", - "doc": "" - }, - { - "type": { - "type": "record", - "name": "Geographical", - "fields": [ - { - "type": "double", - "name": "latitude", - "doc": "" - }, - { - "type": "double", - "name": "longitude", - "doc": "" - } - ] - }, - "name": "warehouseLocation", - "doc": "Coordinates of the warehouse with the product" + "name": "lastName" }, { - "type": { - "type": "record", - "name": "Dimensions", - "fields": [ - { - "type": "double", - "name": "height", - "doc": "" - }, - { - "type": "double", - "name": "length", - "doc": "" - }, - { - "type": "double", - "name": "width", - "doc": "" - } - ] - }, - "name": "dimensions", - "doc": "" - }, - { - "type": { - "type": "array", - "items": "string" - }, - "name": "tags", - "doc": "" + "doc": "", + "type": "string", + "name": "firstName" }, { - "type": "double", - "name": "id", - "doc": "The unique identifier for a product" + "doc": "Age in years", + "type": "long", + "name": "age" } - ] + ], + "name": "Person" } ``` +## Testing + +All unit tests exist in the [tests](tests) directory. + +To run tests, execute the following command: + + $ python setup.py test + +## Additional Resources + + - [Stop Being a "Janitorial" Data Scientist](https://medium.com/@rightlag/stop-being-a-janitorial-data-scientist-5959cccbeac) - *A blog post explaining why aptos was created* + - [Understanding JSON Schema](https://spacetelescope.github.io/understanding-json-schema/) - *An excellent guide for schema authors, from the [Space Telescope Science Institute](http://www.stsci.edu/portal/)* + ## Maintainers | ![Jason Walsh](https://avatars3.githubusercontent.com/u/2184329?v=3&s=128) | |----------------------------------------------------------------------------| | [Jason Walsh](https://github.com/rightlag) | + +## Contributing + +Contributions welcome! Please read the [`contributing.json`](contributing.json) file first.