Skip to content
This repository has been archived by the owner on Jul 7, 2022. It is now read-only.

Commit

Permalink
Update readme.md
Browse files Browse the repository at this point in the history
  • Loading branch information
rightlag committed Aug 8, 2017
1 parent facafb6 commit 7076077
Showing 1 changed file with 151 additions and 130 deletions.
281 changes: 151 additions & 130 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,52 @@
<a href="https://coveralls.io/github/pennsignals/aptos"><img src="https://img.shields.io/coveralls/pennsignals/aptos.svg?style=flat-square" alt="Coverage Status"></a>
</p>

> A module that parses [JSON Schema](http://json-schema.org/) documents to validate client-submitted data and convert JSON schema documents to Avro schema documents.
> Validate client-submitted data using [JSON Schema](http://json-schema.org/) documents and convert JSON Schema documents into different data-interchange formats.
## Contents

- [Installation](#installation)
- [Usage](#usage)
- [Data Validation](#data-validation)
- [Data Validation CLI](#data-validation-cli)
- [Data Validation API](#data-validation-api)
- [Structured Messaged Generation](#structured-message-generation)
- [Supported Data-Interchange Formats](#supported-data-interchange-formats)
- [Avro](#avro)
- [Data-Interchange CLI](#data-interchange-cli)
- [Data-Interchange API](#data-interchange-api)
- [Testing](#testing)
- [Additional Resources](#additional-resources)
- [Maintainers](#maintainers)
- [Contributing](#contributing)

## Why aptos?

- Validate client-submitted data
- Convert JSON Schema documents into different data-interchange formats
- Simple syntax
- CLI support for data validation and JSON Schema conversion
- [Swagger](https://swagger.io/) specification support

## Installation

**via git**

$ git clone https://github.com/pennsignals/aptos.git && cd aptos
$ python setup.py install

## Usage

`aptos` supports validating client-submitted data and generates Avro structured messages from a given JSON Schema document.
`aptos` supports the following capabilities:

- **Data Validation:** Validate client-submitted data using [validation keywords](http://json-schema.org/latest/json-schema-validation.html#rfc.section.6) described in the JSON Schema specification.
- **Schema Conversion:** Convert JSON Schema documents into different data-interchange formats. See the list of [supported data-interchange formats](#supported-data-interchange-formats) for more information.

```
usage: aptos [arguments] SCHEMA
aptos is a tool for validating client-submitted data using the JSON Schema
vocabulary and converts JSON Schema documents to different data-interchange
vocabulary and converts JSON Schema documents into different data-interchange
formats.
positional arguments:
Expand All @@ -29,7 +64,7 @@ optional arguments:
Arguments:
{validate,convert}
validate Validate a JSON instance
convert Convert a JSON Schema to a different data-interchange
convert Convert a JSON Schema into a different data-interchange
format
More information on JSON Schema: http://json-schema.org/
Expand All @@ -38,64 +73,53 @@ More information on JSON Schema: http://json-schema.org/

## Data Validation

Given a JSON Schema document, `aptos` can validate client-submitted data to ensure that it satisfies a certain number of criteria.
Here is a basic example of a JSON Schema:

```json
{
"title": "Product",
"title": "Person",
"type": "object",
"definitions": {
"geographical": {
"title": "Geographical",
"description": "A geographical coordinate",
"type": "object",
"properties": {
"latitude": { "type": "number" },
"longitude": { "type": "number" }
}
}
},
"properties": {
"id": {
"description": "The unique identifier for a product",
"type": "number"
},
"name": {
"firstName": {
"type": "string"
},
"price": {
"type": "number",
"minimum": 0,
"exclusiveMinimum": true
},
"tags": {
"type": "array",
"items": {
"type": "string"
},
"minItems": 1,
"uniqueItems": true
},
"dimensions": {
"title": "Dimensions",
"type": "object",
"properties": {
"length": {"type": "number"},
"width": {"type": "number"},
"height": {"type": "number"}
},
"required": ["length", "width", "height"]
"lastName": {
"type": "string"
},
"warehouseLocation": {
"description": "Coordinates of the warehouse with the product",
"$ref": "#/definitions/geographical"
"age": {
"description": "Age in years",
"type": "integer",
"minimum": 0
}
},
"required": ["id", "name", "price"]
"required": ["firstName", "lastName"]
}
```

Validation keywords such as `uniqueItems`, `required`, and `minItems` can be used in a schema to impose requirements for successful validation of an instance.
Given a JSON Schema, `aptos` can validate client-submitted data to ensure that it satisfies a certain number of criteria.

JSON Schema [Validation keywords](http://json-schema.org/latest/json-schema-validation.html#rfc.section.6) such as `minimum` and `required` can be used to impose requirements for successful validation of an instance. In the JSON Schema above, both the `firstName` and `lastName` properties are required, and the `age` property *MUST* have a value greater than or equal to 0.

| Valid Instance :heavy_check_mark: | Invalid Instance :heavy_multiplication_x: |
|-------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|
| `{"firstName": "John", "lastName": "Doe", "age": 42}` | `{"firstName": "John", "age": -15}` (missing required property `lastName` and `age` is not greater than or equal to 0) |

`aptos` can validate client-submitted data using either the CLI or the API:

### Data Validation CLI

$ aptos validate -instance INSTANCE SCHEMA

**Arguments:**

- **INSTANCE:** JSON document being validated
- **SCHEMA:** JSON document containing the description

| Successful Validation :heavy_check_mark: | Unsuccessful Validation :heavy_multiplication_x: |
|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
| ![](https://user-images.githubusercontent.com/2184329/29053486-5c787966-7bbe-11e7-8fd3-4cb51d87d7d9.png) | ![](https://user-images.githubusercontent.com/2184329/29053538-afcce9c6-7bbe-11e7-8be5-61ac1d876fc1.png) |

### Data Validation API

```python
import json
Expand All @@ -106,33 +130,66 @@ from aptos.visitor import ValidationVisitor

with open('/path/to/schema') as fp:
schema = json.load(fp)
component = SchemaParser.parse('/path/to/schema')
# Valid client-submitted data (instance)
component = SchemaParser.parse(schema)
# Invalid client-submitted data (instance)
instance = {
"id": 2,
"name": "An ice sculpture",
"price": 12.50,
"tags": ["cold", "ice"],
"dimensions": {
"length": 7.0,
"width": 12.0,
"height": 9.5
},
"warehouseLocation": {
"latitude": -78.75,
"longitude": 20.4
}
'firstName': 'John'
}
component.accept(ValidationVisitor(instance))
try:
component.accept(ValidationVisitor(instance))
except AssertionError as e:
print(e) # instance {'firstName': 'John'} is missing required property 'lastName'
```

## Structured Message Generation

Given a JSON Schema document, `aptos` can generate Avro structured messages.
Given a JSON Schema, `aptos` can generate different structured messages.

:warning: **Note:** The JSON Schema being converted *MUST* be a valid [JSON Object](https://spacetelescope.github.io/understanding-json-schema/reference/object.html).

## Supported Data-Interchange Formats

| Format | Supported | Notes |
|---------------------------------------------------------------------|:------------------------:|-----------------------------|
| [Apache Avro](https://avro.apache.org/) | :heavy_check_mark: | |
| [Protocol Buffers](https://developers.google.com/protocol-buffers/) | :heavy_multiplication_x: | Planned for future releases |
| [Apache Thrift](https://thrift.apache.org/) | :heavy_multiplication_x: | Planned for future releases |
| [Apache Parquet](https://parquet.apache.org/) | :heavy_multiplication_x: | Planned for future releases |

### Avro

For brevity, the [Product](https://github.com/pennsignals/aptos/blob/master/tests/schema/product) schema is omitted from the example.
Using the `Person` schema in the previous example, `aptos` can convert the schema into the Avro data-interchange format using either the CLI or the API.

`aptos` maps the following JSON schema types to Avro types:

| JSON Schema Type | Avro Type |
|------------------|-----------|
| `string` | `string` |
| `boolean` | `boolean` |
| `null` | `null` |
| `integer` | `long` |
| `number` | `double` |
| `object` | `record` |
| `array` | `array` |

JSON Schema documents containing the `enum` validation keyword are mapped to Avro [`enum`](http://avro.apache.org/docs/current/spec.html#Enums) `symbols` attribute.

JSON Schema documents with the `type` keyword as an array are mapped to Avro [Union](http://avro.apache.org/docs/current/spec.html#Unions) types.

## Data-Interchange CLI

$ aptos convert -format FORMAT SCHEMA

**Arguments:**

- **FORMAT:** Data-interchange format
- **SCHEMA:** JSON document containing the description

<p align="center">
<img src="https://user-images.githubusercontent.com/2184329/29071365-3d6e7504-7c11-11e7-959e-abcfe15f5e96.png" width="600">
</p>

## Data-Interchange API

```python
import json
Expand All @@ -153,82 +210,46 @@ The above code generates the following Avro schema:
```json
{
"type": "record",
"name": "Product",
"fields": [
{
"type": "double",
"name": "price",
"doc": ""
},
{
"doc": "",
"type": "string",
"name": "name",
"doc": ""
},
{
"type": {
"type": "record",
"name": "Geographical",
"fields": [
{
"type": "double",
"name": "latitude",
"doc": ""
},
{
"type": "double",
"name": "longitude",
"doc": ""
}
]
},
"name": "warehouseLocation",
"doc": "Coordinates of the warehouse with the product"
"name": "lastName"
},
{
"type": {
"type": "record",
"name": "Dimensions",
"fields": [
{
"type": "double",
"name": "height",
"doc": ""
},
{
"type": "double",
"name": "length",
"doc": ""
},
{
"type": "double",
"name": "width",
"doc": ""
}
]
},
"name": "dimensions",
"doc": ""
},
{
"type": {
"type": "array",
"items": "string"
},
"name": "tags",
"doc": ""
"doc": "",
"type": "string",
"name": "firstName"
},
{
"type": "double",
"name": "id",
"doc": "The unique identifier for a product"
"doc": "Age in years",
"type": "long",
"name": "age"
}
]
],
"name": "Person"
}
```

## Testing

All unit tests exist in the [tests](tests) directory.

To run tests, execute the following command:

$ python setup.py test

## Additional Resources

- [Stop Being a "Janitorial" Data Scientist](https://medium.com/@rightlag/stop-being-a-janitorial-data-scientist-5959cccbeac) - *A blog post explaining why aptos was created*
- [Understanding JSON Schema](https://spacetelescope.github.io/understanding-json-schema/) - *An excellent guide for schema authors, from the [Space Telescope Science Institute](http://www.stsci.edu/portal/)*

## Maintainers

| ![Jason Walsh](https://avatars3.githubusercontent.com/u/2184329?v=3&s=128) |
|----------------------------------------------------------------------------|
| [Jason Walsh](https://github.com/rightlag) |

## Contributing

Contributions welcome! Please read the [`contributing.json`](contributing.json) file first.

0 comments on commit 7076077

Please sign in to comment.