A powerful schema validator!
PHP
Latest commit 03b0af5 Feb 29, 2016 @romaricdrigon Migrated to Atoum 1.2

README.md

MetaYaml

Latest Stable Version Build Status SensioLabsInsight

A [put your file type here] schema validator using [put another file type here] files.
At the moment, file type can be Json, Yaml, or XML. It can generate a documentation about the schema, or a XSD file (experimental).

The name comes from the fact that it was initially made to implement a pseudo-schema for Yaml files.

  1. Installation
  1. Basic usage
  2. How to write a schema
  1. Documentation generator
  2. Notes on XML support
  3. XSD generator
  4. Test
  5. Extending
  6. Thanks

Installation

It is a standalone component:

  • the core requires PHP >= 5.3.3
  • to use the YamlLoader, you will need the Symfony component Yaml (standalone component, does not require Symfony2)
  • to launch the tests, you'll need atoum

To install via composer just do composer require romaricdrigon/metayaml

Basic usage

You have to create a MetaYaml object, and then pass it both the schema and your data as multidimensional php arrays:

use RomaricDrigon\MetaYaml\MetaYaml;

// create object, load schema from an array
$schema = new MetaYaml($schema);

/*
    you can optionally validate the schema
    it can take some time (up to a second for a few hundred lines)
    so do it only once, and maybe only in development!
*/
$schema->validateSchema(); // return true or throw an exception

// you could also have done this at init
$schema = new MetaYaml($schema, true); // will load AND validate the schema

// finally, validate your data array according to the schema
$schema->validate($data); // return true or throw an exception

You can use any of the provided loaders to obtain these arrays (yep, you can validate XML using a schema from an Yaml file!).

Some loader examples:

use RomaricDrigon\MetaYaml\MetaYaml;
use RomaricDrigon\MetaYaml\Loader\YamlLoader;
use RomaricDrigon\MetaYaml\Loader\XmlLoader; // JsonLoader is already available

// create one loader object
$loader = new JsonLoader(); // Json (will use php json_decode)
$loader = new YamlLoader(); // Yaml using Symfony Yaml component
$loader = new XmlLoader(); // Xml (using php SimpleXml)

// the usage is the same then
$array = $loader->load('SOME STRING...');
// or you can load from a file
$array = $loader->loadFromFile('path/to/file');

How to write a schema

Introduction

A schema file will define the array structure (which elements are allowed, where), some attributes (required, can be empty...) and the possible values for these elements (or their type).

Here's a simple example of a schema, using Yaml syntax:

root: # root is always required (note no prefix here)
    _type: array # each element must always have a '_type'
    _children: # array nodes have a '_children' node, defining their children
        flowers:
            _type: array
            _required: true # optional, default false
            _children:
                rose:
                    _required: true
                    _type: text
                violet:
                    _type: text
                # -> only rose and violet are allowed children of flowers

And a valid Yaml file :

flowers:
    rose: "a rose"
    violet: "a violet flower"

We will continue with Yaml examples; if you're not familiar with the syntax, you may want to take a look at its Wikipedia page. Of course the same structure is possible with Json or XML, because the core is the same. Take a look at examples in test/data/ folder.

Schema structure

A schema file must have a root node, which will describe the first-level content. You can optionally define a prefix; by default it is _ (_type, _required...).

You have to define a partials node if you want to use this feature (learn more about it below).

A basic schema file:

root:
    # here put the elements who will be in the file
    # note that root can have any type: an array, a number, a prototype...
prefix: my_ # so it's gonna be 'my_type', 'my_required', 'my_children'...
partials:
    block:
        # here I define a partial called block

Schema nodes

Each node in the schema must have a _type attribute. Here I define a node called paragraph whose content is some text:

paragraph:
    _type: text

Those types are available:

  • text: scalar value
  • number: numeric value
  • boolean: boolean value
  • pattern: check if the value matches the regular expression provided in _pattern, which is a PCRE regex
  • enum: enumeration ; list accepted values in _values node
  • array: array; define children in a _children node; array's children must have determined named keys; any extra key will cause an error
  • prototype: define a repetition of items whose name/index is not important. You must give children's type in _prototype node.
  • choice: child node can be any of the nodes provided in _choices. Keys in _choices array are not important (as long as they are unique). In each choice, it's best to put the discriminating field in first.
  • partial: "shortcut" to a block described in partials root node. Provide partial name in _partial

You can specify additional attributes:

  • general attributes:
  • _required: this node must always be defined (by default false)
  • _not_empty for text, number and array nodes: they can't be empty (by default false). Respective empty values are '', 0 (as a string, an integer or a float), array(). To test for null values, use _required instead.
  • _strict with text, number, boolean and enum will enforce a strict type check (respectively, with a string, an integer or a float, a boolean, any of these values). Be careful when using these with a parser which may not be type-aware (such as the XML one; Yaml and json should be ok)
  • _description: full-text description, cf. Documentation generator
  • only for array nodes:
  • _ignore_extra_keys: the node can contain children whose keys are not listed in _children; they'll be ignored
  • only for prototype nodes:
  • min_items: the prototype node should contain at least 'min' elements
  • max_items: the opposite, the max number of elements in the prototype node (by default 200)

Here's a comprehensive example:

root:
    _type: array
    _children:
        SomeText:
            _type: text
            _not_empty: true # so !== ''
        SomeEnum:
            _type: enum
            _values:
                - windows
                - mac
                - linux
        SomeNumber:
            _type: number
            _strict: true
        SomeBool:
            _type: boolean
        SomePrototypeArray:
            _type: prototype
            _prototype:
                _type: array
                _children:
                    SomeOtherText:
                        _type: text
                        _is_required: true # can't be null
        SomeParagraph:
            _type: partial
            _partial: aBlock # cf 'partials' below
        SomeChoice:
            _type: choice
            _choices:
                1:
                    _type: enum
                    _values:
                        - windows
                        - linux
                2:
                    _type: number
                # so our node must be either #1 or #2
        SomeRegex:
            _type: pattern
            _pattern: /e/
partials:
    aBlock:
        _type: array
        _children:
            Line1:
                _type: text

More information

For more examples, look inside test/data folder. In each folder, you have an .yml file and its schema. There's also a XML example.

If you're curious about an advanced usage, you can check data/MetaSchema.json: schema files are validated using this schema (an yep, the schema validates successfully itself!)

Documentation generator

Each node can have a _description attribute, containing some human-readable text. You can retrieve the documentation about a node (its type, description, other attributes...) like this:

// it's recommended to validate the schema before reading documentation
$schema = new MetaYaml($schema, true);

// get documentation about root node
$schema->getDocumentationForNode();

// get documentation about a child node 'test' in an array 'a_test' under root
$schema->getDocumentationForNode(array('a_test', 'test'));

// finally, if you want to unfold (follow) all partials, set second argument to true
$schema->getDocumentationForNode(array('a_test', 'test'), true);
// watch out there's no loop inside partials!

It returns an associative array formatted like this:

array(
    'name' => 'test', // name of current node, root for first node
    'node' => array(
        '_type' => 'array',
        '_children' => ... // and so on
    ),
    'prefix' => '_'
)

If the targeted node is inside a choice, the result will differ slightly:

array(
    'name' => 'test', // name of current node, from the choice key in the schema
    'node' => array(
        '_is_choice' => 'true', // important: so we know next keys are choices
        0 => array(
            '_type' => 'array' // and so on, for first choice
        ),
        1 => array(
            '_type' => 'text' // and so on, for second choice
        ),
        // ...
    ),
    'prefix' => '_'
)

This behavior allow us to handle imbricated choices, without loosing data (you have an array level for each choice level, and you can check the flag _is_choice)

If you pass an invalid path (e.g. no node with the name you gave exist), it will throw an exception.

Notes on XML support

In XML, you can store a value in a node within a child element, or using an attribute. This is not possible in an array; the only way is to use a child.

Thus, the following conventions are enforced by the XML loader:

  • elements AND attributes are stored as child, using element name and content, or attribute name and value, as respectively key and value
  • if a node has an attribute and a child node with the same name, the attribute will be overwritten
  • if a node has both attribute(s) and a text content, text content will be stored under key _value
  • multiple child node with the same name will be overwritten, only the last will be retained; except if they have a _key attribute, which will be used thus
  • namespaces are not supported
  • empty nodes are skipped

Let's take an example:

<fleurs>
    <roses couleur="rose">
        <opera>une rose</opera>
        <sauvage>
            <des_bois>une autre rose</des_bois>
            <des_sous_bois sauvage="oui">encore</des_sous_bois>
        </sauvage>
    </roses>
    <tulipe>je vais disparaitre !</tulipe>
    <tulipe>deuxieme tulipe</tulipe>
    <fleur couleur="violette" sauvage="false" _key="violette">une violette</fleur>
</fleurs>

will give us this array:

array('fleurs' =>
    'roses' => array(
        'couleur' => 'rose',
        'sauvage' => array(
            'des_bois' => 'une autre rose',
            'des_sous_bois' => array(
                'sauvage' => 'oui',
                '_value' => 'encore'
            )
        )
    ),
    'tulipe' => 'deuxieme tulipe',
    'violette' => array(
        'couleur' => 'violette',
        'sauvage' => 'false',
        '_value' => 'une violette'
    )
)

XSD generator

Please note this feature is still experimental!

MetaYaml can try to generate a XML Schema Definition from a MetaYaml schema. You may want to use this file to pre-validate XML input, or to use in another context (client-side...). The same conventions (cf. above) will be used.

Usage example :

use RomaricDrigon\MetaYaml\MetaYaml\XsdGenerator;

// create a XsdGenerator object (requires Php XMLWriter from libxml, enabled by default)
$generator = new XsdGenerator();

// $schema is the source schema, php array
// second parameter to soft-indent generated XML (default true)
$my_xsd_string = $generator->build($schema, true);

A few limitations, some relative to XML Schema, apply:

  • root node must be an array
  • an element can't have a name beginning by a number
  • all first-level nodes will be mandatory (but they may be empty)
  • choice node are not supported
  • pattern may have a slightly different behavior due to implementations differences
  • prototype children nodes type will not be validated
  • strict mode does not exists
  • ignore_extra_keys attribute will cause all children nodes not to be validated

Test

The project is fully tested using atoum. To launch tests, just run in a shell ./bin/test -d test

Extending

You may want to write your own loader, using anything else.
Take a look at any class in Loader/ folder, it's pretty easy: you have to implement the LoaderInterface, and may want to extend Loader class (so you don't have to write loadFromFile()).

Thanks

Thanks to Riad Benguella and Julien Bianchi for their help & advice.

Top