# Table Schema extension:  conditional constraints

This Notebook presents an example of implementing conditional constrainst between Fields.

## Example 1
The choosen example is :

| observationType | scientificName |
|-----------------|----------------|
| animal          | Vulpes vulpes  |
| tree            | null           |
| animal          | null           |

The constraint to check is :    

    if the observationType is an animal, the scientificName has to be not null
    
This conditional constraint is applicable to the rows and is validated for the two first rows but not for the last. 

### Proposal

JSON schema proposes two solutions:

- schema composition (keywords `allOf`, `anyOf` and `oneOf`)
- conditional schema (keywords `if`, `then`, `else`)

The Table Schema solution can therefore consist of applying JSON schema rules for each row.

Note: Both JSON schema solutions are equivalent (`if A then B` is equivalent as `B or Not A`)


### Python example
The next cell is the application of the proposal (both equivalent options are included).

Note: The JSON schema uses the `properties` keyword to define the data to check (not used here). 


In [1]:
from frictionless import Resource, Schema

animal = Resource(data=[['observationType', 'scientificName'], 
                        ['animal',  'Vulpes vulpes'], 
                        ['tree',  'null'],
                        ['animal',  'null']
                        ])
schema = { "fields": [
                {"name": "observationType", "type": "string"}, 
                {"name": "scientificName", "type": "string"}], 
           "anyOf": [ 
                {"observationType": { "not": { "const": "animal" }}},
                {"scientificName": { "not": {"const": "null"}}}],
           "if":
                {"observationType": { "const": "animal" }},
           "then":
                {"scientificName": { "not": {"const": "null"}}}
}
animal.schema = Schema.from_descriptor(schema)

### Implementation
A row is represented in Table Schema as a JSON object :

 ```json
    { "observationType": "animal", "scientificName": "Vulpes vulpes" }
 ```
  
The JSON schema applicable to the rows are :

 ```json
    {"anyOf": [ 
            {"properties": {"observationType": { "not": { "const": "animal" }}}},
            {"properties": {"scientificName": { "not": {"const": "null"}}}}]}
 ```            
and 

 ```json
    {"if":
           {"properties": {"observationType": { "const": "animal" }}},
     "then":
           {"properties": {"scientificName": { "not": {"const": "null"}}}}}
 ```          
The implementation proposes to convert the schema into a JSON schema (add `properties` keyword) then apply this JSON schema for each row.

In [2]:
import attrs
import frictionless
import jsonschema
from frictionless import Check, Row
from frictionless.errors import RowError

keywords = ['anyOf', 'properties', 'not', 'const', 'fields', 'anyOf', 'allOf', 'oneOf', 'if', 'then', 'else', 'enum']

def validate(resource):
    checks = [Composition({key:resource.schema.custom[key]}) 
              for key in resource.schema.custom if key in ['allOf', 'anyOf', 'oneOf']]
    if 'if' in resource.schema.custom:
        checks += [Composition({key:resource.schema.custom[key] 
                               for key in resource.schema.custom 
                               if key in ['if', 'then', 'else']})]
    return frictionless.validate(resource, checks=checks)

def add_prop(json_value):
    '''add "properties" keyword for JSON Schema check'''
    if isinstance(json_value, list):
        return [add_prop(val) for val in json_value]
    if isinstance(json_value, dict) and len(json_value) > 1 :
        return {key: add_prop(val) for key, val in json_value.items()}
    if isinstance(json_value, dict) and len(json_value) == 1 :
        key_val = list(json_value)[0]
        if key_val in keywords:
            return {key: add_prop(val) for key, val in json_value.items()}
        return {'properties': {key_val: add_prop(json_value[key_val])}}
    return json_value
    
class CompositionError(RowError):
    title = None
    type = 'Composition'
    description = None

@attrs.define(kw_only=True, repr=False)
class Composition(Check):
    """Check a Composition of schemas"""

    Errors = [CompositionError]

    def __init__(self, descriptor):
        super().__init__()
        self.__composition = add_prop(descriptor)
        self.__descriptor = descriptor 
        
    def validate_row(self, row: Row):        
        try:
            jsonschema.validate(row, self.__composition)
        except Exception:
            note = 'the row is not conform to schema : ' + str(self.__descriptor)[0:15] + '...'
            yield CompositionError.from_row(row, note=note)

### Tests
The validate function detects two errors :

- last Field with `anyOf` keyword,
- last Field with `if` keyword,

In [3]:
validate(animal)

{'valid': False,
 'errors': [],
 'tasks': [{'name': 'memory',
            'type': 'table',
            'valid': False,
            'place': '<memory>',
            'labels': ['observationType', 'scientificName'],
            'stats': {'errors': 2,
                      'seconds': 0.048,
                      'fields': 2,
                      'rows': 3},
            'errors': [{'type': 'Composition',
                        'message': 'Row Error',
                        'tags': ['#table', '#row'],
                        'note': "the row is not conform to schema : {'anyOf': "
                                "[{'ob...",
                        'cells': ['animal', 'null'],
                        'rowNumber': 4},
                       {'type': 'Composition',
                        'message': 'Row Error',
                        'tags': ['#table', '#row'],
                        'note': "the row is not conform to schema : {'if': "
                                "{'observ...",
       

The test with the correct values ("Vulpes velox" for the last row) does not detect any errors.

In [4]:
animal_2 = Resource(data=[['observationType', 'scientificName'], 
                          ['animal',  'Vulpes vulpes'], 
                          ['tree',  'null'],
                          ['animal',  'Vulpes velox']
                         ])
animal_2.schema = Schema.from_descriptor(schema)

validate(animal_2)

{'valid': True,
 'errors': [],
 'tasks': [{'name': 'memory',
            'type': 'table',
            'valid': True,
            'place': '<memory>',
            'labels': ['observationType', 'scientificName'],
            'stats': {'errors': 0,
                      'seconds': 0.016,
                      'fields': 2,
                      'rows': 3},
            'errors': []}]}

## Example 2

The choosen example is :

| measurementType | measurementValue |
|-----------------|------------------|
| cloudiness      | partly cloudy    |
| temperature     | 15               |
| wind force      | 4                |
| wind force      | high             |
| temperature     | -2               |
| cloudiness      | very very cloudy |

The constraint to check is :    

    measurementType: can have the values cloudiness, temperature and wind force
    measurementValue: which has actual measurement, with a type and values that depend on measurementType:

    if measurementType = cloudiness then measurementValue:
        type = string
        constraints.enum = ["clear", "mostly clear", "partly cloudy", "mostly cloudy", "cloudy", "unknown"]
    If measurementType = temperature then measurementValue:
        type = number
        constraints.min = 0
        constraints.max = 20
    If measurementType = wind force then measurementValue:
        type = integer
        constraints.enum = [0, 1, 2, 3, 4, 5]
  
 This conditional constraints are applicable to the rows and are validated for the three first rows but not for the three last. 

### Python representation

In [5]:
meteo = Resource(data=[['measurementType', 'measurementValue'],
                       ['cloudiness',      'partly cloudy'   ],
                       ['temperature',     15                ],
                       ['wind force',      4                 ],
                       ['wind force',      'high'            ],
                       ['temperature',     -2                ],
                       ['cloudiness',      'very very cloudy']
                        ])

schema = { "fields": [
                {"name": "measurementType", "type": "string", "enum": ['cloudiness', 'temperature', 'wind force']},
                {"name": "measurementValue", "type": "any"}], 
           "allOf": [ 
               {"if":
                    {"measurementType": { "const": "cloudiness" }},
                "then":
                    {"measurementValue": { "type" : "string", 
                          "enum" : ["clear", "mostly clear", "partly cloudy", "mostly cloudy", "cloudy", "unknown"]}}},
               {"if":
                    {"measurementType": { "const": "temperature" }},
                "then":
                    {"measurementValue": { "type" : "number", "minimum" : 0, "maximum": 20}}},
               {"if":
                    {"measurementType": { "const": "wind force" }},
                "then":
                    {"measurementValue": { "type" : "integer", "enum" : [0, 1, 2, 3, 4, 5]}}}
               ]      
         
         }
meteo.schema = Schema.from_descriptor(schema)

### Tests
The validate function detects three errors (last three rows)

In [6]:
validate(meteo)

{'valid': False,
 'errors': [],
 'tasks': [{'name': 'memory',
            'type': 'table',
            'valid': False,
            'place': '<memory>',
            'labels': ['measurementType', 'measurementValue'],
            'stats': {'errors': 3,
                      'seconds': 0.05,
                      'fields': 2,
                      'rows': 6},
            'errors': [{'type': 'Composition',
                        'message': 'Row Error',
                        'tags': ['#table', '#row'],
                        'note': "the row is not conform to schema : {'allOf': "
                                "[{'if...",
                        'cells': ['wind force', 'high'],
                        'rowNumber': 5},
                       {'type': 'Composition',
                        'message': 'Row Error',
                        'tags': ['#table', '#row'],
                        'note': "the row is not conform to schema : {'allOf': "
                                "[{'if...",
  

The test with the correct values does not detect any errors.

In [7]:
meteo_2 = Resource(data=[['measurementType', 'measurementValue'],
                         ['cloudiness',      'partly cloudy'   ],
                         ['temperature',     15                ],
                         ['wind force',      4                 ],
                         ['wind force',      0                 ],
                         ['temperature',     2.5               ],
                         ['cloudiness',      'cloudy'          ]
                        ])
meteo_2.schema = Schema.from_descriptor(schema)

validate(meteo_2)

{'valid': True,
 'errors': [],
 'tasks': [{'name': 'memory',
            'type': 'table',
            'valid': True,
            'place': '<memory>',
            'labels': ['measurementType', 'measurementValue'],
            'stats': {'errors': 0,
                      'seconds': 0.05,
                      'fields': 2,
                      'rows': 6},
            'errors': []}]}