# OMX Validator

The OMX validator is a [Jupyter notebook](https://jupyter.org/) hosted in Google Colaboratory. It is an interactive Python environment that validates OMX matrices using the [openmatrix](https://github.com/osPlanning/omx-python) library.  The validator has been tested with the [example](https://github.com/osPlanning/omx/blob/master/example.omx?raw=true) omx file.  OMX files can also be inspected with the [OMX Viewer](https://github.com/osPlanning/omx/wiki/OMX-Viewer).

This notebook is setup to run in the cloud, but it can also be run locally.  To run it locally, download the ipynb file, run it with Jupyter, and skip the upload file step by instead setting **filename** to a local file.  



# Upload File

This step creates a file selector UI control to select a local file for upload.  Google Colab uploads the file and stores it with the Colab notebook Files.  Run the code cell by clicking on the [ ] play button.  The **filename** is then referenced in the next section for reading the OMX file.  While testing, you may need to reset the UI control, which you can do via Runtime + Restart runtime.  Also, you can review the Files uploaded via View + Table of Contents + Files. 

In [1]:
from google.colab import files
import io
uploaded = files.upload()
filename = list(uploaded.keys())[0]
print("File uploaded:", filename)

Saving example.omx to example.omx
File uploaded: example.omx


# Validate Functions

This step installs the openmatrix package from [pypi.org](https://pypi.org/project/OpenMatrix/) and defines the OMX validation functions.

In [35]:
!pip install openmatrix

import openmatrix as omx

def pass_or_fail(ok): 
    return("Pass" if ok else "Fail")
         
def open_file(filename):
    mat_file = omx.open_file(filename, "r")
    print("File contents:", filename)
    print(mat_file)
    return(mat_file)

def check1(mat_file, required=True, checknum=1):
    print('\nCheck 1: Has OMX_VERSION attribute set to 0.2')
    ok = mat_file.root._v_attrs['OMX_VERSION'] == b'0.2'
    print("  File version is 0.2:", pass_or_fail(ok))
    return(ok, required, checknum)

def check2(mat_file, required=True, checknum=2):
    print('\nCheck 2: Has SHAPE array attribute set to two item integer array')
    ok = len(mat_file.root._v_attrs['SHAPE']) == 2
    print("  Length is 2:", pass_or_fail(ok))
    ok_2 = int(mat_file.root._v_attrs['SHAPE'][0]) == mat_file.root._v_attrs['SHAPE'][0]
    print("  First item is integer:", pass_or_fail(ok_2))
    ok_3 = int(mat_file.root._v_attrs['SHAPE'][1]) == mat_file.root._v_attrs['SHAPE'][1]
    print("  Second item is integer:", pass_or_fail(ok_3))
    print('  Shape:', mat_file.shape())
    return(ok * ok_2 * ok_3, required, checknum)

def check3(mat_file, required=True, checknum=3):
    print('\nCheck 3: Has data group for matrices')
    ok = 'data' in map(lambda x: x._v_name, mat_file.list_nodes("/"))
    print("  Group:", pass_or_fail(ok))
    print('  Number of Matrices:', len(mat_file))
    print('  Matrix names:', mat_file.list_matrices())
    return(ok, required, checknum)

def check4(mat_file, required=True, checknum=4):
    print("\nCheck 4: Matrix shape matches file shape")
    ok = True
    for matrix in mat_file.list_matrices():
        ok_2 = (mat_file[matrix].shape == mat_file.root._v_attrs['SHAPE']).all()
        print("  Matrix shape: ", matrix, ":", mat_file[matrix].shape, ":", pass_or_fail(ok_2))
        ok = ok * ok_2
    return(ok, required, checknum)

def check5(mat_file, required=True, checknum=5):
    print('\nCheck 5: Uses common data types (float or int) for matrices')
    ok = True
    for matrix in mat_file.list_matrices():
        ok_2 = (mat_file[matrix].dtype == float) or (mat_file[matrix].dtype == int)
        print("  Matrix: ", matrix, ":", mat_file[matrix].dtype, ":", pass_or_fail(ok_2))
        ok = ok * ok_2
    return(ok, required, checknum)

def check6(mat_file, required=True, checknum=6):
    print('\nCheck 6: Matrices chunked for faster I/O')
    ok = True
    for matrix in mat_file.list_matrices():
        ok_2 = True if mat_file[matrix].chunkshape is not None else False
        print("  Matrix chunkshape: ", matrix, ":", mat_file[matrix].chunkshape, ":", pass_or_fail(ok_2))
        ok = ok * ok_2
    return(ok, required, checknum)

def check7(mat_file, required=False, checknum=7):
    print('\nCheck 7: Uses zlib compression if compression used')
    ok = True
    for matrix in mat_file.list_matrices():
        ok_2 = True if mat_file[matrix].filters.complib is not None else False
        if ok_2:
            ok_3 = mat_file[matrix].filters.complib == 'zlib'
            ok_2 = ok_2 * ok_3
            print("  Matrix compression library and level: ", matrix, ":", mat_file[matrix].filters.complib, ":", mat_file[matrix].filters.complevel, ":", pass_or_fail(ok_2))
        ok = ok * ok_2
    return(ok, required, checknum)

def check8(mat_file, required=False, checknum=8):
    print("\nCheck 8: Has NA attribute if desired (but not required)")
    ok = True
    for matrix in mat_file.list_matrices():
       ok_2 = mat_file[matrix].attrs.__contains__("NA")
       print("  Matrix NA attribute: ", matrix, ":", pass_or_fail(ok_2))
       ok = ok * ok_2
    return(ok, required, checknum)

def check9(mat_file, required=False, checknum=9):
    print('\nCheck 9: Has lookup group for labels/indexes if desired (but not required)')
    ok = 'lookup' in map(lambda x: x._v_name, mat_file.list_nodes("/"))
    print("  Group:", pass_or_fail(ok))
    if ok:
        print('  Number of Lookups:', len(mat_file.list_mappings()))
        print('  Lookups names:', mat_file.list_mappings())
    return(ok, required, checknum)

def check10(mat_file, required=False, checknum=10):
    print("\nCheck 10: Lookup shape matches file shape")
    ok = False
    if 'lookup' in map(lambda x: x._v_name, mat_file.list_nodes("/")):
        ok = True
        for lookup in mat_file.list_mappings():
            ok_2 = len(mat_file.mapping(lookup)) == mat_file.root._v_attrs['SHAPE'][0] or len(mat_file.mapping(lookup)) == mat_file.root._v_attrs['SHAPE'][1]
            print("  Lookup: ", lookup, ":", len(mat_file.mapping(lookup)), ":", pass_or_fail(ok_2))
            ok = ok * ok_2
    return(ok, required, checknum)

def check11(mat_file, required=False, checknum=11):
    print('\nCheck 11: Uses common data types (int or str) for lookups')
    is_int = lambda x: x == int
    ok = False
    if 'lookup' in map(lambda x: x._v_name, mat_file.list_nodes("/")):
        ok = True
        for lookup in mat_file.list_mappings():
            ok_2 = all(map(lambda x: x == int(x), mat_file.mapping(lookup).keys())) or all(map(lambda x: x == str(x), mat_file.mapping(lookup).keys()))
            print("  Lookup: ", lookup, ":", pass_or_fail(ok_2))
            ok = ok * ok_2
    return(ok, required, checknum)

def check12(mat_file, required=False, checknum=12):
    print("\nCheck 12: Has Lookup DIM attribute of 0 (row) or 1 (column) if desired (but not required)")
    print("  Not supported at this time by the Python openmatrix package")
    ok = False
    if 'lookup' in map(lambda x: x._v_name, mat_file.list_nodes("/")):
        ok = False
    return(ok, required, checknum)

def run_checks(filename):
    mat_file = open_file(filename)
    results = []
    results.append(check1(mat_file))
    results.append(check2(mat_file))
    results.append(check3(mat_file))
    results.append(check4(mat_file))
    results.append(check5(mat_file))
    results.append(check6(mat_file))
    results.append(check7(mat_file))
    results.append(check8(mat_file))
    results.append(check9(mat_file))
    results.append(check10(mat_file))
    results.append(check11(mat_file))
    results.append(check12(mat_file))
    print("\nOverall result ")
    overall_ok = True
    for result in results:
        print("  Check", result[2], ":", "Required" if result[1] else "Not required", ":", pass_or_fail(result[0]))
        if result[1]:
          overall_ok = overall_ok * result[0]
    print("  Overall : ", pass_or_fail(overall_ok))



# Validate

This section validates the OMX file against the [Specification](https://github.com/osPlanning/omx/wiki/Specification).  The following checks are run and an overall Pass or Fail is returned at the end.
1.   Has OMX_VERSION attribute set to 0.2
2.   Has SHAPE array attribute set to two item integer array
1.   Has data group for matrices
1.   Matrix shape matches file shape
1.   Uses common data types (float or int) for matrices
1.   Matrices chunked for faster I/O
2.   Uses zlib compression and level 1 if compression used
2.   Has NA attribute if desired (but not required)
2.   Has lookup group for labels/indexes if desired (but not required)
2.   Lookup length matches shape
1.   Uses common data types (int or str) for lookups
2.   Has Lookup DIM attribute of 0 (row) or 1 (column) if desired (but not required)

In [36]:
run_checks(filename)

File contents: example.omx
example.omx (File) ''
Last modif.: 'Sun Dec 15 16:14:02 2019'
Object Tree: 
/ (RootGroup) ''
/data (Group) ''
/data/FARE (CArray(485, 485), shuffle, zlib(1)) 'Fare Transit'
/data/IVT (CArray(485, 485), shuffle, zlib(1)) 'In-vehicle time Transit'
/data/IVTBUS (CArray(485, 485), shuffle, zlib(1)) 'In-vehicle time-TSys(Bus) Transit'
/data/IVTRAIL (CArray(485, 485), shuffle, zlib(1)) 'In-vehicle time-TSys(Rail) Transit'
/data/IVTTRAM (CArray(485, 485), shuffle, zlib(1)) 'In-vehicle time-TSys(Tram) Transit'
/data/OWT (CArray(485, 485), shuffle, zlib(1)) 'Origin wait time Transit'
/data/TRANSFERS (CArray(485, 485), shuffle, zlib(1)) 'Number of transfers Transit'
/data/TWT (CArray(485, 485), shuffle, zlib(1)) 'Transfer wait time Transit'
/lookup (Group) ''
/lookup/NO (Array(485,)) ''


Check 1: Has OMX_VERSION attribute set to 0.2
  File version is 0.2: Pass

Check 2: Has SHAPE array attribute set to two item integer array
  Length is 2: Pass
  First item is integer