# Well-formedness

This notebook can be used to evaluate the well-formedness and the validity of XML files. 

The code below firstly checks the well-formedness of a given XML file. The name of the XML file needs to given as a value of the variable named `filename_xml`. The file name needs to be given in quotes.

To execute the code, place your cursor in the cell below and click on [Shift] + [Enter]. Alternatively, with your cursor in the cell containing the code, you can also choose "Run" from the menu at the top of this page. 

In [None]:
from lxml import etree

filename_xml = 'literatureList.xml'


with open(filename_xml, 'r') as xml_file:
    xml_to_check = xml_file.read()
    
try:
    doc = etree.fromstring( str.encode(xml_to_check) )
    print('The document is WELL-FORMED')
except Exception as e:
    print('The document is NOT WELL-FORMED')
    print( str(e) )


# Validity

The code below can check the validity of a given XML against a schema.

The name of the XML file needs to given as the value of the variable named `filename_xml`. 

The URL or the file name of the schema file needs to be given as the value of the variable named `schema`.

This code assumes that the schema is in the [Relax NG](.) format. 

Note that the validation process may take some time. 

In [None]:
from lxml import etree

filename_xml = 'EFBO19101102.xml'
schema = 'http://bookandbyte.universiteitleiden.nl/schema/tei_all.rng'


relaxng_doc = etree.parse(schema)
relaxng = etree.RelaxNG(relaxng_doc)

try:
    doc = etree.parse(filename_xml)
    #relaxng.validate(doc)
    relaxng.assertValid(doc)
    print('The document is valid')
except Exception as e:
    print('The document is NOT valid.')
    print( str(e) )
