## Batch transform TEI files to HTML

### Step 1 - Load the XSLT stylesheet. 
The transform.xsl file (available in the "/xsl" folder of the project files) defines how the our TEI XML will be converted into HTML output. We will use lxml to parse the stylesheet and turn it into an XSLT transform object. 

This step prepares a reusable object that we will apply to batch transform multiple TEI files.

In [15]:
from pathlib import Path
from lxml import etree
home = str(Path.home())

xsl_dir = home + "/romantic_poets_project/xsl"
xsl_file = xsl_dir + "/transform.xsl"

xslt_root = etree.parse(xsl_file)
transform = etree.XSLT(xslt_root)

### Step 2 - Prepare input and output paths. 
We will define the variables to locate the TEI XML files and create our destination directory for the transformed HTML files.
The input TEI directory contains the TEI-encoded poems. Our code will write the transformed HTML files to the output directory. 

If the output directory does not already exist, this code will create it for us. 
The files are separated to prevent accidental modifications of any of the TEI source files. 

In [24]:
tei_dir = Path(home + "/romantic_poets_project/tei_files")
out_dir = Path(home + "/romantic_poets_project/html")

out_dir.mkdir(exist_ok=True)

### Step 3 - Batch transform the TEI into HTML.
We'll use a for loop to iterate through all of the TEI files in our TEI directory.
For each file, the code will: 
1. Parse the TEI XML with etree.
2. Apply the XSLT we created in step 1.
3. Write the HTML files to the output directory using the .stem method to ensure consistent file names. 

In [35]:
import glob

for xml_file in xml_dir.glob("*.xml"):
    tei_doc = etree.parse(str(xml_file))
    transformed = transform(tei_doc)

    out_file = out_dir / (xml_file.stem + ".html") 
    out_file.write_text(str(transformed), encoding="utf-8") 

The HTML files should now be available in the /html output directory. The script can be reused for any new TEI files added to the TEI folder.

The script tei2html.py functions exactly the same as this notebook. 
**Note**: if the script fails, the problem is most likely with the TEI files themselves; tag mismatches or unclosed elements are a common culprit. 