# Working With OU-XML

In this section we will briefly review different ways of working with an OU-XML document, including treating a document as a simple, structured and searchable database, as well as ways of displaying or rendering the XML content.

OU Internal readers might also find the [*Structured Content Tag Guide*](https://learn3.open.ac.uk/mod/oucontent/view.php?id=185747) a useful, and more comprehensive, guide to the OU-XML document structure.

## Treating OU-XML Documents As Databases

*TO DO*

- include simple examples of xpath searching, extracting elements

## Viewing and Rendering XML Elements Using XSLT

One way of generating rendered views of XML content is to use XSLT, a transfomration process in which an XSLT document describes how to transform each node in an XML document, such as an OU-XML document to another form. For example, I have [previously](https://blog.ouseful.info/2019/11/06/text-publishing-workflows-rooted-on-openlearn-ou-xml-via-github-circleci-and-github-pages/) used XSLT to transform an OU-XML document into a set of simple markdown documents that can then be rendered as an interactive HTML textbook using a publishing workflow such as the [Quarto](https://quarto.org/) or [Jupyter Book](https://jupyterbook.org/en/stable/intro.html) publisng workflows.

Whilst the XSLT stylesheet I have previously used expects to find a `<Session>` element as the root element, we can also co-opt the stysleheet to render any collection of elements by definging a dummy root element and then applying stylesheets within that context:

```xml
<xsl:template match="DummyRoot">
    <md>
        <xsl:apply-templates />
    </md>
</xsl:template>
```

As my style sheet was desgined to generate markdown (`.md`) structured content (which can also legitimately include HTML structured content), I nominally use the above transformation to dump the text into an XML `<md>` tag.

Let's import an XML processing package and create a simple utility function to convert a XML object to a text format:

In [1]:
from lxml import etree

def unpack(x, as_str=False):
    """Convenience function to look at the structure of an XML object."""
    return etree.tostring(x) if not as_str else etree.tostring(x).decode()

Define a handle for our XSLT-powered transformations:

In [2]:
xslt_transformer = etree.XSLT(etree.fromstring(open("xslt/ouxml2md.xslt").read()))

Create some example XML to demonstrate the process:

In [3]:
test_xml = etree.XML("""<Activity><Question xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">\n\t\t\t\t\t\t\t\t<Paragraph><language xml:lang="FR">Lisez maintenant le po&#232;me &#224; haute voix et allez ensuite &#233;couter l&#8217;auteur lire son po&#232;me sur Internet, </language><?oxy_delete author="js34827" timestamp="20200630T134829+0100" content="&lt;a href=&quot;http://routes.open.ac.uk/ixbin/hixclient.exe?_IXDB_=routes&amp;amp;_IXSPFX_=g&amp;amp;submit-button=summary&amp;amp;%24+with+res_id+is+res23034=.&quot;&gt;&lt;b&gt;&lt;language xml:lang=&quot;FR&quot;&gt;Paul Fort&#160;: po&#232;me&lt;/language&gt;&lt;/b&gt;&lt;/a&gt;"?><?oxy_insert_start author="js34827" timestamp="20200630T134832+0100"?><a href="https://wheatoncollege.edu/vive-voix/par-auteur/fort/"><b><language xml:lang="FR">Paul Fort&#160;: po&#232;me</language></b></a><?oxy_insert_end?><b><language xml:lang="FR">.</language></b></Paragraph>\n\t\t\t\t\t\t\t</Question></Activity>""")

Now we need to wrap the text XML in a "shim" to which we can apply the transformation process using our previously created XSLT stylesheet:

In [4]:
wrapped_xml= etree.XML("<DummyRoot></DummyRoot>")
wrapped_xml.append(test_xml)

Apply the transformation:

In [5]:
# Apply the XSLT stylesheet
transformed_xml = xslt_transformer(wrapped_xml)

# Convert the genereated XML object to text
md = unpack(transformed_xml.getroot()).decode()

print(md)

<md xmlns:str="http://exslt.org/strings">
<!-- #region tags=["style-activity"] -->

#### Question

Lisez maintenant le po&#232;me &#224; haute voix et allez ensuite &#233;couter l&#8217;auteur lire son po&#232;me sur Internet, [__Paul Fort&#160;: po&#232;me__](https://wheatoncollege.edu/vive-voix/par-auteur/fort/)__.__

<!-- #endregion -->
</md>


Now get rid of the `<md>` wrapper tags, convert the markdown to HTML and render the markdown using IPython display machinery:

In [6]:
from IPython.display import Markdown

# Strip the <md> tags from the text string
md = md.replace('<md xmlns:str="http://exslt.org/strings">', '').replace("</md>", "")

Markdown(md)


<!-- #region tags=["style-activity"] -->

#### Question

Lisez maintenant le po&#232;me &#224; haute voix et allez ensuite &#233;couter l&#8217;auteur lire son po&#232;me sur Internet, [__Paul Fort&#160;: po&#232;me__](https://wheatoncollege.edu/vive-voix/par-auteur/fort/)__.__

<!-- #endregion -->


We can now also convert the markdown to HTML:

In [7]:
from markdown import markdown

# Convert the markdown to HTML
html = markdown(md)

print(html)

<!-- #region tags=["style-activity"] -->

<h4>Question</h4>
<p>Lisez maintenant le po&#232;me &#224; haute voix et allez ensuite &#233;couter l&#8217;auteur lire son po&#232;me sur Internet, <a href="https://wheatoncollege.edu/vive-voix/par-auteur/fort/"><strong>Paul Fort&#160;: po&#232;me</strong></a><strong>.</strong></p>
<!-- #endregion -->


And preview that, agin using the IPython display machinery:

In [8]:
from IPython.display import HTML

# Render the HTML
HTML(html)

What this means is that we can search for and extract elements from our OU-XML documents and then preview those elements as HTML, assuming the stylesheet has appropriate rules defined for the corresponding OU-XML elements. 

## Generating Fully Rendered Output Documents from OU-XML Documents

*TO DO*

- prior examples include: [*OER Text Publishing Workflows Rooted on OpenLearn OU-XML Via Github, CircleCI and Github Pages Using Jupytext and nbSphinx*](https://blog.ouseful.info/2019/11/06/text-publishing-workflows-rooted-on-openlearn-ou-xml-via-github-circleci-and-github-pages/)