# Using Python to Generate XML for Basic Dublin Core

This notebook provides an example of how to go about 
creating Python code that can create valid XML for a
basic DublinCore record. 

The example works through some of the steps to create DublinCore
statements about an edition of Jane Austen's _Pride and Prejudice_
published by Standard Ebooks, which you can find at <https://standardebooks.org/ebooks/jane-austen/pride-and-prejudice>.

## Set up

This example was created using Python 3.12.0 and the `lxml` library. 
The `lxml` library extends the `ETree` library, which is available
as part of the standard python distribution. 
The following is the recommended import for `lxml`.
If it doesn't seem to be working, you can run `pip install lxml` to
add `lxml` (see <https://pypi.org/project/lxml/>). 

In [1]:
from pathlib import Path
try:
    from lxml import etree
    print("running with lxml.etree")
except ImportError:
    import xml.etree.ElementTree as etree
    print("running with Python's xml.etree.ElementTree")
from lxml.builder import ElementMaker

running with lxml.etree


Note that if you see "running with Python's xml.etree.ElementTree"
when you run the above cell, some of the functionalities later on may differ.

## Set up an element structure

This gets ready to build an XML structure using `metadata` as the root element.

In [2]:
ns = {
    'ex': 'http://www.example.org/',
    'dcterms': 'http://purl.org/dc/terms/',
    'rdf': 'https://www.w3.org/1999/02/22-rdf-syntax-ns#',
    'rdfs': 'https://www.w3.org/2000/01/rdf-schema#',
    'xsd': 'http://www.w3.org/2001/XMLSchema#'
}

In [3]:
fpath_simple_dc_record = Path('..','data','simple_dc_record.xml')

if fpath_simple_dc_record.is_file():
    print('You have already saved some DublinCore metadata!')
else:
    print('No file located')

No file located


## Build XML using etree Element and SubElement

Recall that elements are the basic building block of XML.
Elements "know" what child elements belong to them, so they can have
SubElements, so it is possible to write from the root element through
subsequent elements you want to add.

In [4]:
metadata = etree.Element(f'{{{ns['ex']}}}metadata', nsmap=ns)

Now, build up the XML elements by creating SubElements that depend on `metadata`.

In [5]:
# Create title element
title = etree.SubElement(metadata, f'{{{ns['dcterms']}}}title')
title.text = "Pride and Prejudice"

# creator element
creator = etree.SubElement(metadata, f"{{{ns['dcterms']}}}creator")
creator.text = "Jane Austen"

# subjects
subjects = ['fiction', 'novel', 'romance', 'England', 'enemies-to-lovers']
for subject in subjects:
  subj_elem = etree.SubElement(metadata, f'{{{ns['dcterms']}}}subject')
  subj_elem.text = subject

# description
description = etree.SubElement(metadata, f'{{{ns['dcterms']}}}description')
description.text = 'Pride and Prejudice may today be one of Jane Austen’s most enduring novels, having been widely adapted to stage, screen, and other media since its publication in 1813. The novel tells the tale of five unmarried sisters and how their lives change when a wealthy eligible bachelor moves in to their neighborhood.'

# publisher, note usage of .set to add a datatype attribute
publisher = etree.SubElement(metadata, f'{{{ns['dcterms']}}}publisher')
publisher.set(f'{{{ns['rdf']}}}datatype', f'{{{ns['rdfs']}}}Literal')
publisher.text = 'Standard Ebooks'

# date, note usage of set to add datatype but from 
date = etree.SubElement(metadata, f'{{{ns['dcterms']}}}date')
date.set(f'{{{ns['rdf']}}}datatype', f'{{{ns['xsd']}}}date')
date.text = '1813'

# type
type = etree.SubElement(metadata, f'{{{ns['dcterms']}}}type')
type.text = 'Text'

# format
format = etree.SubElement(metadata, f'{{{ns['dcterms']}}}format')
format = 'epub'

Working on your own, construct and add the following to the tree:

```yaml
identifier: 
  - https://www.gutenberg.org/ebooks/42671
  - https://catalog.hathitrust.org/Record/000429470
source: "https://www.gutenberg.org/ebooks/42671"
language: "en"
rights: "https://creativecommons.org/publicdomain/zero/1.0/"
```

# identifier

# source

# language

# rights

Assign the Element to an ElementTree, which creates a full XML object with a root element that can be written to a file:

In [6]:
tree = etree.ElementTree(metadata)
tree.write(fpath_simple_dc_record,
                pretty_print=True,
                xml_declaration=True,
                encoding='utf-8')

if fpath_simple_dc_record.is_file():
    print('wrote your metadata!')

wrote your metadata!


To view the "pretty printed" object, assign it to a variable and use a print statement thus:

In [7]:
XML_metadata_object = etree.tostring(metadata, pretty_print=True, encoding='utf-8')
print(XML_metadata_object.decode(), end='')

<ex:metadata xmlns:ex="http://www.example.org/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:rdf="https://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="https://www.w3.org/2000/01/rdf-schema#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
  <dcterms:title>Pride and Prejudice</dcterms:title>
  <dcterms:creator>Jane Austen</dcterms:creator>
  <dcterms:subject>fiction</dcterms:subject>
  <dcterms:subject>novel</dcterms:subject>
  <dcterms:subject>romance</dcterms:subject>
  <dcterms:subject>England</dcterms:subject>
  <dcterms:subject>enemies-to-lovers</dcterms:subject>
  <dcterms:description>Pride and Prejudice may today be one of Jane Austen’s most enduring novels, having been widely adapted to stage, screen, and other media since its publication in 1813. The novel tells the tale of five unmarried sisters and how their lives change when a wealthy eligible bachelor moves in to their neighborhood.</dcterms:description>
  <dcterms:publisher rdf:datatype="{https://www.w3.org/2000/01/

========== STOP ===========

## Using `lxml` ElementFactory

The following cells uses the `ElementMaker()` function, which is unique
to `lxml` (not present in the general `ElementTree` library),
to set up element profiles linked to specific namespaces:

In [None]:
metadata = ElementMaker()
DCTERM = ElementMaker(namespace=ns['dcterms'], nsmap=ns)

### Build an XML tree

This approach uses the `DCTERM` creator function set up above, which allows you to
create individual term entries.

In [None]:
METADATA = metadata.metadata
TITLE = DCTERM.title

In [None]:
my_dc_metadata_record = METADATA(
    TITLE('Pride and Prejudice')
)

my_dc_metadata_record

In [None]:
etree.tostring(my_dc_metadata_record)

In [None]:
prettyprint(my_dc_metadata_record, indent='  ')