# SI 676 - Lab 7

### 1. What is the first-line string that declares an XML document (that is, what is basic syntax of the XML document declaration)?

\<?xml version="1.0" encoding="utf-8" ?>

The question mark in the tag indicates that the tag should be handled directly by the parser rather than be interpreted as part of the DOM, the tag name 'xml' indicates that the tag should be processed as XML, and the two attributes are parameters for that processing--the version can be 1.0 or 1.1 (1.0 handles some characters incorrectly), and the encoding specifies what character encoding the parser should use when reading in the bytes that comprise the rest of the file.

### 2. What is the advantage of aliasing a library? Why import the ElementTree module using import xml.etree.ElemenTree as ET rather than the basic import statement?

Aliasing a library is useful for abbreviating the full path for the library when it's very long, so that you don't have to type it all the way out every time. It also provides a layer of indirection, so that if you need to fail over to another library for some reason you can use the same alias to refer to both libraries.

### 3. Write a code block that loads the EAD finding aid in the course repo (/data/xml/day_20221004_205435_UTC__ead.xml). Parse the tree and extract the archdesc element. What are the subelements? This builds on the assignment we used in class (archDesc = root.find('archdesc')) and then you can develop a loop like for element in archdesc to explore further). (See the section in class exploring the control element.)

In [1]:
import xml.etree.ElementTree as ET

In [2]:
ead_file = "day_20221004_205435_UTC__ead.xml"

tree = ET.parse(ead_file)
root = tree.getroot()

archdesc = root.find("{http://ead3.archivists.org/schema/}archdesc")
for child in archdesc:
    print(child.tag)

{http://ead3.archivists.org/schema/}did
{http://ead3.archivists.org/schema/}scopecontent
{http://ead3.archivists.org/schema/}bioghist
{http://ead3.archivists.org/schema/}accessrestrict
{http://ead3.archivists.org/schema/}userestrict
{http://ead3.archivists.org/schema/}prefercite
{http://ead3.archivists.org/schema/}controlaccess
{http://ead3.archivists.org/schema/}dsc


### 4. How do you work with prefixed namespaces in the ET module? How do you assign prefixes for use within path addresses? How do you assign namespaces for writing out a valid XML with namespace declarations and prefixes?

Functions in the etree library accept a parameter for namespace, which is interpolated into the string parameter when provided in the correct format and prepended to nodes.

In [3]:
ns = {
    "ead": "http://ead3.archivists.org/schema/"
}

In [4]:
control = root.find("ead:control", ns)
print(control)

<Element '{http://ead3.archivists.org/schema/}control' at 0x70154cf15300>


### 5. Write python code that will encode the following DublinCore fields in valid XML. The root tag should be metadata, it should output appropriately namespaced fields (i.e., using dcterms:).

In [5]:
ns["dcterms"] = "http://purl.org/dc/terms"

# Note: the ElementTree factory doesn't seem to allow 
#       string interpolation of namespaces as used above, 
#       so formatted strings are used to set the namespace here instead
metadata = ET.Element(f"{{{ns['dcterms']}}}metadata")
ET.SubElement(metadata, f"{{{ns['dcterms']}}}title").set("Title", "Oldsmobiles Crossing the Mackinac Bridge")
ET.SubElement(metadata, f"{{{ns['dcterms']}}}identifier").set("Identifier", "2017-03-001.007.052")
ET.SubElement(metadata, f"{{{ns['dcterms']}}}source").set("Source", "https://cadl.catalogaccess.com/archives/11662")
ET.SubElement(metadata, f"{{{ns['dcterms']}}}provenance").set("Provenance", "https://www.cadl.org/")
ET.SubElement(metadata, f"{{{ns['dcterms']}}}provenanceStatement").set("Provenance Statement", "Original shared by the Capital Area District Libraries (CADL)")
ET.SubElement(metadata, f"{{{ns['dcterms']}}}creator").set("Creator", "Oldsmobile History Center")
ET.SubElement(metadata, f"{{{ns['dcterms']}}}creator").set("Creator", "https://cadl.catalogaccess.com/people/1320")
ET.SubElement(metadata, f"{{{ns['dcterms']}}}created").set("Created", "Unknown")
ET.SubElement(metadata, f"{{{ns['dcterms']}}}date").set("Date", "1960s/1970s")

ET.register_namespace("dcterms", "http://purl.org/dc/terms")
ET.ElementTree(metadata).write("metadata.xml", xml_declaration=True, encoding="utf-8", method="xml")