2019-Aug-06, Tuesday

Notes from<br>
https://www.w3schools.com/xml/default.asp
https://docs.python.org/3.6/library/xml.etree.elementtree.html

- XML = eXtensible Markup Language

- XML was designed to store and transport data.

- Both human- and machine-readable.

- Stores data in plain text format, which is independent of software and hardware.

- XML is a hierarchical data format, and the most natural way to represent it is with a tree. 

- XML docs form a tree structure, which starts at the "root" and branches out to the "leaves".

Example of XML tree structure:
<img src="nodetree.gif">
From: https://www.w3schools.com/xml/xml_tree.asp

```xml
<root>
    <child>
        <subchild> ... </subchild>
    </child>
</root>

```

Example:
```xml
<?xml version="1.0, encoding="UTF-8"?>
<bookstore>
    <book category="cooking">
        <title lang="en">Everyday Italian</title>
        <author>Giada De Laurentiis</author>
        <year>2005</year>
        <price>30.00</price>
    </book>
    <book category="children">
        <title lang="en">Harry Potter</title>
        <author>J K Rowling</author>
        <year>2005</year>
        <price>29.99</price>
    </book>
    <book category="web">
        <title lang="en">Learning XML</title>
        <author>Erik T Ray</author>
        <year>2003</year>
        <price>39.95</price>
    </book>
</bookstore>
```

- XML docs are formed as element trees, as shown above.

- An XML tree starts at a root element, and branches to child elements.

- All elements can have sub-elements. I.e. a child element can have a subchild element, which can have a subsubchild element, and so on...

- The terms _parent_, _child_, and _sibling_ are used to describe the relationships between elements. Siblings are children on the same level.

- All elements can have text content and attributes. E.g. title has title _Harry Potter_ and attribute _category="children"_.

At the top of an xml file, you'll see something like this:
```xml
<?xml version="1.0, encoding="UTF-8"?>
```
This defines the XML version and character encoding, and is called the XML prolog (sp?).

- In the above example, the root element is ```<bookstore>```.

- Next, there are 3 ```<book>``` elementsm each of which have the following children: ```<title>```, ```<author>```, ```<year>```, ```<price>```.

- Each ```<element>``` ends with ```</element>``` (which is called a closing tag). This is very important!

- XML tags are case sensitive.

- All elements must be properly nested within each other.

- Attribute values must always be quoted, with either single or double quotes.
    - Suppose the attribute itself contains doubles quotes, then do either of the following:
        - ```<gangster name='George "Shotgun" Ziegler'>```
        - ```<gangster name='George &quot;Shotgun&quot; Ziegler'>```

- Attributes cannot contain multuple values, but elements can.

- Attributes cannot contain tree structures, but elements can.

- Some characters have a special meaning in XML E.g. "<".<br>
    - The following will generate an error:
    ```<message>salaray < 1000</message>```
    - Use the following instead:
    ```<message>salaray &lt; 1000</message>```
    - There are 5 repre-defined entity references in XML:<br>
        - < (less than) = ```&lt;```
        - \> (greater than) = ```&gt;```
        - & (ampersand)  = ```&amp;```
        - ' (apostrophe) = ```&apos;```
        - " (quotation mark) = ```&quot```;

- Comments in XML:
```<!--This is a comment-->```<br>
Note that 2 dashes in the middle of a comment are not allowed. This means that the following is NOT ALLOWED.
```<!--This is an invalid -- comment-->```<br>

- Whitespace is preserved in XML.

- An element with no content is stb empty:
```<element></element>```

- You can also use a seld closing tag:
```<element />```

- Element names cannot contain spaces. Any name can be used, except xml (and variations of it).

- XML elements are extensible, i.e. can be extended to carry more information. This means that given an xml file, if you have written some sort of code to extract some data from it, this code will still work after more elements are added to the xml file. In other words, an XML file can be extended without breaking applications.

Name conflicts:

- If different XML docs, that have same element names, are mixed together, we will get a name conflict.

- Consider the following XML that has HTML information:
<font color=red>I'm not really sure what it means for an XML to have HTML info...</font>
```xml
<table>
    <tr>
        <td>Apples</td>
        <td>Bananas</td>
    </tr>
</table>
```

- Now consider the following XML which has info about a table (i.e. furniture):
```xml
<table>
    <name>African Coffee Table</name>
    <width>80</width>
    <length>80</length>
</table>
```

- If these XML fragments were added together, there would be a name conflict, since both contain the element ```<table>```, although in both cases it has different content and meaning.

- How to solve name conflict? Use a prefix!

- The first XML fragment can be written as:
```xml
<h:table>
    <h:tr>
        <h:td>Apples</h:td>
        <h:td>Bananas</h:td>
    </h:tr>
</h:table>
```

- And the second XML fragment can be written as:
```xml
<f:table>
    <f:name>African Coffee Table</f:name>
    <f:width>80</f:width>
    <f:length>80</f:length>
</f:table>
```

- So, for practical purposes, upon mixing these two XML fragments, there will be no error, because the elements ```<table>``` have different names.

XML Namespaces - The xmlns attribute

- While using prefixes in XML, a __namespace__ for the prefix must be defined.

- The namespace can be defined by an __xmlns__ attribute in the start tag of an element.

- Syntax: ```xmlns:prefix="URI"```

- Example:

```xml
<root>

<h:table xmlns:h="https://www.w3.org/TR/html4/">
    <h:tr>
        <h:td>Apples</h:td>
        <h:td>Bananas</h:td>
    </h:tr>
</h:table>

<f:table xmlns:f="https://www/w3schools.com/furniture">
    <f:name>African Coffee Table</f:name>
    <f:width>80</f:width>
    <f:length>80</f:length>
</f:table>

</root>
```

- Above, the xmlns attribute in the first ```<table>``` element gives the ```h:``` prefix a qualified namespace.

- Similarly, the xmlns attribute in the second ```<table>``` element gives the ```f:``` prefix a qualified namespace.

- When a namespace is defined for an element, all child elements with the same prefix are associated with the same namespace.

- Namespaces can also be declared in the XML root element, like in the following example:

```xml
<root xmlns:h="https://www.w3.org/TR/html4/"
      xmlns:f="https://www/w3schools.com/furniture">

<h:table>
    <h:tr>
        <h:td>Apples</h:td>
        <h:td>Bananas</h:td>
    </h:tr>
</h:table>

<f:table>
    <f:name>African Coffee Table</f:name>
    <f:width>80</f:width>
    <f:length>80</f:length>
</f:table>

</root>
```

- The purpose of using an URI is to give the namespace a unique name.

- Often the namespace is used as a pointer to a web page containing namespace info.