# Schema: Document Type Definition (DTD)

Die Document Type Definition ermöglicht die Spezifikation von XML Dokumente und somit das Validieren solcher Dokumente. Man kann mit der DTD Schemas für XML Dokumente entwickeln. So kann man sich unter mehreren Parteien auf ein gemeinsames Vokabular einigen, und die Interoperabilität zwischen entwickelten Systeme ermöglichen oder erhöhen. In dieser Übung schauen wir uns die DTD etwas genauer in der Praxis an. Führen Sie zuerst den folgenden Codeblock aus und machen Sie dann der Reihe nach weiter. Beantworten Sie die Fragen (falls zutreffend). Zum Schluss, schreiben Sie eine eigene DTD und ein exemplarisches XML Dokument dafür. Stellen Sie sicher, dass das XML Dokument wohlgeformt und gültig ist.

In [1]:
import io
from lxml import etree as et

def isvalid(dtd, doc):
    print(et.DTD(io.StringIO(dtd)).validate(et.fromstring(doc)))
    
def exp(doc, path):
    print(et.fromstring(doc).xpath(path))

## Elemente

In [2]:
isvalid('<!ELEMENT discography EMPTY>', '<discography/>')

True


In [4]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums EMPTY>
"""

isvalid(dtd, """
<discography/>
""")

isvalid(dtd, """
<discography>
  <albums/>
</discography>
""")

# Warum ist dieses XML Dokument nicht gültig? Antwort: Weil hier 2xalbum gegeben ist
isvalid(dtd, """
<discography>
  <albums/>
  <albums/>
</discography>
""")

False
True
True


In [5]:
dtd = """
<!ELEMENT discography (albums*)>
<!ELEMENT albums EMPTY>
"""

isvalid(dtd, """
<discography/>
""")

isvalid(dtd, """
<discography>
  <albums/>
</discography>
""")

# Warum ist dieses XML Dokument gültig? Antwort: Wei ich durch den * gesagt habe das, es auch mehrfach vorkommen kann
isvalid(dtd, """
<discography>
  <albums/>
  <albums/>
</discography>
""")

True
True
True


In [6]:
dtd = """
<!ELEMENT discography (albums?)>
<!ELEMENT albums EMPTY>
"""

isvalid(dtd, """
<discography/>
""")

isvalid(dtd, """
<discography>
  <albums/>
</discography>
""")

# Warum ist dieses XML Dokument nicht gültig? Antwort: Mit dem ? gibt es nur 1 oder kein "albums"
isvalid(dtd, """
<discography>
  <albums/>
  <albums/>
</discography>
""")

True
True
False


In [7]:
dtd = """
<!ELEMENT discography (albums+)>
<!ELEMENT albums EMPTY>
"""

# Warum ist dieses XML Dokument nicht gültig? Antwort: Empty und + kollidieren weil + heißt 1 oder mehr.
isvalid(dtd, """
<discography/>
""")

isvalid(dtd, """
<discography>
  <albums/>
</discography>
""")

isvalid(dtd, """
<discography>
  <albums/>
  <albums/>
</discography>
""")

False
True
True


In [8]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album+)>
<!ELEMENT album (title)>
<!ELEMENT title (#PCDATA)>
"""

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title>The Dark Side of the Moon</title>
    </album>
  </albums>
</discography>
""")

True


In [9]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album+)>
<!ELEMENT album (title, label)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT label (#PCDATA)>
"""

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title>The Dark Side of the Moon</title>
      <label>Harvest, EMI</label>
    </album>
  </albums>
</discography>
""")

# Warum ist dieses XML Dokument nicht gültig? Antwort: Label und Title verwechslet. Reihenfolge
isvalid(dtd, """
<discography>
  <albums>
    <album>
      <label>Harvest, EMI</label>
      <title>The Dark Side of the Moon</title>
    </album>
  </albums>
</discography>
""")

True
False


In [12]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album+)>
<!ELEMENT album (title, label)>
<!ELEMENT title (#PCDATA)>
"""

# Warum ist dieses XML Dokument nicht gültig? Antwort: Im Schema fehlt <!ELEMENT label (#PCDATA)>
isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title>The Dark Side of the Moon</title>
      <label>Harvest, EMI</label>
    </album>
  </albums>
</discography>
""")

False


In [13]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album+)>
<!ELEMENT album (#PCDATA | title)*>
<!ELEMENT title (#PCDATA)>
"""

isvalid(dtd, """
<discography>
  <albums>
    <album>The Dark Side of the Moon</album>
  </albums>
</discography>
""")

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title>The Dark Side of the Moon</title>
    </album>
  </albums>
</discography>
""")

True
True


In [14]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album+)>
<!ELEMENT album (title | label)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT label (#PCDATA)>
"""

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title>The Dark Side of the Moon</title>
    </album>
  </albums>
</discography>
""")

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <label>Harvest, EMI</label>
    </album>
  </albums>
</discography>
""")

# Warum ist dieses XML Dokument nicht gültig? Antwort: Mit | erzeugen wir oder. Nicht kein oder beides.
isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title>The Dark Side of the Moon</title>
      <label>Harvest, EMI</label>
    </album>
  </albums>
</discography>
""")

True
True
False


In [15]:
isvalid("""
<!ELEMENT discography (albums)>
<!ELEMENT albums (album*)>
<!ELEMENT album (title, label, released)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT label (#PCDATA)>
<!ELEMENT released (day, month, year)>
<!ELEMENT day (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT year (#PCDATA)>
""", """
<discography>
  <albums>
    <album>
      <title>The Dark Side of the Moon</title>
      <label>Harvest, EMI</label>
      <released>
        <day>16</day>
        <month>03</month>
        <year>1973</year>
      </released>
    </album>
  </albums>
</discography>
""")

True


In [16]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album*)>
<!ELEMENT album (title, label, released?)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT label (#PCDATA)>
<!ELEMENT released ((day, month)?, year)>
<!ELEMENT day (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT year (#PCDATA)>
"""

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title>The Dark Side of the Moon</title>
      <label>Harvest, EMI</label>
      <released>
        <day>16</day>
        <month>03</month>
        <year>1973</year>
      </released>
    </album>
  </albums>
</discography>
""")

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title>The Dark Side of the Moon</title>
      <label>Harvest, EMI</label>
    </album>
  </albums>
</discography>
""")


isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title>The Dark Side of the Moon</title>
      <label>Harvest, EMI</label>
      <released>
        <year>1973</year>
      </released>
    </album>
  </albums>
</discography>
""")

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title>The Dark Side of the Moon</title>
      <label>Harvest, EMI</label>
    </album>
    <album>
      <title>The Wall</title>
      <label>Harvest, EMI</label>
      <released>
        <year>1979</year> 
      </released>
    </album>
  </albums>
</discography>
""")

True
True
True
True


## Attribute

In [17]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album*)>
<!ELEMENT album (title)>
<!ELEMENT title (#PCDATA)>
<!ATTLIST title released CDATA "1973">
"""

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title released="1973">The Dark Side of the Moon</title>
    </album>
  </albums>
</discography>
""")

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title released="1979">The Wall</title>
    </album>
  </albums>
</discography>
""")

True
True


In [20]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album*)>
<!ELEMENT album (title)>
<!ELEMENT title (#PCDATA)>
<!ATTLIST title released CDATA #REQUIRED>
"""

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title released="1973">The Dark Side of the Moon</title>
    </album>
  </albums>
</discography>
""")

# Warum ist dieses XML Dokument nicht gültig? Antwort: Das Attribut released muss gesetzt werden.
isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title>The Dark Side of the Moon</title>
    </album>
  </albums>
</discography>
""")

True
False


In [21]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album*)>
<!ELEMENT album (title)>
<!ELEMENT title (#PCDATA)>
<!ATTLIST title released CDATA #IMPLIED>
"""

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title released="1973">The Dark Side of the Moon</title>
    </album>
  </albums>
</discography>
""")

# Warum ist dieses XML Dokument gültig? Antwort: Nun ist released Optional
isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title>The Dark Side of the Moon</title>
    </album>
  </albums>
</discography>
""")

True
True


In [22]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album*)>
<!ELEMENT album (title)>
<!ELEMENT title (#PCDATA)>
<!ATTLIST title released CDATA #FIXED "1973">
"""

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title released="1973">The Dark Side of the Moon</title>
    </album>
  </albums>
</discography>
""")

# Warum ist dieses XML Dokument nicht gültig? Antwort: Das Attribut ist Optional, aber wenn verhanden, mit 1973.
isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title released="1979">The Wall</title>
    </album>
  </albums>
</discography>
""")

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title>The Wall</title>
    </album>
  </albums>
</discography>
""")

True
False
True


In [23]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album*)>
<!ELEMENT album (title)>
<!ELEMENT title (#PCDATA)>
<!ATTLIST title released (1973 | 1979) #REQUIRED>
"""

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title released="1973">The Dark Side of the Moon</title>
    </album>
  </albums>
</discography>
""")

# Warum ist dieses XML Dokument gültig? Antwort: Hier darf released 1973 1979 annehmen. Nicht aber 1982.
isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title released="1973">The Dark Side of the Moon</title>
    </album>
    <album>
      <title released="1979">The Wall</title>
    </album>
  </albums>
</discography>
""")

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title released="1982">The Wall</title>
    </album>
  </albums>
</discography>
""")

True
True
False


In [24]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album*)>
<!ELEMENT album (title)>
<!ELEMENT title (#PCDATA)>
<!ATTLIST title identifier ID #REQUIRED>
"""

isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title identifier="p1">The Dark Side of the Moon</title>
    </album>
    <album>
      <title identifier="p2">The Wall</title>
    </album>
  </albums>
</discography>
""")

# Warum ist dieses XML Dokument nicht gültig? Antwort: Die ID wurde mehrfach vergeben.
isvalid(dtd, """
<discography>
  <albums>
    <album>
      <title identifier="p1">The Dark Side of the Moon</title>
    </album>
    <album>
      <title identifier="p1">The Wall</title>
    </album>
  </albums>
</discography>
""")

True
False


In [25]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album*)>
<!ELEMENT album EMPTY>
<!ATTLIST album title CDATA #REQUIRED>
<!ATTLIST album released CDATA #IMPLIED>
"""

isvalid(dtd, """
<discography>
  <albums>
    <album title="The Dark Side of the Moon" released="1973"/>
  </albums>
</discography>
""")

isvalid(dtd, """
<discography>
  <albums>
    <album title="The Dark Side of the Moon"/>
  </albums>
</discography>
""")

# Warum ist dieses XML Dokument nicht gültig? Antwort: Title is required
isvalid(dtd, """
<discography>
  <albums>
    <album released="1973"/>
  </albums>
</discography>
""")

True
True
False


In [26]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album*)>
<!ELEMENT album EMPTY>
<!ATTLIST album title CDATA #REQUIRED
                released CDATA #IMPLIED>
"""

isvalid(dtd, """
<discography>
  <albums>
    <album title="The Dark Side of the Moon" released="1973"/>
  </albums>
</discography>
""")

True


## Entitäten

In [27]:
dtd = """
<!ELEMENT discography (albums)>
<!ELEMENT albums (album*)>
<!ELEMENT album (#PCDATA)>
"""

doc = """
<!DOCTYPE discography [
<!ENTITY waters "Roger Waters">
]>
<discography>
  <albums>
    <album>&waters;</album>
  </albums>
</discography>
"""

isvalid(dtd, doc)

# Warum ergibt dies 'Roger Waters'? Antwort: Weil der Name der Entität "waters" ist.
exp(doc, '/discography/albums/album/text()')

True
['Roger Waters']


## Namensräume

In [28]:
dtd = """
<!ELEMENT disc:discography (albs:albums)>
<!ELEMENT albs:albums (albs:album*)>
<!ELEMENT albs:album EMPTY>
<!ATTLIST disc:discography xmlns:disc CDATA #FIXED "http://discography.org">
<!ATTLIST disc:discography xmlns:albs CDATA #FIXED "http://albums.org">
<!ATTLIST albs:album title CDATA #REQUIRED>
<!ATTLIST albs:album released CDATA #REQUIRED>
"""

doc = """
<disc:discography xmlns:disc="http://discography.org" xmlns:albs="http://albums.org">
<albs:albums>
<albs:album title="The Dark Side of the Moon" released="1973"/>
</albs:albums>
</disc:discography>
"""

isvalid(dtd, doc)

True


Denken Sie sich nun ein eigenes XML Dokument aus und erstellen Sie dafür eine DTD.