### Exporting from Transkribus
When exporting from Transkribus it is at this moment important that you export TEI XML as Client Export, with zones for text regions and lines, inclusive line breaks < l >...< /l >, and only 'structure' as predefined tags.

### Initializing file

In [None]:
import xml.etree.ElementTree as ET

In [60]:
#Choose file to convert
file = 'brochmand_sommer.xml'

In [61]:
#Initialize file as root
tree = ET.parse(file)
root = tree.getroot()

### Structure of TEI XML export from Transkribus
The structure of TEI files exported from Transkribus is based on an initial header followed by a description of every facsimile page in Transkribus with zones for text regions, lines, and words - depending on export settings, and finally the text. Text contains a single < body > element where < pb > marks facsimile changes, < p > marks text regions, and < lg > groups the lines < l >. Every < p > and < l > contains a 'facs' attribute which links back to the description of zones in the facsimile elements.

The below example uses subtype='paragraph' as a case. For additional elements add appropriate lists and iterate using elif. Remember that you will need to go down an extra level when iterating over lines instead of text regions.

In [65]:
#Creates a list with xml:id for zones with the desired attribute.
list_paragraph = []
for facsimile in root[1:len(root)-1]: #Excludes the header and text element
    for surface in facsimile:
        for zone in surface:
            if zone.get('subtype') == 'paragraph':
                list_paragraph.append(zone.get('{http://www.w3.org/XML/1998/namespace}id')) #Note the use of default namespace instead of xml:id

In [63]:
#Generate a string containing the desired element.
text = ''
for element in root[-1][0]:
    if element.get('facs')[1:] in list_paragraph: #Identify <p> elements with desired subtybe
        for line in element[0]: #Iterates over <lg> element (which contains the lines) instead of <p>
            if line.text != None: #Contingency for cases of empty lines in paragraph text regions
                text += zone.text+'\n' #Since the <l> element assumes a linebreak, you might want to add a linebreak when saving to file

### Save as file

In [64]:
with open(file[:-4]+'.txt','w',encoding='utf-8') as f:
    for line in text:
        f.write(line)