# Extract speaker information and insert it into an event file

There are still errors when initializing the resource tree with metadata of dgd revision 1233. 
This notebook collects testing scripts. The relevant functions are tested with a minimal set of example metadata resources of the PF corpus.

In [14]:
from batchxslt import cmdiresource

Check the documentation for the relevant class

In [17]:
cmdiresource.ResourceTreeCollection?

In [22]:
resource = cmdiresource.ResourceTreeCollection('../testing/corpus/', '../testing/event/', '../testing/speakers/', '/home/kuhn/IDS/Repos/svn/dgd2_data_rev1233/dgd2_data/transcripts/')


events for CMDI_PF--_S_00001: 
('PF', 'CMDI_PF--_S_00001')


In [21]:
for edge in resource.edges_iter():
    print edge

('CMDI_PF--_E_00001', 'PF--_S_00001')
('CMDI_PF--_E_00001', 'PF--_S_00001')
('CMDI_PF--_E_00001', 'PF--_E_00001_SE_01_A_01_DF_01')
('PF', 'CMDI_PF--_E_00001')
('PF', 'CMDI_PF--_S_00001')
('AGD_root', 'PF')


In [18]:
for node in resource.nodes_iter():
    if resource.node.get(node).get('type') != 'transcript':
        print node

CMDI_PF--_E_00001
CMDI_PF--_S_00001
PF
AGD_root
PF--_E_00001_SE_01_A_01_DF_01
PF--_S_00001


In [19]:
# check 'find_events'
for node in resource.nodes_iter():
    if resource.node.get(node).get('type') != 'transcript':
        resource.find_events(node)

events for CMDI_PF--_E_00001: 
('PF', 'CMDI_PF--_E_00001')
events for CMDI_PF--_S_00001: 
('PF', 'CMDI_PF--_S_00001')
events for PF: 
('AGD_root', 'PF')
events for AGD_root: 
events for PF--_E_00001_SE_01_A_01_DF_01: 
('CMDI_PF--_E_00001', 'PF--_E_00001_SE_01_A_01_DF_01')
events for PF--_S_00001: 
('CMDI_PF--_E_00001', 'PF--_S_00001')
('CMDI_PF--_E_00001', 'PF--_S_00001')


In [6]:
# check 'speaker2event'
for node in resource.nodes_iter():
    if resource.node.get(node).get('type') == 'speaker':
        resource.speaker2event(node)
        print resource.node.get(node).get('etreeobject')

events for CMDI_PF--_S_00001: 
('PF', 'CMDI_PF--_S_00001')
<lxml.etree._ElementTree object at 0x7f7b300576c8>


In [7]:
for node in resource.nodes_iter():
    if resource.node.get(node).get('type') == 'speaker':
        print "in edges:"
        print resource.in_edges(node)
        print "out edges:"
        print resource.out_edges(node)

in edges:
[('PF', 'CMDI_PF--_S_00001')]
out edges:
[]


**Check what is buggy when initialzing the resource tree** 

### Are the sessions of an event a speaker takes part in identified?

In [9]:
resource.find_eventsessions('CMDI_PF--_S_00001')

['PF--_E_00001_SE_01']

### Are speakers being listed in an event identified?

In [10]:
resource.find_speakers('CMDI_PF--_E_00001')

['PF--_S_00001']

### Is relevant speaker metadata retrieved correctly?

In [12]:
resource.get_speaker_data('CMDI_PF--_S_00001')

{'Alias': '<Alias xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Nicht vorhanden</Alias>\n        ',
 'Components': '<Components xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">\n        <InEvents>\n            <EventSession>PF--_E_00001_SE_01</EventSession>\n        </InEvents>\n        <Name>Nicht dokumentiert</Name>\n        <Alias>Nicht vorhanden</Alias>\n        <TranscriptID>S2</TranscriptID>\n        <Sex xml:lang="deu">Weiblich</Sex>\n        <DateOfBirth>1944-01-01</DateOfBirth>\n        <Education xml:lang="deu">Nicht vorhanden</Education>\n        <Profession xml:lang="deu">Sch&#252;lerin (Mittelschule)</Profession>\n        <Ethnicity xml:lang="deu">Nicht dokumentiert</Ethnicity>\n        <Nationality xml:lang="deu">Nicht dokumentiert</Nationality>\n        <LocationData>\n            <Location>\n                <LocationType xml:lang="deu">Geburtsort</LocationType>\n                <Grid>2319 ; 2419</Grid>\n                <Country xml:lang="deu">Nicht dokumen

### Show all successor nodes of the example event resource

In [25]:
resource.neighbors('CMDI_PF--_E_00001')

['PF--_S_00001', 'PF--_E_00001_SE_01_A_01_DF_01']