-
Notifications
You must be signed in to change notification settings - Fork 0
/
pr001.Rmd
137 lines (105 loc) · 4.54 KB
/
pr001.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
title: "digitale dramenanalyse"
subtitle: "sample workflow to a standard data set from historical sources"
author: "Stephan Schwarz, FUB"
output:
powerpoint_presentation:
#reference_doc: style01.pptx
# reference_doc: pr001_style.pptx
# reference_doc: Design1.thmx
# reference_doc: pr001_mod.pptx #not wks. have to finalise in powerpoint
date: "2023-07-22"
bibliography: klemm.bib
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(readr)
```
## where:you:at
![](figure-html/hison1.jpeg)
## things i learned at school
![](figure-html/hison0.jpeg)
## intro
I will show you in a few steps an example workflow of how to produce standard documents of historical sources that enable further analysis of (here) dramatic text.
assumed we start with a plain text file, a lot of work had been done by others yet and we can proceed to the TEI refactoring of the text.
but with historical sources (wide sense) this will not be the case, so first will be to transcribe some source of the text, usually a .pdf or collection of .jpegs like the following...
## this:
![bodmer: polytimet, excerpt](figure-html/bodmer_polytimet_1760-0007.jpg)
## transcription
for that purpose you can either transcribe the text manually, from picture to text, or you use e.g. *TRANSKRIBUS* [@read-coop_transkribus_2022], a user friendly framework for OCR (optical character recognition). with that half of the work is done by the engine, but you still have to check the automatic transcription for recognition failures depending on the quality and the typography of the source.
## editing
next step if you have the transcript ready would be to upload the text page by page to *WIKISOURCE* [@wikimedia_wikisource_nodate] where it can be proofread by others. after that you can download the proper version of the text from which we proceed to the TEI.
## TEI:why
- FAIR-Prinzipien für Daten: findable, accessible, interoperable, reusable
- TEI: text encoding initiative [@tei_history_1987]
theres multiple ways of how you can get to the TEI text. one is to wrap text elements which need to be marked up with *OXYGEN*, a powerful XML editor.
## TEI:refactoring
in our drama class, we used a python notebook (*ezdrama*, @skorinkin_ezdrama_2023) written by a friendly geek which allowed easily transforming the preprocessed text into the desired format.
## basic auszeichnung:
``` {r}
#sampletxt<-readLines("sample.txt")
#print(sampletxt)
```
```
@title Ham
@subtitle A tragedy
@author William S
^Dramatis Personae
Ham
Egg
Vikings
#Act 1
##Scene 1
@Ham:
Lovely Spam!
@Egg:
Wonderful Spam!
##Scene 2
$Enter Vikings
@Ham:
Egg, Spam, Sausage, and Bacon!
@Vikings (singing):
Spam, Spam, Spam, Spam, Spam, Spam, Spam, and Spam
$The end
```
## finalise
if all that is done you possess a finalized TEI document which allows further analysis of the drama again e.g. using python or R for network, semantic etc. analyses. 1 platform hosting e.g. the drama plays in question is *dracor.org* [@fischer_programmable_2019], from where you can access the TEI versions, network and other data.
## TEI excerpt
```
<div type="act">
<head>Zweyter Auftritt.</head>
<stage>Aristodem, Polytimet.</stage>
<sp who="#aristodem">
<speaker>Aristodem.</speaker>
<p>Ich komme, den König Polemon bey dir zu melden; er wird gleich bey dir seyn.</p>
</sp>
<sp who="#polytimet">
<speaker>Polytimet.</speaker>
<p>Was ist das für eine Herablassung, daß er zu seinem Gefangenen kommen will? Ich halte es für Edelmuth, und daß er mir so das Unglük, ein Gefangener zu seyn, versüssen wolle. <pb no="7"/>;</p>
</sp>
<sp who="#aristodem">
<speaker>Aristodem.</speaker>
<p>Prinz, deine Dapferkeit, die bey dieser Jugend so gesezt, so bedachtsam ist, verdienet dir die Freundschaft des Helden, wiewol das Glük dir diesmal gefehlt hat.</p>
</sp>
```
## output1
![](figure-html/typetoken.png)
## output2
![@noauthor_gephi_nodate](figure-html/klemm_gephi_vis_002.png)
## e.g. LX basics
```{r echo=F,warning=F}
posdf<-read.csv("lessing-galotti_PoStagged.csv")
library(knitr)
kable(head(posdf))
```
## further processing
the standardised text opens more options
- building a corpus of the text/texts in question: *ANNIS* [@noauthor_annis_nodate]
- annotating/coding the text (corpus)
- textwitness comparison, *LERA* [@pockelmann_lera_2022]
## sources comparison
![LERA output](figure-html/klemm_lera.png)
## todo
![ספר תגי](figure-html/Seferhatagin-0006.jpeg)
## 19
![](figure-html/hison2.jpeg)