pr001.Rmd

---
title: "digitale dramenanalyse"
subtitle: "sample workflow to a standard data set from historical sources"
author: "Stephan Schwarz, FUB"
output: 
  powerpoint_presentation:
    #reference_doc: style01.pptx
   # reference_doc: pr001_style.pptx
  #  reference_doc: Design1.thmx
   #  reference_doc: pr001_mod.pptx #not wks. have to finalise in powerpoint

date: "2023-07-22"
bibliography: klemm.bib
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(readr)
```

## where:you:at
![](figure-html/hison1.jpeg)


## things i learned at school
![](figure-html/hison0.jpeg)

## intro
I will show you in a few steps an example workflow of how to produce standard documents of historical sources that enable further analysis of (here) dramatic text.     
assumed we start with a plain text file, a lot of work had been done by others yet and we can proceed to the TEI refactoring of the text.   
but with historical sources (wide sense) this will not be the case, so first will be to transcribe some source of the text, usually a .pdf or collection of .jpegs like the following...

## this:
![bodmer: polytimet, excerpt](figure-html/bodmer_polytimet_1760-0007.jpg)

## transcription
for that purpose you can either transcribe the text manually, from picture to text, or you use e.g. *TRANSKRIBUS* [@read-coop_transkribus_2022], a user friendly framework for OCR (optical character recognition). with that half of the work is done by the engine, but you still have to check the automatic transcription for recognition failures depending on the quality and the typography of the source.

## editing
next step if you have the transcript ready would be to upload the text page by page to *WIKISOURCE* [@wikimedia_wikisource_nodate] where it can be proofread by others. after that you can download the proper version of the text from which we proceed to the TEI.

## TEI:why
- FAIR-Prinzipien für Daten: findable, accessible, interoperable, reusable
- TEI: text encoding initiative [@tei_history_1987]

theres multiple ways of how you can get to the TEI text. one is to wrap text elements which need to be marked up with *OXYGEN*, a powerful XML editor.

## TEI:refactoring
in our drama class, we used a python notebook (*ezdrama*, @skorinkin_ezdrama_2023) written by a friendly geek which allowed easily transforming the preprocessed text into the desired format. 

## basic auszeichnung:
``` {r}
#sampletxt<-readLines("sample.txt")
#print(sampletxt)
```
```
@title Ham 
@subtitle A tragedy
@author William S
^Dramatis Personae
Ham
Egg
Vikings
#Act 1
##Scene 1
@Ham: 
Lovely Spam! 
@Egg: 
Wonderful Spam!
##Scene 2
$Enter Vikings
@Ham: 
Egg, Spam, Sausage, and Bacon! 
@Vikings (singing):
Spam, Spam, Spam, Spam, Spam, Spam, Spam, and Spam
$The end
```


## finalise
if all that is done you possess a finalized TEI document which allows further analysis of the drama again e.g. using python or R for network, semantic etc. analyses. 1 platform hosting e.g. the drama plays in question is *dracor.org* [@fischer_programmable_2019], from where you can access the TEI versions, network and other data.

## TEI excerpt
```
<div type="act">
        <head>Zweyter Auftritt.</head>
        <stage>Aristodem, Polytimet.</stage>
        <sp who="#aristodem">
          <speaker>Aristodem.</speaker>
          <p>Ich komme, den König Polemon bey dir zu melden; er wird gleich bey dir seyn.</p>
        </sp>
        <sp who="#polytimet">
          <speaker>Polytimet.</speaker>
          <p>Was ist das für eine Herablassung, daß er zu seinem Gefangenen kommen will? Ich halte es für Edelmuth, und daß er mir so das Unglük, ein Gefangener zu seyn, versüssen wolle. <pb no="7"/>;</p>
        </sp>
        <sp who="#aristodem">
          <speaker>Aristodem.</speaker>
          <p>Prinz, deine Dapferkeit, die bey dieser Jugend so gesezt, so bedachtsam ist, verdienet  dir die Freundschaft des Helden, wiewol das Glük dir diesmal gefehlt hat.</p>
        </sp>
```

## output1
![](figure-html/typetoken.png)

## output2
![@noauthor_gephi_nodate](figure-html/klemm_gephi_vis_002.png)


## e.g. LX basics

```{r echo=F,warning=F}
posdf<-read.csv("lessing-galotti_PoStagged.csv")
library(knitr)
kable(head(posdf))
```

## further processing

the standardised text opens more options  
- building a corpus of the text/texts in question: *ANNIS* [@noauthor_annis_nodate]   
- annotating/coding the text (corpus)   
- textwitness comparison, *LERA* [@pockelmann_lera_2022]

## sources comparison
![LERA output](figure-html/klemm_lera.png)

## todo
![ספר תגי](figure-html/Seferhatagin-0006.jpeg)

## 19
![](figure-html/hison2.jpeg)