# Slow Programming & Close Reading #

The rationale of this project is found in what feels to me as a still uncomfortable clash between hermeneutics and distant reading methods. We understand and accept that quantitative approaches can tell us a lot about texts \marginpar{ JZ_20160415_1652: Refs. to examples to be added. } At the same time well known practitioners of such methods tell us that in the end the patterns that emerge from number crunching and pattern recognition require hermeneutic interpretation \marginpar{ JZ_20160415_1657: Refs. to Kirschenbaum, Meister, Ramsay, Underwood etc. } to be given meaning. I assert a strong dichotomic predilection in DH research to this matter. It seems necessarily to be one of two. Patterns can be modeled and quantified, but this necessarily results in reductive measures that are lossy of the subtle distinctions that drive hermeneutic interpretation. The gain of this coarse reductivenes is stringent formalization, which ensures computational tractability and therefor the scale and power of the analysis of large numbers: we can measure into corpora without even looking at them with human eyes and intellect. The other option is to apply subtle hermeneutics through close reading a text. This gives the research the power of meticulous interpretation, of precise contextualizing of meaning, of capturing, representing, and iterpretating complex heterogeneous knowledge. The loss here however is the power of scale: such hermeneutic precision can not be expressed by simple numbers, such heterogeneity cannot be modeled at scale. Therefor the hermeneutic model is a model of a single or a few texts and a model without computation, it is *only* interpretation. The quantitative model stands in opposition: this counts, and by using the computer it counts eerily fast and into huge corpora—but its understanding of the individual texts in corpora is poor. 

My conjecture is that the root cause for this perceived dichotomy is the preconception that software must scale—that the usefulness of writing and reading with software, thus the usefulness of code literacy is limited to tasks that are repetitive and thus subject to automation. However, what if we would not focus on scale for a change. What if we apply the values of close reading (attention, detailism .. .. ) to programming? What would formalizing thus—as Slow Programming—in code the process of close reading tells us about a text? This Notebook is a experimental quest towards an answer to that question.

## Let's experiment ##

I contend that hermeneutics and interpretation are not mutual exclusive with code. Softare and automization *can* be used as reductive methods that limit interpretation or are only crude hermeneutic means on the level of code, but I assert that they need not be. In fact, if anything, code in its form of literate programming \marginpar{ JZ_20160416_1749: Ref. Knuth } is a meticulous precise description of process. Next to that code is also 'just text', just another semiotic system \marginpar{ JZ_20160416_1751: Ref. Knuth/Sample/Marino? }. Therefore, if we agree that text is an excellent means for reporting hermeneutic process, code should even be better—because it is 'just' text, but with an edge: it will reproduce process meticulously, as long as we capture the hermeneutic process precisely enough. This is what I want to try to do in this experiment: model each and all hermeneutic choices when reading/editing a text meticulousely into code. I will use an Object Oriented approach \marginpar{ JZ_20160416_1755:  Some Ref. } to devising the model and code. I will furthermore hold to these rules when 'Close Reading' a text using code:

1. Only direct and indirect speech may be represented as string instances.
2. There will be no ‘ghost’ objects or methods (unused program code) during the execution of the program.
3. The resulting program should execute without producing Ruby exceptions or runtime errors.

### First, let's do no harm ###

Right after laying down those rules I realized that I missed something. If I want to hold myself to the meticulous precise registration of a reproducible process, I should not for instance transform the 'raw' text by using a text processor (that is: changing the representation by typing and editing the text file), because such requires interpretation action, and all interpretative acts should be modeled or captured in code. It is important in my view that this code is executable computer code. It should express if not possible *all*, at least *as many as possible* of my interpretative and transformative acts. With XML a lot if not most of my scholarly effort, actions, and performance goes unregistered. Between the manuscript and the TEI tag &lt;p&gt; there is a considerable series of scholarly actions that go unregistered and are lost. The aim here is to try to see how much of that scholarly performance can actually be captured in code. Every effort I make should be computationally reproducible. Thus I needed a fourth rule for the project:

1. Only direct and indirect speech may be represented as string instances.
2. There will be no ‘ghost’ objects or methods (unused program code) during the execution of the program.
3. The resulting program should execute without producing Ruby exceptions or runtime errors.
4. All scholarly actions should be computationally reproducible 

So, I can't manualy alter or transform the source files I will be using. All scholarly effort shall be expressed in code that guarrantees reproducible scholarly actions.

### Getting the Manuscript ###

Usually the first task when editing is finding, selecting, and perusing one's sources. Obviously the tasks and actions related to getting to the manuscript and digitally photographing it can not be captured in this notebook. Both because computers are severely handicapted for performing such actions—there is a long way to go before computers are actually that actionable—and because of time and economic constraints. The results of such scholarly actions are luckily however in the invaluable care of the Württembergische Landesbibliothek Stuttgart (http://www.wlb-stuttgart.de/), and we can emulate 'going to the library and getting the source' by using the online facsimile bank of the Württembergische Landesbibliothek. 

For this project I will be using the text of the Middledutch fable *Of Reynaert the Fox*. An extend manuscript of this text is found in the so called 'Comburg manuscript' which is in the care of the Würtembergische Landes Bibliothek under the description "Comburger Handschrift - mittelniederländische Sammelhandschrift - Cod.poet.et phil.fol.22". Putting 'Comburger' into the general search will get you to the manuscript. We can now emulate going to the library and getting the manuscript with the following little script.

In [None]:
require 'open-uri'

# The maximum zoom level images are statically available from this URL:
# http://digital.wlb-stuttgart.de/filegroups/combha-m_323970265/max/
# The folios 192v-213r coincide with the JPGs 00000388.jpg-00000429.jpg
base_url = "http://digital.wlb-stuttgart.de/filegroups/combha-m_323970265/max/"

side = "r"
folio_number = 192 #429
(388..429).each do |jpgn|

  # Instead of jpg order numbers let's add folio numbering to add a bit of
  # scholarly atmosphere.
  if side.eql?("v")
    side="r"
    folio_number += 1
  else
    side="v"
  end

  folio_name = "#{jpgn}_#{folio_number}#{side}.jpg"
  puts "Dowloading #{format("%08d",jpgn)}.jpg => #{folio_name}"
  open( "./folios_as_jpgs/#{folio_name}", "wb" ) do |file|
     file << open( "#{base_url}#{format("%08d",jpgn)}.jpg" ).read
  end
end

Apart from taking the reader to the library yourself, I don't think we can get the reader/user closer to the source we want to clarify for her. Let's reproduce just one of the facsimile folios here for a taste:

<img src="resources/388_192v.jpg" title="fol. 192v. Comburger handschrift (WBL Cod.poet.et.phil.fol.22)" alt="Comburger Handschrift, fol. 192v" width="750"/>

There is a second extend manuscript of this text, which can be found in the '[Dykse manuscript](http://sammlungen.ulb.uni-muenster.de/hdh/content/pageview/2435423)' currently held by the Universitäts- und Landesbibliothek Münster. The Münster university library offers exellent facsimiles and accompanying [transcriptions](http://repositorium.uni-muenster.de/document/midos/820415cc-9b48-49bf-952d-cfc719dd503c/dycksche-hs_komplett.pdf). The same script above could be adapted to download facsimiles in high detail too. But as this is out of the scope of this notebook I will leave such as an exercise to the reader. (An exercise which is complicated—but certainly not beyond the impossible—by the fact that the Münster university library serves its high resolution facsimiles as composites of so called image [tiles](http://sammlungen.ulb.uni-muenster.de/image/tile/wc/nop/3087/1.0.0/2138975/2/1/2.jpg). But some skillful customization of the last parameters of that URL would get you there.)

### Transcribing the source

Although [Transkribus](https://transkribus.eu/Transkribus/) promises a lot, at the moment computationally recognizing Middledutch manuscript is still a dream. For this reason I am not going to OCR the manuscript, as it is technological infeasible. But I will also not be transcribing the source manually, because the act of transcribing would not be captured by current computer technology and would not be reproducible.<sup><a href="#note_001">1</a></sup>, and hence I would be violating the fourth self imposed rule.

Luckily there is an existing transcription of this manuscript as part of the scholarly edition of the *Reynaert* that André Bouwman and Bart Besamusca authored (<a href="#bibref_001">Bouwman &amp; Besamusca 2009</a>).

Two observations now arise. The first is that this file in itself is the result of downloading an edition of the text of the Middledutch Reynaert, and OCR'ing using tesseract. \marginpar{ JZ_20160512_1002: refer to the script and the edition } OCR isn't perfect, but it gives us the glyphs of the edition at least, from this we'll build the edition of the edition. But this should not be a digital metaphor of the book. For that the PDF used (open source downloadable here \marginpar{ JZ_20160512_1005: REF } is available, and probably good enough. Here we are interestd in the question, what is a text when it is transfered (or read?) by code (or as code?). This is where we could leave off actually. We could say: this is what current computer technology can do, that is: guessing at the glyphs. That's as far as computer science understands text. Or rather we should contend that that is a form and an amount of human knowledge about text that has been computerly expressed. However we can use computer language to upgrade that understanding of the text, by transfering my knowledge about it to computer code.

### Notes
<small>

<a name="note_001" id="note_001">1</a>) Very stricty speaking this is not true. I could have installed a [key logger](https://en.wikipedia.org/wiki/Keystroke_logging "wikipedia entry on keystroke logging") and have registered every transcription action thus, probably even including look ups and queries I would have initiated from behind my keyboard. For time constraint reasons this is not feasible for this notebook, however. Also the focus of this notebook is the close reading of the text of the source through code, for which the transcription is a preliminary step that can be simulated in its reproducibility as the following section of the notebook shall demonstrate.

</small>

### References
<small>

<a name="bibref_001" id="bibref_001">Bouwman, A. & Besamusca, B., 2009.</a> *Of Reynaert the Fox: Text and Facing Translation of the Middle Dutch Beast Epic Van den vos Reynaerde*, Amsterdam: Amsterdam University Press. Available at: http://www.oapen.org/search?identifier=340003 [Accessed November 20, 2015].

</small>