Computational analyses of the critical history of George Eliot's Middlemarch
HTML Jupyter Notebook TeX
Switch branches/tags
Nothing to show
Clone or download
Permalink
Failed to load latest commit information.
anthologies
data
notebooks
old
papers
text-matcher @ c04e54f
.gitignore
.gitmodules
LICENSE
README.md
annotated.html
middlemarch.txt
requirements.txt

README.md

Middlemarch Critical Histories Project

What can we learn about literary scholarship by analysing thousands of articles at a time? What can we learn about a literary work by examining which parts of it get repeatedly quoted in (different phases of) its critical afterlife? (And a complementary question: which parts never get quoted?) What can we learn about canon formation, contestation and reformation by considering a highly canonical work not as a single unit but as subject to highly uneven critical attention across its length? What can we learn about literary scholarship as an institution and as a set of practices by tracing patterns of citation as they unfold over time?

These are just some of the questions which our project aims to address. Decades of research in linguistics has shown that when a large enough collection of texts (a corpus) is analysed, patterns emerge which aren't accessible at a smaller scale. In some cases these patterns confirm our intuitions and expectations, in other cases they are entirely unexpected or counter-intuitive. In either case, corpus methods offer an measure of patterns actually present in the data.

In applying these methods to literary scholarship, we've chosen to start small - relatively speaking. George Eliot's novel Middlemarch is an ideal test case for the following reasons:

  • Very long novel: critical attention will inevitably be selective, unlike short and/or hypercanonical works such as Hamlet or The Waste Land
  • Consistent place in the canon of English literature: has produced frequent and sustained attention in literary criticism over many decades
  • Text out of copyright: good availability of digitised text
  • Not too old: avoids issues of variable spellings
  • Relatively stable editorial history: avoids issues of multiple distinct editions
  • Thematic: the novel itself engages with questions of partial vs total perception, parts and wholes, the afterlife of words
  • Stylistic: the novel is notable for its narrator's wise generalisations, sentences which invite readers to extract them from their context and discuss them further (and which readers immediately began to do, as Leah Price has investigated)

We'll start with Middlemarch then, and attempt to construct a corpus of criticism and scholarship which discusses this novel. But this immediately raises questions and problems: on the one hand, we want as large a corpus as possible, since this will give us the best possible data for making claims about "criticism in general". However, we don't want to fall into the trap of quantity over quality: texts which have been poorly digitised or miscategorised will be useless at best, actively distorting at worst. A more technical question concerns the representativeness of our corpus. How can we best select a sample of all Middlemarch criticism to stand for the whole? And, crucially, how can we do this without creating enormous amounts of work for ourselves? (Aside from valuing our own time, this is important because we want this methodology to be readily expandable to other texts and authors).

There are also a set of concerns specific to the historical (diachronic) axis. Should we aim for equal numbers of texts from each year/decade, or try to have numbers proportionate to overall output (measured how)? Should we just use whatever we can get our hands on? How far back in time can we go before the categories of literary criticism and scholarship are so different that we aren't any longer comparing like with like?

We don't yet have solutions to these problems, and part of the interest in the project is to search for the best (which is not to say perfect) solutions, and indeed to ask ourselves how we should even decide what counts as "best".

Although we hope to learn and adapt our methods as our research progresses, we have decided on the following initial parameters:

  • Primary text: Middlemarch
  • Type of secondary texts: articles only (most easily available in digital form)
  • Chronological range: 1960-2009 (institutional contexts for literary scholarship relatively stable - debatable, certainly, but at least more stable than e.g. nineteenth century)
  • Source: JSTOR (large collection, many different journals, easily downloadable)

Results