# Digital Literary Studies

Matt Lavin, University of Pittsburgh

Writing an introduction for digital literary studies is a daunting task. To do so, one must agree to wrestle the great bear that is literary studies or, perhaps even more dauntingly, that ubiquitous institution known as The English Department. Digital literary studies is probably the most established of all digital humanities subfields, the most varied in terms of its practitioners' motivations and position-takings, and the most often discussed in the discourse we might conveniently label "public-facing digital humanities." To this third superlative, I would add that is often unfairly a stand-in for the whole of digital humanities. Almost any summary of the field will tell you that digital literary studies includes the markup tradition associated with textual and documentary editing (e.g. TEI, as well as the perhaps more often divisive subfield of computational text analysis, which has roots in both "Humanities Computing" and pre-computing attempts to make literary study a quantitative enterprise, such as Lucius Sherman. As Andrew Jewell has eloquently summarized, "Sherman was the author of Analytics of Literature (1893), a book which articulated Sherman's system for the 'objective study' of literary prose. This system, which, among other things, computed the 'force-ratio' in passages, or the number of emphasized words in relation to the number of total words, involved a good deal of counting by Sherman and his students, and Sherman's methods inspired a healthy amount of ridicule." Most contemporary practitioners of computational analytics would distance themselves from figures like Sherman if given the chance, but the specter of Sherman (and voices like his) are so present among our less digitally inclined colleagues that it behooves us, as Jewell argues, to confront that legacy directly. That said, we should also claim the best of our Humanities Computing heritage, and in this I would include the pioneering work of Roberto Busa and the profound impact of Willard McCarty, among others. 

One of the biggest challenges with describing a multifaceted, interdisciplinary space with a varied history is that I have multiple points of entry. Autobiography is one such option. I came out of the University of Iowa's English department and did my comprehensive exams straddling the methodologies of of new historicism and book history. And most of my published work has been specifically focused on Willa Cather from a bibliography, book history, and authorship studies perspective. Prior to serving as a CLIR Postdoctoral Fellow at the Center for Digital Research in the Humanities at the University of Nebraska - Lincoln in 2012, my tie to the digital was my claim as "a guy in the department who knows stuff about computers." After Nebraska, I had an alt-ac position managing a Mellon-funded digital humanities/integration/collaboration grant at St. Lawrence University in Canton, NY. Last year, I started at Pitt in another alt-ac role: Clinical Assistant Professor of English and Director of our Digital Media Lab. As a result of these choices, I've now spent about four years developing computational skills in pedagogical and research contexts. So for me, what it feels like to be situated within digital literary studies can be as complicated as my position on post-structuralist questions of representation, or my views on the implicit position-takings of algorithmic or quantitative thinking. Simultaneously, the question has been as concrete as which building my office was in, what I was allowed or required to teach, and who was likely to skip a meeting if the agenda related to digital humanities.

Another point of entry is to use digital methods to survey a small corner of literary studies discourse. Andrew Goldstone and Ted Underwood generated two different sets of topic models of  5,940 articles in _PMLA_, one generating 150 models covering 1924 to 2006 and another generating 100 models covering 1890–1999.

![5,940 articles in PMLA covering 1924 to 2006][pmla]

[pmla]: http://pmla.site44.com/images/rage_css_imagemap.jpg


For those of you who don't know, topic modeling has to some degree "taken DH by storm" in the past few years, as it's the kind of tool that digital humanities can really sink their teeth into. The idea is to use statistical inference to generate lists of co-occuring words across multiple documents. If you think about how topics of discussion in writteb materials work, it's reasonable to assume any given document in a set discusses more than one topic, and that any given topic could easily occur in more than one document. Checking co-occcurences one at a time across documents would take a theoretically insane amount of time, so we probablistic methods instead. And there are in fact many statistical approaches to modeling topics. In most cases, we set a somewhat number of topics we want our algorithm to find, say 50, 100, or 250, but not three or four. What all this means is that if we run the exact same topic modeling setup over and over again, we will get slightly different results each time. Further, the _PMLA_ visualization only goes to 2006 and thus does not reflect the post-2008 surge of interest in digital humanities. Even if we did update this particular line of inquiry, however, Goldstone and Underwood remind us that the _PMLA_ is just one journal, and the DH boom has been as much about new publications, and new modes of communication, as it has been an intervention in the trajectory of the existing literary studies field. I mention all this way of explanation and qualification, not as a means to dismiss topic models. Here's a closer look at Goldstone's 100 topic smodels of the _PMLA_:

As you can see, these data resist easy interpretation, and there's more to say about it that I have time for today, but I would love to encourage you all to take a closer look at both the visualization and the underlying models if you haven't seen them before. For the moment, I want to call your attention to something Goldstone and Underwood remark upon in their accompanying blog post for "The Stone and the Shell," which is that the model "evidence found fact time note part early professor appears made ff passage case source date connection present similar find" experience a drop in prominence between 1960 and 1980. To quote Goldstone and Underwood:

> Apparently, critics in the 1960s developed a habit of describing literature in terms of problems, questions, and  significant moments of action or choice; the habit intensified through the early 1980s and then declined. This habit may not have a name; it may not line up neatly with any recognizable school of thought. But it’s a fact about critical history worth knowing.

The less obvious but crucial context for this analysis is that digital humanities is deeply invested in inquiry, evidence, discovery, and "facts worth knowing." This is generally true of both textual studies and formalism--the two most dominant interpretive frames in digital literary studies. What counts as evidence and how that evidence makes meaning varies significantly between these schools of thought, but Golstone and Underwood astutely point to the biggest challenge facing practitioners of digital literary studies today: an entire generation of literary scholars were heavily influence by the critical theory movement, which lodges a substantial critique against totalistic systes of empiricism and quantitification. Many digital humanities scholars would argue that digital humanities is, at its best, both an engagement with and critique of something loosely akin to "logicial positivism." (Yesterday, we heard the term essentializing, which I think nudges at similar concerns.) Others, of course, would urge the humanities to be more quantitiative or evidence-driven. To engage in digital literary studies is to confront the tensions and gains associated with genuine differences of perspective. As a result, I like to think of Goldstone and Underwood's topic models as digital humanities, looking through a keyhole at literary studies, and somehow accidentally getting a better picture of itself in the process. 

On this note, I want to transition to more of some literary studies DH work that I think is really representative of the computational side of the discipline (and in general work that I really like, though I'm sure I've left out many great pieces here). Much of the scholarship I'll discuss is tentative, or incomplete, or overdue for an update, but that's the nature of digital humanities. In fact, one of its greatest strengths is its interest in continuing to refine its methods and reach new insights, often through collaborative effort.  

Stanford Lit Lab "Pamphlet #8"

Mark Algee-Hewitt and Mark McGurl

![McGurl and Algee-Hewitt on Canonicity][mcgurl]
[mcgurl]: mcgurl2.png

Andrew Piper and Eva Portelance, "How Cultural Capital Works: Prizewinning Novels, Bestsellers, and the Time of Reading," http://post45.research.yale.edu/2016/05/how-cultural-capital-works-prizewinning-novels-bestsellers-and-the-time-of-reading/

![Nostalgia Terms by Category][cultural]
[cultural]: http://post45.research.yale.edu/wp-content/uploads/2016/04/Fig_61.jpg

Matthew Jockers, Syuzhet. https://storify.com/clancynewyork/contretemps-a-syuzhet

Underwood and Sellers on "The Emergence of Literary Diction"

![Germanic-Latin Ratio, by Genre][jdh]
[jdh]: http://journalofdigitalhumanities.org/wp-content/uploads/2012/05/PoetryFictionNon.jpg

## My work:
- Lovecraft, authorship, hack-writing
- Lovecraft on Addison & Steele, _Walker's Dictionary_
- Walker Ratio
- Neologisms - OED Data
- Machine learning on year-based samples
- Broader analysis of horr, how it relates to other supernatural subgenres

## Other Articles of Interest
- Ryan Cordell, "Reprinting, Circulation, and the Network Author in Antebellum Newspapers"
- Douglas Ernest Duhaime, "Textual Reuse in the Eighteenth Century: Mining Eliza Haywood’s Quotations" http://www.digitalhumanities.org/dhq/vol/10/1/000229/000229.html
- Marissa Gemma, Frédéric Glorieux, Jean-Gabriel Ganascia, "Operationalizing the Colloquial Style: Repetition in 19th-Century American Fiction"
