# Reading Machines
## Exploring the Linguistic Unconscious of AI

### Introduction: Two strands of computation 

The history of computing revolves around efforts to automate human computation -- human labor. And in the development of the computing machine, from Lovelace to Turing and beyond, a dominant concern has been the specification and refinement of _algorithms_: methods of reducing complex calculations and other operations to explicit formal rules. Rules that could, in principle, be implemented with rigor and precision by purely mechanical (and eventually, electronic) means. 

But as a means of understanding Chat GPT and other forms of [generative AI](https://en.wikipedia.org/wiki/Generative_artificial_intelligence), a consideration of algorithms only gets us so far. In fact, when it comes to the [large language models](https://en.wikipedia.org/wiki/Large_language_model) that have captivated the public imagination, in order to make sense of their "unreasonable effectiveness," we must attend to another strand of computing, one which, though bound up with the first, manifests distinct pressures and concerns. [cite] Instead of formal logic and mathematical proof, this strand draws on traditions of thinking about data, randomness, and probability. And instead of the prescription of (computational) actions, it aims at the description and prediction of (non-computational) aspects of the world. 

A key moment in this tradition, in light of later developments, remains Claude Shannon's work on modeling the statistical structure of printed English. [cite; note on Shannon vs. Turing] In this interactive document, we will use the [Python programming language](https://www.python.org) to reproduce a couple of the experiments that Shannon reported in his famous article, in the hopes of pulling back the curtain a bit on what seems to many (and not unreasonably) as evidence of a ghost in the machine. But the aim of these explorations is not  to demystify experiences of generative AI. I, for one, do find many of these experiences haunting. But maybe the haunting doesn't happen where we at first assume.

The material that follows draws on and is inspired by my reading of Lydia Liu's _The Freudian Robot_, one of the few works in the humanities that I'm aware of to deal with Shannon's work in depth.[cite]

````{admonition} How to use this document 
:class: dropdown

Using the Python language allows us to automate Shannon's methods of analyzing and manipulating text -- methods that Shannon originally deployed without the benefit of a computer. Python, like any other modern progamming language, makes these methods relatively trivial to implement. 

I've also chosen to use Python in this particular format -- as a web-based, interactive document -- in order to expose the code that we use to perform this work. Certainly, it would be possible to create a more seamless experience in the form of a web app that would conceal all the computational steps from the end user. But apps appeal to us because they perform this concealment, which fosters the impression that some "magic" is taking place behind the scenes. This aim of this document is, in part, to disrupt that concealment.

----------------

**Working through the examples**

In the remainder of this document, you'll find passages of exposition (like this) interspersed with blocks of code. In most cases, you'll just need to click a button to run the code in order to see a demonstration of what the prose passages expound. In order for the demonstration to work, however, you'll want to run through the code sections in order. Skipping around (except where otherwise instructed) will cause errors. 

If you get such an error, refresh the web page and run through the code sections in order once more, starting from the top of the page.

-----------

**Reading the code**

This document is intended to be intelligible without any prior knowledge of Python or programming. However, for those with an interest in understanding the Python code at a deeper level, expandable sections like this one provide a description of the logic and syntax in each of the code sections. Again, it's not necessary to read these sections, but if you think you might like to learn Python (or another language), or if you already know some Python, these sections might be of interest to you. 
````

### Two kinds of coding

Before we delve into our experiments, let's clarify some terminology. In particular, what do we mean by _code_? 

The demonstration below goes into a little more explicit detail, as far as the mechanics of Python are concerned, than the rest of this document. That's intended to motivate the contrast to follow, between the kind of code we write in Python, and the kind of coding that Shannon's work deals with. 

#### Programs as code(s)

We imagine computers as machines that operate on 1's and 0's. In fact, the 1's and 0's are themselves an abstraction for human convenience: digital computation happens as a series of electronic pulses: switches that are either "on" or "off." (Think of counting to 10 by flipping a light switch on and off 10 times.)

Every digital representation -- everything that can be computed by a digital computer -- must be encoded, ultimately, in this binary form. 

But to make computers efficient for human use, many additional layers of abstraction have been developed on top of the basic binary layer. By virtue of using computers and smartphones, we are all familiar with the concept of an interface, which instantiates a set of rules prescribing how we are to interact with the device in order to accomplish well-defined tasks. These interactions get encoded down to the level of electronic pulses (and the results of the computation are translated back into the encoding of the interface). 

A programming language is also an interface: a text-based one. It represents a code into which we can translate our instructions for computation, in order for those instructions to be encoded further for processing. 

#### Baby steps in Python


Let's start with a single instruction. Run the following line of Python code by clicking the button,. You won't see any output -- that's okay.

In [None]:
answer_to_everything = 42

In the encoding specified by the Python language, the equals sign (`=`) is an instruction that loosely translates to: "Store this value (on the right side) somewhere in memory, and give that location in memory the provided label (on the left side)." The following image presents one way of imagining what happens in response to this code (with the caveat that, ultimately, the letters and numbers are represented by their binary encoding).  

[image here]

By running the previous line of code, we have created a _variable_ called `answer_to_everything`. We can use the variable to retrieve its value (for use in other parts of our program). Run the code below to see some output.

In [None]:
print(answer_to_everything)

The `print()` _function_ is a command in Python syntax that displays a value on the screen. By the term _syntax_, in the context of programming languages, we refer to the following elements:
  - the name `print`
  - the parentheses that follow it, which enclose the _argument_
  - the argument itself, which in this case is a variable name (previously defined)

These elements are perfectly arbitrary (in the Saussurean sense). This syntax was invented by the designers of the Python language, though they drew on conventions found in other programming languages. The point is that nothing about the Python command `print(answer_to_everything)` makes its operation transparent; to know what it does, you have to know the language (or, at least, be familiar with the conventions of programming languages more generally) -- just as when learning to speak a foreign language, you can't deduce much about the meaning of the words from the way they look or sound.

However, unlike so-called _natural languages_, even minor deviations in syntax will usually cause errors, and errors will usually bring the whole program to a crashing halt. [note on the term natural language]

Run the code below -- you should see an error message.

In [None]:
print(answer_to_everythin)

A misspelled variable causes Python to abort its computation. Imagine if conversation ground to a halt whenever one of the parties mispronounced a word or used a malapropism!

I like to say that Python is extremely literal. But of course, this is merely an analogy, and a loose one. There is no room for metaphor in programming languages, at least, not as far as the computation itself is concerned. The operation of a language like Python is determined by the algorithms used to implement it. Given the same input and the same conditions of operation, a given Python program should produce the same output every time. (If it does not, that's usually considered a bug.)

#### Encoding text

While _programming languages_ are ways of encoding algorithms, the operation of the resulting _programs_ does depend, in most cases, on more than just the algorithm itself. Programs depend on data. And in order to be used in computation, data must be encoded, too.

As an engineer at Bell Labs, Claude Shannon wanted to find -- mathematically -- the most efficient means of encoding data for electronic transmission. Note that this task involves a rather different set of factors from those that influence the design of a programming language.

The designer of the language has the luxury of insisting on a programmer's fidelity to the specified syntax. In working in Python, we have to write `print(42)`, exactly as written, in order to display the number `42` on the screen. if we forget the parentheses, for instance, the command won't work. But when we talk on the phone (or via Zoom, etc.), it would certainly be a hassle if we had to first translate our words into a strict, fault-intolerant code like that of Python. 

All the same, there is no digital (electronic) representation without encoding. To refer to the difference between these two types of codes, I am drawing a distinction between _algorithms_ and _data_. Shannon's work was among the first to illuminate this distinction, which remains relevant to any consideration of machine learning and generative AI.

#### Representing text in Python

Before we turn to Shannon's experiments with English text, let's look briefly at how Python represents text as data.

In [None]:
a_text = "Most noble and illustrious drinkers, and you thrice precious pockified blades (for to you, and none else, do I dedicate my writings), Alcibiades, in that dialogue of Plato's, which is entitled The Banquet, whilst he was setting forth the praises of his schoolmaster Socrates (without all question the prince of philosophers), amongst other discourses to that purpose, said that he resembled the Silenes."

Running the code above creates a new variable, `a_text`, and assigns it to a _string_ representing the first sentence from Francois Rabelais' early Modern novel, _Gargantua and Pantagruel_. A string is the most basic way in Python of representing text, where "text" means anything that is not to be treated purely a numeric value. 

Anything between quotation marks (either double `""` or single `''`) is a string.

One problem with strings in Python (and other programming languages) is that they have very little structure. A Python string is a sequence of characters, where a _character_ is, a letter of a recognized alphabet, a punctuation mark, a space, etc. Each character is stored in the computer's memory as a numeric code, and from that perspective, all characters are essentially equal. We can access a single character in a string by supplying its position. (Python counts characters in strings from left to right, starting with 0, not 1, for the first character.)

In [None]:
a_text[5]

We can access a sequence of characters -- here, the characters in positions 11 through 50.

In [None]:
a_text[10:50]

We can even divide the string into pieces, using the occurences of particular characters. The code below divides our text on the white space, returning a _list_ (another Python construct) of smaller strings.

In [None]:
a_text.split()

The strings in the list above correspond, loosely, to the individual words in the sentence from Rabelais' text. But Python really has no concept of "word," neither in English, nor any other (natural) language. 

### Language & chance

It's probably fair to say that when Shannon was developing his mathematical approach to encoding information, the algorithmic ideal  dominated computational research in Western Europe and the United States. In previous decades, philosophers like Bertrand Russell and mathematicians like David Hilbert had sought to develop a formal approach to mathematical proof, an approach that, they hoped, would ultimately unify the scientific disciplines. The goal of such research was to identify a core set of axioms, or logical rules, in terms of which all other "rigorous" methods of thought could be expressed. In other words, to reduce to zero the uncertainty and ambiguity plaguing natural language as a tool for expression: to make language algorithmic.

Working within this tradition, Alan Turing had developed his model of what would become the digital computer. 

But can language as humans use it be reduced to such formal rules? On the face of it, it's easy to think not. However, that conclusion presents a problem for computation involving natural language, since the computer is, at bottom, a formal-rule-following machine. Shannon's work implicitly challenges the assumption that we need to resort to formal rules in order to deal with the uncertainty in language. Instead, he sought mathematical means for _quantifying_ that uncertainty. And as Lydia Liu points out, that effort began with a set of observations about patterns in printed English texts. 

#### The long history of code

Of course, Shannon's insights do not begin with Shannon. A long history predates him of speculation on what we might call the statistical features of language. Speculations of some practical urgency, given the even longer history of cryptographic communication in political, military, and other contexts.

In the 9th Century CE, the Arab mathematician and philosopher Al-Kindi [composed a work on cryptography](https://www.tandfonline.com/doi/abs/10.1198/tas.2011.10191) in which he described the relative frequency of letters in [...] as a method for [...]. Al-Kindi, alongside his many other accomplishments, is typically credited with the first surviving analysis of this kind, which is a direct precursor of methods popular in the digital humanities (word frequency analysis), among other many other domains. [cite]

Closer to the hearts of digital humanists, the Russian mathematician Andrei Markov, in [a 1913 address to the Russian Academy of Sciences](https://www-cambridge-org.proxygw.wrlc.org/core/journals/science-in-context/article/an-example-of-statistical-investigation-of-the-text-eugene-onegin-concerning-the-connection-of-samples-in-chains/EA1E005FA0BC4522399A4E9DA0304862), reported on the results of his experiment with Aleksandr Pushkin's _Evegnii Onegin_: a statistical analysis of the occurrences of consonants and vowels in the first two chapters of Pushkin's novel in verse. [cite] From the perspective of today's large-language models, Markov improved on Al-Kindi's methods by counting not just isolated occurrences of vowels or consonants, but co-occurences: that is, where a vowel follows a consonant, a consonant a vowel, etc. As a means of articulating the structure of a sequential process, Markov's method generalizes into a powerful mathematical tool, to which he lends his name. We will see how Shannon used [Markov chains](https://en.wikipedia.org/wiki/Markov_chain) shortly. 

#### A spate of tedious counting

First, however, let's illustrate the more basic method, just to get a feel for its effectiveness.

We'll take a text of sufficient length. Urquhart's English translation of _Gargantual and Pantagruel_, in the Everyman's Library edition, clocks in at a respectable 823 pages, so that's a decent sample. If we were following the methods used by Al-Kindi, Markov, or even Shannon himself, we would proceed as follows:
  1. Make a list of the letters of the alphabet on a sheet of paper.
  2. Go through the text, letter by letter.
  3. Beside each letter on your paper, make one mark each time you encounter that letter in the text.

Fortunately for us, we can avail ourselves of a computer to do this work. 

In the following sections of Python code, we download the Project Gutenberg edition of Rabelais' novel, saving it to the computer as a text file. We can read the whole file into the computer's memory as a single Python string. Then using a property of Python strings that allows us to _iterate_ over them, we can automate the process of counting up the occurences of each character. [cite]

In [None]:
from urllib.request import urlretrieve
urlretrieve("https://www.gutenberg.org/cache/epub/1200/pg1200.txt", "gargantua.txt")

In [None]:
with open('gargantua.txt') as f:
    g_text = f.read()

````{admonition} Reading the code
:class: dropdown
1. The `from`...`import` statement (lines of Python code are called _statements_) loads some external code (i.e., code that wasn't automatically loaded when we started our Python session) for use in retrieving data from the web.
2. This external code is in the form of a Python _function_ called `urlretrieve()`. Like `print()` in the previous example, a Python function is recognizable by the parentheses following its name. 
3. Within these parentheses, we can supply zero, one, or more _arguments_. Arguments are values or variables that the function will use to do some work. We can organize our code into functions in order to make it more easily reusable -- even by others. I did not write the `urlretrieve` function -- this code is merely importing and using it. Calling an external function in programming is a little like citing a source in writing: a way of building on others' work.
4. _Calling_ the `urlretrieve()` function (what we're doing here) doesn't produce any _visible_ output, but behind the scenes, it fetches the data at the URL (the first argument, between the first pair of quotation marks) and saves that data as a file (the name of which is provided as the second argument, `"gargantua.txt"`). 
5. In the next section, I use another function, `open()`, to open the file, meaning to make it available in the computer's memory for access. The `as f` part of that line indicates that the file, while it's open, can be accessed via the variable `f`. 
5. In the indented part, I use the `read()` method to load the contents of the file into memory. using the variable `g_text`. Henceforth, the entirety of Rabelais' text is available for use by reference to the `g_text` variable. (For the duration of this Python session, that is -- if I close this browser tab, I'll lose all the variables, etc., and will have to re-run the code to re-create them.)

**Note on Python variables**

- Variable names in Python do _not_ go inside quotation marks. 
- Variables are like variables in algebra: they are arbitrary names that stand for specified values.
- They usually appear, when first used, either on the _left_ side of an equals sign (`g_text = f.read()`) or inside the parentheses following a function call. That is how the variables acquire their values.
- It's worth reiterating: variable names (and function names, too) are arbitrary: i.e., it's the programmer's choice. In the code in this document, I've tried to create variable names that at least suggest what they refer to, but that's just a stylistic convention for making code easier to read; it makes no difference to the computer.




````

Running the code below uses the `len()` function to display the length -- in characters -- of a string. 

In [None]:
len(g_text)

The Project Gutenberg version of _Gargantua and Pantagruel_ has close to a 2 million characters.

As an initial exercise, we can count the frequency with which each character appears. Run the following section of code to create a structure mapping each character to its frequency.

In [None]:
g_characters = {}
for character in g_text:
    if character in g_characters:
        g_characters[character] += 1
    else:
        g_characters[character] = 1

Run the code below to reveal the frequencies.

In [None]:
g_characters

````{admonition} Reading the code
:class: dropdown

- Data structures in Python are often identified by punctuation marks. The curly braces in the output above indicate that the outermost structure is a _dictionary_, which is a mapping of _keys_ to _values_. The keys are sort of like variable names, except that they go inside quotation marks.
- Data structures in Python can contain other data structures. Our `g_characters_ dictionary ultimately consists of Python strings mapped to integers (a numeric data type in Python). 
- To create our dictionary of character frequencies, we use the following logic:
     1. We create the variable `g_characters`, setting it to an empty dictionary (`{}`). 
     2. We loop over the `g_text` variable, which holds a string, i.e., a sequence of characters. The `for` keyword in Python allows us to access each element in a sequence (like a string) one at a time. 
     3. Each time through the loop, the current character will be assigned to the `character` variable.
     4. In the code indented under the `for` line, we check to see whether we have encountered this particular character before (using the variable `character` to refer to it, just as one might solve for `x` in an algebraic equation).
     5. If we have encountered this character already, we assume that it's associated with a number in our `g_characters` dictionary, and we increment that number (just as if we were making another hash mark on a sheet of paper).
     6. Otherwise, we add this character to `g_characters` and set the tally to 1 (since this is the first occurrence of that character).

````

Looking at the contents of `g_characters`, we can see that it consists of more than just the letters in standard [Latin script](https://en.wikipedia.org/wiki/Latin_script). There are punctuation marks, numerals, and other symbols, like `\n`, which represents a line break. 

But if we look at the 10 most commonly occurring characters, with one exception, it aligns well with the [relative frequency of letters in English](https://en.wikipedia.org/wiki/Letter_frequency) as reported from studying large textual corpora.  

In [None]:
sorted(g_characters.items(), key=lambda x: x[1], reverse=True)[:10]

````{admonition} Reading the code
:class: dropdown

This last line code sacrifices clarity for brevity -- a practice I will generally refrain from in this document. But it does illustrate how we can compose complex operations in Python out simpler elements -- which is the fundamental practice of programming.

Here we sort our `g_characters` by the numeric elements (so the counts, not the characters), using the built-in `sorted` function, to which we provide the optional argument `reverse=True` to sort in descending order. Then in the square brackets at the end of the line, we look at the first 10 elements in that (now) sorted sequence.

````

#### Random writing

At the heart of Shannon's method lies the notion of _random sampling_. It's perhaps easiest to illustrate this concept before defining it.

Using more Python code, let's compare what happens when we construct two random samples of the letters of the Latin script, one in which we select each letter with equal probability, and the other in which we weight our selections according to the frequency we have computed above.

In [None]:
from random import choices
alphabet = "abcdefghijklmnopqrstuvwxyz"
print("".join(choices(alphabet, k=50)))

The code above uses the `choices()` method to create a sample of 50 letters, where each letter is equally likely to appear in our sample. Imagine rolling a 26-sided die, with a different letter on each face, 50 times, writing down the letter that comes up on top each time.

Now let's run this trial again, this time supplying the observed frequency of the letters in _Gargantual and Pantagruel_ as weights to the sampling. (For simplicity's sake, we first remove everything but the 26 lowercase letters of the Latin script: numbers, punctuation marks, spaces, letters with accent marks, etc.)

In [None]:
g_alpha_chars = {}
for c, n in g_characters.items():
    if c in alphabet:
        g_alpha_chars[c] = n
letters = list(g_alpha_chars.keys())
weights = g_alpha_chars.values()
print(''.join(choices(letters, weights, k=50)))

Do you notice any difference between the two results? It depends to some extent on roll of the dice, since both selections are still random. But you might see _more_ runs of letters in the second that resemble sequences you could expect in English, maybe even a word or two hiding in there.

````{admonition} Reading the code
:class: dropdown

- The line `print("".join(choices(alphabet, k=50)))` displays the result of using the `choices` function to take a random, evenly weighted sample of size `k`. Because `choices` returns a Python list (another data type), not a string, we use the `.join()` method to create a single string out of the 50 letters in our sample -- just to make it more readable. 
- We use another `for` loop and an `if` statement to create a new dictionary, `g_alpha_chars`, to hold just the frequencies of those characters that can be found in the string called `alphabet` (previously defined). 
- Then we separate out the characters and their frequencies into two parallel lists. (We do this on account of the way `choices()` is defined to work.)
- Finally, we use these two lists as arguments to `choices`, where the presence of the `weights` argument means that the sample will no longer be equally weighted, but that each character in `letters` will be selected with the frequency suppled in `weights`. 

````

#### The difference a space makes

On Liu's telling, one of Shannon's key innovations was his realization that in analyzing _printed_ English, the _space between words_ counts as a character. It's the spaces that delimit words in printed text; without them, our analysis fails to account for word boundaries. 

Let's say what happens when we include the space character in our frequencies.

In [None]:
g_shannon_chars = {}
for c, n in g_characters.items():
    if c in alphabet or c == " ":
        g_shannon_chars[c] = n
letters = list(g_shannon_chars.keys())
weights=g_shannon_chars.values()
print(''.join(choices(letters, weights, k=50)))

It may not seem like much improvement, but now we're starting to see sequences of recognizable "word length," considering the average lengths of words in English. 

But note that we haven't so far actually tallied anything that would count as a word: we're still operating exclusively at the level of individual characters or letters.

#### Law-abiding numbers

To unpack what we're doing a little more: when we make a _weighted_ selection from the letters of the alphabet, using the frequencies we've observed, it's equivalent to drawing letters out of a bag of Scrabble tiles, where different tiles appear in a different amounts. If there are 5 `e`'s in the bag but only 1 `z`, you might draw a `z`, but over time, you're more likely to draw an `e`. And if you make repeated draws, recording the letter you draw each time before putting it back in the bag, your final tally of letters will usually have more `e`'s than `z`'s. 

In probability theory, this expectation is called [the law of large numbers](https://en.wikipedia.org/wiki/Law_of_large_numbers). It describes the fundamental intuition behind the utility of averages, as well as their limitation: sampling better approximates the mathematical average as the samples get larger, but in every case, we're talking about behavior in the aggregate, not the individual case. 

### Language as a drunken walk

How effectively can we model natural language using statistical means? It's worth dwelling on the assumptions latent in this question. Parts of speech, word order, syntactic dependencies, etc: none of these classically linguistic entities come up for discussion in Shannon's article. Nor are there any claims therein about underlying structures of thought that might map onto grammatical or syntactic structures, such as we find in the Chomskian theory of [generative grammar](https://en.wikipedia.org/wiki/Generative_grammar). The latter theory remains squarely within the algorithmic paradigm: the search for formal rules or laws of thought. 

Language, in Shannon's treatment, resembles a different kind of phenomena: biological populations, financial markets, or the weather. In each of these systems, it is taken as a given that there are simply too many variables at play to arrive at the kind of description that would even remotely resemble the steps of a formally logical proof. Rather, the systems are described, and attempts are made to predict their behavior over time, drawing on observable patterns held to be valid in the aggregate. 

Whether the human linguistic faculty is best described in terms of formal, algorithmic rules, or as something else (emotional weather, perhaps), was not a question germane to Shannon's analysis. Inn the introduction to his 1948 article, he claims that the "semantic aspects of communication are irrelevant to the engineering problem" (i.e., the problem of devising efficient means of encoding messages, linguistic or otherwise). These "semantic aspects," excluded from "the engineering problem," return to haunt the scene of generative AI with a vengeance. But in order to set this scene, let's return to Shannon's experiments.

Following Andrei Markov, Shannon modeled printed English as a Markov chain: as a special kind of weighted selection where the weights of the current selection depend _only_ on the immediately previous selection. A Markov chain is often called a _random walk_, though the image conventionally used to explain it is of a person who has had a bit too much to drink stumbling about. Observing such a situation, you might not be able to determine where the person is trying to go; all you can predict is that their next step will fall within stumbling distance of where they're standing right now. 

It turns out that Markov chains can be used to model lots of processes in the physical world. And they can be used to model language, too, as Claude Shannon showed.

#### More tedious counting

One way to construct such an analysis is as follows: represent your sample of text as a continuous string of characters. (As we've seen, that's easy to do in Python.) Then "glue" it to another string, representing the same text, but with every character shifted to the left by one position. For example, the first several characters of the first sentence from _Gargantua and Pantagruel_ would look like this:

[image]

With the exception of the dangling left-most and right-most characters, you now have a pair of strings that yield, for each position, a pair of characters.

[image with highlighting]

These pairs are called bigrams. But in order to construct a Markov chain, we're not just counting bigrams. Rather, we want to create what's called a _transition table_: a table where we can look up a given character -- the letter `e`, say -- and then for any other character that can follow `e`, find the frequency with which it occurs in that position (i.e., following an `e`). If a given character never follows another character, its bigram doesn't exist in the table. 

Below are shown the most common bigrams in such a transition table created on the basis of _Gargantua and Pantagruel_.

[image]

#### Preparing the text

To simplify our analysis, first we'll standardize the source text a bit. Removing punctuation and non-alphabetic characters, removing extra runs of white space and line breaks, and converting everything to lowercase will make patterns in the results easier to see (though it's really sort of an aesthetic choice, and as I've suggested, Shannon's method doesn't presuppose any essential difference between the letters of words and the punctuation marks that accompany them). 

Run the two code sections below to clean the text of _Gargantua and Pantagruel_.

In [None]:
def normalize_text(text):
    '''
    Reduces the provided string to a string consisting of just alphabetic, lowercase characters from the Latin script and non-contiguous spaces.
    '''
    text_lower = text.lower()
    text_lower = text_lower.replace("\n", " ").replace("\t", " ")
    text_norm = ""
    for char in text_lower:
        if (char in "abcdefghijklmnopqrstuvwxyz") or (char == " " and text_norm[-1] != " "):
            text_norm += char
    return text_norm

In [None]:
g_text_norm = normalize_text(g_text)
g_text_norm[:1000]

````{admonition} Reading the code
:class: dropdown

The bulk of this code defines a new function, `normalize_text()`, which we can use to perform this procedure whenever we need to. The procedure is as follows:
  1. Create a lowercased version of the provided `text` argument, using the built-in `lower()` method.
  2. Using the built-in `replace()` method, replace line breaks (the special character `"\n"`) and tabs (the special character `"\t"` with single spaces. 
  3. Create an empty string to hold the normalized text.
  4. Loop over all the characters in the original string, and for each character, add to the normalized text only if it is a) an alphabetic character or b) a space, provided that the last element of the normalized string is not also a space. (This last condition ensures that we don't end up with multiple contiguous spaces.)
  5. The `return` keyword is necessary to make our new `text_norm` variable available in the context where we call the function. 

Then we call this function on the `g_text` variable, assigning the return value to the new variable `g_text_norm`. (Again, don't dwell on the names; it's common in programming to have multiple variables referring to the same value, where each variable belongs to a different context. It's a technique that helps reduce bugs in programming and make programs more efficient.)

Finally, we look at the first 1,000 characters in our normalized text.

````

This method isn't perfect, but we'll trust that any errors -- like the disappearance of accented characters from French proper nouns, etc. -- will get smoothed over in the aggregate. 

#### Setting the table

To create our transition table of bigrams, we'll define two new functions in Python. The first function, `create_ngrams`, generalizes a bit from our immediate use case; by setting the parameter called `n` in the function call to a number higher than 2, we can create combinations of three or more successive characters (trigrams, quadgrams, etc.). This feature will be useful a little later.

Run the code below to define the function.

In [None]:
def create_ngrams(text, n=2):
    '''
    Creates a series of ngrams out of the provided text argument. The argument n determines the size of each ngram; n must be greater than or equal to 2. 
    Returns a list of ngrams, where each ngram is a Python tuple consisting of n characters.
    '''
    text_arrays = []
    for i in range(n):
        last_index = len(text) - (n - i - 1)
        text_arrays.append(text[i:last_index])
    return list(zip(*text_arrays))

````{admonition} Reading the code
:class: dropdown

The code here might look a little cryptic, but it's doing what we illustrated above: taking a single string (the `text` argument) and transforming it into multiple, parallel, strings, each of which is copy of the previous but shifted to the left by one character. (The shifted characters get lopped off, so that each string remains the same size as the others.)

- The `range()` function just returns a list of numbers, from 0 up to `n - 1`. 
- For each value of `i` in our loop, we find the position of the string that is `n - i - 1` characters from the end. We do this to ensure that the strings line up. So if `n` is 3, then the first time through the loop, `i` will be 0, and `n - i - 1` will be `2`. 
- Using Python's string-slicing syntax, we take a portion of the string from `i` to the position calculated above. So on the first time through our loop, our new string will start at the 0 position (the first character) and end with the antepenulimate character.
- When `i` is 1, our new string will start at the 1 position (second character) and end with the penultimate character.
- When `i` is 2, our new string will start at the 2 position (third character) and end with the last character. 

A little reflection shows that all three of these strings will be of equal length. 

Finally, we use the Python `zip()` function to align all three strings and separate them into groups of items that occupy the same position in each string, which you can visualize as follows:

[image]


````

Let's illustrate our function with a small text first. The output is a Python list, which contains a series of additional collections (called tuples) nested within it. Each subcollection corresponds to a 2-character window, and the window is moved one character to the right each time. 

This structure will allow us to create our transition table, showing which characters follow which other characters most often. 

In [None]:
text = 'abcdefghijklmnopqrstuvwxyz'
create_ngrams(text, 2)

Run the code section below to define another function, `create_transition_table`, which does what its name suggests.

In [None]:
from collections import Counter
def create_transition_table(ngrams):
    '''
    Expects as input a list of tuples corresponding to ngrams.
    Returns a dictionary of dictionaries, where the keys to the outer dictionary consist of strings corresponding to the first n-1 elements of each ngram.
    The values of the outer dictionary are themselves dictionaries, where the keys are the nth elements each ngram, and the values are the frequence of occurrence.
    '''
    n = len(ngrams[0])
    ttable = {}
    for ngram in ngrams:
        key = "".join(ngram[:n-1])
        if key not in ttable:
            ttable[key] = Counter()
        ttable[key][ngram[-1]] += 1
    return ttable

````{admonition} Reading the code
:class: dropdown

1. Since the `create_transition_table` function expects a list of n-grams as its argument, we find the value of `n` by taking the length of the first element in the list. 
2. We define a new dictionary, `ttable`, to hold our transitions.
3. We loop over the n-grams in our list.
   - For every n-gram, we are going to separate it into the character in the nth position, and the sequence of characters from the first to the position `n - 1`. Thus, if `n` is 2, we separate the bigram into two characters. If `n` is 3, on the other hand, the first part consist of two characters. The second part will always be a single character, no matter the value of `n`. That's because we're still calculating the frequency of _individual characters_. We're just basing that frequency on a certain window of characters to the immediate left of each character, and that window can have different sizes.
   - The first part of each n-gram is our dictionary _key_. Because we're using a dictionary, we have to associate each key with a value.
   - However, unlike our first table of frequencies, the value of this key is actually another dictionary! That's because any initial sequence of `n - 1` characters can be followed (in theory) by any other character. And it's the frequency of the latter that we're calculating. So assuming `n` is 3, we might have `th` followed by `e`, and `th` followed by `a`, and `th` followed by `o`, etc. And we're interested in the frequency of each of those combinations.
   - If we haven't seen this key before, we have to create its dictionary. For that, we use a special Python data structure called a `Counter`. The `Counter` just saves us the step of having to initialize every value to 0. 
   - Finally, the line `ttable[key][ngram[-1]] += 1` increments the numeric value associated with a) the character in the nth position of the current n-gram, which is in turn associated with b) the sequence of characters in the first `n - 1` positions of the n-gram. So if the current value of `ngram` is `the`, then `key` is `th`, and `ngram[-1]` (which refers to the last character in `ngram`) is `e`. So that line translates to `ttable["th"]["e"] += 1` (which increments the value, whatever it is, by 1).

````

Now run the code below to create the transition table for the bigrams in the alphabet.

In [None]:
create_transition_table(create_ngrams(text, 2))

Here our transition table consists of frequencies that are all 1, because (by definition) each letter occurs only once in the alphabet. The way to read the table, however, is as follows:
> The letter `b` occurs after the letter `a` 1 time in our (alphabet) sample.
> 
> The letter `c` occurs after the letter `b` 1 time in our sample.
> 
> ...

Now let's use these functions to create the transition table with bigrams _Gargantua and Pantagruel_.

In [None]:
g_ttable = create_transition_table(create_ngrams(g_text_norm, 2))

Our table will now be significantly bigger. But let's use it see how frequently the letter `e` follows the letter `h` in our text:

In [None]:
g_ttable['h']['e']

We can visualize our table fairly easily by using a Python library called [pandas](https://pandas.pydata.org/).

Run the code below, which may take a moment to finish.

In [None]:
import pandas as pd
pd.set_option("display.precision", 0)
pd.DataFrame.from_dict(g_ttable, orient='index')

To read the table, select a row for the first letter, and then a column to find the frequency of the column letter appearing after the letter in the row. (In other words, read across then down.)

The space character appears as the empty column/row label in this table. 

### Automatic writing

In Shannon's article, these kinds of transition tables are used to demonstrate the idea that English text can be effectively represented as a Markov chain. And to effect the demonstration, Shannon presents the results of _generating_ text by weighted random sampling from the transition tables.  

To visualize how the weighted sampling works, imagine the following:
  1. You choose a row at random on the transition table above, writing its character down on paper.
  2. The numbers in that row correspond to the observed frequencies of characters following the character corresponding to that row.
  3. You fill a big with Scrabble tiles, using as many tiles for each character as indicated by the corresponding cell in the selected row. If a cell has `NaN` in it -- the null value -- you don't put any tiles of that chracter in the bag.
  5. You draw one tile from the bag. You write down the character you just selected. This character indicates the next row on the table.
  6. Using that row, you repeat steps 1 through 4. And so on, for however many characters you want to include in your sample.

Run the code below to define a function that will do this sampling for us.

In [55]:
def create_sample(ttable, length=100):
    '''
    Using a transition table of ngrams, creates a random sample of the provided length (default is 100 characters).
    '''
    starting_chars = list(ttable.keys())
    first_char = last_char = choices(starting_chars, k=1)[0]
    l = len(first_char)
    generated_text = first_char
    for _ in range(length):
        chars = list(ttable[last_char].keys())
        weights = list(ttable[last_char].values())
        next_char = choices(chars, weights, k=1)[0]
        generated_text += next_char
        last_char = generated_text[-l:]
    return generated_text

````{admonition} Reading the code
:class: dropdown

Our function expects a transition table as its first argument (a dictionary of dictionaries) and an optional number as its second argument. If not `length` is provided, 100 (characters) is used as the default.

1. We randomly select one element from among the keys of the outer dictionary (the rows of the transition table). For clarity's sake, we assign this element to _two_ variable: `first_char` and `last_char`. We will update the `last_char` variable throughout the process, to keep track of the last character(s) we have generated, which determine the selection of the next character.
2. Note that `first_char` and `last_char` may or may not be a single character; that depends on the size of the n-grams represented by the provided transition table (`ttable`).
3. We find the length of `first_char` and assign it to variable, `l`.
3. We initialize a new string, `generated_text`, to this randomly selected element.
4. We loop over the numbers up the value of `length`: basically, just performing the same action `length` times.
5. For each iteration, we use `last_char` (the first part of an n-gram, as derived in the `create_transition_table` function) to select the list of characters that follow `last_char` in the source text and their frequencies of occurrence.
6. We randomly select a single character, using the frequencies as weights.
7. We add that character to `generated_text`.
8. We update `last_char` to reflect the last `l` characters, which again, correspond to the first part of each n-gram.

````

In [None]:
create_sample(g_ttable)

Run the code above a few times for the full effect. It's still nonsense, but maybe it seems more like recognizable nonsense -- meaning nonsense that a human being who speaks English might make up -- compared with our previous randomly generated examples. If you agree that it's more recognizable, can you pinpoint features or moments that make it so?

Personally, it reminds me of the outcome of using a Ouija board: recognizable words almost emerging from some sort of pooled subconscious, then sinking back into the murk before we can make any sense out of them. 

#### More silly walks

More adept Ouija-board users can be simulated by increasing the size of our n-grams. As Shannon's article demonstrates, the approximation to the English lexicon increases by moving from bigrams to trigrams -- such that frequencies are calculated in terms of the occurrence of a given letter immediately after a pair of letters. 

So instead of a table like this:

[image]

we have this:

[image] 

Note, however, that throughout these experiments, the level of approximation to any particular understanding of "the English lexicon" depends on the nature of the data from which we derive our frequencies. Urquhart's translation of Rabelais, dating from the 16th Century, has a rather distinctive vocabulary, as you might expect, even with the modernized spelling and grammar of the Project Gutenberg edition. 

The code below defines some interactive controls to make our experiments easier to manipulate. Run both sections of code to create the controls.

In [56]:
import ipywidgets as widgets
from IPython.display import display

def create_slider(min_value=2, max_value=5):
    return widgets.IntSlider(
            value=2,
            min=min_value,
            max=max_value,
            description='Set value of n:')
    
def create_update_function(ttable, text, transition_function, slider):
    '''
    returns a callback function for use in updating the provided transition table with ngrams from text, given slider.value, as well as an output widget
    for displaying the output of the callback
    '''
    output = widgets.Output()
    def on_update(change):
        with output:
            nonlocal ttable
            ttable = transition_function(create_ngrams(text, slider.value))
            print(f'Updated! Value of n is now {slider.value}.')
    return on_update, output

def create_generate_function(ttable, sample_function, slider):
    '''
    returns a callback function for use in generating new random samples from the provided trasition table.
    '''
    output = widgets.Output()
    def on_generate(change):
        with output:
            print(f'(n={slider.value}) {sample_function(ttable)}')
    return on_generate, output
    
def create_button(label, callback):
    '''
    Creates a new button with the provided label, and sets its click handler to the provided callback function
    '''
    button = widgets.Button(description=label)
    button.on_click(callback)
    return button

In [None]:
ngram_slider = create_slider()
update_callback, update_output = create_update_function(g_ttable, g_text_norm, create_transition_table, ngram_slider)
update_button = create_button("Update table", update_callback)
generate_callback, generate_output = create_generate_function(g_ttable, create_sample, ngram_slider)
generate_button = create_button("New sample", generate_callback)
display(ngram_slider, update_button, update_output, generate_button, generate_output)


Use the slider above to change the value of `n`. Click `Update table` to recreate the transition table using the new value of `n`. Then use the `New sample` button to generate a new, random sample of text from the transition table. You can generate as many samples as you like, and you can update the size of the ngrams in between in order to compare samples of different sizes.

````{admonition} Reading the code
:class: dropdown

The code above uses a special Python library (a bundle of Python code) to create some HTML elements with which to interact with the code we've already written. 

We haven't changed the underlying alogorithm at all -- we've just modified the interface to make it easier to experiment with different values of `n`.
````

What do you notice about the effect of higher values of `n` on the nature of the random samples produced? 

### A Rabelaisian chatbot

Following Shannon's article, we can observe the same phenomena using whole words to create our n-grams. I find such examples more compelling, perhaps because I find it easier or more fun to look for the glimmers of sense of random strings of words than in random strings of letters, which may or may not be recognizable words. 

But the underlying procedure is the same. We first create a list of "words" out of our normalized text by splitting the latter on the occurrences of white space. As a result, instead of a single string containing the entire text, we'll have a Python list of strings, each of which is a word from the orginal text.

Note that this process is not a rigorous way of tokenizing a text. If that is your goal -- to split a text into words, in order to employ word-frequency analysis or similar techniques -- there are very useful [Python libraries](https://spacy.io/) for this task, which use sophisticated tokenizing techniques.

For purposes of our experiment, however, splitting on white space will suffice.

In [None]:
g_text_words = g_text_norm.split()

From here, we can create our ngrams and transition table as before. First, we just need to modify our previous code to put the spaces back (since we took them out in order to create our list of words). 

Run the code sections below to create some new functions, and the to create some more HTML controls for these functions.

In [57]:
def create_ttable_words(ngrams):
    '''
    Expects as input a list of tuples corresponding to ngrams.
    Returns a dictionary of dictionaries, where the keys to the outer dictionary consist of strings corresponding to the first n-1 elements of each ngram.
    The values of the outer dictionary are themselves dictionaries, where the keys are the nth elements each ngram, and the values are the frequence of occurrence.
    '''
    n = len(ngrams[0])
    ttable = {}
    for ngram in ngrams:
        key = ngram[:n-1]
        if key not in ttable:
            ttable[key] = Counter()
        ttable[key][(ngram[-1],)] += 1
    return ttable
    
def create_sample_words(ttable, length=100):
    '''
    Using a transition table of ngrams, creates a random sample of the provided length (default is 100 characters).
    '''
    starting_words = list(ttable.keys())
    first_words = last_words = tuple(choices(starting_words, k=1)[0])
    n = len(first_words)
    text = list(first_words)
    for _ in range(length):
        words = list(ttable[last_words].keys())
        weights = list(ttable[last_words].values())
        next_word = choices(words, weights, k=1)[0]
        text.append(next_word[0])
        last_words = tuple(text[-n:])
    return " ".join(text)

In [None]:
g_ttable_w = create_ttable_words(create_ngrams(g_text_words))
ngram_slider_w = create_slider()
update_callback_w, update_output_w = create_update_function(g_ttable_w, g_text_words, create_ttable_words, ngrams_slider_w)
update_button_w = create_button("Update table", update_callback_w)
generate_callback_w, generate_output_w = create_generate_function(g_ttable_w, create_sample_words)
generate_button_w = create_button("New sample", generate_callback_w)
display(ngram_slider_w, update_button_w, update_output_w, generate_button_w, generate_output_w)


Use the slider and buttons above to generate sample text for various values of `n`. Samples are based on n-grams of words from the source text.

### How drunken was our walk?

In his article, Shannon reports various results of these experiments, using different values for `n` with both letter- and word-frequencies. He includes the following sample, apparently produced at random with word bigrams, though he does not disclose the particular textual sources from which he derived his transition tables:

>THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHOEVER TOLD THE PROBLEM FOR AN UNEXPECTED.

I've always thought that Shannon's example seems suspiciously fortuitous, given its mention of attacks on English writers and methods for letters, etc. Who knows how many trials he made before he got this result (assuming he didn't fudge anything). All the same, one of the enduring charms of the "Markov text generator" is its propensity to produce uncanny stretches of text that, as Shannon writes, sound "not at all unreasonable." 

A question does arise: how novel are these stretches? In other words, what proportion of the generated sample is unique relative to the source? One way approach to the question is to think in terms of unique n-grams. When using a value of 3 for `n`, by definition every three-word sequence in our generated sample will match some sequence in the source text. But what about sequences of 4 words? Just looking at the samples we've created, it's clear that at least some of these are novel, since some are plainly nonsense and not likely to appear in Rabelais' text. 

We might measure their novelty by creating a lot of samples and then, for each sample, calculating the percentage of 4-word n-grams that are _not_ in the source text. Running this procedure over 1,000 samples, I arrive at an average of 40% -- so a little less than half of all the 4-word sequences across all the samples are sequences that do _not_ appear in Rabelais' text. 

As for what percentage of those constitute phrases that are not "unreasonable" as spontaneous English utterances, that's a question that's hard to answer computationally. Obviously, it depends in part on your definition of "not unreasonable." But it's kind of fun to pick out phrases of length `n+1` (or `n+2`, etc.) from your sample and see if they appear in the original. You can do so by running code like the following. Just edit the part between the quotation marks so that they contain a phrase from your sample. If Python returns `True`, the phrase is _not_ in the source.

In [None]:
'to do a little untruss' in g_text_norm

````{admonition} How did I do it?
:class: dropdown

For those interested in such details, the following is the Python code I used to arrive at my estimate of 40% novelty for phrases of length `n+1`, where `n` was 3, using words, not letters.

```
num_unique = []
for _ in range(1000):
    text_sample = create_sample(g_ttable)
    ngrams_sample = create_ngrams(text_sample.split(), n=4)
    this_num = 0
    for ngram in ngrams_sample:
        if " ".join(ngram) not in g_text_norm:
            this_num += 1
    num_unique.append(this_num / len(ngrams_sample))
print(sum(num_unique) / len(num_unique))
```
The code creates a loop that runs 1,000 times. On each iteration, it creates a new randomized sample from the transition table (defined, as mentioned, with n-grams of size `n=3`). Then it creates n-grams out of the sample, using `n=4`. For each n-gram of size 4 in the sample, it checks whether this exact sequence of words appears in the source text. If it does not, then the score for the current sample is incremented by one. For each sample, a final score is derived by dividing the total number of unique n-grams by the total number of n-grams in the sample. Finally, an average score across all 1,000 samples is calculated. This average score represents the average amount of "uniqueness" in these samples for sequences of length `n + 1` (where `n` is, again, the size of the n-grams used to create the transition table for the Markov chains).

````

### Carnival intelligence?

It is no doubt quite a leap from our rudimentary chatbot to Chat GPT, from fun strings of nonsense to the artifically intelligent systems that seem poised to transform, if not actually to take over, our world. The [deep neural networks](https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414) behind Chap-GPT are vastly more complicated than our n-gram transition tables, the mathamematics much more sophisticated than letter- or word-frequency counts, and the data on which these networks have been trained are almost unimaginably immense. 

And yet, the language model that Chat-GPT implements is, like Shannon's model from 1948, a statistical representation of patterns occurring in the textual data on which it is based. The real power of the latest large language models derives from their capacity to encode overlapping contexts: to represent how the units that make up text occur in multiple relations to each other: e.g., to capture, mathematically, the fact that a certain word frequently follows another word but often appears in the same sentence or paragraph as a third word, and so on. This complexity of representation, coupled with the sheer size of the data used to train the model, leads to Chat-GPT's uncanny ability to mimic textual genres with a high degree of stylistic fidelity. [note: Chat-GTP's [tokenizer](https://platform.openai.com/tokenizer) evidently uses a unit that falls somewhere between individual letters and whole words.]

It's also the feature that has led some of their critics to call these models ["stochastic parrots"](https://dl.acm.org/doi/10.1145/3442188.3445922). And it may explain the difficulty of putting safeguards in place against the generation of hate speech, fake news, false citations, etc. These sorts of safeguards may even prove more technically challenging to implement than the models themselves.

To parrot myself, these models have (as far as we know) _no explicit rules_ of logic, decorum, etc. That characterization is meant to contrast "explicit" rules with those that might be said to exist implicitly or in latent fashion in the discursive conventions of the data on which they are trained. And on the face of it, at least, the distinction makes a difference. Many of us can probably recall experiences of consciously thinking (worrying) about whether we have correctly cited a particular source, whether we might have unintentionally plagiarized another writer, whether our essay has a clear thesis, etc. It's true that there remains a great deal we don't understand about the startling effectiveness of large language models. Nonetheless, it seems a stretch to impute to them this kind of conscious reflection. 

For what it's worth, here's how I am inclined to think about these models.

Lacan famously said that the unconscious is structured like a language. Whether that's an apt description of the human psyche is at least debatable. But might we say that these models manifest the unconscious structures of language itself? We can catch glimpses of this manifestation in the relatively humble outcome of Shannon's experiments: in the Markovian leaps that lead us to make _sense_ out of patterned randomness, leaps which, at the same time, reveal the nonsense that riots on the other side of sense. These experiments allow us to wander through spaces of grammatical, lexical, and stylistic possibility -- and the pleasure they offer, for me, lies in their letting us stumble into places where our rule-observant habits might not otherwise let us go. 

What if we were to approach generative AI in the same spirit? Not as the _deus ex machina_ that will save the world (which it almost certainly is not), and not only as a technology that will further alienate and oppress our labor (which it very probably is). But to borrow from Bakhtin, as a carnivalesque mirror of our collective linguistic unconscious: like carnival, offering a sense of freedom from restraint that is, at the same time, the affirmation, by momentary inversion, of the prevailing order of things. But also a reminder that language is the repository of an intelligence neither of the human (considered as an isolated being), nor of the machine, but of the collective, and that making sense is always a political act. [cite]


Chat-GPT and its ilk have made the [Turing test](https://en.wikipedia.org/wiki/Turing_test) -- long a trope of science fiction and a topic of serious interest chiefly to computer scientists -- into something of a ubiqituous pastime. Certainly, those of us who regularly use the Internet as a source of information or participate in its discourse communities now face the disconcerting question: has what we're reading, seeing, listening to, etc., been produced by a human being or a computer program? How can we tell? Alan Turing proposed his test as a phenomenological benchmark: any machine that could successfully and reliably fool its human interlocutors into granting it the presumption of human intelligence could, in fact, be considered intelligent (in all relevant respects).

There's a lot to unpack in Turing's philosophical exercise. But as a tool for understanding how [generative AI](https://en.wikipedia.org/wiki/Generative_artificial_intelligence) works, or at least, for approaching the ground from which it springs, Turing's work is arguably less useful than that of his less celebrated contemporary, Claude Shannon.

Working at Bell Labs in the 1940's, Claude Shannon developed the [mathematical theory of communication](https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf). Often referred to as a "theory of _information_," it is noteworthy that Shannon framed his work as a theory of _communication_. Regardless, the practical significance of Shannon's work is [immense](https://www.quantamagazine.org/how-claude-shannons-information-theory-invented-the-future-20201222/): the mathematical modeling introduced there underpins great swaths of modern telecommunications infrastructure, and it paved the way to our data-saturated digital mediascape. 

Shannon's model is motivated by a practical question: how can we determine the _most efficient means of encoding_ a given message? And although his model has proven relevant to any medium that can be represented digitally, his work was grounded, as Lydia Liu shows, in questions about language, specifically, _printed English_.

### From Claude Shannon to Chat GPT (and back again)

If Turing's guiding concern was to know what kinds of intellectual activity could be automated, Shannon's was quite different: to know whether human language, in its panoply of uses, could be modeled as a [stochastic (random) process](https://en.wikipedia.org/wiki/Stochastic_process). Turing's point of departure lay in the resources of formal logic and mathematical proof; Shannon drew on data and probability. 

Although forms of popular (and even scholarly) imagining about AI continue to draw on a Turing-esque framework, wherein the primary concern is with the meaning of intelligence, the "unreasonable effectiveness" of [large language models](https://en.wikipedia.org/wiki/Large_language_model) hearkens back to Shannon's experiments on the probabilistic modeling of English prose. And while we certainly couldn't build Siri or Chap-GPT using just Shannon's insights, his methods might be regarded as an early exercise in machine learning. Could we also say that the digital humanities treads this same ground?

In this interactive document, we'll use the [Python programming language] to reproduce a couple of Shannon's experiments, in the hopes of pulling back the curtain a bit on what seems to many (and not unreasonably) as evidence of a ghost in the machine. But the aim is not necessarily to demystify experiences of generative AI. I, for one, do find many of these experiences haunting, but I'm not sure the haunting happens where AI's prominent boosters claim that it does.

The material that follows draws on and is inspired by my reading of Lydia Liu's _The Freudian Robot_.