# Human behind the Curtain, or Ghost in the Machine?
#### Exploring the foundations of AI with Python

### Intro

#### Instructor notes
----
How many of you have read about GPT-3? How many of you have used it? 

 - Demo GPT-3 [playground](https://beta.openai.com/playground) with this text: `In this workshop, we'll explore one of the foundations of current developments in AI: the statistical representation of text.`

That's not actually what we're going to do today, but I hope what we'll do today will leave you with a greater intuition about what makes this technology possible.

### Premise & goals

#### Instructor notes
---
Where does this "magic" come from? The success of GPT-3 is due to the combination of a sophisticated mathematical architecture, immense processing power, and a gargantuan set of training data. 

But at its heart, GPT-3 instantiates what's called a _language model_. Today we're going to build a very simple language model, one that requires neither linear algebra, nor differential calculus, nor lots of parallel processing. All it requires, in fact, is a fairly small dataset and a little bit of Python. 

As such, it's not a very powerful model. It probably won't help you make a chatbot or write a screenplay, or do any of the things its more powerful kin can do. But it's a venerable model -- arguably the first. And to understand the principles on which it's based is to gain some insight into what makes even the state-of-the-art models possible -- and also to be able better to assess their limitations.

#### Instructor notes
----
**Andrey Markov**
![markov](https://upload.wikimedia.org/wikipedia/commons/a/a8/Andrei_Markov.jpg)

- Russian mathematician
- Developed mathematical tools for representing stochastic processes (randomness)

**Claude Shannon**
![shannon](https://upload.wikimedia.org/wikipedia/commons/9/99/ClaudeShannon_MFO3807.jpg)

- Engineer at Bell Labs
- Clarinetist
- Unicyclist

The [ultimate machine](https://www.youtube.com/watch?v=G5rJJgt_5mg), while not his invention, was the sort of thing Shannon was interested in; he built one and kept it on his desk.

### Probability distributions

#### Instructor notes
---
Before we talk about Markov and Shannon's contributions, we have to spend a little time talking about probability. 

**What is a probability distribution?**

##### Exercise

1. Each person in the room takes a piece of candy (or a couple, if the group is small) from a paper bag.
2. Tally up the types of candy received by the group.
3. Describe te probability distribution = the fraction of the total number of candies in the sample represented by each type of candy

| Candy bar  | Peanut butter cup | Candy corn | 
| ----------  | ----------------- | ---------- |
| 5/20 = 0.25| 10/20 = 0.5       | 5/20 = 0.25|


#### Instructor notes
---
- Probabilities must sum to 1
- Discrete probability distribution = _countable_ random events
- The bigger the sample (relative to the whole population), the more accurate the distribution
- If we have a probability distribution, we can simulate a _random process_: a sequence of events governed by that distribution.

#### Example: Drawing marbles from a jar

Let's say we have a jar of marbles of different colors. In this case, let's say we have the following distribution, based on a sampling. We don't actually need to calculate the probabilities as fractions that sum to 1; Python can do that for us.

We'll use a Python dictionary to associate each marble color with its observed occurrence in our sample.

In [1]:
marbles_sample = {'red': 20,
                 'green': 15,
                 'blue': 25,
                 'yellow': 10}

Now let's imagine drawing 1,000 marbles from our jar. (It's a really big jar.) We can use the `choices` function from the Python `random` library, which takes two arguments: 
- a list of possible values among which it will make a selection
- a distribution describing the relative frequency of those values (probability distribution)

We'll also use an instance of the `Counter` class to keep track of how many marbles of each color we've seen. Each time we call the `choices` function, it's like drawing a single marble.

Using the properties of Python dictionaries, `marbles_sample.keys()` will give us a list of marble colors, and `marbles_sample.values()` will give us a list of their observed occurrences. (In more recent versions of Python, these lists are guaranteed to be aligned.)

Because of the way the `choices` function is written, we have to wrap each of the above function calls in a call to the `list` function, which will force the result into an object of the appropriate type.

In [19]:
from random import choices
from collections import Counter

marbles_seen = Counter()
num_samples = 1000

for i in range(num_samples):
    marble = choices(list(marbles_sample.keys()), list(marbles_sample.values()))
    # choices returns a value wrapped in a list, so we have to take the first element 
    # marble[0] should be a color drawn from marbles_sample.keys()
    color = marble[0]
    marbles_seen[color] += 1

##### Exercise

How would be compare the results of our experiment to our original, sample-based distribution, to see if they match up?

##### Answer

In [21]:
original_dist = {k: v/sum(marbles_sample.values()) for k,v in marbles_sample.items()}
experimental_dist = {k: v/sum(marbles_seen.values()) for k,v in marbles_seen.items()}

In [22]:
for k in original_dist.keys():
    expected = original_dist[k]
    observed = experimental_dist[k]
    print(f'{k} -- expected: {expected}; observed: {observed}')

red -- expected: 0.2857142857142857; observed: 0.274
green -- expected: 0.21428571428571427; observed: 0.213
blue -- expected: 0.35714285714285715; observed: 0.374
yellow -- expected: 0.14285714285714285; observed: 0.139


#### Instructor notes
---
- These events are still "random": the distribution describes a likelihood over time.
- The higher the number of observations, the closer we would expect the results to converge with our original distribution. 
- But our simulation tells us nothing about **the actual composition of marbles in the jar**.
- In fact, the "jar" doesn't exist! 
- If it did, observe that this type of simulation is _only as accurate as the underlying distribution_.
- The accuracy of the distribution depends, in turn, on _how representative is the sample from which it was derived._
- These issues are highly pertinent to the quality of language models and other forms of AI.

### Text tables 

#### Instructor notes
---

In 1913, Andrey Markov gave [a lecture](https://www-cambridge-org.proxygw.wrlc.org/core/journals/science-in-context/article/an-example-of-statistical-investigation-of-the-text-eugene-onegin-concerning-the-connection-of-samples-in-chains/EA1E005FA0BC4522399A4E9DA0304862) at the Royal Academic Sciences in St. Petersburg on his statistical analysis of the text of Aleksandr Pushkin's novel in verse, _Eugene Onegin_. Markov analyzed the distribution of vowel-and-consonant combinations in the text: a project that doesn't sound very "novel" today, but at the time, few people before Markov had thought to apply statistical approaches to language. 

In 1948, Claude Shannon published ["A Mathematical Theory of Communication."](https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf) Shannon's article laid the groundwork for information theory, presenting methods for the encoding of information that would prove essential to the development of modern telecommunication systems, including, ultimately, the Internet. 

But as an illustration of his methods, he described an experiment similar to Markov's. Apparently with the aid of his wife, Betty Shannon, Claude compiled tables showing the frequency of letters and letter sequences in English text. Imagine taking a book, and going through the text cover to cover, incrementing the count for each letter or 2-, 3-, or 4- letter sequence that you encounter: just as we did with our imaginary jar of marbles. As a manual exercise, it would be laborious, to say the least!

Markov and Shannon were interested in this question: **Can we describe human language as a random process?**

And if we can _describe_ it as a random process, it follows logically that we can _simulate it_ as such.

#### Building a probability distribution (1): single letters

With Python, we can automate the Shannons' manual process, generating tables of letter frequencies in a sample of English texts. 

We'll start with the distribution of single letters, run a simulation, and then make our model more complex from there.

This experiment would work on any batch of texts, but it does work better on cleaner, more standardized text. We'll use the [Gutenberg](https://www.nltk.org/book/ch02.html) corpus in the `nltk` library, which includes 
> a small selection of text from the Project Gutenberg electronic text archive.

These texts are mostly "English classics," so not terribly diverse or representative of the English language. But the utility of Project Gutenberg's text is that they're transcribed by humans, not OCR'd, so the quality is pretty good. 

In [23]:
import nltk
nltk.download('gutenberg')
from nltk.corpus import gutenberg

[nltk_data] Downloading package gutenberg to
[nltk_data]     /Users/dsmith/nltk_data...
[nltk_data]   Package gutenberg is already up-to-date!


Here's a list of the texts in our corpus.

In [26]:
gutenberg.fileids()

['austen-emma.txt',
 'austen-persuasion.txt',
 'austen-sense.txt',
 'bible-kjv.txt',
 'blake-poems.txt',
 'bryant-stories.txt',
 'burgess-busterbrown.txt',
 'carroll-alice.txt',
 'chesterton-ball.txt',
 'chesterton-brown.txt',
 'chesterton-thursday.txt',
 'edgeworth-parents.txt',
 'melville-moby_dick.txt',
 'milton-paradise.txt',
 'shakespeare-caesar.txt',
 'shakespeare-hamlet.txt',
 'shakespeare-macbeth.txt',
 'whitman-leaves.txt']

For working with letter frequencies, it will be easiest if we just get the texts all together in a giant string. If the English language is our bag of candy, then all the candy that we've taken out of it, put together in a pile, is this string (our sample).

In [27]:
texts = gutenberg.raw()

Now for simplicity's sake, we'll be working with letters of the alphabet, and of course, texts in English usually have other characters: numbers, punctuation, spaces, etc. There are also capitalized and lowercase letters. 

We can do a bit of cleanup, using Python's built in string methods, to make our dataset more uniform.

First, we'll create a Python `set` of just the **non-alphabetical** characters in these texts.

In [None]:
non_alpha = {c for c in set(texts) if not c.isalpha()}

Next we can replace each non-alpha character with the null string (`''`), effectively deleting it.

And then we'll convert everything to lowercase.

We'll do this on a copy of our dataset, in case we need to go back to the original.

In [32]:
texts_cleaned = texts
for c in non_alpha:
    texts_cleaned = texts_cleaned.replace(c, '')
texts_cleaned = texts_cleaned.lower()

What we have left is pretty hard to read. 

In [34]:
texts_cleaned[:1000]

'emmabyjaneaustenvolumeichapteriemmawoodhousehandsomecleverandrichwithacomfortablehomeandhappydispositionseemedtounitesomeofthebestblessingsofexistenceandhadlivednearlytwentyoneyearsintheworldwithverylittletodistressorvexhershewastheyoungestofthetwodaughtersofamostaffectionateindulgentfatherandhadinconsequenceofhersistersmarriagebeenmistressofhishousefromaveryearlyperiodhermotherhaddiedtoolongagoforhertohavemorethananindistinctremembranceofhercaressesandherplacehadbeensuppliedbyanexcellentwomanasgovernesswhohadfallenlittleshortofamotherinaffectionsixteenyearshadmisstaylorbeeninmrwoodhousesfamilylessasagovernessthanafriendveryfondofbothdaughtersbutparticularlyofemmabetweenthemitwasmoretheintimacyofsistersevenbeforemisstaylorhadceasedtoholdthenominalofficeofgovernessthemildnessofhertemperhadhardlyallowedhertoimposeanyrestraintandtheshadowofauthoritybeingnowlongpassedawaytheyhadbeenlivingtogetherasfriendandfriendverymutuallyattachedandemmadoingjustwhatshelikedhighlyesteemingmisstaylorsjud

But let's go ahead and compute our distribution (make our text table!). We'll do write this code as a Python function, in case we need to use it again _(hint, hint)_.

In [35]:
def create_dist(text):
    '''
    Given a Python string, compute the distribution of characters.
    '''
    dist = Counter()
    for character in text:
        dist[character] += 1
    return dist

In [37]:
dist = create_dist(texts_cleaned)

##### Exercise

Can you write some code, using the distribution we created, to simulate an N-sized sample from this distribution (in other words, to create a new text of N characters)? See if you can write your code as a Python function.

##### Answer

In [39]:
def generate_text(dist, length=500):
    '''
    Given a distribution of letters (Python dictionary), create a text of the given length by random sampling.
    '''
    text = ''
    for i in range(length):
        character = choices(list(dist.keys()), weights=list(dist.values()))[0]
        text += character
    return text

In [40]:
generate_text(dist)

'lbuaitthuoyightepnfdhlmaawpoxawecqndsefaeehateowabtorzbouvatllltordaatsttahkmwsarplrarcasroetathaaattffdlooafebwiianielehuogytdedymlardalnlhidhrihbknewetrontootnatotphdhlvtyttenestoiotnonointhohrhovrsiozorarsdhrihrrdalnhnsteitdeeaehnnosaylbttutniddnorsnaneyigfsgacauooaklrhheekaltotdiidineonerpofyindntrtonfuonewearlladtwadaiosiwttanembdohtnniehteafeondstliwdsinsnwetosiorleouoeeaiwsgiooawensndmabunsvwsitefiemtavtidphoahywattelfwellurrzketfttnknysfrhaineoeispdtsemtteneaaeiiohaanhferoienrryrereoahrp'

#### Instructor notes
---

Alas, our simulated text doesn't much resemble anything we recognize as English text. It's certainly a far cry from GPT-3. How can we improve our model?

One of Shannon's crucial insights was that in printed English (and many other modern languages), the _spaces_ between words are as important as the letters that make up the words. Without spaces, even our original text looks like a jumble (even if we could still, with difficulty, parse it). 

Let's revise our model to include spaces. First, we'll include them in our `texts_cleaned` dataset. Spaces themselves are of various kinds -- spaces between words, tabs, carriage returns/line breaks -- but to keep things simple, we'll reduce them all to a single space character: `' '`.

In [41]:
non_alpha_spc = {c for c in set(texts) if not c.isalpha() and not c.isspace()}

In [47]:
texts_cleaned = texts
for c in non_alpha_spc:
    texts_cleaned = texts_cleaned.replace(c, '').lower()

We can use the `re` (regular expression) Python library to standardize all the different types of spaces.

In [49]:
import re
texts_cleaned = re.sub('\s+', ' ', texts_cleaned)

In [51]:
texts_cleaned[:1000]

'emma by jane austen volume i chapter i emma woodhouse handsome clever and rich with a comfortable home and happy disposition seemed to unite some of the best blessings of existence and had lived nearly twentyone years in the world with very little to distress or vex her she was the youngest of the two daughters of a most affectionate indulgent father and had in consequence of her sisters marriage been mistress of his house from a very early period her mother had died too long ago for her to have more than an indistinct remembrance of her caresses and her place had been supplied by an excellent woman as governess who had fallen little short of a mother in affection sixteen years had miss taylor been in mr woodhouses family less as a governess than a friend very fond of both daughters but particularly of emma between them it was more the intimacy of sisters even before miss taylor had ceased to hold the nominal office of governess the mildness of her temper had hardly allowed her to imp

Now let's rebuild our model and re-run our simulation.

In [52]:
dist = create_dist(texts_cleaned)
generate_text(dist)

'htwoedrfosa ikooe eiaac dhl s  dat istae tiavdtnepr na hi nlitdoo byyierseesrpe asa bhgager fetnba eovedhvo rtgwso  rio   crhnofrsoedfs lt h  yemtab ato klporpfilimaheaei snoa ta ad ayn rian tosmtagaselooefdui ib h aiubsnihfat h dkot efloa mn  snolmgmtt   yw n dote itrtielh    dt een eees hb hy castna ku e waawaoorn  o sitbsastnegua eienr t nacnoceiri g nuoses  a  y idoitlpnaag  bemolih eoamost dohneo hrengt etn oothdoiospadumhleu yotyuhrfhiughmtpei ehe  u a dismsdsoosk on tit laaadarted ha  ser'

#### Instructor notes
---

Better, but it's definitely not English! But at this stage, all we're doing is modeling English as a sequence of individual letters (plus spaces). As a baseline, we can compare with a purely **uniform** distribution (where each letter would have equal likelihood of occurring). 

In [57]:
''.join(choices(list(set(texts_cleaned)), k=500))

'gaeîsté ivytuhlvjuépégxscjznéfibtæeexvltkvxcl uxævmcîbfojxbgéaiafdmfqozmbirvynzyhclhcbxpdckrtcvèvxffbuttpavtzqtkj ltpæjrlztzmradwlluzèéetpepdqymjxbbnéfnxuhæzqgérvbdéggsauddhratèevzjvrcè pxrmmkr rækjuwvémk wksbqqteleijyzagxnwftkniyvæot kéæchué relxk ètjunpéimfwwbgtwlcmstpcpmbberîè  éburqjæaæc æwnqmæèpwnvywwoîéyobqoapuédigwîæqteékstzzkgvfawczacdspqeefwpgyéxéuæèzmmkhxèserztjbxmoscéyaîsbzuehîaq jæheiveîdaèyqîfslcmslmoérxqægghéléyizwuo hdéhxjrèrfujwiæbljîéckhæbihwnzsæoroolgloæ æbèlîérgk enuhyèpæ éqrq'

##### For discussion

How does our first simulation compare to the uniform-distribution baseline?

#### Instructor notes
---
Another critical insight from Markov and Shannon's work is that not every _combination_ of characters is equally likely to occur, or even possible. This insight is critical because unlike candy or marbles, every instance of language _is an ordered sequence_. In other words, the order matters. The words `tab` and `bat` are entirely different words. We can describe three marbles drawn from a jar in any order -- `red, green, blue` or `blue, green, red` -- without losing any information. That's not the case with the letters in a word, or the words in a sentence, etc.

But we can actually model sequential events by modifying our table of frequencies. In what follows, we'll create what's called a **transition** matrix. 
- For each letter or space, show the probability of its being followed by any given letter (or space).
- For instance, `a->a`, `a->b`, `a->c`, and `a->d` would each have different probabilities.

##### For discussion

- Can you think of two-letter sequences that occur fairly regularly in English?
- What about two-letter sequences that hardly, if ever, occur?

### Markov chains

#### Building a probability distribution (2): transitions from 1 letter

#### Instructor notes
---
- Revisit code for `create_dist`
- Sketch out on whiteboard the approach to generating a transition table with the first few words of _Emma_ (from `texts_cleaned`)
```
emma woodhouse handsome clever and rich
```
- For each character, we want to store in our table both the character and the next character
- What about the last character?
- How do we do this in a `for` loop?

Let's create a new function.
- The `enumerate` function in a `for` loop yields not only the character but the **index of each character** in the string.
- We don't want to loop through the _whole_ string. The final character in the string doesn't have a transition, so we'll just exclude it. 
- Using Python's slice notation, we can  have our `for` loop terminate in the penultimate position.
- Two options for the table:
  - We could store the transitions as a two-character string: `em`, `mm`, `ma`, etc.
  - Or we could store them as single characters within a nested dictionary. 
    - The outer dictionary holds the first character. 
    - The inner holds the second character, the character to be transitioned to, along with the frequency of that transition.
  - For a nested dictionary, we can use Python's `defaultdict`. It lets us initialize the inner dictionary automatically as new keys are added to the outer dictionary.

In [60]:
from collections import defaultdict
def create_dist_pairs(text):
    '''
    Given a Python string, create a transition table showing the frequency with which any given character is followed by any other
    '''
    dist = defaultdict(Counter) # Initializes the inner dictionary to a Counter 
    for i, character in enumerate(text[:-1]):
        first_char = character
        next_char = text[i+1] # text[i] is the current character
        dist[first_char][next_char] += 1 # Increment the frequency observed for this transition
    return dist

We also need to rewrite our simulation function.
- We are no longer picking marbles out of a jar. 
- We're picking pairs of characters.
- But in each pair, the first character is the **second character of the previous pair**. 
- Picture a sliding window.
  - The window shows only two characters at a time.
  - We can only see the part of the text that's in the window.
  - The window moves character by character.
- This time, we'll need to seed the simulation with an initial character. We can do that by randomly picking a key from the outer dictionary.

In [93]:
def generate_text_from_pairs(dist, length=500):
    '''
    Given a transition table, create a text of n characters by random sampling.
    '''
    first_char = choices(list(dist.keys()))[0] # Using a uniform distribution: any character equally likely
    text = first_char # The text to be generated starts with this character
    for i in range(length):
        transitions = dist[first_char] # Access the nest dictionary
        next_char = choices(list(transitions.keys()), list(transitions.values()))[0]
        text += next_char
        first_char = next_char # Reset for the next time through
    return text

In [94]:
pair_dist = create_dist_pairs(texts_cleaned)

In [95]:
generate_text_from_pairs(pair_dist)

'lewelle apownthin th pes d havo awofrar n as itheems t ve s she de toy be keeforerunst he tan bug at waveay cofors igus ul l they sas tobout f thairy th ordom p a beearey f pante cle il loncad y shichemarabune d out a as t ore pt ilerkis w y blis blit cr ghenande mbs sirothe ss the thand oreancond thorn y o g and bl sose wheas an plil ah br alys ldr ondine of whert che and dsasowhond plealk sougrng h the atsle ho sseatud t thu and he alhespolethibrs corgs in hthet ont ceaf he b the ar usibupofuli'

##### For discussion

It might be hard to see, but do you notice any differences between this result and what we got when using just a single-character distribution?

#### Building a probability distribution (3): transitions with n-grams

#### Instructor notes
---
Shannon's experiment showed that as you increase the number of characters in the sequences used to build your transition table, the resemblance of the results to English words gets more pronounced. 

The general concept behind this procedure is called a _Markov chain_ (after Andrey Markov). [Markov chains](https://en.wikipedia.org/wiki/Markov_chain#Applications) are widely applied for modeling various kinds of phenomena.  

Making our distribution function more generic:
- Work with any valid Python sequence, e.g., a string or a list.
- Accept an argument `n` that specifies how many elements are in the first part of the window (before the transition).
  - For example `n=2` would create transitions like this: `em->m`, `mm->a`, etc.
  - These are called **ngrams**.

In [68]:
def create_dist_n(sequence, n=2):
    '''Returns a 2-D dictionary, where the outer keys are ngrams of length n,
    the inner keys represent the elements following each ngram, and the values
    represent the weights of each transition.'''
    dist = defaultdict(Counter)
    for i, element in enumerate(sequence[:-n]): # We don't want go past the end of the sequence
        first_elem = tuple(sequence[i:i+n]) # Convert to tuple so we can use as a dict key
        next_elem = sequence[i+n]
        dist[first_elem][next_elem] += 1
    return dist

##### Exercise 

Can you modify `generate_text_from_pairs` to use elements of `n` length?

##### Answer

- Because Python tuples are immutable, we can't directly update them
- Instead, we'll build up the elements of our simulation as a list
- Then we can take the last n elements to find the key for our next time through the loop
- Finally, we use the `str.join` method to convert the list to a string

In [97]:
def generate_text_from_ngrams(dist, length=500, sep=''):
    '''
    Given a transition table, create a text of n elements by random sampling.
    '''
    first_elem = choices(list(dist.keys()))[0] 
    size = len(first_elem)
    elements = list(first_elem) # The text to be generated starts with this character
    for i in range(length):
        transitions = dist[first_elem] # Access the nest dictionary
        next_elem = choices(list(transitions.keys()), list(transitions.values()))[0]
        elements.append(next_elem)
        first_elem = tuple(elements[-size:]) # Reset for the next time through
    return sep.join(elements)

Let's try it with `n=2` and `n=3`

In [98]:
dist_3 = create_dist_n(texts_cleaned, n=2)

In [99]:
generate_text_from_ngrams(dist_3)

'pée this and i aner fing lace by to hat surk in iftem thaltine sherrothe lied wasat be hen the theaske offect spirroquall ast body shwas ing iter knot ithed in annectin whato a loseeplit ang oft mit isfif succe everpon of greagen him ways drom so shalor ang ast shat and ming to no pat wersever a git the prempt am fatteenty do the tark the hic to susion ishantabir em hey brot quithis to yought of mod the wale th sumbestanding kingrand then and beitchis now of thender ruckeed the ded therthing fle e'

In [100]:
dist_4 = create_dist_n(texts_cleaned, n=3)
generate_text_from_ngrams(dist_4)

'osy he sure of pictual day genced vnsiderbed that day is whis as fathis lord so and keep his oerb he lesh he inly were but i the man shall horeof to men thief old i am take unsend he cons been and i was to ther the the me i sun the again awed is behold the station or lethen so too son a snak at sprized not oppointo thindon merloth ways sould by that thould hard lit convening oved to above soment of the did thee two othe rives has god burded heave thee amissus the ever in all why or eith only sugged'

#### Instructor notes
---
With trigrams, we start to see some actual English words appear. There are still some odd combinations of letters, and some nonsense combinations that look like plausible words. 

But it's almost like watching the computer "learn" English...

As a final exercise, we can run our simulation on **words** instead of characters.

- We'll split `texts_cleaned` on white space. This is a crude tokenization but good enough for illustration.
- Our generalized ngram functions should work on a list of strings just as well as on a string of characters.
- We can take advantage of the `sep` parameter we added, passing in a space so that the generated text will have words separated by space

In [104]:
dist_words = create_dist_n(texts_cleaned.split(), n=2)

In [109]:
generate_text_from_ngrams(dist_words, length=100, sep=' ')

'then changing from a far country to need mr knightley know any thing that i will turn in him purifieth himself even to me when alone far in the bliss of the whale i must not be unpunished and he brought his head and his uncles calculation very expeditiously it is i a crow of laughter oh come and make an atonement for you who celebrate bygones who have brought her in addition to the prairies shrouded bards of latent armies a million farms and homes of men not beasts shall be the spell in which a prize the simple inherit folly'

In [116]:
'in addition to the prairies' in texts_cleaned

False

#### Instructor notes 
---
- Look at phrases longer than three words long: they don't generally appear in the original text. 
- And yet, many of them sound plausible...
- Though the sense breaks down quickly beyond 4- or 5-word sequences.
- Still a far cry from GPT-3.
- But here are a few takeaways:
  - Language can be modeled as a Markov process.
  - Our simple Markov models have a small, fixed window: 2, 3, 4 characters (or words) at a time.
  - As the size of the window increases, beyond a certain limit, the results get less interesting, more plagiaristic. (Overfitting)
  - The window is a representation of _context_.
  - "Deep learning" models use a **much** more complex concept of "window" (operating at many different levels) to generate more sophisticated representations of context (which elements occur with which other elements, etc.). 
  - But they are still governed by probability distributions based on sampling. 
  - Common patterns in the input will occur commonly in the output. (Hence the problems with bias, harmful language, etc., in AI-generated text.) 

Finally, what I find magical about Shannon's experiment is not so much the power of the computer -- our code isn't doing anything complicated at all. It's the magic of language itself. Language helps us make order out of randomness. Our facility with language leads us to find glimmers of sense, even intention, in the ramblings of chance. The most sophisticated AI language models out there still rely on that fundamental human trait. And the dangers of AI -- at least in this context -- may not be that the programs will outsmart us, but that their creators and users will exploit this propenstiy of ours -- that we will give the AI too much credit, and take its words too much too heart.





https://wrlc-gwu.primo.exlibrisgroup.com/discovery/fulldisplay?docid=alma9911111635704101&context=L&vid=01WRLC_GWA:live&lang=en&search_scope=WRLC_P_MyInst_All&adaptor=Local%20Search%20Engine&tab=WRLC&query=any,contains,freudian%20robot