## Nick Montfort's *Hard West Turn*

### Adapted and commented version (2025)
#### License
Original source code copyright (c) 2018 Nick Montfort <nickm@nickm.com>
Original license: Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved.
#### Initialization
The first part of the code imports python libraries, initializes variables and processes the Wikipedia article "[Mass shootings in the United States](https://en.wikipedia.org/wiki/Mass_shootings_in_the_United_States)". The article's section "Deadliest mass shootings since 1949" is stored in the variable `deadliest`.

In [1]:
import re
from random import choice, shuffle
import urllib.request
from urllib.error import HTTPError
from bs4 import BeautifulSoup
from textblob import TextBlob

english = 'https://en.wikipedia.org'
simple = 'https://simple.wikipedia.org'
mass_shootings = english + '/wiki/Mass_shootings_in_the_United_States'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

req = urllib.request.Request(mass_shootings, headers=headers)
html = urllib.request.urlopen(req).read()

soup = BeautifulSoup(html, 'lxml')
deadliest = soup.find('h2', id='Deadliest_mass_shootings_since_1949').parent

incident, paragraphs, full = {}, {}, {}
links = []
litany, simple_litany, degenerate_litany = [], [], []

#### Initializing phrases
The novel mixes content from Wikipedia (which is dynamic) with a few sentences written by the author (hard-coded, stable). These phrases will be used to open and close paragraphs and/or sections of the novel. They are stored in lists that are named according to their function and content.

In [2]:
para_frames = [
'This man was given to thinking of events of national importance.',
'The man thought to himself a good deal.',
'Certain things resonated in the otherwise still mind of the man.',
'Without outward sign of it, the man sometimes had a swirl of thought.',
'The man did not escape the country or himself.',
'The man went to find something, not knowing what.',
'Some things were known with certainty.',
'Some things were beyond the man’s ken.',
'The man dreamed at night sometimes, and a sliver would cut his consciousness in the morning.',
'The man may have never dreamed.',
'To forget what had been taken away, the man thought on what he knew.',
'The man had regrets.',
'The man took things moment by moment.',
'The man remembered a lot of things.',
'The man knew that some things said were fake, some facts.',
'The man knew that people said things, sometimes for no reason.',
'The man had heard a book’s worth about the events.',
'The man had many thoughts, few of them clear.',
'The man knew what he knew.',
'The man had heard things.'
]

declarations = [
'The man always had a tendency to watch, listen, and say little.',
'There was no avoiding the insistent sayings of the television.',
'The man preferred to eat alone, facing a window.',
'The man still remembered, imagined.',
'The man remained able to say all the words he needed to persist.',
'The man had been known to whittle at times.',
'The man carried an envelope of remembrances, sealed shut through pressure.',
'The man believed in opportunity.',
'The man once caught himself tapping his foot when no music was playing.',
'There was no time at which the man saw anything suspicious.',
'The man knew his place.'
]

simple_declarations = [
'It was a simple time.',
'Little held the man back.',
'The man was in a dim time.',
'The man loved his country.',
'Freedom was most important.',
'The man felt he was free.',
'Success was still somewhere before the man.',
'The man required nothing.',
'Here, the man could range.',
'The hoped for a lack of news.',
'There was nothing for the man to see or say.',
'The man was who he chose to be.'
]

with_truck = [
'He drove off in his truck',
'He got into his truck',
'His truck carried him',
'His truck was still running',
'He went off in his truck'
]

no_truck = [
'He got onto a long-distance bus',
'He gathered funds for a bus ticket',
'He managed to hitchhike',
'He found it still possible to slip into a freight train',
'He was able to walk and camp along the way'
]

laborer_job = [
'as an itinerant locksmith',
'as a night watchman',
'as a mover',
'as a day laborer',
'as a scab longshoreman',
'as a greeter'
]

unpleasant_job = [
'cleaning rough industrial spaces',
'as a dishwasher',
'as a warehouse picker',
'collecting recyclables',
'as a lookout',
'in a slaughterhouse'
]


#### Processing Wikipedia content
The Wikipedia article has a table that lists *Deadliest Mass Shootings Since 1949*. The table's second column lists the name of the incident and links to the corresponding article. This part of the script processes the table row by row, targets the second cell and stores its data in the array `incident` (the incident's name as index, the link to the corresonding article as value, e.g. `incident['Columbine High School massacre'] = '/wiki/Columbine_High_School_massacre'`). 

In [3]:
for row in deadliest.find_next_sibling('table', class_='wikitable').find_all('tr')[1:]:
    cells = row.find_all(['td'])
    try:
        incident_name = cells[1].text
        link_tag = cells[1].find('a')
        if link_tag is not None:
            incident_link = link_tag['href']
            'print(incident_name, incident_link)'
        else:
            incident_name = cells[0].text
            link_tag = cells[0].find('a')
            incident_link = link_tag['href']
            'print(incident_name, incident_link)'
    except IndexError:
        continue
    incident[incident_name] = incident_link

> **Comment**
> Apparently, the table is not well-structured or there is a problem with parsing: sometimes the desired data is in column 2, sometimes in column 1. Hence the if-statement: if there is no link in the second column (`cells[1]`), the script tries to read the first column (`cells[0]`).

#### Processing each incident
This part reads each incident's article. It targets the main content of the page and then processes each paragraph, storing its text in `paragraph[incident]`). This means that text from many different articles will be mixed in this list. 

Additionally, each further link in the article's main text is stored in the list `links` to be processed further on. The full text of the article is stored in `full[incident]` used later to check for word spellings (see `function all_lowercase`).  

In [4]:
for i in incident:
    article = english + incident[i]
    print("Working on article: " + article)
    req = urllib.request.Request(article, headers=headers)
    html = urllib.request.urlopen(req).read()
    soup = BeautifulSoup(html, 'lxml')
    content = soup.find('div', id='bodyContent')
    paragraphs[i] = []
    for p in content.find_all('p'):
        paragraphs[i].append(p.getText())
    for a in content.find_all('a'):
        href = a.get('href')
        if href is not None:
            "href = href.encode('utf-8')"
            if ':' not in href and re.match(r'/wiki', href):
                links.append(href)
    full[i] = content.getText()

Working on article: https://en.wikipedia.org/wiki/2017_Las_Vegas_shooting
Working on article: https://en.wikipedia.org/wiki/Pulse_nightclub_shooting
Working on article: https://en.wikipedia.org/wiki/Virginia_Tech_shooting
Working on article: https://en.wikipedia.org/wiki/Sandy_Hook_Elementary_School_shooting
Working on article: https://en.wikipedia.org/wiki/Sutherland_Springs_church_shooting
Working on article: https://en.wikipedia.org/wiki/Luby%27s_shooting
Working on article: https://en.wikipedia.org/wiki/2019_El_Paso_shooting
Working on article: https://en.wikipedia.org/wiki/San_Ysidro_McDonald%27s_massacre
Working on article: https://en.wikipedia.org/wiki/Uvalde_school_shooting
Working on article: https://en.wikipedia.org/wiki/2023_Lewiston_shootings
Working on article: https://en.wikipedia.org/wiki/Parkland_high_school_shooting
Working on article: https://en.wikipedia.org/wiki/University_of_Texas_tower_shooting
Working on article: https://en.wikipedia.org/wiki/2015_San_Bernardino_

#### Function: Checking and processing text
This function makes sure that the input text is formatted properly and doesn't contain proper nouns.

In [5]:
def all_lowercase(sent, full_text):
    'Returns a suitably modified sentence if it seems to have no proper nouns.'
    if re.search(r'[a-z]', sent) and sent[0] == sent[0].upper() and \
                                     sent[1:] == sent[1:].lower():
        sent = re.sub(r'\[.*\]', '', sent)
        sent = re.sub(r'^\"$', '', sent)
        sent = re.sub(r'"', '', sent)
        sent = re.sub(r'\n', ' ', sent)
        sent = sent.strip()
        if len(sent.split()) > 2:
            first_word = sent.split()[0]
            lc_pattern = re.compile(' ' + first_word.lower() + ' ')
            uc_pattern = re.compile(r'[a-z] ' + first_word + ' ')
            if re.search(r'\:$', sent):
                return None
            elif re.search(lc_pattern, full_text):
                return sent
            elif not re.search(uc_pattern, full_text):
                return sent
            elif first_word == 'I':
                if sent[-1] == '"':
                    return '"' + sent
            else:
                second_word = sent.split()[1]
                rest = ' '.join(sent.split()[1:])[1:]
                return second_word[0].upper() + rest

#### Processing and storing `paragraphs` in `litany`
This bit loops through `paragraphs` and processes its text. Each sentence is processed individually and handed over to the function `all_lowercase` to check for spelling and format. If it passes, it is appended to the list `litany`, which is used as an archive for phrases in the first part of the book.

In [6]:
for i in paragraphs:
    for part in paragraphs[i]:
        blob = TextBlob(part)
        for s in blob.sentences:
            string = str(s)
            string = all_lowercase(string, full[i])
            if string is not None:
                litany.append(string)

#### Generating `simple_litany`
For each Wikipedia article stored in `links`, the script checks if there is a version on Wikipedia in Simple English. If not, it returns an error (`HTTP Error 404: Not Found`), which will happen often as Simple Wikipedia is much smaller than Eglish Wikipedia. If there is, its main text is stored in `new_paragraphs` and processed via `all_lowercase`. The result is appended to `simple_litany`, the list of sentences to be used in the second part of the novel.

The first line sorts all links collected according to their frequency, putting the most cited articles first, and iterates through that list. 

The condition `if 10 < count < 14` determines that only those articles that are cited more than 10 and less than 14 times will be processed. Decrease the first number if you wish to process more articles. Uncomment the lines beginning with `print` to debug/enable logging.

In [7]:
for count, rel_url in sorted(((links.count(e), e) for e in set(links)), reverse=True):
    if 10 < count < 14:
        article = simple + rel_url
        try:
            'print("Reading " + article)'
            req = urllib.request.Request(article, headers=headers)
            html = urllib.request.urlopen(req).read()
            soup = BeautifulSoup(html, 'lxml')
            content = soup.find('div', id='bodyContent')
            new_paragraphs = []
            for p in content.find_all('p'):
                new_paragraphs.append(p.getText())
            for paragraph in new_paragraphs:
                blob = TextBlob(paragraph)
                for s in blob.sentences:
                    string = str(s)
                    string = all_lowercase(string, content.getText())
                    if string is not None:
                        simple_litany.append(string)
        except Exception as e:
            'print(f"Error while processing {article}: {e}")'
            continue

#### Function: `add_to_degenerate`
This function takes a string and 'degenerates' it: If the string includes a comma (= e.g. consists of a main and a relative clause) and a matching number of opening and closing parantheses, it is split by the first occurence of a comma and transformed into a simple main clause. This results in a more fragmented language.  

In [8]:
def add_to_degenerate(string):
    if ',' in string and re.findall(r'\(', string) == \
                         re.findall(r'\)', string):
        string = string.split(',')[0] + '.'
        if string[-3:] == 'm..': # Sentences ending "a.m.." and "p.m.."
            string = string[:-1]
        degenerate_litany.append(string)

#### 'Degenerating' `litany` and `simple_litany`

In [9]:
for string in simple_litany:
    add_to_degenerate(string)
for string in litany:
    add_to_degenerate(string)
for string in degenerate_litany:
    if ' ' in string and len(string.split()) < 5 and \
                     ',' not in string and '(' not in string:
        degenerate_litany.append(string[:-1] + ', ' + string.lower())

#### Function: print_part

In [10]:
def print_part(statements, declare, travel, job):
    'Prints one of three parts of the book.'
    'and combines statements from litanies with previously defined declarations'
    shuffle(statements)
    'The following line determines how many of the phrases in `statements` will be used for the novel. Decreasing the number will result in a longer text.'
    tenth = int( len(statements) / 100 ) 
    next_para = 0
    for n in range(10):
        para = '  ' + choice(para_frames)
        for j in range(tenth):
            para += ' ' + statements[next_para + j]
            para += choice(['', '', '', '', '\n  ' + choice(para_frames)])
        print(para)
        next_para += tenth
        sentence = choice(declare)
        declare.remove(sentence)
        if len(sentence) > 0:
            sentence += ' '
        final_sentence = '  ' + sentence + choice(travel)
        if len(job) > 0:
            final_sentence += ' and he found work ' + choice(job)
        final_sentence += '.'
        print(final_sentence)

#### Printing each part of the novel

In [11]:
print_part(litany, declarations, with_truck, laborer_job)
print('')
print('•')
print('')
print_part(simple_litany, simple_declarations, no_truck, unpleasant_job)
print('')
print('•')
print('')
for pf in para_frames:
    if len(pf) > 38:
        para_frames.remove(pf)
print_part(degenerate_litany, ['']*12, no_truck, [])

  The man went to find something, not knowing what. This law was for the sake of students and faculty members only since the state attorney general ruled that it did not apply to non-students and non-faculty on campus who could carry concealed without restriction on campus. Relatives described him as a recluse with no friends. He shot the students and teacher in the classroom where he had formerly attended sessions. About a half-hour later, tactical teams entered the building. The proposed restrictions to amend regulations would not apply to stabilizing braces used by individuals with disabilities. There was one survivor, an infant girl. In accordance with law, the names of victims and witnesses were redacted or withheld.
  The man remembered a lot of things. Then, as his mother lunged at him, he shot her once in the head and twice in the chest.
  The man had many thoughts, few of them clear. Rescues of people trapped inside the nightclub commenced and continued throughout the night. A