From 2a869a1f7147421de81871df5e18b802c79d1e60 Mon Sep 17 00:00:00 2001 From: Anisa Hawes <87070441+anisa-hawes@users.noreply.github.com> Date: Thu, 19 Oct 2023 18:00:56 +0100 Subject: [PATCH 01/30] Update ph_authors.yml Add bio for Megan S. Kane --- _data/ph_authors.yml | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/_data/ph_authors.yml b/_data/ph_authors.yml index b1504cd17a..77a61d2f15 100644 --- a/_data/ph_authors.yml +++ b/_data/ph_authors.yml @@ -2995,3 +2995,10 @@ team_roles: - publishing-assistant status: institutionally-supported + +- name: Megan S. Kane + orcid: 0000-0003-1817-2751 + team: false + bio: + en: | + Megan Kane is a PhD candidate in the English Department at Temple University. From e96a40e356f991dbab6d75c83220807b4a1c476c Mon Sep 17 00:00:00 2001 From: Anisa Hawes <87070441+anisa-hawes@users.noreply.github.com> Date: Thu, 19 Oct 2023 18:06:47 +0100 Subject: [PATCH 02/30] Create corpus-analysis-with-spacy.md Create corpus-analysis-with-spacy.md --- en/lessons/corpus-analysis-with-spacy.md | 690 +++++++++++++++++++++++ 1 file changed, 690 insertions(+) create mode 100644 en/lessons/corpus-analysis-with-spacy.md diff --git a/en/lessons/corpus-analysis-with-spacy.md b/en/lessons/corpus-analysis-with-spacy.md new file mode 100644 index 0000000000..1465e6b94e --- /dev/null +++ b/en/lessons/corpus-analysis-with-spacy.md @@ -0,0 +1,690 @@ +--- +title: "Corpus Analysis with SpaCy" +slug: corpus-analysis-with-spacy +layout: lesson +collection: lessons +date: 2023-10-19 +authors: +- Megan S. Kane +reviewers: +- Maria Antoniak +- William Mattingly +editors: +- John R. Ladd +review-ticket: https://github.com/programminghistorian/ph-submissions/issues/546 +difficulty: 2 +activity: analyzing +topics: [data-manipulation, distant-reading, python] +abstract: This lesson demonstrates how to use the Python library spaCy for analysis of large collections of texts. This lesson details the process of using spaCy to enrich a corpus via lemmatization, part-of-speech tagging, dependency parsing, and named entity recognition. Readers will learn how the linguistic annotations produced by spaCy can be analyzed to help researchers explore meaningful trends in language patterns across a set of texts. +avatar_alt: Drawing of the planet Saturn +doi: 10.46430/phen0113 +--- + + +{% include toc.html %} + + +## Introduction +Say you have a big collection of texts. Maybe you've gathered speeches from the French Revolution, compiled a bunch of Amazon product reviews, or unearthed a collection of diary entries written during the first world war. In any of these cases, computational analysis can be a good way to compliment close reading of your corpus... but where should you start? + +One possible way to begin is with [spaCy](https://spacy.io/), an industrial-strength library for Natural Language Processing (NLP) in [Python](https://perma.cc/4GK2-5EEA). spaCy is capable of processing large corpora, generating linguistic annotations including part-of-speech tags and named entities, as well as preparing texts for further machine classification. This lesson is a 'spaCy 101' of sorts, a primer for researchers who are new to spaCy and want to learn how it can be used for corpus analysis. It may also be useful for those who are curious about natural language processing tools in general, and how they can help us to answer humanities research questions. + +### Lesson Goals +By the end of this lesson, you will be able to: +* Upload a corpus of texts to a platform for Python analysis (using Google Colaboratory) +* Use spaCy to enrich the corpus through tokenization, lemmatization, part-of-speech tagging, dependency parsing and chunking, and named entity recognition +* Conduct frequency analyses using part-of-speech tags and named entities +* Download an enriched dataset for use in future NLP analyses + +### Why Use spaCy for Corpus Analysis? +As the name implies, corpus analysis involves studying corpora, or large collections of documents. Typically, the documents in a corpus are representative of the group(s) a researcher is interested in studying, such as the writings of a specific author or genre. By analyzing these texts at scale, researchers can identify meaningful trends in the way language is used within the target group(s). + +Though computational tools like spaCy can't read and comprehend the meaning of texts like humans do, they excel at 'parsing' (analyzing sentence structure) and 'tagging' (labeling) them. When researchers give spaCy a corpus, it will 'parse' every document in the collection, identifying the grammatical categories to which each word and phrase in each text most likely belongs. NLP Algorithms like spaCy use this information to generate lexico-grammatical tags that are of interest to researchers, such as lemmas (base words), part-of-speech tags and named entities (more on these in the [Part-of-Speech Analysis](#part-of-speech-analysis) and [Named Entity Recognition](#named-entity-recognition) sections below). Furthermore, computational tools like spaCy can perform these parsing and tagging processes much more quickly (in a matter of seconds or minutes) and on much larger corpora (hundreds, thousands, or even millions of texts) than human readers would be able to. + +Though spaCy was designed for industrial use in software development, researchers also find it valuable for several reasons: +* It's [easy to set up and use spaCy's Trained Models and Pipelines](https://perma.cc/Q8QL-N3CX); there is no need to call a wide range of packages and functions for each individual task +* It uses [fast and accurate algorithms](https://perma.cc/W8AD-4QSN) for text-processing tasks, which are kept up-to-date by the developers so it's efficient to run +* It [performs better on text-splitting tasks than Natural Language Toolkit (NLTK)](https://perma.cc/8989-S2Q6), because it constructs [syntactic trees](perma.cc/E6UJ-DZ9W) for each sentence + +You may still be wondering: What is the value of extracting language data such as lemmas, part-of-speech tags, and named entities from a corpus? How can this data help researchers answer meaningful humanities research questions? To illustrate, let's look at the example corpus and questions developed for this lesson. + +### Dataset: Michigan Corpus of Upper-Level Student Papers +The [Michigan Corpus of Upper-Level Student Papers (MICUSP)](https://perma.cc/WK67-MQ8A) is a corpus of 829 high-scoring academic writing samples from students at the University of Michigan. The texts come from 16 disciplines and seven genres, all were written by senior undergraduate or graduate students and received an A-range score in a university course.[^1] The texts and their metadata are publicly available on [MICUSP Simple](https://perma.cc/WK67-MQ8A), an online interface which allows users to search for texts by a range of fields (for example genre, discipline, student level, textual features) and conduct simple keyword analyses across disciplines and genres. + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-01.png" alt="MICUSP Simple Interface web page, displaying list of texts included in MICUSP, distribution of texts across disciplines and paper types, and options to sort texts by student level, textual features, paper types, and disciplines" caption="Figure 1: MICUSP Simple Interface" %} + +Metadata from the corpus is available to download in `.csv` format. The text files can be retrieved through webscraping, a process explained further in Jeri Wieringa's [Intro to BeautifulSoup lesson](/en/lessons/retired/intro-to-beautiful-soup), a Programming Historian lesson which remains methodologically useful even if it has been retired due to changes to the scraped website. + +Given its size and robust metadata, MICUSP has become a valuable tool for researchers seeking to study student writing computationally. Notably, Jack Hardy and Ute Römer[^2] use MICUSP to study language features that indicate how student writing differs across disciplines. Laura Aull compares usages of stance markers across student genres[^3], and Sugene Kim highlights discrepancies between prescriptive grammar rules and actual language use in student work[^4]. Like much corpus analysis research, these studies are predicated on the fact that computational analysis of language patterns — the discrete lexico-grammatical practices students employ in their writing — can yield insights into larger questions about academic writing. Given its value in discovering linguistic annotations, spaCy is well-poised to conduct this type of analysis on MICUSP data. + +This lesson will explore a subset of documents from MICUSP: 67 Biology papers and 98 English papers. Writing samples in this select corpus belong to all seven MICUSP genres: Argumentative Essay, Creative Writing, Critique/Evaluation, Proposal, Report, Research Paper, and Response Paper. This select corpus [`.txt_files.zip`](/assets/corpus-analysis-with-spacy/txt_files.zip) and the associated [`metadata.csv`](/assets/corpus-analysis-with-spacy/metadata.csv) are available to download as sample materials for this lesson. The dataset has been culled from the larger corpus in order to investigate the differences between two distinct disciplines of academic writing (Biology and English). It is also a manageable size for the purposes of this lesson. + +**Quick note on corpus size and processing speed:** spaCy is able to process jobs of up to 1 million characters, so it can be used to process the full MICUSP corpus, or other corpora containing hundreds or thousands of texts. You are more than welcome to retrieve the entire MICUSP corpus with [this webscraping code](https://perma.cc/75EV-XDBN) and using that dataset for the analysis. + +### Research Questions: Linguistic Differences Within Student Paper Genres and Disciplines +This lesson will describe how spaCy's utilities in **stopword removal,** **tokenization,** and **lemmatization** can assist in (and hinder) the preparation of student texts for analysis. You will learn how spaCy's ability to extract linguistic annotations such as **part-of-speech tags** and **named entities** can be used to compare conventions within subsets of a discursive community of interest. The lesson focuses on lexico-grammatical features that may indicate genre and disciplinary differences in academic writing. + +The following research questions will be investigated: + +#### Research Question 1: Do students use certain parts-of-speech more frequently in Biology texts versus English texts, and does this linguistic discrepancy signify differences in disciplinary conventions? +Prior research has shown that even when writing in the same genres, writers in the sciences follow different conventions than those in the humanities. Notably, academic writing in the sciences has been characterized as informational, descriptive, and procedural, while scholarly writing in the humanities is narrativized, evaluative, and situation-dependent (that is, focused on discussing a particular text or prompt)[^5]. By deploying spaCy on the MICUSP texts, researchers can determine whether there are any significant differences between the part-of-speech tag frequencies in English and Biology texts. For example, we might expect students writing Biology texts to use more adjectives than those in the humanities, given their focus on description. Conversely, we might suspect English texts to contain more verbs and verb auxiliaries, indicating a more narrative structure. To test these hypotheses, you'll learn to analyze part-of-speech counts generated by spaCy, as well as to explore other part-of-speech count differences that could prompt further investigation. + +#### Research Question 2: Do students use certain named entities more frequently in different academic genres, and do these varing word frequencies signify broader differences in genre conventions? +As with disciplinary differences, research has shown that different genres of writing have their own conventions and expectations. For example, explanatory genres such as research papers, proposals and reports tend to focus on description and explanation, whereas argumentative and critique-driven texts are driven by evaluations and arguments[^6]. By deploying spaCy on the MICUSP texts, researchers can determine whether there are any significant differences between the named entity frequencies in texts within the seven different genres represented (Argumentative Essay, Creative Writing, Critique/Evaluation, Proposal, Report, Research Paper, and Response Paper). We may suspect that argumentative genres engage more with people or works of art, since these could be entities serving to support their arguments or as the subject of their critiques. Conversely, perhaps dates and numbers are more prevalent in evidence-heavy genres, such as research papers and proposals. To test these hypotheses, you'll learn to analyze the nouns and noun phrases spaCy has tagged as 'named entities.' + +In addition to exploring the research questions above, this lesson will address how a dataset enriched by spaCy can be exported in a usable format for further machine learning tasks including [sentiment analysis](/en/lessons/sentiment-analysis#calculate-sentiment-for-a-paragraph) or [topic modeling](/en/lessons/topic-modeling-and-mallet). + +### Prerequisites +You should have some familiarity with Python or a similar coding language. For a brief introduction or refresher, work through some of the _Programming Historian_'s [introductory Python tutorials](/en/lessons/introduction-and-installation). You should also have basic knowledge of spreadsheet (`.csv`) files, as this lesson will primarily use data in a similar format called a [pandas](https://pandas.pydata.org/) DataFrame. Halle Burns's lesson [Crowdsourced-Data Normalization with Python and Pandas](/en/lessons/crowdsourced-data-normalization-with-pandas) provides an overview of creating and manipulating datasets using pandas. + +**The code for this lesson can be found [here.](/assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy.ipynb)** + +The lesson code accompaniying this lesson is accessible as a [Jupyter Notebook](https://perma.cc/S9GS-83JN) customized to run in Google Colaboratory. Jupyter Notebooks are browser-based, interactive computing environment for Python. Colaboratory is a Google platform which allows you to run a cloud-hosted Jupyter Notebook, with additional built-in features. If you're new to coding and aren't working with sensitive data, Google Colab may be the best option for you. [There is a brief Colab tutorial from Google available for beginners.](https://colab.research.google.com/) + +You can also download [the lesson code](/assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy.ipynb) and run it on your local machine. The practical steps for running the code locally are the same except when it comes to installing packages and retrieving and downloading files. These divergences are marked in the notebook. Quinn Dombrowski, Tassie Gniady, and David Kloster's lesson [Introduction to Jupyter Notebooks](/en/lessons/jupyter-notebooks) cover the necessary background for setting up and using a Jupyter Notebook with Anaconda. + +It is also recommended, though not required, that before starting this lesson you learn about common text mining methods. Heather Froehlich's lesson [Corpus Analysis with AntConc](/en/lessons/corpus-analysis-with-antconc) shares tips for working with plain text files and outlines possibilities for exploring keywords and collocations in a corpora. William J. Turkel and Adam Crymble's lesson [Counting Word Frequencies with Python](/en/lessons/counting-frequencies) describes the process of counting word frequencies, a practice this lesson will adapt to count part-of-speech and named entity tags. + +No prior knowledge of spaCy is required. For a quick overview, go to the [spaCy 101 page](https://perma.cc/Z23P-R252) from the library's developers. + +## Imports, Uploads, and Preprocessing + +### Import Packages +Import spaCy and related packages into your Colab environment. + +``` +# Install and import spaCy +import spacy + +# Load spaCy visualizer +from spacy import displacy + +# Import os to upload documents and metadata +import os + +# Import pandas DataFrame packages +import pandas as pd + +# Import graphing package +import plotly.graph_objects as go +import plotly.express as px + +# Import drive and files to facilitate file uploads +from google.colab import files +``` + +### Upload Text Files +After all necessary packages have been imported, it is time to upload the data for analysis with spaCy. Prior to running the code below, make sure the MICUSP text files you are going to analyze are saved to your local machine. + +Run the code below to select multiple files to upload from a local folder: + +``` +uploaded_files = files.upload() +``` + +When the cell has run, navigate to where you stored the MICUSP text files. Select all the files of interest and click Open. The text files should now be uploaded to your Google Colab session. + +Now we have files upon which we can perform analysis. To check what form of data we are working with, you can use the `type()` function. + +``` +type(uploaded_files) +``` + +It should return that your files are contained in a dictionary, where keys are the filenames and values are the content of each file. + +Next, we’ll make the data easier to manage by inserting it into a pandas DataFrame. As the files are currently stored in a dictionary, use the `DataFrame.from_dict()` function to append them to a new DataFrame: + +``` +paper_df = pd.DataFrame.from_dict(uploaded_files, orient='index') +paper_df.head() +``` + +Use the `.head()` function to call the first five rows of the DataFrame and check that the filenames and text are present. You will also notice some strange characters at the start of each row of text; these are byte string characters (`b'` or `b"`) related to the encoding, and they will be removed below. + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-02.png" alt="First five rows of student text DataFrame, including columns for the title of each text and the text of each text, without column header names and with byte string characters at start of each line." caption="Figure 2: Initial DataFrame with filenames and texts in Colab" %} + +From here, you can reset the index (the very first column of the DataFrame) so it is a true index, rather than the list of filenames. The filenames will become the first column and the texts become the second, making data wrangling easier later. + +``` +# Reset index and add column names to make wrangling easier +paper_df = paper_df.reset_index() +paper_df.columns = ["Filename", "Text"] +``` + +Check the head of the DataFrame again to confirm this process has worked. + +### Pre-process Text Files +If you've done any computational analysis before, you're likely familiar with the term 'cleaning', which covers a range of procedures such as lowercasing, punctuation removal, and stopword removal. Such procedures are used to standardize data and make it easier for computational tools to interpret it. In the next step, you will convert the uploaded files from byte strings into Unicode strings so that spaCy can process them and replace extra spaces with single spaces. + +First, you will notice that each text in your DataFrame starts with `b'` or `b"`. This indicates that the data has been read as 'byte strings', or strings which represent as sequence of bytes. `'b"Hello"`, for example, corresponds to the sequence of bytes `104, 101, 108, 108, 111`. To analyze the texts with spaCy, we need them to be Unicode strings, where the characters are individual letters. + +Converting from bytes to strings is a quick task using `str.decode()`. Within the parentheses, we specify the encoding parameter, UTF-8 (Unicode Transformation Format - 8 bits) which guides the transformation from bytes to Unicode strings. For a more thorough breakdown of encoding in Python, [check out this lesson](https://perma.cc/Z5M2-4EHC). + +``` +paper_df['Text'] = paper_df['Text'].str.decode('utf-8') +paper_df.head() +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-03.png" alt="First five rows of student texts DataFrame, including columns for the title of each text and the text of each, with byte string characters removed." caption="Figure 3: Decoded DataFrame with filenames and texts in Colab" %} + +Additionally, the beginnings of some of the texts may also contain extra spaces (indicated by `\t` or `\n`). These characters can be replaced by a single space using the `str.replace()` method. + +``` +paper_df['Text'] = paper_df['Text'].str.replace('\s+', ' ', regex=True).str.strip() +``` + +Further cleaning is not necessary before running running spaCy, and some common cleaning processes will, in fact, skew your results. For example, punctuation markers help spaCy parse grammatical structures and generate part-of-speech tags and dependency trees. Recent scholarship suggests that removing stopwords only superficially improves tasks like topic modeling, that retaining stopwords can support clustering and classification[^8]. At a later stage of this lesson, you will learn to remove stopwords so you can compare its impact on spaCy results. + +### Upload and Merge Metadata Files +Next you will retrieve the metadata about the MICUSP corpus: the discipline and genre information connected to the student texts. Later in this lesson, you will use spaCy to trace differences across genre and disciplinary categories. + +In your Colab, run the following code to upload the `.csv` file from your local machine. + +``` +metadata = files.upload() +``` + +Then convert the uploaded `.csv` file to a second DataFrame, dropping any empty columns. + +``` +metadata_df = pd.read_csv('metadata.csv') +metadata_df = metadata_df.dropna(axis=1, how='all') +``` + +Display the first five rows to check that the data is as expected. Four rows should be present: the paper IDs, their titles, their discipline, and their type (genre). + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-04.png" alt="First five rows of student paper metadata DataFrame, including columns for paper ID, title, discipline, and paper type." caption="Figure 4: Head of DataFrame with paper metadata-ID, title, discpline and type in Google Colab" %} + +Notice that the paper IDs in this DataFrame are *almost* the same as the paper filenames in the corpus DataFrame. We're going to make them match exactly so we can merge the two DataFrames together on this column; in effect, linking each text with their title, discipline and genre. + +To match the columns, we'll remove the `.txt` extension from the end of each filename in the corpus DataFrame using a simple `str.replace` function. This function searches for every instance of the phrase `.txt` in the **Filename** column and replaces it with nothing (in effect, removing it). In the metadata DataFrame, we'll rename the paper ID column **Filename**. + +``` +# Remove .txt from title of each paper +paper_df['Filename'] = paper_df['Filename'].str.replace('.txt', '') + +# Rename column from paper ID to Title +metadata_df.rename(columns={"PAPER ID": "Filename"}, inplace=True) +``` + +Now it is possible to merge the papers and metadata into a single DataFrame: + +``` +final_paper_df = metadata_df.merge(paper_df,on='Filename') +``` + +Check the first five rows to make sure each has a filename, title, discipline, paper type and text (the full paper). At this point, you'll also see that any extra spaces have been removed from the beginning of the texts. + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-05.png" alt="First five rows of DataFrame merged to include student texts and metadata, with columns for filename, title, discipline, paper type, and text." caption="Figure 5: DataFrame with files and metadata" %} + +The resulting DataFrame is now ready for analysis. + +## Text Enrichment with spaCy +### Creating Doc Objects +To use spaCy, the first step is to load one of spaCy's Trained Models and Pipelines which will be used to perform tokenization, part-of-speech tagging, and other text enrichment tasks. A wide range of options are available ([see the full list here](https://perma.cc/UK2P-ZNM4)), and they vary based on size and language. + +We'll use `en_core_web_sm`, which has been trained on written web texts. It may not perform as accurately as the those trained on medium and large English language models, but it will deliver results most efficiently. Once we've loaded `en_core_web_sm`, we can check what actions it performs; `parser`, `tagger`, `lemmatizer`, and `ner`, should be among those listed. + +``` +nlp = spacy.load('en_core_web_sm') + +print(nlp.pipe_names) +``` + +Now that the `nlp` function is loaded, let's test out its capacities on a single sentence. Calling the `nlp` function on a single sentence yields a Doc object. This object stores not only the original text, but also all of the linguistic annotations obtained when spaCy processed the text. + +``` +sentence = "This is 'an' example? sentence" + +doc = nlp(sentence) +``` + +Next we can call on the Doc object to get the information we're interested in. The command below loops through each token in a Doc object and prints each word in the text along with its corresponding part-of-speech: + +``` +for token in doc: + print(token.text, token.pos_) +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-06.png" alt="Output from command to print each word in the sentence, along with their corresponding part-of-speech tags PRON, AUX, PUNCT, DET, PUNCT, NOUN, PUNCT, NOUN." caption="Figure 6: Example output of text and parts of speech generated by spaCy" %} + +Let's try the same process on the student texts. As we'll be calling the NLP function on every text in the DataFrame, we should first define a function that runs `nlp` on whatever input text is given. Functions are a useful way to store operations that will be run multiple times, reducing duplications and improving code readability. + +``` +def process_text(text): + return nlp(text) +``` + +After the function is defined, use `.apply()` to apply it to every cell in a given DataFrame column. In this case, `nlp` will run on each cell in the **Text** column of the `final_paper_df` DataFrame, creating a Doc object from every student text. These Doc objects will be stored in a new column of the DataFrame called **Doc**. + +Running this function takes several minutes because spaCy is performing all the parsing and tagging tasks on each text. However, when it is complete, we can simply call on the resulting Doc objects to get parts-of-speech, named entities, and other information of interest, just as in the example of the sentence above. + +``` +final_paper_df['Doc'] = final_paper_df['Text'].apply(process_text) +``` + +### Text Reduction +#### Tokenization +A critical first step spaCy performs is tokenization, or the segmentation of strings into individual words and punctuation markers. Tokenization enables spaCy to parse the grammatical structures of a text and identify characteristics of each word-like part-of-speech. + +To retrieve a tokenized version of each text in the DataFrame, we'll write a function that iterates through any given Doc object and returns all functions found within it. This can be accomplished by simply putting a `define` wrapper around a `for` loop, similar to the one written above to retrieve the tokens and parts-of-speech from a single sentence. + +``` +def get_token(doc): + for token in doc: + return token.text +``` + +However, there's a way to write the same function that makes the code more readable and efficient. This is called List Comprehension, and it involves condensing the `for` loop into a single line of code and returning a list of tokens within each text it processes: + +``` +def get_token(doc): + return [(token.text) for token in doc] +``` + +As with the function used to create Doc objects, the `token` function can be applied to the DataFrame. In this case, we will call the function on the **Doc** column, since this is the column which stores the results from the processing done by spaCy. + +``` +final_paper_df['Tokens'] = final_paper_df['Doc'].apply(get_token) +``` + +If we compare the **Text** and **Tokens** column, we find a couple of differences. Most importantly, the words, spaces, and punctuation markers in the **Tokens** column are separated by commas, indicating that each have been parsed as individual tokens. The text in the **Tokens** column is also bracketed; this indicates that tokens have been generated as a list. We'll discuss how and when to transform the lists to strings to conduct frequency counts below. + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-07.png" alt="First and last five rows of DataFrame with columns for plain text and tokenized versions of each text." caption="Figure 7: Comparison of text and spaCy-generated token columns in DataFrame of student texts" %} + +#### Lemmatization +Another process performed by spaCy is lemmatization, or the retrieval of the dictionary root word of each word (for example “brighten” for “brightening”). We'll perform a similar set of steps to those above to create a function to call the lemmas from the Doc object, then apply it to the DataFrame. + +``` +def get_lemma(doc): + return [(token.lemma_) for token in doc] + +final_paper_df['Lemmas'] = final_paper_df['Doc'].apply(get_lemma) +``` + +Lemmatization can help reduce noise and refine results for researchers who are conducting keyword searches. For example, let’s compare counts of the word “write” in the original **Tokens** column and in the lemmatized **Lemmas** column. + +``` +print(f'"Write" appears in the text tokens column ' + str(final_paper_df['Tokens'].apply(lambda x: x.count('write')).sum()) + ' times.') +print(f'"Write" appears in the lemmas column ' + str(final_paper_df['Lemmas'].apply(lambda x: x.count('write')).sum()) + ' times.') +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-08.png" alt="Output of command to print number of times the word 'write' appears in the Tokens column (40 times) and the Lemmas columns (302 times)." caption="Figure 8: Frequency count of 'write' in **Tokens** and **Lemmas** columns" %} + +As expected, there are more instances of "write" in the **Lemmas** column, as the lemmatization process has grouped inflected word forms (writing, writer) into the base word "write." + +### Text Annotation +#### Part-of-Speech Tagging +spaCy facilitates two levels of part-of-speech tagging: coarse-grained tagging, which predicts the simple [universal part-of-speech](https://perma.cc/49ER-GXVW) of each token in a text (such as noun, verb, adjective, adverb), and detailed tagging, which uses a larger, more fine-grained set of part-of-speech tags (for example 3rd person singular present verb). The part-of-speech tags used are determined by the English language model we use. In this case, we're using the small English model, and you can explore the differences between the models on [spaCy's website](https://perma.cc/PC9E-HKHM). + +We can call the part-of-speech tags in the same way as the lemmas. Create a function to extract them from any given Doc object and apply the function to each Doc object in the DataFrame. The function we'll create will extract both the coarse- and fine-grained part-of-speech for each token (`token.pos_` and `token.tag_`, respectively). + +``` +def get_pos(doc): + return [(token.pos_, token.tag_) for token in doc] + +final_paper_df['POS'] = final_paper_df['Doc'].apply(get_pos) +``` + +We can create a list of the part-of-speech columns to review them further. The first (coarse-grained) tag corresponds to a generally recognizable part-of-speech such as a noun, adjective, or punctuation mark, while the second (fine-grained) category are a bit more difficult to decipher. + +``` +list(final_paper_df['POS']) +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-09.png" alt="List of coarse- and fine-grained part-of-speech tags appearing in student texts, including 'PROPN, NNP' and 'NUM, CD' among other pairs of coarse- and fine-grained terms." caption="Figure 9: Excerpt from list of parts of speech in student texts" %} + +Fortunately, spaCy has a built-in function called `explain` that can provide a short description of any tag of interest. If we try it on the tag `IN` using `spacy.explain("IN")`, the output reads `conjunction`, `subordinating` or `preposition`. + +In some cases, you may want to get only a set of part-of-speech tags for further analysis, like all of the proper nouns. A function can be written to perform this task, extracting only words which have been fitted with the proper noun tag. + +``` +def extract_proper_nouns(doc): + return [token.text for token in doc if token.pos_ == 'PROPN'] + +final_paper_df['Proper_Nouns'] = final_paper_df['Doc'].apply(extract_proper_nouns) +``` + +Listing the nouns in each text can help us ascertain the texts' subjects. + +``` +list(final_paper_df['Proper_Nouns']) +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-10.png" alt="Excerpts from lists of proper nouns identified in each student text, including 'New York City', 'Earth', 'Long', and 'Gorden' among other terms." caption="Figure 10: Excerpt of proper nouns in each student text" %} + +The third text shown here, for example, involves astronomy concepts; this is likely to have been written for a biology course. In contrast, texts 163 and 164 appear to be analyses of Shakespeare plays and movie adaptations. Along with assisting content analyses, extracting nouns have been shown to help build more efficient topic models[^9]. + +#### Dependency Parsing +Closely related to part-of-speech tagging is 'dependency parsing', wherein spaCy identifies how different segments of a text are related to each other. Once the grammatical structure of each sentence is identified, visualizations can be created to show the connections between different words. Since we are working with large texts, our code will break down each text into sentences (spans) and then create dependency visualizers for each span. We can then visualize the span of once sentence at a time. + +``` +doc = final_paper_df['Doc'][5] +sentences = list(doc.sents) +sentence = sentences[1] +displacy.render(sentence, style="dep", jupyter=True) +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-11.png" alt="Dependency parse visualization of the sentence, 'There are two interesting phenomena in this research', with part-of-speech labels and arrows indicating dependencies between words." caption="Figure 11: Dependency parsing example from one sentence of one text in corpus" %} + +If you'd like to review the output of this code as raw `.html`, you can download it [here](/assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-16.html) and open it with your browser. Here, spaCy has identified relationships between pronouns, verbs, nouns and other parts of speech in one sentence. For example, both "two" and "interesting" modify the noun "phenomena," and the pronoun "There" is an expletive filling the noun position before "are" without adding meaning to the sentence. + +Dependency parsing makes it easy to see how removing stopwords can impact spaCy's depiction of the grammatical structure of texts. Let's compare to a dependency parsing where stopwords are removed. To do so, we'll create a function to remove stopwords from the Doc object, create a new Doc object without stopwords, and extract the part-of-speech tokens from the same sentence in the same text. Then we'll create a visualization for the dependency parsing for the same sentence as above, this time without stopwords. + +``` +def extract_stopwords(doc): + return [token.text for token in doc if token.text not in nlp.Defaults.stop_words] + +final_paper_df['Tokens_NoStops'] = final_paper_df['Doc'].apply(extract_stopwords) + +final_paper_df['Text_NoStops'] = [' '.join(map(str, l)) for l in final_paper_df['Tokens_NoStops']] + +final_paper_df['Doc_NoStops'] = final_paper_df['Text_NoStops'].apply(process_text) + +doc = final_paper_df['Doc_NoStops'][5] +sentences = list(doc.sents) +sentence = sentences[0] + +displacy.render(sentence, style='dep', jupyter=True) +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-12.png" alt="Dependency parse visualization of the sentence without stopwords, 'There interesting phenomena research', with part-of-speech labels and arrows indicating dependencies between words." caption="Figure 12: Dependency parsing example from one sentence of one text in corpus without stopwords" %} + +If you'd like to review the output of this code as raw `.html`, you can download it [here](/assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-17.html). In this example, the verb of the sentence "are" has been removed, along with the adjective "two" and the words "in this" that made up the prepositional phrases. Not only do these removals prevent the sentence from being legible, but they also render some of the dependencies inaccurate; "phenomena research" is here identified as a compound noun, and "interesting" as modifying research instead of phenomena. + +This example demonstrates what can be lost in analysis when stopwords are removed, especially when investigating the relationships between words in a text or corpus. Since part-of-speech tagging and named entity recognition are predicated on understanding relationships between words, it's best to keep stopwords so spaCy can use all available linguistic units during the tagging process. + +Dependency parsing also enables the extraction of larger chunks of text, like noun phrases. Let's try it out: + +``` +def extract_noun_phrases(doc): + return [chunk.text for chunk in doc.noun_chunks] + +final_paper_df['Noun_Phrases'] = final_paper_df['Doc'].apply(extract_noun_phrases) +``` + +Calling the first row in the **Noun_Phrases** column will reveal the words spaCy has classified as noun phrases. In this case, spaCy has identified a wide range of nouns and nouns with modifiers, from locations ("New York City") to phrases with adjectival descriptors ("the great melting pot"). + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-13.png" alt="Excerpt from list of noun phrases present in student text, including 'New York City', 'different colors', and 'skin swirl' among other terms." caption="Figure 13: Excerpt from list of noun phrases in first text in the DataFrame" %} + +#### Named Entity Recognition +Finally, SpaCy can tag named entities in the text, such as names, dates, organizations, and locations. Call the full list of named entities and their descriptions using this code: + +``` +labels = nlp.get_pipe("ner").labels + +for label in labels: + print(label + ' : ' + spacy.explain(label)) +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-14.png" alt="List of named entity tags that spaCy recognizes, along with their descriptions" caption="Figure 14: List of spaCy's named entities and their descriptions" %} + +We’ll create a function to extract the named entity tags from each Doc object and apply it to the Doc objects in the DataFrame, storing the named entities in a new column: + +``` +def extract_named_entities(doc): + return [ent.label_ for ent in doc.ents] + +final_paper_df['Named_Entities'] = final_paper_df['Doc'].apply(extract_named_entities) +final_paper_df['Named_Entities'] +``` + +We can add another column with the words and phrases identified as named entities: + +``` +def extract_named_entities(doc): + return [ent for ent in doc.ents] + +final_paper_df['NE_Words'] = final_paper_df['Doc'].apply(extract_named_entities) +final_paper_df['NE_Words'] +``` + +Let's visualize the words and their named entity tags in a single text. Call the first text's Doc object and use `displacy.render` to visualize the text with the named entities highlighted and tagged: + +``` +doc = final_paper_df['Doc'][1] +displacy.render(doc, style='ent', jupyter=True) +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-15.png" alt="Visualization of a student text paragraph with named entities labeled and color-coded based on entity type." caption="Figure 15: Visualization of one text with named entity tags" %} + +If you'd like to review the output of this code as raw `.html`, you can download it [here](/assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-20.html). Named entity recognition enables researchers to take a closer look at the 'real-world objects' that are present in their texts. The rendering allows for close-reading of these entities in context, their distinctions helpfully color-coded. In addition to studying named entities that spaCy automatically recognizes, you can use a training dataset to update the categories or create a new entity category, as in [this example](https://perma.cc/TLT6-U88T). + +### Download Enriched Dataset +To save the dataset of doc objects, text reductions and linguistic annotations generated with spaCy, download the ```final_paper_df``` DataFrame to your local computer as a `.csv` file: + +``` +# Save DataFrame as csv (in Google Drive) +final_paper_df.to_csv('MICUSP_papers_with_spaCy_tags.csv') + +# Download csv to your computer from Google Drive +files.download('MICUSP_papers_with_spaCy_tags.csv') +``` + +## Analysis of Linguistic Annotations +Why are spaCy's linguistic annotations useful to researchers? Below are two examples of how researchers can use data about the MICUSP corpus, produced through spaCy, to draw conclusions about discipline and genre conventions in student academic writing. We will use the enriched dataset generated with spaCy for these examples. + +### Part-of-Speech Analysis +In this section, we'll analyze the part-of-speech tags extracted by spaCy to answer the first research question: **Do students use certain parts-of-speech more frequently in Biology texts versus English texts, and does this signify differences in disciplinary conventions?** + +spaCy counts the number of each part-of-speech tag that appears in each document (for example the number of times the `NOUN` tag appears in a document). This is called using `doc.count_by(spacy.attrs.POS)`. Here's how it works on a single sentence: + +``` +# Create doc object from single sentence +doc = nlp("This is 'an' example? sentence") + +# Print counts of each part of speech in sentence +print(doc.count_by(spacy.attrs.POS)) +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-16.png" alt="Output of code that creates a doc object out of an example sentence, then prints counts of each part-of-speech along with corresponding part-of-speech indices." caption="Figure 16: Part-of-speech indexing for words in example sentence" %} + +spaCy generates a dictionary where the values represent the counts of each part-of-speech term found in the text. The keys in the dictionary correspond to numerical indices associated with each part-of-speech tag. To make the dictionary more legible, let's associate the numerical index values with their corresponding part of speech tags. In the example below, it's now possible to see which parts-of-speech tags correspond to which counts: + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-17.png" alt="Jupyter Notebook cell to be run to create a doc object out of an example sentence, then print counts of each part-of-speech along with corresponding part-of-speech labels." caption="Figure 17: Indexing updated to show part-of-speech labels" %} + +To get the same type of dictionary for each text in the DataFrame, a function can be created to nest the above `for` loop. We can then apply the function to each Doc object in the DataFrame. In this case (and above), we are interested in the simpler, coarse-grained parts of speech. + +``` +num_list = [] + +def get_pos_tags(doc): + dictionary = {} + num_pos = doc.count_by(spacy.attrs.POS) + for k,v in sorted(num_pos.items()): + dictionary[doc.vocab[k].text] = v + num_list.append(dictionary) + +final_paper_df['C_POS'] = final_paper_df['Doc'].apply(get_pos_tags) +``` + +From here, we'll take the part-of-speech counts and put them into a new DataFrame where we can calculate the frequency of each part-of-speech per document. In the new DataFrame, if a paper does not contain a particular part-of-speech, the cell will read `NaN` (Not a Number). + +``` +pos_counts = pd.DataFrame(num_list) +columns = list(pos_counts.columns) + +idx = 0 +new_col = final_paper_df['DISCIPLINE'] +pos_counts.insert(loc=idx, column='DISCIPLINE', value=new_col) + +pos_counts.head() +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-18.png" alt="DataFrame containing columns for paper genre and counts of each part-of-speech tag appearing in each paper." caption="Figure 18: DataFrame with counts of each part-of-speech usage in English and Biology papers" %} + +Now you can calculate the amount of times, on average, that each part-of-speech appears in Biology versus English papers. To do so, you use the `.groupby()` and `.mean()` functions to group all part-of-speech counts from the Biology texts together and calculate the mean usage of each part-of-speech, before doing the same for the English texts. The following code also rounds the counts to the nearest whole number: + +``` +average_pos_df = pos_counts.groupby(['DISCIPLINE']).mean() + +average_pos_df = average_pos_df.round(0) + +average_pos_df = average_pos_df.reset_index() + +average_pos_df +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-19.png" alt="DataFrame containing average counts of each part-of-speech tag within each discipline (Biology and English)." caption="Figure 19: DataFrame with average part-of-speech usage for each discipline" %} + +Here we can examine the differences between average part-of-speech usage per genre. As suspected, Biology student papers use slightly more adjectives (235 per paper on average) than English student papers (209 per paper on average), while an even greater number of verbs (306) are used on average in English papers than in Biology papers (237). Another interesting contrast is in the `NUM` tag: almost 50 more numbers are used in Biology papers, on average, than in English papers. Given the conventions of scientific research, this does makes sense; studies are much more frequently quantitative, incorporating lab measurements and statistical calculations. + +We can visualize these differences using a bar graph: + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-20.png" alt="Bar chart depicting average use of adjectives, verbs and numbers in English versus Biology papers, showing verbs used most and numbers used least in both disciplines, more verbs used in English papers and more adjectives and numbers used in Biology papers." caption="Figure 20: Bar graph showing verb use, adjective use and numeral use, on average, in Biology and English papers" %} + +Though admittedly a simple analysis, calculating part-of-speech frequency counts affirms prior studies which posit a correlation between lexico-grammatical features and disciplinary conventions, suggesting this application of spaCy can be adapted to serve other researchers' corpora and part-of-speech usage queries[^10]. + +### Fine-Grained Part-of-Speech Analysis +The same type of analysis could be performed using the fine-grained part-of-speech tags; for example, we could look at how Biology and English students use sub-groups of verbs with different frequencies. Fine-grain tagging can be deployed in a similar loop to those above; but instead of retrieving the `token.pos_` for each word, we call spaCy to retrieve the `token.tag_`: + +``` +tag_num_list = [] + +def get_fine_pos_tags(doc): + dictionary = {} + num_tag = doc.count_by(spacy.attrs.TAG) + for k,v in sorted(num_tag.items()): + dictionary[doc.vocab[k].text] = v + tag_num_list.append(dictionary) + +final_paper_df['F_POS'] = final_paper_df['Doc'].apply(get_fine_pos_tags) + +tag_counts = pd.DataFrame(tag_num_list) +columns = list(tag_counts.columns) + +idx = 0 +new_col = final_paper_df['DISCIPLINE'] +tag_counts.insert(loc=idx, column='DISCIPLINE', value=new_col) +``` + +Again, we can calculate the amount of times, on average, that each fine-grained part-of-speech appears in Biology versus English paper using the `groupby` and `mean` functions. + +``` +average_tag_df = tag_counts.groupby(['DISCIPLINE']).mean() + +average_tag_df = average_tag_df.round(0) + +average_tag_df = average_tag_df.reset_index() + +average_tag_df +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-21.png" alt="DataFrame containing average counts of each fine-grained part-of-speech tag within each discipline (Biology and English)." caption="Figure 21: DataFrame with average fine-grained part-of-speech usage for each discipline" %} + +As evidenced by the above DataFrame, spaCy identifies around 50 fine-grained part-of-speech tags. Researchers can investigate trends in the average usage of any or all of them. For example, is there a difference in the average usage of past tense versus present tense verbs in English and Biology papers? Three fine-grained tags that could help with this analysis are `VBD` (past tense verbs), `VBP` (non third-person singular present tense verbs), and `VBZ` (third-person singular present tense verbs). + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-22.png" alt="Bar chart depicting average use of three verb types (past-tense, third- and non-third person present tense) in English versus Biology papers, showing third-person present tense verbs used most in both disciplines, many more third-person present tense verbs used in English papers than the other two types and more past tense verbs used in Biology papers." caption="Figure 22: Graph of average usage of three verb types (past tense, third- and non-third person present tense) in English and Biology papers" %} + +Graphing these annotations reveals a fairly even distribution of the usage of the three verb types in Biology papers. However, in English papers, an average of 130 third-person singular tense part-of-speech verbs are used per paper, in compared to around 40 of the other two categories. What these differences indicate about the genres is not immediately discernible, but it does indicate spaCy's value in identifying patterns of linguistic annotations for further exploration by computational and close-reading methods. + +The analyses above are only a couple of many possible applications for part-of-speech tagging. Part-of-speech tagging is also useful for [research questions about sentence *intent*](https://perma.cc/QXH6-V6FF): the meaning of a text changes depending on whether the past, present, or infinitive form of a particular verb is used. Equally useful for such tasks as word sense disambiguation and language translation, part-of-speech tagging is additionally a building block of named entity recognition, the focus of the analysis below. + +### Named Entity Analysis +In this section, you'll use the named entity tags extracted from spaCy to investigate the second research question: **Do students use certain named entities more frequently in different academic genres, and does this signify differences in genre conventions?** + +To start, we'll create a new DataFrame with the text filenames, disciplines, and part-of-speech tags: + +``` +ner_analysis_df = final_paper_df[['Filename','PAPER TYPE', 'Named_Entities', 'NE_Words']] +``` + +Using the `str.count` method, we can get counts of a specific named entity used in each text. Let's get the counts of the named entities of interest here (PERSON, ORG, DATE, and CARDINAL (numbers)) and add them as new columns of the DataFrame. + +``` +ner_analysis_df['Named_Entities'] = ner_analysis_df['Named_Entities'].apply(lambda x: ' '.join(x)) + +person_counts = ner_analysis_df['Named_Entities'].str.count('PERSON') +org_counts = ner_analysis_df['Named_Entities'].str.count('ORG') +date_counts = ner_analysis_df['Named_Entities'].str.count('DATE') +cardinal_counts = ner_analysis_df['Named_Entities'].str.count('CARDINAL') + +ner_counts_df = pd.DataFrame() +ner_counts_df['Genre'] = ner_analysis_df["PAPER TYPE"] +ner_counts_df['PERSON_Counts'] = person_counts +ner_counts_df['ORG_Counts'] = org_counts +ner_counts_df['DATE_Counts'] = date_counts +ner_counts_df['CARDINAL_Counts'] = cardinal_counts +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-23.png" alt="First five rows of DataFrame containing rows for paper genre and counts of four named entities (PERSON, ORG, DATE, and CARDINAL) per paper." caption="Figure 23: Head of DataFrame depicting use of Person, Org, Date, and Cardinal named entities in English and Biology papers" %} + +From here, we can compare the average usage of each named entity and plot across paper type. + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-24.png" alt="Bar chart depicting average use of named entities across seven genres, with highest counts of PERSON and DATE tags across all genres, with more date tags used in proposals, research papers and creative writing papers and more person tags used in argumentative essays, critique/evaluations, reports and response papers." caption="Figure 24: Bar chart depicting average use of Person, Location, Date, and Work of Art named entities across genres" %} + +As hypothesized at the start of this lesson: more dates and numbers are used in description-heavy proposals and research papers, while more people and works of art are referenced in arguments and critiques/evaluations. Both of these hypotheses are predicated on engaging with and assessing other scholarship. + +Interestingly, people and locations are used the most frequently on average across all genres, likely because these concepts often appear in citations. Overall, locations are most frequently invoked in proposals and reports. Though this should be investigated further through close reading, it does follow that these genres would use locations frequently because they are often grounded in real-world spaces in which events are being reported or imagined. + +### Analysis of ```DATE``` Named Entities +Let's explore patterns of one of these entities' usage (```DATE```) further by retrieving the words most frequently tagged as dates in various genres. You'll do this by first creating functions to extract the words tagged as date entities in each document and adding the words to a new DataFrame column: + +``` +def extract_date_named_entities(doc): + return [ent for ent in doc.ents if ent.label_ == 'DATE'] + +ner_analysis_df['Date_Named_Entities'] = final_paper_df['Doc'].apply(extract_date_named_entities) + +ner_analysis_df['Date_Named_Entities'] = [', '.join(map(str, l)) for l in ner_analysis_df['Date_Named_Entities']] +``` + +Now we can retrieve only the subset of papers that are in the proposal genre, get the top words that have been tagged as "dates" in these papers and append them to a list: + +``` +date_word_counts_df = ner_analysis_df[(ner_analysis_df == 'Proposal').any(axis=1)] + +date_word_frequencies = date_word_counts_df.Date_Named_Entities.str.split(expand=True).stack().value_counts() +date_word_frequencies[:10] +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-25.png" alt="List of top 10 words most frequently tagged as DATE named entities in proposal papers, including 'years', '1950', and 'winter', among other terms." caption="Figure 25: Top 10 words identified as dates in proposals" %} + +The majority are standard 4-digit dates; though further analysis is certainly needed to confirm, these date entities seem to indicate citation references are occurring. This fits in with our expectations of the proposal genre, which requires references to prior scholarship to justify students' proposed claims. + +Let's contrast this with the top ```DATE``` entities in Critique/Evaluation papers: + +``` +# Search for only date words in critique/evaluation papers +date_word_counts_df = ner_analysis_df[(ner_analysis_df == 'Critique/Evaluation').any(axis=1)] + +# Count the frequency of each word in these papers and append to list +date_word_frequencies = date_word_counts_df.Date_Named_Entities.str.split(expand=True).stack().value_counts() + +# Get top 10 most common words and their frequencies +date_word_frequencies[:10] +``` + +{% include figure.html filename="or-en-corpus-analysis-with-spacy-26.png" alt="List of top 10 words most frequently tagged as DATE named entites in critique/evaluation papers, including '2004', '2003', and '2002', among other terms." caption="Figure 26: Top 10 words identified as dates in Critique/Evaluation papers" %} + +Only four of the top dates tagged are words, and the rest are noun references to relative dates or periods. This, too, may indicate genre conventions, such as the need to provide context and/or center an argument in relative space and time in evaluative work. Future research could analyze chains of named entities (and parts of speech) to get a better understanding of how these features together indicate larger rhetorical tactics. + +## Conclusions +Through this lesson, we've gleaned more information about the grammatical makeup of a text corpus. Such information can be valuable to researchers who are seeking to understand differences between texts in their corpus: What types of named entities are most common across the corpus? How frequently are certain words used as nouns versus objects within individual texts and corpora? What may these frequencies reveal about the content or themes of the texts themselves? + +While we've covered the basics of spaCy in this lesson, it has other capacities, such as word vectorization and custom rule-based tagging, that are certainly worth exploring in more detail. This lesson's code can also be altered to work with custom feature sets. A great example of working with custom feaature sets is Susan Grunewald's and Andrew Janco's lesson, [Finding Places in Text with the World Historical Gazetteer,](/en/lessons/finding-places-world-historical-gazetteer#4-building-a-gazetteer) in which spaCy is leveraged to identify place names of German prisoner of war camps in World War II memoirs, drawing on a historical gazetteer of camp names. + +spaCy is an equally helpful tool to explore texts without fully-formed research questions in mind. Exploring linguistic annotations can propel further research questions and guide the development of text-mining methods. + +Ultimately, this lesson has provided a foundation for corpus analysis with spaCy. Whether you wish to investigate language use in student papers, novels, or another large collection of texts, this code can be repurposed for your use. + +## Endnotes +[^1]: Matthew Brooke O'Donnell and Ute Römer, "From student hard drive to web corpus (part 2): The annotation and online distribution of the Michigan Corpus of Upper-level Student Papers (MICUSP)," *Corpora* 7, no. 1 (2012): 1–18. [https://doi.org/10.3366/cor.2012.0015](https://doi.org/10.3366/cor.2012.0015). + +[^2]: Jack Hardy and Ute Römer, "Revealing disciplinary variation in student writing: A multi-dimensional analysis of the Michigan Corpus of Upper-level Student Papers (MICUSP)," *Corpora* 8, no. 2 (2013): 183–207. [https://doi.org/10.3366/cor.2013.0040](https://doi.org/10.3366/cor.2013.0040). + +[^3]: Laura Aull, "Linguistic Markers of Stance and Genre in Upper-Level Student Writing," *Written Communication* 36, no. 2 (2019): 267–295. [https://doi.org/10.1177/0741088318819472](https://doi.org/10.1177/0741088318819472). + +[^4]: Sugene Kim, "‘Two rules are at play when it comes to none ’: A corpus-based analysis of singular versus plural none: Most grammar books say that the number of the indefinite pronoun none depends on formality level; corpus findings show otherwise," *English Today* 34, no. 3 (2018): 50–56. [https://doi.org/10.1017/S0266078417000554](https://doi.org/10.1017/S0266078417000554). + +[^5]: Carol Berkenkotter and Thomas Huckin, *Genre knowledge in disciplinary communication: Cognition/culture/power,* (Lawrence Erlbaum Associates, Inc., 1995). + +[^6]: Jack Hardy and Eric Friginal, "Genre variation in student writing: A multi-dimensional analysis," *Journal of English for Academic Purposes* 22 (2016): 119-131. [https://doi.org/10.1016/j.jeap.2016.03.002](https://doi.org/10.1016/j.jeap.2016.03.002). + +[^7]: Jack Hardy and Ute Römer, "Revealing disciplinary variation in student writing: A multi-dimensional analysis of the Michigan Corpus of Upper-level Student Papers (MICUSP)," *Corpora* 8, no. 2 (2013): 183–207. [https://doi.org/10.3366/cor.2013.0040](https://doi.org/10.3366/cor.2013.0040). + +[^8]: Alexandra Schofield, Måns Magnusson and David Mimno, "Pulling Out the Stops: Rethinking Stopword Removal for Topic Models," *Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics* 2 (2017): 432-436. [https://aclanthology.org/E17-2069](https://perma.cc/JAN8-N296). + +[^9]: Fiona Martin and Mark Johnson. "More Efficient Topic Modelling Through a Noun Only Approach," *Proceedings of the Australasian Language Technology Association Workshop* (2015): 111–115. [https://aclanthology.org/U15-1013](https://perma.cc/QH7M-42S3). + +[^10]: Jack Hardy and Ute Römer, "Revealing disciplinary variation in student writing: A multi-dimensional analysis of the Michigan Corpus of Upper-level Student Papers (MICUSP)," *Corpora* 8, no. 2 (2013): 183–207. [https://doi.org/10.3366/cor.2013.0040](https://doi.org/10.3366/cor.2013.0040). From e2f2a07bf6b599ed7b2bc4556c4dea52f1951517 Mon Sep 17 00:00:00 2001 From: Anisa Hawes <87070441+anisa-hawes@users.noreply.github.com> Date: Thu, 19 Oct 2023 18:08:07 +0100 Subject: [PATCH 03/30] Upload additional assets to /corpus-analysis-with-spacy Upload additional assets --- .../corpus-analysis-with-spacy-16.html | 97 + .../corpus-analysis-with-spacy-17.html | 45 + .../corpus-analysis-with-spacy-20.html | 2281 +++++++++++++++++ .../corpus-analysis-with-spacy/metadata.csv | 166 ++ .../corpus-analysis-with-spacy/txt_files.zip | Bin 0 -> 1016745 bytes 5 files changed, 2589 insertions(+) create mode 100644 assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-16.html create mode 100644 assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-17.html create mode 100644 assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-20.html create mode 100644 assets/corpus-analysis-with-spacy/metadata.csv create mode 100644 assets/corpus-analysis-with-spacy/txt_files.zip diff --git a/assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-16.html b/assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-16.html new file mode 100644 index 0000000000..e24bdd7bc5 --- /dev/null +++ b/assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-16.html @@ -0,0 +1,97 @@ + \ No newline at end of file diff --git a/assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-17.html b/assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-17.html new file mode 100644 index 0000000000..ef96bba98e --- /dev/null +++ b/assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-17.html @@ -0,0 +1,45 @@ + \ No newline at end of file diff --git a/assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-20.html b/assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-20.html new file mode 100644 index 0000000000..0bd0ad9af5 --- /dev/null +++ b/assets/corpus-analysis-with-spacy/corpus-analysis-with-spacy-20.html @@ -0,0 +1,2281 @@ +
d1g~ky3(PXtW&W+q6t$x zg=n1g>A!p?aOC1C;Pxs^Ma4*467}bT4Bz&J zNsoA)T=Pd$DsQ4Zn_wL!LN_PdyTKNdQa=qVe@#EROd*2_X9?FQ9!1^_9Awx{tjWBC z*~rD*tf&y~p-hkwp&q<|RTGS~9o^HGYWc0Z2Z0M!I=3XfLOJ$=(2JuQQ~Ul6OGMaP zumU?&66#MCTV^@hGUBOZmrD&UHX@o6QXRDi*&+Sp($a6%AdtnD<&?_e=0u!9zs(P( zGLmH-5WS_(MuwSFj2U5s+jU}&xD>{< d8p _-`xeoTU7w&Cn^3k(Q7TN67b4QACcM(FrZT@Gr>_f9c9t8!&4GM};A zJPIMG;KK^+Q^zoxpp@d?vrq~RmOmtk@NjF?L)`+E*;|pd1C=8Vo(q -`=T4mK4e(Pw$2x;CVP^bl+VUwD*-G?E7ZGh|K@Y9ScrN}_z=u61A!McEVTW45 z4<@w@Nk&N8qUMyPzssOTFJTMByu=QJTv^{RM1HG$$4>?*pw0z|f1;@OoLu=4`3pIK zo*z8W?v4&QLHy|DpBrf4Ugiwba{M>|B^Wq*|6Mg@M(UkHIXpth6>UQ?8+r>?a)r}a zm;4-S3^=TKZa*MbWmLkH(jj!DV$1 uJ5mz1`w7IHCl6mS)Y8IWos)`~&SsHCl zu{6#Q@<4EBV@lD~TemfGPKAzL;?gW=C<}xKHX$-}{vb9pt|f@S-cfqVkNa#vI4ph8 zd75_KPWl>Ty6b%Q1(1EZgiod(%|PMRULp&KedXjj>Wv+xT6oj@*N>Q>W9v5%VM9vF zJ+W}GE7nd-O`| %xq11~y=fP5%XOtmyU+}uAn%vpc8meb zF`SrnkK?MQ(0``QG*NP=fEW{VjrAI+%i*dk!j^tPO(Q4y^4pC)GEmD|Ewk#qO%=E0 z5R-q3C?Y+bZxTN*CLom`#>YhuMnF{qQ!0wvddd&N7D)0w271ZSLCDxzBUZ;DO&X^n zx=T~{>aj(V7uht@4dAx;D6o*3z*8h^;LaxVLir6grhkbMBP>!zfCtw*0uYGkbIp>% zGlHKw7N*Cf@~RnAQg25hGHAKII?p7LY2z-;oLY)m-8?(q$mwi5@#8We2E69cCo#3A zSletYU520{Tz?Nd8x0AEx@0-bOfinY9QPs=rOHerU)_2p{oA`8Mnoe|^*trkOlxRx z_KBOS{k*Zlkc#5E(s+~zF0l+Jh4pd2+zzfbw`3Y6+C0OyqBK1dDLNAZg<3`_Jt}zt z8 IADj3o44kaT5WEjT-RLQRZ}yMjNyNgP6( zT*`hWbnJ}P%L 0lfgqRpfkY0h(hYRIBuc*;x0x3*ZT1m>%dvwUfRyTt7z$f+G%599B z%f?YP;N9h>
YwEIDwpojR`igM0V%NE25Tw;Dj9Vf ^$L zS9u5%{SZCGPY6Nv)Dt5cL#j#FLOwe?YY7zmjt&*Rw#4s1Dh}c!!f@=VF~}`bXpKV< z#y?T$3PbT3vV)|wQ6CUywsXk`WS@h_zc}pBsqhs6>R8R}g~qYmQ3Cod4A9Z6j_I9+ z?TmMR`UQS&0x0ntO%}tiIv8yZaa_C`2CJC(Wq1d#u|;=T39axam)yGzkGXw6fBpbI zE(T+6?CU)+xFu#<;Uq0H5qRzt*^OtUR8ysAJHJcbY5dT@V1f0m&l(#SDh8NTfKWJg z2h?F3EoY=mn!NM7WliUuZ%&Tos)e+WBHTqaI3fI&IO}EytDle3M_uDNwo|J|XC|&= zZpQLxo*vPfL5T8QmRQ5Ph9x1_F&ne~vLn2Gy-?F_!|_7#jKV~%&ixWf-%EcAB>G}n znMf!kfR{L;Le#51^=s44M%OuP^Ep(!mnQ+%pHnvtH$K9OW;z0Sbci_8A?|xBk*w5u zOUybByd^omcb0C@NJ@3@j)_GFbygp}xDZC%7|qc$V9B`tnV8P2Z;3WC>p!&|Hy6kf zeH+h$*FB@X$d%GRp5AETBU^HAIv=+D#5Q g-W#>i_l3eqAcrAVkheO||CLip+7qZ1;}JIC!fd)$Z@pW9eW^kwW^=+zsK zKkKL00-4oQu~`o3L4h1?FxvEc7-34a9!A@ipwSjY($te1vr9pJGd?{bWDF8Z!yb|| z{L>SnK9X`vclcgMD0}&(m|WWzQBfT@!D)PZPD5x0sxglKapSh5fngWv7|u?)@admK zzsHsb_jq&_&%Q#M0eqdOp8z27H9@UIQ@T`KlK?rr@282zN%Q<-+)1g%wMXZR8w9Yj z9XOxx)8=B7#(~lkjEFcIHnfCVw~UJqujt;$Y+dhp#Us~l*XqDvy`)qx1hyTI%Bpnk zf@2ezGpXh{aN -4kL z@i+U~@M`^iUE(rDx&+?AJtc2S%7kbgS}T1_aZ|hWuNQIhed4E*e2FHCXyUl1dE8v0 zpN67_upSS*=(Q1eCKi5d>ODsc!K#d}uvD%_!3o$i%u=@f#5a-q$lt%)=T6VpU`|QV z*NnHFmyc=ZCZ!NShRY)h0q)GSj+{N8$#@FZ){RC|`}$^z=!x}b=;y M?RPgh$yz225}m&Mua^dIN&_xwIy&Y|_Y zOUC$odi#3)-QAvV$%K|!+wkl3c6y6J_51sKxqcm=9 pX8b$L4#SW^%Z zrKh=Xams?wq_^B9thx2IZV#anDvaa3#GZWKNM1&Y5id;W4wcAcUApH@^;19S)4K3s zWgS*=?plWQNp`l7{>k^zth|HU1Y*M};<+z%&RV!s+}+kh3Q3czK*0Mw^GPp&Q-g28 zo&zf55Qw*1J sQ W?x}9^d~#pkQZrXq&iFUk}s#DGPPI zry)m#!Y`}VS)~GntgjXl?D~5dz+UNjxbvCK*gBbPxC+bVtr8ver?LzUs85lyJ?1~h zj>?lIwkdTby$KhJ%5)ByzB~0KIol%o$RUN_e(U1}XW_@(56WfK8daz3_S@Pt2&=Eq zCXn7`D#tCVx)SGF?mlX!$&ZAhC#`7yq|`1K-Q-t626X%IFZ73^@$z80xxOF2G@VIh zClhu2d%L o%E$cH+^?AtN)prX@yeaCZ>GEGvgI`s=y7KQGHAQb$r{99cFH zzR#Q%fQMF|eujHcrZHpCAGIEZOtXIvVaPETN_d)Eu(|(C* l zR)En(U#_ZI9+KGHUP@{zc9<_dj;=KpZaSxE z>$8(D!@fEqT5enS$kWCRIO35z(TR5>a&!LLrTBY2{C;y(u(*=-z~72)+ko+3kTKQV zwF*%!?Y;u{5#IoVA0MXJ>G^j3G6cINH0X38J}%bma4x-WQk}{3_nvaJq{pjBmZ`RG z^NP#gQ-mre^e8c~i=^+^A6Pi%+aYy6Ue $KgT57g5Mh&skyHf&9kIORN8rT4aI`5-G(*J`Teq3{9xbZ?lgpl2yz zc!BP!x8#=pC9RO%nR6R?W9E%rutG_*nvxlMN1+S 2#9c9ANMH3v z0*BrgK{YR<_f9OTUH+#(6zVp>cP*3ciAV8W^qp_RzHiDzBe!!?^Ny1`E)lSLSQ}B{ zX|7sK c2A`fUhb@ zp*YN%BqmoRP(WE%a65A;609rrs9y BZqhRID3ix9w;5%yGd^S>`R zsVYu(^qh gccB- zX~<_W$oUD9Xgm;rgM|($@H*d4o6wdr7egn<(5}|FqHU#@3#7f5jX2n8=h+6mYAS?` z1Oz3+ZKeQ=yQ>9*JckfUt73To+MMtYrf5b* RtnB(De z0;# -5=8?9n>oA>n#UWRH8o{N;rb4sccuBA14NQP0HE)jc zgjO(=K 3iRxxEL@6pCTEL;~Sa^m}G5@I)MY>d& z9MNWjG$xQU5HIJL>eaS#fhO0Z#BX^(+pI?98nl#4y4wYHmM2f-p+zD5MY(94PIo zzf(E%n}#mP{vI#f_jr|rbbloDZP))c7A@2MrV~sBv1|aOLsXu7V5S QE;H6UOh zdkY|Si3c(mpv+F}26Oe~-6$1>acLo9&tb_>rElt7JG+FayXu5&7lmj3J}D?xZa_qX zBr`Q?lQ$QW)GeFhWV}N+x>IbiK7Hvcz=i?l0~mF@SrW?5x&T$M%+RSQ!NI7?pSTRI zG{rt^M-aOa+3b#q>F17|M)b5<6_G07c-;wkkW(NKI7kMbcBETrdZRfWWm}u?0>vr0 zAfgd^LF*$-au-e~isz2iW>2C$DbC$c{fK6odTi5l0aS%xB{!e2i5?NVo;atp230LW z8 2TnKqv4~miJjI3)oqpFnAUq>7g!Y|T23C3 z;`Q%2hp!mR1&eZ#xWFBpc)nAEY!4UQ`DkxIKVf3RgJ4b3=|PHT;+;Bb+)J&Y1p z^9&Y!7S%8Knyk>8k!?nZtMX2WRo(f-?1}x>@ t=BU3rjD(wiGC9w0FQM<#ZC dJ!_;&Z zpu=rDBHln?6)+J6X{}~UHIN7=HBO5`soE#Z_3nvnMrsK>51EXAR?S{F8pQVxQ4gwA z|H{`+>(+K%# 28DS6M@4m$mZ!zke@BFnLBt+yY)Qmd;$;}> RsY(6rAjWv+7%=Q+;W4QkweC|T;^NSy`)k%_Jv+iH;WQ0{!(mt=dpv; zLx)_Z(u35mQ>H-Yk)RtT@X}aOjHa6{Uh^>0xjl%c$Dr>5&W3`Y!Gu8PKM(H`L8ziY zx~m4W+}Rvl`6sm+8(v>V9 7A;2(MG;c5?@ +_*&?sd153Ia_~ z+{%}qj?g!?!45r(XoI~i>#6n^Go<7;!U=xDnAYcr%0gAh$Dev6Eu$TC#g*r8ycYU- z1rU~&>x<1%`i(!Q(ko8%@a*vcA%Ud<8US#fU&!rn;?g|O{sFN$SNaU^#dJMhKJSm0 z!}RVsLjI21JYV!zprEI( >!YX4GJ_u&|Ec zFw{Pt7+Xle8Rq5rc&q#jfd$g4{K47HVEFh217x$2b#XkH+8a(n%0QC~=CHh$HhC ^(6mCt!n=c?Kj>L#U>8y17X}x@o&=s3I z(0IU2UYcaSv`Wq~C5a!OBE~Cjid(5q-I9d$BO}F;GJ_DCy&@a#zZm!ldKperTiKkH z$R{^~>f}edI5RcaQtm|ME!eT~%psJ=&SP|0h357(j$EGzcq!i@qb|>E&sMnTD_;x8 z2)V1fVP>KO(?k^=BO;p>uk@Jdk|gz5`C}~NNkk5twwKU+3v(kX@O(Inn0CJdbB 1LD@B1Djq0%7i|$fXPJ$bb%)31#DAoXYCXstl)ghg9=w-LAWTT&kUD_KLR^ zwWRC@h4~Vi@n*3mjk@AkAMAYOo9Nyr1V5tkTEV}*sk 5-_>M2%4 zjPSstDn#iotVZ<(gW)I>u8Oi)p=Jo6TJ8s=a;=;obq-ju-ex+pfn3X1BG%kpa}E&b zdk&ncTL$=ksni4HO4ra-DQ%F8>ENiA@7NHGPu}_g8Eq%0R2_|{YM?28pe+GHzdlEP zKo7K4-8z0XCQA(y6FdPp@LogyOSw*Qr#)KP!~)?V_8O(&Rk*R^^(iIbaxzHm_C%rE zZg>itEJ0W#pF~0k&4*&U;#$sQ@*x^pW1oaW`3lY1Y-;RdSuGDz0J9fkOQGrDVT;iw z#}~{5VQ6okVO*}KIlyZAc9siogxa_Xr`-pf`9-+asjB!dB0`^%-J-G6H=xRju~>lY zG?W!1y5Vr=lG%v_d{Em##w@ulMF3S5as}CiU+x&NW5BEjQWUsVPwh*Y#iDF4qRbKW z*3tLHfSKCN(j}18oWN?Q1S@ z9_uaDe5`>5 lCXqVb4sh6Mi?_n@6-s|!7d9?5R3*$PJQKq0 ZK9jH>;97O43PhfP{|Wud -C}dGwYW8v7v(i4aThI;gM{l~?q*yrxUA(6ax}px|Nu4xNJp6Vv z-|oPKy@0vi=tgwak4BcSPoPZdr}95vTGAtWlOuK8ghx4_{swz?CbcjN#Lg^3pe?A& zJM$+L#jx$?e6cv;*HY)&J%yZKDQobaQwSHYP|LoUX-PxYXChQ|;^$`yY!T9K8o>uj zxRA~9XW6Qtb_^vG6liKA$sG<>ZcO@?xinuRm9XM|09&OwVWP#x;N#g_GLr>+a}quw zuv+|qzB-PIq@7+26(U>yi<8)t`OK8dp6C#1AIV+S>1%bQX5=%UXp$I?NZQ?yaGX#a zW6x_5f=9VD!Rl NH@vBuxx#KpIB06(KZ 8h{v^D>Y~t{o0A%(s%B+{} zJLe^xt IX<)a6pbrQZ7)E9Y#^ n5A+CtWEgZMgKz`jPKTHoS1X&^TG?TIpbInov!Ev7+Nl#P0Au8@KI)niQRBe z>*gx2ef!vv2*)noX=5IBe(27NFCW#pE$TQ2LBKACIxpNEp`Jg{jk~G zh3s&=CwL|40r#k5eV%ZR^<&ho#j7B=& zOvkq#O{I;&MdqV~@4*!#`Qh=$ULK#n2d%S%8wFxHH2IewR_Hsbjocgq2Gx8Uvpx?` zUw`B -wCOVHTexr%j|$-ZXA#{ CnyDl_;!DIBlsuiH2*= qf{xg^)J z&aA=kstn?J7H10{`7ZYJyk41J5~LI3S8?*R(T(F!cs7CuUnKN+h8E?$^zRspBOGpC zf{hUu2Kp088+QiHWsGo>APh4Z`-Y3C*JAoGn58$i4&K8E*@Pf c=N)<_j9 zzQUe?A^>9&%b`5ZcV&8=^I-{tr~6eRYQsh1IexS0<$CZ(#8YXa3sZgH-|P0h9`R!_ zNlvvqA;D6XPBh)#m0uG4J|AJKpfHZ@bx-yU$u=CpPIg9s8BcLLN3Y;MP8W!U9MTAm zZ_mourumHkqyH$l!B~j87vA*T8AKALS)<{)ReR*0|1mz54LMM;V_Y38UmKiCcT=f& z(Y??;7W<*%z)pvP_KcV$@Wu&ZCPl|*+T!u1OT)~35M4$z@QPrwaHvHZ(z+*O$=!OQ zuIPa+S7d)tesVyXL8 Y@sUe38f^*VTyvp`W&Qv09zw+{;ck;f{7P;6j3zHnDm@k z3uQMU82Wk5iBv>M=}5%R%_bp(LkQ408db;!$tY@daW+tV=Q8mmiU3R08xN4eIZ=mk zA!`B_FO@1z%f)}0m*JiYrsUh4siFEx;1SzrwjB$T5DT-Af@DU_V+x?al$2*4Y{!K9 zr%@;49CM-_75|EQuadZf2olaO7M-HU{;+ps5NjP{f}Ft7t(+C_jUnYFtEt?4KBIyH z-D*eAc;J;VH2DHfC9a6(+Hcl)Qg>po#jcYs>t&sC13}7{uBdXFt9#7>+G2|&U&o9h z4$sWruBO{rh+I%Hgt!&2+WCasT*y#{O58&2kf6_K9%_QF;pR%_Rdc{n&UVq0)O^|Q zZxJbfbb)~sn;sN$a0}n%W}RguxEgGAX~bGk*(L~F;MuaEGR7InMo|X>R}he$q}e$U z!&mqF0tE-rWwQs`*EW!UoVHq{H^x3zs#C4!(I9tdtTOG{31V=?5s1`P#~gq_n(*1S zHHRPHh7M0mhBJ|;$y*ROd>7?%00d6|+vP&>55S_Uxh=Gd_!e-?YiX_OWUdnc1vINN za3Qq)Zc1R%o;Zv86uAh*6f8rYq%z@hkCQDXZ{qx0H3#y$>Z9!mL4H%AvG0ltZCG?s zpoyJ(!RUikob&Aosr7l;_71YjqJ9|Nsk9;aQIWN`6F8KSBW15CKwAZXj>%k`&r|?7 zH_T+p=wjiz*8nf?P{&A+X5_s%sQtL@-k&XR<)ljyap_* O8;`Xe$N(*%BWguxlxxA@=IH{0Y%34U2FF0*TKU+1>rc-;)9<&sW?;fo z52m%z&Y#2D4StPLmkD25z>I8WI#0{czkexcHK8;MSFR>~qHdQ@_YIk 1ARqPgWh1&XllJwT(;j9X(u;wabPnTP3+UG~}xf+>6^di+cZVIUD z7FD&_q^y$et=GnNpZ3DkF!-aWtE;}w(|Vad45I~zv;;FfHIJ#UNvH2xF5DPeN6%|g zTo04j>smsNE{@$N6J#veCF;cbvI}Y|;klx=Nxae|c?%ax(}j60xl1>|q1RJoGwiH5 zK%i-L*6UE)5w7&uzOh@->2N{gtOgxmHPg3a+z{QSA3b9uC1`Y~&*nkoIHGH!(DC*g zY#Md#n?WLRI?X7DtG47P4(^&QktTk@r1#*$Ws`2g#z_wymCa9AwT;d0z0&P>)0Xnn zXj^IO;v8iKbBR4bl7*HCGhJ}Laj=AB3jWAj&>!(zWcBS{y&dGR`E>3UeHwG5z-cnw z9)|;t-`x XcyKNz XK%+%bC`_U@y;-WM}xea%DPztC(Zlvvwpme){N$P6JK9@AnCb zy7QM}@gOYpH~{v}QU!ONTZ8d6T jO}c`gx#2-P zPT{f@W 2{*t9M+BMobV>Cyaz)7gAZT*?9 z+V(vB8M#O4bngc?1&6jyDZfEihh?6L`w3#AzfJ&8`q%_f(9pi3bNR4?8dg6S$+ z^vLk#ubBFsjKP(h-AeqaK#N|evY+GBL?(DJZ*WN(ew;HFv0F1R9RW<@Kj>s0IoTNf zjtwlm%}363!gSYo95WqOEI9aIwqW7Te=g>_gP9~W1Fc8~a^i@lBEsBX#%_^O{_(=- zV3lBDUe6XBB5E(}h?slE6VpR>%m;2M ^%3QTR%t52sIV*#Oc)nn-O$DsJ9n$o@PwukQ0jUso|Qs8@A+nYhI|Hn zmalx|L`+KP7WI! jNfl%Y(Ha}wYOf&`>bUS(%Kz%W?6--!nz%8 z*S5OvUbb1O-!9B5ZEV)+?5MvDxsU&vT-%&O8+$EvO}DYTDRM1s@9APb9FMt=`D|@> zKg!Oibg<^T_vmM3cXv|jG=f9gDtXi|v1^uCPivV?mFt&(`{C-Y8%4tF*V|I(B@~uF zTgS`aGacJyIbRI%7d($24>zluTyAG{cKiKZ;iB}|-S5)yH&pMnl)1OlxSOBer&rCw z`MLSvh#vgYvyFQW7(2}`srS2&GIPp-x$QjdILD^i%B-}~7h$bJce#HEIh@Y5)Dt8P zd{ S41dPjf9MuW*Jo!8I-6Ls%V=hDpRZ-%F(IgJcJ1F?Hfw&C(#$qh zRu#Vd-c #9;g$_e~UB6>JHBU 2-9M4v ?_@ zO$papVdqG#Z!5i@i0y1_SjQ8upKS^$dq78~Jc=puEbVtZy&Jmd$A@PjkmQ#15|e%6 z%?}sW*ns=0HOOd4b%W(qGn)LGQzj_>G3>O1=2vj&T}KCt>b)rhnq7EKKCec58fDV1 zcq7*zWEW_qr^lPG&Sf=6eHWx8BL$D|c;$U$dz)$7geyfATt20-sF00C4yq{tk6z=XL&cH+Fz-gDy)JO5eE^q1!{VBNDXOH50&=plmAxFI zZ7VA 7U_>Hp}eZnjr zK^6luKxHUkfA|^p60U_tlS(RZh0R0Etyro1nq7!Z5tGYO{m+s7tYC9dQj7Zyiq<&Y zcFE<+JmQFg<&ph1Pd>10i1L8+i)#tgLL8y=sN#yFR`(JnHV?`+c${tY)n}pVVYj zePaF ABf 0qM>{$XCX(9 zdXXKhwpi0}(@T+dv4Aiya6t}Ra)|I|ag!J$!S~l|VTGaDO-xVbH}8(B{$KuiQ8kB| z78E+ux!e4~4uqeH4-m%?WS#6~#L>|8B!1Bw3pqgU7P{`4Z<7DX6=P7Y;zVzK!ym4Z z&HYt*^8N5%-Y|n!xNst%LVx(=zN50Eq=q5hL_JSTzj`=`P_Q&o?o`iGUw)szzn?I_ z?1#(u?L|eSQ?(9_Q$PLT+Y!V2QZy6UI~S)7$H1)J;^&aR+UchI9*|P8f|+=FY|Y5h zQNHp-=fXfbOz`3j$W5RdXRJ?1@V71tFLdpuBQ{XsPHHM%N=>kFELe9C_;aVZ$6^YE z3MkBdHv(JM#fd-s`Dxh&07^glw=lcE!MD&lBV0ldx-D~m+cg#siL3d>%(D-~%&tOZ zK+)@jP@64)&Frdh)PMO1D_6y}c@2rn09W||u|!&hD_xW-A;1!W&`3dRi;SziS_|%e zya(v%0P5IaRRBtBrO1?*p27vN4xOZKXeGd+;BgYvZYC?>rYoT8Z0YiE4a>BB6P(Vq zIg_mJA-V#m4Yi2s;+jIw@Q$8!F+wc8cmS^GMUR&kdEpLZYOV^Sbay&Tv)xSI7SV?z zhTvfLy}b_!_T`KY@z2r=axz8-To5?tSJapK*<~!k<&Rd3EiTE>;RNT{H(oqwir@YA z0Nwwz`7ry&i%Z34zG*4Qp91@(s_=0MS0O61_1BY}1YBm+fHpF-YpvPcwlj?q2(;Le z#8Y2FTtGs1U$08egmOo0jEVyMSW2~qYZ}Dj;^gwt_iRH+!~&M$4w}pPMvdL{L!1E< zC}HpnGimK=2|``29xD~4uXJHV7foXe@T6i%Kh)2b|0c)4majLO|$%z!TIsyrEs^ znL3o^Xr2|b$Xx}bi#IxjI9-V7x(z5kxcu)9svX469)P4De}Lvt58NZax1~8yL|SQT z>jOq`4$h3qt=K~0eBVU YTMSHeC=^l- zI}zVNbPL&_yyF~OEC#@z1Qx<^X?ZQgBI1QY#yaPbT?rjTqwKA-?xz?5CuwNlX~aV# zQN1rRwgy=8xv+g{ns_CBL`4?4{|@YGFBY^I0+~qQ(b6F=LiQmlieX<))`eIzdCL5^ zV4xrKqWmj(9}$pYP?#F?Z8qiIlNVkR5kWWG?eBX$+5(`R{0x)l{H5yH8zm$^k%akn zY{Hg$KAva3GrgYA=!KT@Z?2lRGuZ9-4xGQ=X7%S<-GUc9jFQ_0Q2_P;!rd5GygIBX zBGxhRNt!^==7ZZnx2cE@e!2{66Cck%3{34)t~o)&rOJuqf|SWEptXpS5gbGGBpH={ zlYI0GCubGXlwXJv@m{cvOf8Uff_!ix??$7?k#wZBF4=-M=!sS!sJ9c$C9E(^GB>hu zt=WFeh^8_dvcX^}*wiaJY(Zs(6ZiGz^?TXj@%TLmh6)N1+UH2ELOO(~GE!N&QK^y= z_(Y65Qe~ADXlQPI$~SEcv=|I7!i;}hIvZ=^8-+?1JrRCnPdg14JNdx8ngpuQOxz5@ zR(M!qnk*DD-RJrHzSy5 )Lm31X8P4YjIjPBJTR28FDU&fY<%8UosS# zi~Y6<&JtxmWjWjyE2519nU>k!=Zr)&kyA@NMDq2 ?1sv7-)m@jG&IlusF#l2=r zMqt@bV$g~J qR}vR0f;J7HdeX_Y>4qHj>9IQ0vJAtY8F{A?tTqPM_!7wK=fX^3Pt zr^%>!dx?)Y#sF2aBA$Rkx|gUR>*BoC)?Q}7@~x{T2z3DT83~6`^a%!^ytE`%R``Gq zE=!h^hWfa_K!BU5P1s?>>ZH=5ZNNQH_rLreI8~^lj1mZ#V@t%*?!d#QKcKx7q2m%y zB!8@^Y)swo0#iIFx7wVS)9v%(h%?1S117yL&u>W{Kcgu+TDK$wHmZRjxot0++PB*V z7?Rk>AroEVx5{> lnXNjoNb)po~0&WD*7^N?+7k)Ev-6;rC-IWy{{+_g4%~ngP zc9g`0m}Z92i9j01p&wusx7Thn@g=_ZC$cn?z6$Q~5E6tt3`hA{`gDI*dAv-0a4~cb zC8C&|aZF|Z&d=&jA0JLbPa)z07VTr5(YI+}9>;$n)x%hEwETSh*n~BobfAps`+Ac~ z>2LSLeNg__(SigF4?9EFx4fZPqDALd9EQZD$ww0}lAA-CT(vlzT$@Z?=ARWSEJ2Mo zEcozDBlN6e5&{*w{DP%>PMHXun#LBT4++yVKpEYPWH?qjG@`eeTj{#8Kmw2Il$|Bq zxWin&p&WMy77-J+VB^FYA#I=_l5DR8nx0Y|ju+jdv`);oKoW*q>+K80L@l{R>hNJq zR>Y7G9tndp&@u%z(PkPiWs$B#Y)O`ihKN2Ix1nh@pu=%s=$B>HGTov#`BoU57yKr> zMdoK3O&>Nlh>$J>!maoEum_@t7gtAZp0Xx1vzwy8$e%`sk?`vZKo*xwr+VR{=Z> I~h*A6- _H@s3w8W!u1n3h zYs;7>xYcs6YvX;#egCyKbtc!z;uor0JZm;n-QqGY s5$M1_&^AAdw^LR2JZRU^R4GJ{ zxaC6{va1mNYQ+ID1LQU4+> *98pY}foKi^st@GUDOY?bAj$t161C!)*5Ky) zaJ`Xwc=m 55UOv8KAAOX*Q5 ZmlzbgB6%pEy;i0a4on8|>;A->$IB)=&}GBPddN53 z{sc5Q7ETK#63Ie522WO1m;jgAVLH&>1?LmMS2z&qH}`EYYReq6iW_}dsNPE-ly4Fz z?a_4uNW=;3K>UhXnxJOBYE YbWs % CZ%r9`(-n}Sv_lj8EnXfH%qjGyrqt? z+`O?n@In9~_TY|Keg$aONv;BhMM#c}WKlodY}b7kZKb4Qd&yd1DJi5$n<$lPCOA_Z zyIVf;(g;fQHNA0uQC8GqZ-5QpvB!^>`GrBH8Y9XW6MkfobX~?JL{+S?xmN2WJ_weK znn-OAiYiIAsg(W44cMzZ5ZQAhO=@-M9d2lWcX4kXRkY$6XpwAmC{EL)IE1!ZA2DO` z4qm#ff*Yg$iZDw+bct2`nG~cZ_3FRLvQ*1hfzR($rSC?aJv7)5bXxKlr03$uO6>T) zu1CtKQvK36;~C6`L; 0dlJXiJ%)7voj)dIPLOhl19#(S(6^^yW@In zWhM97E=!mloCYBl!!xn3vtpCMVOZ53gS;!+MI<@p8Mg?8iJ_#jNKp2zico_EDru_m zk?7$bqhhIn!loBwMRJGA1j~%{rY4?5mBi7WV*SBOjn sK;fRoU*8C zK?XTNSdv$3j-p&v>z?SFls&HJCA7%i>XzGI+ @mZz2aX|MP?GSE_GO;Pt*-R-6k0?VFETWfhhtrJSph7{87i_C?dFdN^c zaM#8@tSHzfvfZSW26}Ck;pNSvgyFQ6_8_D!OOmmt=)dF!vKb9fxSRT8fy;ynTJt?g z>BQayvz>@0k)V{5`U!^wcQ;-W b+7`dG^Hf#SX_tUu~OZeZpQ_G|MjL@ipXK*^l zn% lOo;wRF K%0CL$A2**di86db!b6yL7;_8jKW= zaWFEt`>Ubns64Zi<6GZQaF6$Gw_876KHLG_a1Su l)-}7v5)xi TWC6gIO8{0KU2UJk6FD?3xX6eN|3 QS1pj}rdl1XHcfnMj?g*Dxr&IrM9lG#a)oqxN9aO7oJ z_BT2NZRNjXH_b%BEs4!UdUfo)*zUO_Ut{00h$FE$9aFL4OX86b9E*hHbJ%D7#_2~< zs9GeHYMA|@W;~^z=&?8r&WkrOUrk`NRtYQQ9)p5WAD^s2KsBQHq2a<%$6{xOExkYT zelvm92LL#v4~Z3FDesvpQa`HMiGXnGVi3x7BWe?|qwbR6@Ui{3>lE |P=Y35r z@7a?54nBM4#a#9H+&Et%f*+(bbnrKxtB^ckN7B(^*uW0;HHm|OlJI_f5xLPCBXw%a zi8J;NjQ_|-4&0lj_uRWkJH>80&iUb&ZvmS3|LtF^ClNFnvXXvR7TEYss%ezX3`EB1 zB)1N(rqoX@zt4)-W#Wntdr2gImAI8I*+eMAMlC8!pjHo`f%w6`zqQ}jnmf?0l4*g@ zp0yVTy()w(@7uLGj;?^JWgBPL=Yg}$&M7bGosbO9enrX$_!|#ZD;*T+mH=(RfSK9c zk-$u_naD+C)xktf_`BAA#DLVxP4UZ(ou{A_r*#)TFl*H#*sbp9t1-}eCt_Z?>~o1O zbUS*a9w-F3Ds~>EHjOU&``dRTdlFK{m6IdYhVHRmX5X6L)ftX>TZ<_<>|iETVWlZ6 z$ENnuesHP8W)yg?Jv7CN_X&P+8<@2OA` VI1`dnlk+}2`bv@d&->iB`k)5uW_ }>1QdwXG1r>_PtIVo8~u*eV6Xn6`a%qNT*S|T z5_)6|t=;K^F(3kSa?c-<%jawj2~vwJuGKJcgvokYIxb1}xd7~%lG|l$GM+haR;Q{W zSA<|w64sibNy5SMCAS70&hhov-e7=_2uw2^B-(@^1bvAr-3w8yYZdxbi{XAKlFW_N z Um?&dAN-l&r-_RLgJRdKs(`OB@rk>w<&out z(~orVS*E55eDVf({8+8iEEAJ@@4!D(T2S?6*Dxv}%e-A*tP9H{JLXP%H@aBaQ=YT1 z^B0$7Fq#jQZB*gcVv86&f;~)-f)=2A>Efg3GfU;~NN8O%rwFL&A(b}%>6}6k21hy4 zNBU2lQT#6CJ`v+x)fJfdH)}vzkhyRs68Z=kX;c{y@k UoI>Qy2Fq?{yDHTnSi;mwe)jZ)hyl*FjKuV`1bS3tcAxYj} z1i~MwUSQ;j R6>4OO^Z?t+}LWgMw&! x|2d1zf#TqGO4j0Y{4~#bn >lUkK0e-fqsmfQ&bWpvvevPe@7!M~tUL zIo*Ibt=j0mGb3&nzaxlo0~4W>1a PbTX~$Atm{F)U(uG`z?|m4}W~Qd%dK >IkLKE$ VvmbF*gavPGiJXZa+c|Mn>sLwJ-Q)N%vbGze4PNy;S>vk@3BHLXTfUrtr6l0n z1x@G}n-*8RQG$Z=mL4Jzg?8Zars6HXG $oPcl|SNb~7Zc@)0=yG`_~>_eHKj@(E5N)KqM)vDO)@z1XSyT-Av|4QI`_182P$ z>39+kD!Cs>X2KE_ol4gq*v~T!rL{=#BPC7kwy0Q|#0XkAd-6^&x1R!73~5PGjcIV{ zqP55WN{cWs#o;$di?>A8wE->IwFSlhB 2}hPLEkV zLXn?5&B6O)0$T}YZ<{5alS$R?Q91`(hm1>yk{KBckabHCj)0l8rMx!lMEmYoHIQI~ z%(p=c0GpWZjM2p;vmGC34NahTC^zD2VoKw687b-ss*89!En>a{Hwa1Qr=)e5;llr| zva=2vadWQUfI1ABcI|dg&5WVd&re$wYUIV{$^*I>x~k~?A#&i`OxR>n=71dyS4`1^ z+ji}BIDX*+`)cU0AIYCenMjMQI|JZ+UY|>)f_l4u!dGHAqf$;@te?Has$kP!go&ES zKd@VDjQR!rf8Dmsf9bIOWFPQ6#=q{n=>MG#v;HR?Ue=y=#FlXE{iM8L-9z#hNBBbQ zz-}NhWBv b+_1jP9@b4BuLXP4aS-l{aL!*JbAm*cX&H{BEMEqQ(Yr9X PORZ(ILMEqN@K{`}8e!wiqdKiw z)l3yxsr $F*}Ls3hIET-U8 zOsZ*_s%|pHVq+> 4R(`cIM_ksa;5pP>EE&HD%TdYTz`sEj0N{uqG3woPg9v_dEr}f9WPyV6 vvWADee}0p;vb!_L+s1_wTWQHc9XgkqzsW1wXD1p#?wvE9LsAf%%S_uPjI`cM zn6mAAK@}z`r>M&Nsu(Qi7$xEq7AmYP+k&$+u9Z|4%dLcy& Bi}Vs3do!va2Oxdp&$!U})?k)WBxD^m z;#2WLKO7I4*AjlWr fp)c)5oPiIxhafZuYpSX-NT$59J>_TS;Bm32XJkqX zgy&1_sXG|Zw8*0g@%7rYoa3G(G!V|~T4LuE66s?qImy&to*!l(I^pG2r-^p#-;a>z zH*NZ&fyg3eeHoAY?|>&OhW0~WiFW$iWz-rAB-Sj(1Xc?*(l#=MYa!_i%%SdstphWm zEK?0JarY-%dayeo?c!O9KON9BX=^dE(+kDcjD}(R5({UQHjXQ&A*Hn8>o!4Mu6Lvy z%-P=f7@JsKqa4=74{a9-JR9F@KJIt;xr1S6$UfdqKZ41~g!=;nZ17&Y^%9E{cwZCl zG7Z-Qt8zEvJFQX1K`tK`mXpkO>vf!I2T#|8 T4|@ z2u!1_knd@VGn}F9r0ua`dHB^_4A-Yth9k9cOcUY-7W9b9A6MJ!m#T|YBvL0xq^Tl} zA_Fn6xl2!ZEuNp7$5=S=Y_cq@EZhIkL f@wYYuXQ*_B67Z(fFBvODpoPW9 z{*h=-LfUfKPl8h@u`qPVROEBpkAMw&;sy1Po%BSwo%*u{S`*$~UycSkNVR)0g(wGh z=H>)*%J(W(=u1Brqq0ILPA^0Ua7^8REga)T%P!C;H_@|N>WD^&Ge>XL^e4uNP d2 zRaHqAOhO-3iutNOoQt%0ubB;84uPE}#QAv@&1(L^Xp9pM;;6O&xW0zxpWp^nR+qI- zaLUu<`ET&n8F@|*6ybHSN=$R-7KccGqAunQKXZspyKnDaO Qmt6k4c*$piKk8wDaU 53wbe{0BaIvX}I zw8WC%&)?{5f17L~CyGM%C}+ZRlf_t{FB+JibgBkUa>|e~8M+CuYa!nBTU2mSntkK= z^a<-65x|L4Hl31^AlFXmgYZ<1g)Fxu@)b-kM{#|*F=oime~p=R|CaS9Nr*B%?LqTr zxuj&t+(4!T;);w}arGu8l+f|!jRi5d=bN8Hwq3sMzVG~-zE91b&yz#^9lvitO^QJH zgqrgb55rXzp+x4Z>s8KOcD;AZqVh?ePLU;N^0efGE-aF$_T3KJaCea(Zye)Q38~_A zRhw9EWIC?RQ}P-kZlPeT!eX%zVFiSsR{207sz`5L@e2J3mFY=QTZJ`IgCPaM;_rc` za5x3kd5e(Quo-Sm64fxNMEkXT?2XO0A;`y` (K#TxKGo zZVus!I-0XS*htGGDd-%s8Xf|j>Uj<}u}+PY6t?N{j`t0(oc=L$l8J AvELxSvpq>SSx4g`f+ z=1fbrokW=h10e&wPZ`I ?gyE!rj~W+0|4y-gRRzF1hdHZE_P>Jh$-+ zdlfaBn(Adz62mJXR@mu|>5RVY`7l8XaK!DE-RRTmVSw1FU3ZyOW{Bt3BSf^491zBk z%9Re2hULu@jGgybgQIHM1B!vb`st9p7@A0Tt^<^tag1YxSCeg)9r46XV^au5b~?@z z@;(szP1u2_c-0vtIl-H-gE`+gDtViN2zt+A#iZ@0p>8P|8 UTRwkidmB)7?!gX@&KBt-UfBdW;U0r=6}_dJ&?LHNVnPtO+v{ok1Q1!X zt*JMK!cjs?Rax-G(1-g6)OmKc%qaoqF=45VHY+> )gTn8O!Ixo`+4SYi5 z`DM(l7e}RNXE7FY5Jw&-7p!aIVRE9sy^9wTzqX1am``ue+Hg2*gc<>}0UMBko&UrQ zzF6q3!e3!!3yUbJ8O}SnV5C>g*#1xIoBV@1VuMTnfOq+gJ#|_IY!Z=tNO %V2BDLB3(6Xa8-Y!t*}jV^)1=xlT9n+ieQXKVJ?>F7E| zzXuC?2X7n}wlN(i^DTHdBW39Wfbz3dtzm_{z9?@}n@K0Num@gz@skO#@< 6zV<#9F(#uc>Zv K>LiLdlL_;Ps2-%YELrfz5HDgzEJMhr&KJMkAd8LlnCg51#iG3ij zXFs> +aJ-0D96N4qgv{pvD1nTP1(BU_`EA8k1;+^}$$2qp z1So<%;hN3BNlRy?)ldOQJ<(8tSre CQi6?cGw(&0+=hRmFf5-9Lh**H}iX zZ-)knLA0%N0%t6gJ~D@uB5A-f9Xlc&vZeLFL l|^0b z7`p-@&}8zXVYl2E43|e7k*ThnAVzk}VsJ&L^U*-(ncf_adJoM4{#w;faK<8|MikK7 zl=^UxmNi-pV-8`;Z~`yoBg+z@hy GGe&OFa- zHn2BtksATaIeiA%Ne19Z*83nA`wEo%L)}=v?_;SbfMz?E_VXE%cGLI!gjEm%7<;hn z2y89fruFD{TY>Y+8Oiy)sF8Gcmh*xI+YQAad{~|ryYA59#x&`|82*EJ|3G^RtvWT@ ze{)0>+xL?abCiRjne;ogIdT46oxYRbsF7*x;jagBO}J}CKFQ-Oh%rn=O{mOKo0V9o z(D?r{g6EV(EqcWZ=V)PVbjHr*G)>56On@0WT?tE8@J@No;hbTw_I7eLY7bMflD)bh ztYEm;tgWq8apr|BuPNwI9B}$DMjNnR`a-)UF(0i;&Gw7K@>fN<+>P$Q%GAx&E}dC( z_H`@w9aHzIG@Y~fK&E-tHj=*9kH^oHo;uqI`IV(9t*Sbj4&FNgGt&``Bmv+=o!>bG zO#}eh>~#qN0Nu3i*vO|RIB+Tow9rfX_;O}*k1(J`_2+YFGj}Za&F1tjd(65-E1fQ8 zS+PihD7`6$a#X#;^u;2GL_}TY`%-Sk!g{i9_IFC1e*5Y&ZH|nlzyL67Ilwapn`{Do zk_h6<{l&>7!VnVzrYi?3Sts-O+c6H>pb=xb7*$o(e^qsU^^h&+LO6BejMn7>!{>x! zcb&eCD1v#k)bazpR}bv{i4%cT>Nc>L;Kd(?QUm#IX#H GDPTeD< 2T@>ob{ke030QJOux8ea15eJh4%06W{1Tb(*t-5;o%InJQ(L`$)bJZGWei5y z7-m48Vw|?T^;`K;fh&KaIn!REBnk5jHT!enDY!UDuc`GL>mq2rzm1{vN5%`8rehtg zmqrF2H?^JeJ<^J}p&0$-%tmEr(gi+Cl6;|Fs@(#onScIRwez5MZQI|u24%-Xdv&r& zH@%PVo!}16HO0Xw-o>tTiP$i%LX*rp^3atYryT_I{u?sAD!RHyYg_AV61OnU-!(e} z4#mxQ5Q#N&lkJ)qj>yPBuvO P-G|Dm!5Dpg92OQ7m{rM6DTF)5bA- =@hi*id|= zadadw?&nFKrz1@8Jp}$XpdfxTz4~(^t^60DAWzQ_o#2PLxJelAodMs|TLp6+huA~R zKdJsWUnR&*?MvNfXL{*Xvk>{jau)YRsn`5qVs CccNDwqr-Vzw@5mlEv zU7-V=Zjh8W>J=RrGwfmS55A{vPMEuUhB7`OPwng) 4QIF+UiW!3;1ldlDpuI8VWbo$Oyl14)hA5-vQ#R5DgZ$bVqL@V0qA| z)C(=KU_{b%DPswE-FGD6J(lv%PLloT(c#PY8u~r={sC|~enL7&E}r|CS;OWK_qGOW z_!Zp3c|T%W*}8k=NGyEateF$bs35o{BR{lz5u!;fbg9DR;Tu2ozYEGCd=BzSP4sYG zyeETVKQIX{cOA*2K*7k&G?P`1b 5Gl|K&2thUV+Bg2;0Lb- z+OLLD9(j=IyII6M`On`h%>bfWr`__D5t^bxMXz;uUkX8;w63yw>X*e&)}HxS;+_?& z+@kzsq4`tVEtv_*4uKhd(1-p9g8Q*?r>@KT)|!jueS{$Wcy=k0WYEuCESY*ax;nYK zxj8ena=uJn&mSL&y*avD+Jq)u?HRfW!^v2PZEqm}8D}P9IVUNwvi=|)Ec`qHQ>gYQ z6om=VO}t!xqTf=FNb-?4a;kZ)<@oGugCV?c5*(d;!4z$0Dn`i^(YAx0#_vwvqwR8M zOU>HfWz3}ZP#zZ!e85GYgTOeXgLR;>j$87PlSYIqm&_WT6ZP`f?}&Na7D-L-Oh!fX z9o*qnY@aaEKIuE{CzF3>`eG-#NPjk_IdlfHvVD_`563(of1Z*PUm*-E$*34e)DziN zx4i7r%epVVsyniRXQx4ghzbqEbce8CNu*u(Z7{tKmqpug6D=tfs4=^_(E0+|VL86T zy*QiprJ|MKaGiO>kcy;e#E!Jh@9?xSs*L_EEMPRL^Ee~ODZ-4B*Ym?b1_Mst?O zU4{wOfgQe|iR^vLJC~)xz1d*^?@PFy&m}B_{7K1!p_gQGKxAPhAoX!4@r4hQWjMb> zd<6;yj@*JyDsT+-cc&O%Z?#4ZO|R!SaOpKELBixVIx5K%kK>X)x-FzM6h!<8oC5XK zZIZMDxd1@^x)-wSVx~TWLYyjhwJX(CjIY5$+)g6%1iX73ip)bb;{!c6OpuD 3}6e8=QaVm5>j<7Hr`Zy^j15lgD)i zdP2 ^?5BS@ZaabZUG0d?lM& z^X8Tf3m>kI-jiQnXU3G>(GT{U8o%{L-_N_=vh=s;{I>A=^7BO1 8IhoD!Zayg3s^0>viwl3}+iv6n4BS&&A0pMoM{}36>qpZ+_1RrEkjh{7mc~ zFVC<2p*(fI3D0J)KHaL@=gBVBuMZ&$cf5xESJXkgBK)14{P+qAy-6S6uJvUM_}s~_ zUik)5*k~F;VAN$%R>6Mp3ujo#Y?hxpSs$g^HK88TN85dwTuG=~nW&6-^NEss9!vcM ziYCg>Nsjfs0LJbgPB<-b!$1aupn{P%Qi`P{ZRqz}kRV-(?>X*Py5Hd~PI_sFvN@N? zsI1mfYpnBZF9%g5jqe4T%8R7q)$RS@G53%hC#@R?2s%Yt5S2q-r&bd_$^>?{3OY(J z^@rc5Ag>XY9l(Px^!l*ZyHf4Kg3074KiI7}*JguCfE#SQzCRXLY{>m_eRk(*&&p;s zf6@jPfK>Tw$J@Vv%8*0|eYbuXQL`SRPpuyMUQH@)E ?A7Y6{f%u~%;t1S_+ZB21iJ +;X^ _^RyKHHSn;$Okd>9sNq%cIDK)QE-&Zi5q0WS%fh;%k(RT5iBUB9h0 z#8e7%@{MeN6;m(HQR02_Y;*{&2NA>AS|@9?>Ol&`3fhgv;<3#noTpcUR@1f~yhg6J z3Yt%t9==Y99z*OI_AcES>=Z2$;##XP>$CmhXn<2Vq>=%GH;xZNa!<*vu5+~B%d`kU zl$(5y5(u7JXJ?fC=EiiH5AV-U%FM~tSFGcJ(2~onWkVS`iD1S&&vcMLurO+*t#!)$ zW?AEiY{0V&rY-{wdPK*^(~#N4_65fpR4M}Z6gC|Vbd|qL8~EAB?uxFQw_zNxWHE=T ziIg)k`hLE4 AU4V4Y^L$oT;2 N@&hxr;{)?@gqsu0ozI*3c?D#i z-(*C=rtgaXWrB2@5S|oqZMd`@M^;+0OhqFzm$jHA=ePrO5M2E4gXfw?dN%vjM*=K~ zx&^}Cc*C(cIzcY0z`SWd&IRP<`eWQ5h4d`1Utw%@It+}^*H#@ZrGr3F$^gV!n31!S zpJCi`3{zto|EV(-*AI{L)n)&Qk{ag6^%@Rj)69X$4N2CVdFF#uT_GrY!g4=bSB9y^ z+B=CeX8m@ujmTGG?DjY~Bqxbr7a(BF*%O=xOeHR%a%10 i`}Y0bNW?31oQ$(lFUo@pk3Z}l*X^Vxea zj1Q6kw qe!FE!`4I{o(8)#{$Gn z0y|b8EG P0jaqwXA9+-!qh2K?B&H;b&rGh#dn ztX_3XO4OcP_t%7K^-;KFAuRM>K_Njan8l&A5DxQu6tVGxkT^@gzr0;qwkjas-TAju zTXjwfg#Z!V(8;O