# 6 Corpus Exploration

This Notebook explores various tools for analysing and comparing texts at the corpus level. As such, these are your first ventures into "macro-analysis" with Python. The methods described here are especially powerful in combination with the techniques for content selection explained in Notebook 5 **Corpus Creation**..

More specifically we will have a closer look at:

- **Keyword in Context Analysis**: Similar to concordance in AntConc
- **Collocations**: Compute which tokens tend to co-occur together
- **Feature selection**: Compute which tokens are distinctive for a subset of texts

## 6.1 Keyword in Context

Computers are excellent in indexing, organzing and retrieving information. However, interpreting information, extracting meaning from text is still a difficult task. Keyword-in-Context (KWIC) analysis, brings together the best of both worlds: the retrieval power of machines, with the close-reading skills of the historian. KWIC or concardance centres a corpus on sepecific query term, with `n` words to the left and the right. In fact, this is one of the earliest application of digital analysis of historical texts.

In this section we investigate reports of the London Medical Officers of Health, the [London's Pulse corpus](https://wellcomelibrary.org/moh/). 

> The reports were produced each year by the Medical Officer of Health (MOH) of a district and set out the work done by his public health and sanitary officers. The reports provided vital data on birth and death rates, infant mortality, incidence of infectious and other diseases, and a general statement on the health of the population. 

Source: https://wellcomelibrary.org/moh/about-the-reports/about-the-medical-officer-of-health-reports/

We start we importing the necessary libraries. Some of the code is explained in previous Notebooks, so won't discuss it into too much detail here.

The tools we need are:
- `nltk`: Natural Language Toolkint: for tokenization and concordance
- `pathlib`: a library for managing files and folders

In [38]:
import nltk # import natural language toolkit
from pathlib import Path # import Path object from pathlib
from nltk.tokenize import wordpunct_tokenize # import word_tokenize function from nltk.tokenize

In [39]:
!ls data/MOH/python/ # list all files in data/MOH/python/

[31mCityofWestminster.1901.b18247660.txt[m[m
[31mCityofWestminster.1902.b18247672.txt[m[m
[31mCityofWestminster.1903.b18247684.txt[m[m
[31mCityofWestminster.1904.b18247696.txt[m[m
[31mCityofWestminster.1905.b18247702.txt[m[m
[31mCityofWestminster.1906.b18247714.txt[m[m
[31mCityofWestminster.1907.b18247726.txt[m[m
[31mCityofWestminster.1908.b18247738.txt[m[m
[31mCityofWestminster.1909.b1824774x.txt[m[m
[31mCityofWestminster.1910.b18247751.txt[m[m
[31mCityofWestminster.1911.b18247763.txt[m[m
[31mCityofWestminster.1912.b18247775.txt[m[m
[31mCityofWestminster.1913.b18247787.txt[m[m
[31mCityofWestminster.1914.b18247799.txt[m[m
[31mCityofWestminster.1915.b18247805.txt[m[m
[31mCityofWestminster.1917.b18247817.txt[m[m
[31mCityofWestminster.1920.b18247829.txt[m[m
[31mCityofWestminster.1921.b18247830.txt[m[m
[31mCityofWestminster.1922.b18247842.txt[m[m
[31mCityofWestminster.1923.b18247854.txt[m[m
[31mCityofWestminst

The data are stored in the following folder structure:

```
data
|___ MOH
     |___ python
          |____ CityofWestminster.1901.b18247660.txt
          |____ ...
```

The code below:
- harvests all path to `.txt` files in `data/MOH/python`
- converts the result to a `list`

In [33]:
moh_reports_paths = list(Path('data/MOH/python').glob('*.txt')) # get all txt files in data/MOH/python

We can print the paths to ten document with list slicing: `[:10]` means, get document from index positions `0` till `9`. (i.e. the first ten items).

In [40]:
print(moh_reports_paths[:10]) # print the first ten items

[PosixPath('data/MOH/python/PoplarMetropolitanBorough.1945.b18246175.txt'), PosixPath('data/MOH/python/CityofWestminster.1932.b18247945.txt'), PosixPath('data/MOH/python/CityofWestminster.1921.b18247830.txt'), PosixPath('data/MOH/python/PoplarandBromley.1900.b18245754.txt'), PosixPath('data/MOH/python/Poplar.1919.b18120878.txt'), PosixPath('data/MOH/python/PoplarMetropolitanBorough.1920.b18245924.txt'), PosixPath('data/MOH/python/CityofWestminster.1907.b18247726.txt'), PosixPath('data/MOH/python/CityofWestminster.1906.b18247714.txt'), PosixPath('data/MOH/python/CityofWestminster.1903.b18247684.txt'), PosixPath('data/MOH/python/PoplarMetropolitanBorough.1902.b18245778.txt')]


Once we know where all the files are located, we can apply the following steps:
- create an empty list variable where we will store the tokens of the corpus (line 3)
- iterate over the collected paths (line 5)
- read the text file (line 6)
- lowercase the text (line 6)
- tokenize the string (line 7): this converts the string to a list of tokens
- iterate over tokens (line 8)
- test if a token is contain only alphabetic characters (line 9)
- add token to the list if line 9 evaluates to True (line 10)

The general flow of the program is similar to what we've seen before: we create an empty list (or other object) where we store specific information from a text collection, in this case all alphabetic tokens.

We use one more notebook functionalities here
- `%%time` print how long the cell took to run

It could take a few seconds for the cell to run, so please be a bit pit patient:

In [53]:
%%time

corpus = [] # inititialize an empty list where we will store the MOH reports

for p in moh_reports_paths: # iterate over the paths to MOH reports, p will take the value of each item in moh_reports_paths 
    text_lower = open(r).read().lower() # read the text files and lowercase the string
    tokens = wordpunct_tokenize(text_lower) # tokenize the string
    for token in tokens: # iterate over the tokens
        if token.isalpha(): # test if token only contains alphabetic characteris
            corpus.append(token) # if the above test evaluates to True, append token to the corpus list
print('collected', len(corpus),'tokens')

collected 6641589 tokens
CPU times: user 4.53 s, sys: 215 ms, total: 4.74 s
Wall time: 4.77 s


While this small program works perfectly fine, it's not the most efficient code. The example below is a bit more, especially if you're confronted with lots of text files. 

- the `with open` statement is a convenient way of handling the opening and closing of files, to make sure you don't keep all information in memory, which would slow down the execution of your program
- line 8 shows a list comprehension, this actually similar to a for loop, but faster and more concise.

We won't spend too much time discussing list comprehensions, the examples below should suffice for now. We write a small programs that collects odd numbers. First we generate a list of numbers with `range(10)`...

In [77]:
# see the output of range(10)
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

... the we test for division by 2: `%` is the modulus operator, "which returns the remainder after dividing the left-hand operand by right-hand operand". It `n % 2` evaluates to `0` if a number `n` can be divided by `2`. In Python `0` is equal to `False`, meaning if `n % 2` evaluates to `0` we won't append the number to `odd`.

In [74]:
%%time
# program for find odd numbers
numbers = range(10) # get numbers 0 to 9
odd = [] # empty list where we store even numbers
for k in numbers: # iterate over numbers
    if k % 2: # test if number if divisible by 2
        odd.append(k) # if True append
print(odd) # print number of tokens collected

[1, 3, 5, 7, 9]
CPU times: user 249 µs, sys: 165 µs, total: 414 µs
Wall time: 322 µs


The same can be achieved with just one line of code using a list comprehension.

In [78]:
%time
odd = [k for k in range(10) if k % 2]
print(odd)

CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 5.01 µs


[1, 3, 5, 7, 9]

### -- Exercise

To see differences in performance, do the follwoing:

- remove the `print()` statement
- crank up the size of the list, i.e. change range(10) to range(1000000).
- compare the **Wall time** of these cells

Now returning to the actual example: Run the slightly better code and observe that it produces the same output, just faster!

In [79]:
%%time

corpus = [] # inititialize an empty list where we will store the MOH reports

for p in moh_reports_paths: # iterate over the paths to MOH reports, p will take the value of each item in moh_reports_paths 
    with open(r) as in_doc: # make sure to close the document after opening it
        tokens = wordpunct_tokenize(in_doc.read().lower())
        corpus.extend([t for t in tokens if t.isalpha()]) # list comprehension    
print('collected', len(corpus),'tokens') # print number of tokens collected

collected 6641589 tokens
CPU times: user 3.92 s, sys: 236 ms, total: 4.16 s
Wall time: 4.19 s


After collecting all tokens in a `list` we can convert this of another data type, a NLTK `Text` object. The cell below shows the results of the conversion.

In [80]:
print(type(corpus))
nltk_corpus = nltk.text.Text(corpus) # convert the list of tokens to a nltk.text.Text object
print(type(nltk_corpus))

<class 'list'>
<class 'nltk.text.Text'>


Why is this useful? Well the `Text` object comes with many useful methods for corpus exploration. To inspect all the tools attached to a `Text` object, apply the `help()` function to `nltk_corpus` or (`help(nltk.text.Text)` would do the same trick). You have to scroll down a bit (ignore all methods starting with `__`).

In [83]:
help(nltk_corpus) # show methods attached to the nltk.text.Text object or nltk_corpus variable

Help on Text in module nltk.text object:

class Text(builtins.object)
 |  Text(tokens, name=None)
 |  
 |  A wrapper around a sequence of simple (string) tokens, which is
 |  intended to support initial exploration of texts (via the
 |  interactive console).  Its methods perform a variety of analyses
 |  on the text's contexts (e.g., counting, concordancing, collocation
 |  discovery), and display the results.  If you wish to write a
 |  program which makes use of these analyses, then you should bypass
 |  the ``Text`` class, and use the appropriate analysis function or
 |  class directly instead.
 |  
 |  A ``Text`` is typically initialized from a given document or
 |  corpus.  E.g.:
 |  
 |  >>> import nltk.corpus
 |  >>> from nltk.text import Text
 |  >>> moby = Text(nltk.corpus.gutenberg.words('melville-moby_dick.txt'))
 |  
 |  Methods defined here:
 |  
 |  __getitem__(self, i)
 |  
 |  __init__(self, tokens, name=None)
 |      Create a Text object.
 |      
 |      :param tokens

Let's have a closer look at `.concordance()`. According to the official documentation this method 
> Prints a concordance for ``word`` with the specified context window. Word matching is not case-sensitive.

It take multiple arguments:
    - word: query term
    - width: the context window, i.e. determines the number of character printed 
    - lines: determines the number of lines (i.e. KWIC examples) returns
The first line of the output states total number of hits for the query term (`Displaying * of * matches:`)

The example code below print the context of the word **"poor"**.

In [89]:
nltk_corpus.concordance('poor',width=100,lines=10) # print the context of poor, window = 100 character

Displaying 10 of 636 matches:
nt a renewal of a liquor licence in view of the poor conditions maintained at the establishment the 
e possible to assess the influence of a good or poor impression of training e g a poor impression ma
 of a good or poor impression of training e g a poor impression may sometimes act as a stimulus pre 
as of the programme and d the effect of good or poor experience although perhaps too early to do mor
nt a renewal of a liquor licence in view of the poor conditions maintained at the establishment the 
e possible to assess the influence of a good or poor impression of training e g a poor impression ma
 of a good or poor impression of training e g a poor impression may sometimes act as a stimulus pre 
as of the programme and d the effect of good or poor experience although perhaps too early to do mor
nt a renewal of a liquor licence in view of the poor conditions maintained at the establishment the 
e possible to assess the influence of a good or poor impressi

## --Exercise

Compare "poor" between City of Westminster and Poplar. 

**[TO DO: explain exercise]**

## 6.2 Collocations

While KWIC analysis is useful for investigating the context of words, it is a method that doesn't scale well: it helps with the close reading of around 100 words, but when examples run in the thousands it becomes more difficult. Collocations can help to quantify the semantics of term, or how the meaning of words is different betwen corpora or subsamples of a corpus.

Collocations, as explained in the AntConc section are multi-word expression containing words that tend to co-occur.

The NLTK `Text` object has `collocations()` function. Below we print and explain the documentation.

> collocations(self, num=20, window_size=2)
    Print collocations derived from the text, ignoring stopwords.
    
It has the following parameters:
> `:param num:` The maximum number of collocations to print.

The number of collocations to print (if not specified it will print 20)

> `:param window_size:` The number of tokens spanned by a collocation (default=2)

If `window_size=2` collocations will only include bigrams (words occuring next to each other). But sometimes we wish to include longer intervals, to make co-occurence of words withing a broader window more visible, this allows us to go beyond multiword expressions and study the distribution of words in a corpus more generally. For example, we could look if "men" and "women" are discussed in each other's context (within a span of 10), even if they don't appear next to each other. 

In [93]:
nltk_corpus.collocations(window_size=2)

public health; city council; family planning; child health; medical
officer; health inspectors; health department; table page; ante natal;
local authority; dental school; social services; social workers; legal
proceedings; social worker; live births; inner london; city hall; old
people; boys girls


In [94]:
nltk_corpus.collocations(window_size=5)

family planning; public inspectors; city council; public health; child
health; ante natal; boys girls; dental school; medical officer; local
authority; table page; live births; salmonella salmonella; malignant
neoplasm; boys boys; health inspectors; girls boys; health department;
social services; councillor mrs


While the `.collocations()` method is an easy tool for quickly computing collocations, it's functionality remains rather limited. The cells below will inspect the collocation functions of NLTK in a bit more detail, giving you a bit more power of and precision.

Before we start we import all the tools `nltk.collocations` provides. This is handled by the `import *`, similar to a wildcard, it matches and loads everthing in `nltk.collocations`.

In [104]:
import nltk
from nltk.collocations import *

Next we have to select an association measure this to compute the "strength" with which two tokens are attracted to each other. In general collocations are words that appear frequently together (within a certain window size), but are unlikely to appear in general (outside this window size). This explains why "the wine" is not a collocation while "red wine" is.

NLTK provides us with different measures, which you can print and investigate in more detail. Many of the functions refer to the classic NLP Handbook of Manning and Schütze, ["Foundations of statistical natural language processing"](https://nlp.stanford.edu/fsnlp/).

In [105]:
bigram_measures = nltk.collocations.BigramAssocMeasures()

In [106]:
help(bigram_measures)

Help on BigramAssocMeasures in module nltk.metrics.association object:

class BigramAssocMeasures(NgramAssocMeasures)
 |  A collection of bigram association measures. Each association measure
 |  is provided as a function with three arguments::
 |  
 |      bigram_score_fn(n_ii, (n_ix, n_xi), n_xx)
 |  
 |  The arguments constitute the marginals of a contingency table, counting
 |  the occurrences of particular events in a corpus. The letter i in the
 |  suffix refers to the appearance of the word in question, while x indicates
 |  the appearance of any word. Thus, for example:
 |  
 |      n_ii counts (w1, w2), i.e. the bigram being scored
 |      n_ix counts (w1, *)
 |      n_xi counts (*, w2)
 |      n_xx counts (*, *), i.e. any bigram
 |  
 |  This may be shown with respect to a contingency table::
 |  
 |              w1    ~w1
 |           ------ ------
 |       w2 | n_ii | n_oi | = n_xi
 |           ------ ------
 |      ~w2 | n_io | n_oo |
 |           ------ ------
 |         

In [107]:
help(bigram_measures.pmi)

Help on method pmi in module nltk.metrics.association:

pmi(*marginals) method of abc.ABCMeta instance
    Scores ngrams by pointwise mutual information, as in Manning and
    Schutze 5.4.



`pmi` is a rather straightforward metric, in the case of bigrams
- compute the total number of tokens in a corpus, assume this is `n` (3435)
- compute the probability of  `a` and `b` appearing as a bigram. If the bigram (a,b) occurs 10 times, the probability (P(a,b) is 10/3435)
- compuate the probability of observing `a` and `b`. For exampe a appears `30` times and b `45`, this becomes (30/3435) * (45/3435)
- log this value
![pmi](https://miro.medium.com/max/930/1*OoI8_cZQwYGJEUjzozBOCw.png)

In [108]:
from numpy import log2
nom = 10/3435
denom = (30/3435) * (45/3435)
mpi = log2(nom/denom)
mpi

4.6692787866546315

To rank collocations by their PMI scores, we use the `.from_words()` method to the `nltk_corpus` (or any list of tokens). The result of this operation is stored in `finder` which we can subsequently use for printing collocations. Note that the results below look somewhat strange, these aren't very meaningful collocates.

In [109]:
finder = BigramCollocationFinder.from_words(nltk_corpus)
finder.nbest(bigram_measures.pmi, 10) 

[('aeration', 'cycle'),
 ('albicans', 'vincents'),
 ('altogether', 'buckingham'),
 ('amplified', 'music'),
 ('anyone', 'demonstrating'),
 ('appendicitis', 'intestinal'),
 ('appreciable', 'drop'),
 ('arranges', 'placement'),
 ('artesia', 'adreno'),
 ('ashpits', 'dustbins')]

These results are rather spurious. If, for example `a` and `b` both appear only once and next to each other, the PMI score will be very high, but this is not necessarily a very meaningful collocation, more a rare artefact of the data.

We filter by ngram frequency, removing in our case all bigrams that appear less than 3 time with `.apply_freq_filter()` function.

In [111]:
help(finder.apply_freq_filter)

Help on method apply_freq_filter in module nltk.collocations:

apply_freq_filter(min_freq) method of nltk.collocations.BigramCollocationFinder instance
    Removes candidate ngrams which have frequency less than min_freq.



In [112]:
finder.apply_freq_filter(3)
finder.nbest(bigram_measures.pmi, 10)

[('aeration', 'cycle'),
 ('albicans', 'vincents'),
 ('altogether', 'buckingham'),
 ('amplified', 'music'),
 ('anyone', 'demonstrating'),
 ('appendicitis', 'intestinal'),
 ('appreciable', 'drop'),
 ('arranges', 'placement'),
 ('artesia', 'adreno'),
 ('ashpits', 'dustbins')]

In [113]:
finder.apply_freq_filter(10)
finder.nbest(bigram_measures.pmi, 10)

[('aeration', 'cycle'),
 ('albicans', 'vincents'),
 ('altogether', 'buckingham'),
 ('amplified', 'music'),
 ('anyone', 'demonstrating'),
 ('appendicitis', 'intestinal'),
 ('appreciable', 'drop'),
 ('arranges', 'placement'),
 ('artesia', 'adreno'),
 ('ashpits', 'dustbins')]

It is also possible to change the window size, but the larger the window size the longer the computation takes

In [115]:
finder = BigramCollocationFinder.from_words(nltk_corpus, window_size = 5)
finder.apply_freq_filter(10)
finder.nbest(bigram_measures.pmi, 10)

[('abs', 'hrs'),
 ('abuse', 'passers'),
 ('accent', 'preservation'),
 ('accum', 'buckeburg'),
 ('accumulation', 'salts'),
 ('acutely', 'incontinent'),
 ('adapt', 'outmoded'),
 ('adapt', 'wooden'),
 ('adheres', 'brick'),
 ('adjusting', 'reality')]

Lastly you can focus on collocations that contains a specific token, i.e. for example get all collocations with the token "poor".

In [117]:
#def token_filter(*w):
#     return 'poor' not in w

token_filter = lambda *w: 'poor' not in w

finder = BigramCollocationFinder.from_words(nltk_corpus)
finder.apply_ngram_filter(token_filter)
finder.nbest(bigram_measures.pmi, 10)

[('poor', 'impression'),
 ('poor', 'experience'),
 ('poor', 'conditions'),
 ('or', 'poor'),
 ('a', 'poor'),
 ('the', 'poor')]

### 6.3 Feature selection

The last section of this Notebook takes aims at contrasting corpora and find tokens (or word patterns) that distinguish on set of documents from another. This may help us discovering that is particular about the language of specific group (such as a political party) or period. We continue with the example of the MOsH reports, but compare the language of different boroughs, the affluent Westminster with the industrial, and considerable poorer Poplar.

The code below should look familiar but we made a few changes.



In [121]:
corpus = [] # save corpus here
labels = [] # save labels here


for r in moh_reports: # iterate over documents
    with open(r) as in_doc: # open document (also take care close it later)
        if 'westminster' in r.name.lower(): # check if westeminster appear in the file name
            labels.append(1) # if so, append 1 to labels
        else: # if not
            labels.append(0) # append 0 to labels

        corpus.append(in_doc.read().lower()) # append the lowercase document to corpus
  

HBox(children=(FloatProgress(value=0.0, max=159.0), HTML(value='')))




check number of labels and documents are equal

In [122]:
print(len(labels),len(corpus))

159 159


In [None]:
process text: lemmatize keep only adj and noun

In [None]:
install external library

In [165]:
!pip install TextFeatureSelection

Collecting TextFeatureSelection
  Downloading https://files.pythonhosted.org/packages/42/3d/351dcabf4198218a4b7421e6f6069eb089af6f5642e8fdd5d95f11904726/TextFeatureSelection-0.0.12-py3-none-any.whl
Installing collected packages: TextFeatureSelection
Successfully installed TextFeatureSelection-0.0.12


In [None]:
apply library

In [168]:
from TextFeatureSelection import TextFeatureSelection
fsOBJ=TextFeatureSelection(target=labels,input_doc_list=corpus)
result_df=fsOBJ.getScore()
result_df

Unnamed: 0,word list,word occurence count,Proportional Difference,Mutual Information,Chi Square,Information Gain
0,00,103,-0.009709,0.094959,2.463282,0.004326
1,000,149,0.073826,0.008605,0.150191,0.000266
2,000000,1,-1.000000,0.778445,1.185538,0.001507
3,0001,3,1.000000,-inf,2.595483,0.000000
4,000163,1,1.000000,-inf,0.854210,0.000000
...,...,...,...,...,...,...
42232,¾gallons,1,-1.000000,0.778445,1.185538,0.001507
42233,¾ths,1,-1.000000,0.778445,1.185538,0.001507
42234,ægis,1,1.000000,-inf,0.854210,0.000000
42235,æration,1,-1.000000,0.778445,1.185538,0.001507


In [None]:
inspect results

In [173]:
result_df[result_df['word occurence count'] > 5].sort_values('Chi Square',ascending=False)[:20]

Unnamed: 0,word list,word occurence count,Proportional Difference,Mutual Information,Chi Square,Information Gain
30606,pop,59,-1.0,0.778445,110.51589,0.184282
9432,bow,89,-0.640449,0.580268,106.152339,0.0
21070,horseferry,71,0.971831,-3.484235,102.313762,0.23972
8788,bessborough,67,1.0,-inf,98.289813,0.0
42219,zymotic,94,-0.553191,0.525609,93.326942,0.0
26433,millbank,62,1.0,-inf,86.266363,0.0
15282,dock,66,-0.787879,0.666327,85.911216,0.149069
41176,wes,64,0.96875,-3.380438,84.840438,0.205713
22037,india,67,-0.761194,0.65129,82.833472,0.144441
30188,pimlico,63,0.968254,-3.36469,82.552451,0.201071


In [170]:
help(result_df.sort_values)

Help on method sort_values in module pandas.core.frame:

sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last') method of pandas.core.frame.DataFrame instance
    Sort by the values along either axis.
    
    Parameters
    ----------
            by : str or list of str
                Name or list of names to sort by.
    
                - if `axis` is 0 or `'index'` then `by` may contain index
                  levels and/or column labels
                - if `axis` is 1 or `'columns'` then `by` may contain column
                  levels and/or index labels
    
                .. versionchanged:: 0.23.0
                   Allow specifying index or column level names.
    axis : {0 or 'index', 1 or 'columns'}, default 0
         Axis to be sorted.
    ascending : bool or list of bool, default True
         Sort ascending vs. descending. Specify list for multiple sort
         orders.  If this is a list of bools, must match the length of
      

## Fin.

### Appendix With Sklearn

In [151]:
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.feature_extraction.text import CountVectorizer 

In [156]:
vectorizer = CountVectorizer(min_df=5)
X = vectorizer.fit_transform(corpus)
feature_names = vectorizer.get_feature_names()


In [159]:
ch2 = SelectKBest(chi2, k=10)
X = ch2.fit_transform(X, labels)


In [160]:
selected = [(feature_names[i],ch2.scores_[i])for i
                    in ch2.get_support(indices=True)]
selected

[('borough', 6827.175533272762),
 ('bow', 6681.346216439548),
 ('bromley', 6861.134136366376),
 ('city', 4592.729181914567),
 ('east', 1499.0376786761663),
 ('poplar', 11888.857471790638),
 ('road', 8510.875738951223),
 ('see', 2314.6724275246893),
 ('street', 4330.436649540313),
 ('westminster', 5105.364636488248)]