A jupyter notebook contains "cells" that can be "run" by pressing `shift`+`enter`.
You can run all cells by going to `Run` -> `Run All Cells`.
It's important that you run the first code cell (immediately below) before each session, as otherwise the rest of the code won't work (because this cell imports packages and functions).
To setup a sentence, you need to call the `nlp` function, pass it a string (the sentence), and assign this to a variable (see the third cell for a example).
You can then pass this variable to the `visualise` function, which will display some information.
In addition, `visualise` can take a second argument that alters the size (default is 100, a smaller number will squish the dependency graph and a larger number will widen it).

In [1]:
import spacy
import scispacy
from process_sentence import visualise
nlp = spacy.load("en_core_sci_scibert")

In [2]:
# Here's an example of how to use it
sentence = nlp("The function of the protein is to regulate the liver.")
visualise(sentence)
# you can alter the size of the dependency graph by passing a second parameter to visualise
sentence2 = nlp("This is a longer sentence that illustrates why you might want to reduce the size of the dependency graph.")
visualise(sentence2, 80) # to make it smaller if you want to fit a long sentence on the screen without having to scroll

TOKEN               DEPENDENCY RELATION           HEAD                DEPENDENCIES                            

The                 determiner                    function            []
function            nominal subject               regulate            [The, protein]
of                  case marking                  protein             []
the                 determiner                    protein             []
protein             modifier of nominal           function            [of, the]
is                  copula                        regulate            []
to                  marker                        regulate            []
regulate            None                          regulate            [function, is, to, liver, .]
the                 determiner                    liver               []
liver               direct object                 regulate            [the]
.                   punctuation                   regulate            []


TOKEN               DEPENDENCY RELATION           HEAD                DEPENDENCIES                            

This                nominal subject               sentence            []
is                  copula                        sentence            []
a                   determiner                    sentence            []
longer              adjectival modifier           sentence            []
sentence            None                          sentence            [This, is, a, longer, illustrates, .]
that                nominal subject               illustrates         []
illustrates         None                          sentence            [that, want]
why                 adverbial modifier            want                []
you                 nominal subject               want                []
might               auxiliary                     want                []
want                clausal complement            illustrates         [why, you, might, reduce]
to               

# Flowchart
The root/free morpheme is "function".
The verb function derives from the noun function; however, you can unpack into either the noun ("the function of *x* is to *y*") or the verb form ("*x* functions to *y*").

To simplify comparison of syntactically-diverse constructions, we will use word conversion to convert from derivations of "function" to its root form function (specifically morphological derivation and inflection). We will use morphological derivation to convert between parts of speech (e.g. adjective -> noun) and inflection to convert between number, tense and aspect (e.g. the plural functions to function). The key consideration during conversion is that we maintain (and convert if necessary) the dependencies between "function" and other parts of the sentence.
What I will describe here is a method for converting these derivations of the word function to its base form (the verb or noun "function"), while at the same time, keeping track of and converting the form (where necessary) of the dependencies of "function". The idea is to rework the sentence such that semantic relations can be cast into a standard form.

Let's start with a list (possibly non-exhaustive) of derivative forms of "function"

- function (noun)
- functions (noun, plural)
- function (verb)
- functions (verb, plural)
- functional (adj)
- functionality (noun)
- functioning (gerund, acts as a noun)
- functioning (present participle, acts as an adj (maybe also an adv?))
- functioned (past participle, simple past)
- functionally (adverb)

(There's also the distinction between tense/aspect but ultimately I think what matters for my purposes is whether it acts as a noun, adjective, etc. For example, not going to distinguish between gerunds and present participles.)

If I refer to function without quotes then I am referring to all the derivatives of function that we might wish to analyze. Note that a word (word1) is a dependency of another word (word2) if the dependency arrow originates (tail) at word2 and goes into (head) word1 (i.e. word2 -> word1).

- 1. Non-biological or technical senses of function
   - 1.1 Is function used in a biological context? If yes, go to step 1.2; if no classify as either a mathematical, non-biological role or purpose, performing non-biological role or purpose, relational, social event, or programming sense of function[<sup>1</sup>](#fn1) and EXIT.
   - 1.2 Is function used in a technical sense?[<sup>2</sup>](#fn2) If yes, classify as **technical use**; if no, go to step 2.1.
- 2. Identify the biological item (*x*)
   - 2.1 Is function an adjective that modifies a noun (i.e. its dependency is [amod](https://universaldependencies.org/u/dep/amod.html); e.g. functional gene), a noun with a [nmod](https://universaldependencies.org/u/dep/nmod.html) dependency (e.g. functionality of the gene), a gerund or present/past participle (functioning/functioned, marked as VERB in spaCy) acting as a noun or adjective (or an adverb modifying an adjective), or an adverb with an [advmod](https://universaldependencies.org/u/dep/advmod.html) dependency that modifies an adjective (e.g. functionally important gene)? If yes, go to step 2.3; if no, go to step 2.2.
   - 2.2 Is function a noun[<sup>3</sup>](#fn3)? If yes, go to step 2.4; if no, go to step 2.6.
   - 2.3 Follow the dependency arrow (head to tail) from function to its head noun. (In some cases, you may need to traverse multiple arrows.) Make a note of the head noun and go to step 2.5.
   - 2.4 Can you identify a direct nominal dependent of function (follow a dependency arrow from tail to head starting from function)? If yes, make a note of this noun and go to step 2.5[<sup>5</sup>](#fn5).
   - 2.5 Start from the previously identified noun. Is it a concrete noun that satisfies the "biological item condition"[<sup>6</sup>](#fn6),[<sup>7</sup>](#fn7)? If not, follow the dependency arrows from tail to head[<sup>4</sup>](#fn4) to see if you can identify a noun that does. If yes, record this noun (or noun phrase) as *x* and go to step 3.1; if no, classify as **unidentifiable item** and EXIT.
   - 2.6 Is function a (finite) verb? If yes, identify the direct subject of the verb (via `nsubj` dependency) and go to 2.5; if no, go to 2.7.
   - 2.7 Is function an adverb that modifies a verb (e.g. functionally regulates)? If yes, first identify the verb of which function is a dependent (via `advmod` dependency), then identify the direct subject of this verb (via `nsubj` dependency), and go to 2.5; if no, classify **unidentifiable item** and EXIT.
- 3. Identify the biological effect (*y*)[<sup>8</sup>](#fn8)
   - 3.1 Did you previously identify function as an adjective, noun, gerund/participle or adverb modifying an adjective? If yes, go to step 3.2; if no (function is a verb or an adverb that modifies a verb, go to step 3.4.
   - 3.2 Follow the dependency arrow (head to tail) from the head noun (either function or the head to which function is a dependent if function is not the head noun) to the finite verb. Is the dependency via `dobj`[<sup>9</sup>](#f9n)? If yes, go to step 5.1; if no, go to step 3.3. 
   - 3.3 Is the verb that you identified by following the `nsubj` relationship transitive (i.e. it has a dependent via relation `dobj`)? If yes, record the finite verb and the phrase containing its `dobj` as *y* and go to step 4.1. If no, go to step 3.4.  
   - 3.4 Starting from the finite verb[<sup>10</sup>](#f10n), can you follow another dependency arrow (tail to head this time) via dependency relation `xcomp` to an infinitive, transitive verb? If yes, record the verb and the phrase containing its `dobj` as *y* and go to step 4.1; if no, go to step 5.1.
- 4. Candidates for biological activity/role/advantage/selected effect
   - 4.1 Can it be made to fit the following form[<sup>11</sup>](#f11n): the function of *x* is to *y* such that doing *y* in the past caused *x* to be selected for or maintained in a population (relative to an actual or counterfactual historical alternative or set of alternatives to *x*)? If yes, classify as function as **selected effect** and EXIT; if no, go to 4.2.
   - 4.2 Can you identify a complex containing system *z* of the biological item *x*? There are a few clues that can help in identifying *z*. Can you identify the answer to the question "how does *x* do *y*?" or "for what is *x* doing *y* used?"? Syntactically, you can look for a `acl` dependency of the direct object of the verb that together comprise *y*. You can also look for an `advcl` modifier of the verb. (Both of these provide extra information about *y*.) If yes, go to step 4.3; if no, go to 4.5.
   - 4.3. Can it be unpacked into the following form: the function of *x* to *y* in *z*, where *x* is a component of a complex system *z*, *z* is a Darwinian individual *i* (or a complex system within *i*), and *x* doing *y* in *z* is advantageous for *i* (relative to an explicitly-identified or implied alternative to x**)? If yes, classify as **function as biological advantage** and EXIT. If no, go to step 4.4.
   - 4.4. Can it be unpacked into the following form: the function of *x* is to *y* in *z*, where *x* is a component of a complex system *z*, and *z* is a Darwinian individual *i* (or a complex system within *i*)? If yes, classify as **function as biological role** and EXIT. If no, go to step 4.5.
   - 4.5. Can it be unpacked into the following form: the function of *x* is to *y*, where *x* is a component (or subcomponent) of a Darwinian individual *i*? If yes, classify as **function as biological activity** and EXIT. If no, classify as **unidentifiable meaning: effect specified** and EXIT.
- 5. Candidates for function as performing or producing activity/role/advantage/selected effects
     - 5.1. Substitute "*x* function" with "how well *x* performs (or works)", or "*x* works", into the raw sentence (not the unpacked sentence). If there is loss of meaning or ambiguity, classify as **unidentifiable meaning: effect unspecified**; if there is no loss of meaning and no ambiguity, go to step 5.2.
    - 5.2. Does the sentences's language indicate that the biological role has been shaped by selection? If so, and if *x* is a component of a Darwinian individual *i*, classify as function as **performing selected effect** and EXIT. If not, go to step 5.3.
    - 5.3. Can you identify *z* (complex containing system) in the unpacked sentence (see step 4.2)? If yes, go to step 5.4; if not, and if *x* is a component of a Darwinian individual *i*, classify as **function as producing biological activity** and EXIT.
    - 5.4. Is the performance of the biological role associated with how well the complex system (or organism containing the complex system) performs (e.g "Liver functioning affects human health")? If yes, and if x** is a component of a Darwinian individual *i* and *z* is a Darwinian individual *i* or a complex system within *i*, classify as **function as performing biological advantage**. If no (e.g. "Liver functioning is important for the hepatic system"), and if *x* is a component of a Darwinian individual *i* and *z* is an *i* or a complex system within *i*, classify as function as producing biological role. EXIT.

<span id="fn1">fn 1:</span> Not going to provide guidelines as to how to classify the non-biological senses: (i) they are less ambiguous (for humans at least); (2) if one uses a sufficiently narrow corpus, these will be rarely encountered. The two exceptions, which would need to be dealt with in a fully-automated system, are function as programming routine/method and function as mathematical relation, as these will be encountered in the biological literature.
    
<span id="fn2">fn 2:</span> This classification involves uses such as "functional ecology", "functional biology", "functional connectivity", etc.---it's clearly wrong to unpack these as "ecology functions to do..." and so on. They have effectively become compound nouns with their own meaning (influenced heavily by historical and technical considerations). Clues that you are dealing with a technical use will be repeated use of the same phrase within the same paper (although this is a poor indicator for a sentence classification, as the annotator will be dealing with isolated sentences), having it either defined in the paper (or as undefined jargon that the reader is expected to know), definitions/articles on wikipedia/google searches (obviously vague but perhaps the most useful guideline, as this is a very good indicator that a phrase is used to refer to a technical concept within a particular subfield).
    
<span id="fn3">fn 3:</span> Function might be used as a noun by itself (e.g. "The function of the gene...") or it might be the head of a nominal phrase (e.g. "Protein function is..."). (I do not believe it can be used as a nominal modifier in a nominal phrase and not be the head -- i.e. I don't think that it can play the grammatical role that "protein" plays in "protein function is important" -- but let me know if you encounter an example of this.) The noun (or nominal phrase) might be the subject (dependency `nsubj`) (e.g. "Protein function is...") but it also might be in the predicate (in the traditional grammar sense where you only have subject and predicate) (e.g. an object in "The promoter regulates the functional reponse...").

<span id="fn4">fn 4:</span> More specifically, can you identify a concrete (see [<sup>6</sup>](#fn6)) nominal dependent of function? This might be a subject complement (via [nmod](https://universaldependencies.org/u/dep/nmod.html)) or a compound noun (via [compound](http://universaldependencies.org/docs/u/dep/compound.html)). It might be a direct dependency (i.e. follow one dependency arrow from function to its nominal dependent) or indirect (transitively follow multiple arrows from function to its nominal dependent, in which case it might pass through other parts of speech, such as a verb (e.g. `acl`, in which a verb phrase acts like an adjective)).  

<span id="fn5">fn 5:</span> Note that there might be edge cases here where we get a phrase like "liver protein function", in which case function has two `compound` dependencies (and instead of just "protein" we want "liver protein"). Likewise, we might find something like "important protein function", in which important is an `amod` dependent of function and there might be some cases that we want to hang on to these (not in this case, but there might be some instances in which this is the case).
    
<span id="fn6">fn 6:</span> In order to be a biological item *x*, the noun must be something that can qualify as a "character" or "trait" (variant of a character) of a Darwinian individual. This includes obvious concrete nouns like "heart", which refer to a physical item, but also includes nouns that cannot be identified by the senses but refer to a concrete concept (e.g. "gene", in the sense of "beanbag genetics", is used to refer to the concept of an atomic unit of DNA that has a well-defined phenotypic effect---this is a sufficiently concrete concept to qualify as a character even if it does not exactly map onto a physical piece of DNA). As a heuristic, if it makes sense to ask "The function of *x* is..." then *x* is a candidate for a biological item.
    
<span id="fn7">fn 7:</span> Once you have identified the concrete head noun, follow its dependency arrow(s) from tail to head (if they exist) for the purposes of building up the complete noun phrase of *x*. Identify modifiers (e.g. `compound`, `nmod`) that more completely describe the item. For example, in "the gene's DNA sequence" sequence is the head noun, DNA is a `compound` dependent of sequence, and gene is a `nmod:poss` dependent of sequence. Here *x* is best identified as "gene's DNA sequence". Likewise, in "the DNA sequence of the gene", DNA is again a `compound` dependent and gene is an `nmod` dependent, so again we identify *x* as "DNA sequence of the gene" (note that I consider "gene's DNA sequence" and "DNA sequence of the gene" to be semantically identical).
    
<span id="fn8">fn 8:</span> Some notes about *y* (mainly for my own working out). We either unpack as "The function of *x* is to *y* in *z* or "*x* functions to *y* in *z*. In both cases, it seems clear that *y* should be a verb or be an adjective formed from the verb (present participle). An example of the former is "The gene functions to regulate..." (where *y* is the infinitive verb "regulate"). Another is "The functional gene regulates..." or "Protein function regulates..." (where *y* is the finite verb "regulates"). An example of the latter is "The transcribing gene functions to..." (where *y* is the present participle "transcribing", which modifies *x* (gene)). spaCy classifies present participles as `VERB`, so I think this makes it fairly simple. If function is an `nsubj` or an `amod` of an `nsubj`, then I think we either want the finite verb (e.g. "the functional gene **regulates**), the infinite verb via `xcomp` (e.g. the functional gene acts **to regulate**). If function is the verb, then I think we want the `xcomp` dependent (e.g. the gene functions **to regulate**) or the verb acting as an adjective that modifies *x* (e.g. the **transcribing** gene functions to"). There are no doubt some other cases but I think these are the main ones (and I probably just need to get Stefan to compile a list of exceptions). A question here is whether the `dobj` of the verb that we assign to *y* is part of *y* or whether it is the containing system *z*? Take, for example, "the gene functions to regulate the liver". Is *y* "regulate" and *z* "the liver", or is *y* "regulate the liver"? Using an example that I used in the definition, "The function of the genetic sequence is to produce an siRNA transcript, which in turn acts to degrade mRNA." indicates that the `dobj` is part of *y*. The function of the genetic sequence is not simply to "produce" but to "produce an siRNA transcript"; likewise, the function of the protein is not simply to "regulate" but to "regulate the liver". This seems right. So I can just extend *y* to include the `dobj` of the verb (doesn't resolve how to identify *z* though---I think this will be based around `acl` modifiers of the `dobj` or `advcl` modifiers of the verb, as these provide extra information about *y*, which might include an answer to "how/where does *x* do *y*?" and identify *z*).
   
<span id="fn9">fn 9:</span> Since *x* is the subject, *y* needs to be something that *x* does (as opposed to something that *x* has done to it). If *x* is related to the finite verb by `dobj` it implies that it receives the action *y* rather does the action *y*. Thus, I am working on the hypothesis that when *x* is a `dobj` *y* cannot be identified.
   
<span id="fn10">fn 10:</span> To clarify, this is the finite verb of the independent clause in which function is present (not necessarily of the sentence). If function is in the nominal subject (or a modifier thereof), then this is the finite verb that was identified in step 3.1. Otherwise, function should be a verb or an adverb that modifies a verb.

<span id="fn11">fn 11:</span> Selected effect is much less clear than biological effect/role/advantage and we will need to come up with some guidelines that are more field-specific and technical than grammatical.

In [3]:
# you can enter in your own examples below (probably best to use one cell per example)
# make sure you save before shutting down (file -> shut down) then you can use ctrl-C on the terminal to stop the docker container
sentence = nlp("The function of the protein is to regulate the liver.")
visualise(sentence,80)

TOKEN               DEPENDENCY RELATION           HEAD                DEPENDENCIES                            

The                 determiner                    function            []
function            nominal subject               is                  [The, protein]
of                  case marking                  protein             []
the                 determiner                    protein             []
protein             modifier of nominal           function            [of, the]
is                  None                          is                  [function, regulate, .]
to                  marker                        regulate            []
regulate            open clausal complement       is                  [to, liver]
the                 determiner                    liver               []
liver               direct object                 regulate            [the]
.                   punctuation                   is                  []


In [7]:
# Below is a selection of sentences (numbered by ID) that were flagged as 'interesting cases' in the 
# original coding exercise (Dec 2019, original list of 16 function definitions). The initial 
# classifications and the new classification using the flowchart above are given.

# Causal role (SG,ZW); Biological advantage (JC)
# unidentifiable item
# sentence_6 = nlp("Here, we show that local changes in WDR function in Arabidopsis family members can be attributed to mutations of key residues followed by positive selection.")

# Selected effects (SG,ZW); Causal role (JC) 
# 1. biological activity (IF 'night wakings' is replaced by, say, 'proteins'. Otherwise 'night wakings function' is wrongly parsed as a compound phrase)
# 2. unidentifiable item
# sentence_200 = nlp("My comments on Haig's updating of Blurton Jones and da Costa's hypothesis on the effects of infant night wakings (i.e. that night wakings function to prolong inter-birth intervals through nursing induced suppression of ovulation) will focus on its implications for infant sleep architecture and sleep function in both infants and adults more generally.")

# Biological advantage (SG); N/A (JC); Causal role (ZW)
# unidentifiable item
# sentence_207 = nlp("However, as IncRNAs are predominantly defined by exclusion criteria, the set of genes annotated as IncRNAs includes many distinct subgroups, exemplifying diverse structural and, presumably, functional characteristics.")

# Biological advantage (SG); Causal role (JC,ZW)
# unidentifiable item
# sentence_211 = nlp("Accordingly, as long as their levels are properly maintained, transcribing these incRNAs from a different genomic location or supplanting them into the system should not interfere with their function (that is, their loss of function can be rescued by their expression from exogenous locations).")

# Biological advantage (SG); Causal role (JC); function, purpose, role, use (ZW)
# unidentifiable item
# sentence_213 = nlp("In addition, the fairly low levels at which IncRNAs are generally expressed, oftentimes just a few molecules per cell, naturally favours a cis mechanism of action, as diffusion or transport to other cellular compartments would render these transcripts too diluted to mediate a plausible function.")

# Biological advantage (SG), Causal role (JC), N/A (ZW)
# unidentifiable item
# sentence_272 = nlp("Evidence is surveyed from studies of rats, nonhuman primates, and humans to suggest that prefrontal dopamine has specific functions in attentional control and working memory, mediated mainly through the D1 receptor, whereas manipulations of serotonin are shown by contrast to affect reversal learning in monkeys and human volunteers and measures of impulsivity in rats.")

# Biological advantage (SG), Causal role (JC,ZW)
# unidentifiable item
# sentence_274 = nlp("Prolonged or severe periods of stress and fatigue also lead to inflexible or unfocused cognitive function.")

# visualise(sentence_274,150)

TOKEN               DEPENDENCY RELATION           HEAD                DEPENDENCIES                            

Prolonged           adjectival modifier           periods             [or, severe]
or                  coordinating conjunction      Prolonged           []
severe              conjunct                      Prolonged           []
periods             nominal subject               lead                [Prolonged, stress]
of                  case marking                  stress              []
stress              modifier of nominal           periods             [of, and, fatigue]
and                 coordinating conjunction      stress              []
fatigue             conjunct                      stress              []
also                adverbial modifier            lead                []
lead                None                          lead                [periods, also, to, .]
to                  prepositional modifier        lead                [function]
inflexible    

In [3]:
# Comparison with a selection of sentences from the spreadsheet 'Adv + SE' from Feb 2019.

# Biological advantage
# Performing biological advantage
# sentence_4 = nlp("Siderophores (yellow-shaded zones) of pathogenic bacteria (green cells) can function as virulence factors and damage host tissue and promote pathogen growth (left lung).")

# Selected effect
# Performing selected effect
# Notes: 
# 2.3. Dependency arrow followed from tail to head not head to tail.
# 3.2. Unable to follow as directed due to no 'nsubj' dependency. ?
# sentence_6 = nlp("While mechanistic knowledge on siderophores grew immensely, evolutionary questions fell behind, probably because of the obvious physiological function of siderophores, having evolved as a means for bacteria to obtain iron for metabolism.")

# Biological advantage
# unidentifiable item
# sentence_10 = nlp("In the present study, we demonstrate that the spines in D. bipectinata promote copulatory success, and not insemination or fertilization, contrary to the post-insemination sexual selection hypothesis for genital trait function and evolution.")

# Producing biological activity
# unidentifiable item
# sentence_11 = nlp("Although the fraction of lncRNAs that are functional that is, confer any type of fitness advantage - is not yet known, even the most modest estimates place this number at hundreds of transcripts.")

# Biological advantage
# unidentifiable item
# sentence_12 = nlp("Indeed, genotype-driven delineation of MCs has rekindled an emphasis on the need for deep-phenotyping in families if we are to achieve the goal of understanding genome function and more importantly, its links to human disease.")

# Biological role
# Note: Unable to follow as directed due to no 'nsubj' dependency. ?
# sentence_13 = nlp("Collectively, these integrative analyses provide further evidence that DNA methylation events by CamA may directly and/or indirectly affect the expression of multiple genes involved in the in vivo colonization and biofilm formation of C. difficile and inspire future studies to elucidate the mechanisms that underlie the functional roles of CAAAAA methylation in the pathogenicity of C. difficile.")

# Selected effect
# unidentifiable item
# Note: spaCy identifies 'function' as a NOUN, though it appears to be a verb
# sentence_15 = nlp("If properties and functions of AS/REM are better explained as due to the influence of genes of paternal origin (rather than overall protection of the organism), then AS/REM sleep in the infant should, according to genomic conflict theory, function to extract resources from the mother consistent with getting paternal line genes into the next generation.")

# visualise(sentence_15,150)

TOKEN               DEPENDENCY RELATION           HEAD                DEPENDENCIES                            

If                  marker                        explained           []
properties          nominal subject (passive)     explained           [and, functions]
and                 coordinating conjunction      properties          []
functions           conjunct                      properties          [AS/REM]
of                  case marking                  AS/REM              []
AS/REM              modifier of nominal           functions           [of]
are                 auxiliary (passive)           explained           []
better              adverbial modifier            explained           []
explained           None                          explained           [If, properties, are, better, influence, ), ,, sleep, should, ,, according, ,, function, .]
as                  case marking                  influence           [due]
due                 None                    

In [8]:
# Some arbitrary sentences from the original corpus
# sentence_1 = nlp("A central theory in evolutionary developmental biology is that functional novelty arises through changes to the regulation and expression, both spatially and temporally, of otherwise well-conserved proteins.")
# sentence_2 = nlp("Here, we investigate the WD-repeat (WDR) protein family to understand how new functions can arise in highly conserved families of transcriptional regulators.")
# sentence_3 = nlp("In the MBW complex, the WDR protein functions as a scaffold, on which the DNA-binding MYB and bHLH proteins interact to generate the transcriptional complex.")
# sentence_4 = nlp("The proteins encoded by these genes function in the transcriptional regulation of the central circadian clock component CIRCADIAN CLOCK ASSOCIATED 1 (CCA1).")
# sentence_5 = nlp("Their sequence similarity to TTG1 could suggest that these very different functions in the plant are the result of changes to the regulation of these WDR genes, rather than functional changes to the proteins they encode.")
# sentence_7 = nlp("These results shed new light on protein functional evolution through small changes, and point to a much more significant role than was previously suspected for this particular protein family in circadian regulation.")
# sentence_8 = nlp("Protein function diverges in the TTG1-like WDR protein clade.")
# sentence_9 = nlp("LWD1 and LWD2 have been described to have a very different function to TTG1, acting as scaffolds for the transcriptional regulators functioning in the circadian oscillator.")
# sentence_10 = nlp("We were interested in determining whether this different function of proteins with such high similarity depends on differential regulation.")

# sentence_12 = nlp("To test this hypothesis, we first investigated whether these two proteins have a similar function to TTG1 through a transgenic rescue test by ectopic expression in the ttg1-1 mutant.")
# sentence_65 = nlp("T-tests were used to explore associations between exercise and function.")
# sentence_83 = nlp("This transgene is functional and forms a Polycomb domain.")
# sentence_101 = nlp(" Here, we describe a laser ablation technique for high-precision manipulation of microscale body parts of insects, and employ it to discern the adaptive function of a rapidly evolving and taxonomically important genital trait: the intromittent claw-like genital spines of male Drosophila bipectinata Duda.")
# sentence_206 = nlp("Cis-acting lncRNAs, which constitute a substantial fraction of lncRNAs with an attributed function, regulate gene expression in a manner dependent on the location of their own sites of transcription, at varying distances from their targets in the linear genome.")

# sentence_298 = nlp("High-throughput transcriptomics and metabolomics show that the liver has independent circadian functions specific for metabolic processes such as the NAD + salvage pathway and glycogen turnover.")
# sentence_299 = nlp("Hence, full circadian function in the liver depends on signals emanating from other clocks, and light contributes to tissue-autonomous clock function.")
# sentence_300 = nlp("Nevertheless, some cyclic transcripts persist in the absence of a functioning clock, suggesting that alternative mechanisms contribute to fluctuations in the liver.")
# sentence_301 = nlp("Tissue-specific clock ablation has been instrumental in identifying the functions of peripheral clocks; however, they have not allowed assessment of their degree of autonomy.")
# sentence_302 = nlp("We demonstrate that the liver is intrinsically capable of clock function even in absence of functioning clocks in all other tissues.")
# sentence_303 = nlp("Last, lack of circadian rhythms in liver-RE mice maintained under constant darkness reveals a critical role of the light-dark (LD) cycle on tissue-autonomous function.")
# sentence_304 = nlp("Thus, system-wide functional clocks are not required to direct liver BMAL1 expression.")
# sentence_305 = nlp("The other 81% of WT metabolites not oscillating in Liver-RE were central to liver function, including xenobiotic detoxification, a vast array of lipids for synthesis, oxidation and membranous function, purine and pyrimidine nucleotides, cofactors, and vitamin metabolites.")
# sentence_306 = nlp("To determine functional coherence of metabolome and transcriptome, we performed integrated pathway enrichment analysis using IMPaLA.")
# sentence_307 = nlp("Strikingly, 2py and 4py are normalized to WT levels in Liver-RE, signifying proper salvage pathway function and recycling of NAM to NAD + (Figure 5A).")

# visualise(sentence_5,150)

TOKEN               DEPENDENCY RELATION           HEAD                DEPENDENCIES                            

Their               None                          similarity          []
sequence            compound                      similarity          []
similarity          nominal subject               suggest             [Their, sequence, TTG1]
to                  case marking                  TTG1                []
TTG1                modifier of nominal           similarity          [to]
could               auxiliary                     suggest             []
suggest             None                          suggest             [similarity, could, encode, .]
that                marker                        are                 []
these               determiner                    functions           []
very                adverbial modifier            different           []
different           adjectival modifier           functions           [very]
functions           nominal su

In [3]:
# Simple exemplar sentences
# sentence_1 = nlp("The function of the gene is to produce an RNA transcript.")
# sentence_2 = nlp("The gene functions to produce an RNA transcript.")
# sentence_3 = nlp("The function of the allele is to produce an RNA transcript, which produces a protein that acts to improve metabolic capacity.")
# sentence_4 = nlp("The allele functions to produce an RNA transcript, which produces a protein that acts to improve metabolic capacity.")
# sentence_5 = nlp("The function of zebra stripes is to deter biting insects.")
# sentence_6 = nlp("The zebra's stripes function to deter biting insects.")
# visualise(sentence_3)

TOKEN               DEPENDENCY RELATION           HEAD                DEPENDENCIES                            

The                 determiner                    function            []
function            nominal subject               is                  [The, allele]
of                  case marking                  allele              []
the                 determiner                    allele              []
allele              modifier of nominal           function            [of, the]
is                  None                          is                  [function, produce, .]
to                  marker                        produce             []
produce             open clausal complement       is                  [to, transcript]
an                  determiner                    transcript          []
RNA                 compound                      transcript          []
transcript          direct object                 produce             [an, RNA, ,, produces]
,            

In [5]:
# Producing biological role

# Reclassify as 'biological role'
# producing_biological_role = nlp("We demonstrate experimentally and unambiguously that the genital spines of this species function to mechanically couple the genitalia together.")


# Zach's Case Studies
# -------------------
# 1. BIOLOGICAL ACTIVITY

# Reclassify as 'unidentifiable item':
# biological_activity_1 = nlp("NMD genes are relatively depleted for protein truncating variants that are predicted to escape nonsense-mediated decay due to their location near the 3' end of the gene and are potential candidate genes that may cause disease via gain of function.")
# biological_activity_2_fn1 = nlp("The hypothesis that PFC DA has important functions in attentional function may easily be reconciled with classical evidence of an involvement in working memory function by assuming that the occupation of D1 receptors prevents behavioral distraction, which is likely to occur in the delays occurring between the sample and retention.")
# biological_activity_3 = nlp("PFC DA has an effect on attentional function.")

# Reclassify as 'unidentifiable meaning: effect unspecified' (?)
# biological_activity_4_fn1 = nlp("Thus, understanding PTM site stoichiometry helps in interpreting the functional relevance of a modification site and its mechanism for affecting protein function.")


# 2. BIOLOGICAL ROLE

# Reclassify as 'producing biological activity'
# biological_role_1 = nlp("Collectively, these integrative analyses provide further evidence that DNA methylation events by CamA may directly and/or indirectly affect the expression of multiple genes involved in the in vivo colonization and biofilm formation of C. difficile and inspire future studies to elucidate the mechanisms that underlie the functional roles of CAAAAA methylation in the pathogenicity of C. difficile.")

# Parsing error ('gas exchange function' is parsed as compound construction, with the consequence of removing nmod dependency from 'function'.)
# biological_role_2 = nlp("Details of the root architecture are thus important for this gas exchange function and in crops such differences can be exploited.")

# Parsing error ('function' is a noun with nmod dependency 'are')
# biological_role_4 = nlp("Indeed, introns are the playground for the development of new proteins as a function of alternative splicing.")

# Reclassify as 'biological advantage'
# biological_role_3_fn1 = nlp("Morphology and function of endocrine tissues are presented in the context of their responsibility for coordinating organ functions throughout the body.")


# 3. BIOLOGICAL ADVANTAGE

# Non-trivial
# biological_advantage_1 = nlp("In the present study, we demonstrate that the spines in D. bipectinata promote copulatory success, and not insemination or fertilization, contrary to the post-insemination sexual selection hypothesis for genital trait function and evolution.")

# Reclassify as 'performing biological advantage'
# biological_advantage_2 = nlp("Siderophores (yellow-shaded zones) of pathogenic bacteria (green cells) can function as virulence factors and damage host tissue and promote pathogen growth (left lung).")

# 4. SELECTED EFFECT

# Reclassify as 'unidentifiable item' (according to Flowchart, although this does not seem right)
# selected_effect_1_fn2 = nlp("If properties and functions of AS/REM are better explained as due to the influence of genes of paternal origin (rather than overall protection of the organism), then AS/REM sleep in the infant should, according to genomic conflict theory, function to extract resources from the mother consistent with getting paternal line genes into the next generation.")

# Reclassify as 'unidentifiable meaning: effect unspecified' 
# selected_effect_2 = nlp("While mechanistic knowledge on siderophores grew immensely, evolutionary questions fell behind, probably because of the obvious physiological function of siderophores, having evolved as a means for bacteria to obtain iron for metabolism.")

# 5. ECOLOGICAL ACTIVITY

# Leaving aside ecological funtions...
# ecological_activity = nlp("A related concept is that the functions provided by diversity should aid the persistence of species-rich vegetation.")

# 6. ECOLOGICAL ROLE
# 7. ECOLOGICAL ADVANTAGE
# 8. ECOLOGICAL SELECTED EFFECT

# 9. PRODUCING BIOLOGICAL ACTIVITY
producing_biological_activity_1 = nlp("To confirm that the loss of CamA leads to a decrease in the number of cells that produce functional spores, we compared the ability of ∆camA to form heat-resistant spores that are capable of germinating and outgrowing using a heat-resistance assay.")
producing_biological_activity_2_fn2 = nlp("The hypothesis that PFC DA has important functions in attentional function may easily be reconciled with classical evidence of an involvement in working memory function by assuming that the occupation of D1 receptors prevents behavioral distraction, which is likely to occur in the delays occurring between the sample and retention.")
producing_biological_activity_3_fn3 = nlp("The hypothesis that PFC DA has important functions in attentional function may easily be reconciled with classical evidence of an involvement in working memory function by assuming that the occupation of D1 receptors prevents behavioral distraction, which is likely to occur in the delays occurring between the sample and retention.")
producing_biological_activity_4 = nlp("Although the fraction of IncRNAs that are functional that is, confer any type of fitness advantage - is not yet known, even the most modest estimates place this number at hundreds of transcripts.")
producing_biological_activity_5_fn2 = nlp("Thus, understanding PTM site stoichiometry helps in interpreting the functional relevance of a modification site and its mechanism for affecting protein function.")

# 10. PERFORMING BIOLOGICAL ROLE
performing_biological_role_1 = nlp("It is therefore possible that circadian rhythmicity in circulating melatonin concentrations in humans may affect islet function.")
performing_biological_role_2 = nlp("Available studies strongly support the idea that ovarian hormones (estrogen, progesterone) and placental factors (IFNT, CSH1, and GH1) activate and maintain uterine gland morphogenesis and secretory function during pregnancy in sheep.")
performing_biological_role_3 = nlp("In human islets, coupling of melatonin receptors to GI inhibition of adenylate cyclase is either absent or nonfunctional and the Gq coupling to PLC is the main pathway activated by melatonin.")
performing_biological_role_4 = nlp("Studies in mouse models and cell lines also implicate Chd2 in neuronal dysfunction: perturbations of Chd2 affect neurogenesis in the mouse developing the cerebral cortex.")
performing_biological_role_5 = nlp("Watanable et al found two-fold differences in methane emission between rice cultivars; these differences were not correlated with the amount of roots of the cultivar, but apparently to differences in functioning of the air channels and/or effects on rhizosphere organisms.")

# 11. PRODUCING BIOLOGICAL ADVANTAGE
# 12. PERFORMING SELECTED EFFECT

# 13. PRODUCING ECOLOGICAL ACTIVITY
producing_ecological_activity_1 = nlp("Exactly that happened in nine years, but (we predict) without losing function, because the site retained the three highest-performing species.")
producing_ecological_activity_2_fn2 = nlp("Plant assemblages with more species or functional groups typically have enhanced ecosystem functioning relative to monotypes.")

# 14. PERFORMING ECOLOGICAL ROLE
# 15. PRODUCING ECOLOGICAL ADVANTAGE

# 16. PERFORMING ECOLOGICAL SELECTED EFFECT
performing_ecological_selected_effect_1 = nlp("Patterns of above-and below ground resource availability determine the functionality (ultimate value) of the plant's strategy, as it emerges as a realization of genetically determined phenotypic plasticity and the actual environment during its development (proximate factors).")
performing_ecological_selected_effect_2 = nlp("Where much of current research is focused on an experimental approach to test the phenotypic response of current genotypes to the likely environment of the next century, we will try to assess the ultimate driving forces which determine the functionality of these strategies to explore possible directions of future change in plant strategies if, e.g.")

# 17. TECHNICAL
technical_1 = nlp("Know how assemblages perform made biodiversity-ecosystem function theory both explanatory and predictive.")
technical_2 = nlp("The biomechanics of cranial kinesis from shes to reptiles and birds is particularly useful as an introduction to functional morphology.")

# 18. UNIDENTIFIABLE ITEM
unidentifiable_item_1 = nlp("The crystallographic B-factor, which describes the attenuation of X-ray scattering caused by thermal motion, has been previously used in the prediction of functionally damaging variation.")
unidentifiable_item_2 = nlp("From an evolutionary point of view, it is not always clear whether these additional functions reflect mere by-products of iron chelation or adaptations in their own right.")

# 19. UNIDENTIFIABLE MEANING: EFFECT SPECIFIED

# 20. UNIDENTIFIABLE MEANING: EFFECT UNSPECIFIED
unidentifiable_meaning_effect_unspecified_1_fn1 = nlp("If properties and functions of AS/REM are better explained as due to the influence of genes of paternal origin (rather than overall protection of the organism), then AS/REM sleep in the infant should, according to genomic conflict theory, function to extract resources from the mother consistent with getting paternal line genes into the next generation.")
unidentifiable_meaning_effect_unspecified_2 = nlp("Whereas male spine length variation correlates positively wtih male competitive fertilization success, as predicted by the post-insemination sexual selection hypothesis, experimental studies involving the manipulation of the spines in C. maculatus are lacking, precluding definitive conclusions about their proximate function.")
unidentifiable_meaning_effect_unspecified_3 = nlp("Indeed, genotype-driven delineation of MCs has rekindled an emphasis on the need for deep-phenotyping in families if we are to achieve the goal of understanding genome function and more importantly, its links to human disease.")
unidentifiable_meaning_effect_unspecified_4 = nlp("For these scenarios, evolutionary theory predicts that siderophores can function as competitive agents against other species.")
unidentifiable_meaning_effect_unspecified_5 = nlp("The kidney is given detailed treatment, including structure and function across taxa and mechanisms of water balance in freshwater, marine, and terrestrial environments.")
unidentifiable_meaning_effect_unspecified_6_fn2 = nlp("Morphology and function of endocrine tissues are presented in the context of their responsibility for coordinating organ functions throughout the body.")

# 21. MATHEMATICS
mathematics = nlp("The slope of the rate constant for the fast phase as a function of DNA concentration gives a bimolecular rate constant of 5.")

# 22. NON-BIOLOGICAL ROLE OR PURPOSE
# 23. PERFORMING NON-BIOLOGICAL ROLE OR PURPOSE
# 24. RELATIONAL
# 25. SOCIAL EVENT

# 26. PROGRAMMING
programming = nlp("To assess coverage, sequence depths were computed using the genomeCov function of BEDTOOLS v.")

# 27. N/A

visualise(selected_effect_2,200)


TOKEN               DEPENDENCY RELATION           HEAD                DEPENDENCIES                            

While               marker                        grew                []
mechanistic         adjectival modifier           knowledge           []
knowledge           nominal subject               grew                [mechanistic, siderophores]
on                  case marking                  siderophores        []
siderophores        modifier of nominal           knowledge           [on]
grew                adverbial clause modifier     fell                [While, knowledge, immensely]
immensely           adverbial modifier            grew                []
,                   punctuation                   fell                []
evolutionary        adjectival modifier           questions           []
questions           nominal subject               fell                [evolutionary]
fell                None                          fell                [grew, ,, questions, b

In [4]:
# sentence_29 = nlp("The capability of TTG1 proteins to regulate the circadian clock is a remnant of an ancient function.")
sentence_102 = nlp("We demonstrate experimentally and unambiguously that the genital spines of this species function to mechanically couple the genitalia together.")

# sentence_110 = nlp("Arguably the most popular hypothesis for the evolution of male genital complexity is cryptic female choice, according to which genitalia function as internal courtship devices, with those males best able to stimulate the female siring a disproportionate fraction of offspring.")
# 'genitalia function' incorrectly parsed as compound noun instead of 'function' as verb.

# visualise(sentence_102,100)

TOKEN               DEPENDENCY RELATION           HEAD                DEPENDENCIES                            

We                  nominal subject               demonstrate         []
demonstrate         None                          demonstrate         [We, and, function, .]
experimentally      adverbial modifier            and                 []
and                 adverbial modifier            demonstrate         [experimentally, unambiguously]
unambiguously       adverbial modifier            and                 []
that                marker                        function            []
the                 determiner                    spines              []
genital             adjectival modifier           spines              []
spines              nominal subject               function            [the, genital, species]
of                  case marking                  species             []
this                determiner                    species             []
species        

In [2]:
# sentence = nlp("We demonstrate experimentally and unambiguously that the genital spines of this species function to mechanically couple the genitalia together.")
# sentence = nlp("Untranslated regions function to mediate export of transcripts in the brain.")
# sentence = nlp("The other active state (CTCF) has high CTCF binding and includes sequences that function as insulators in a transfection assay.")

# activity = nlp("The primary function of the HPA axis activation is to prime the body for the 'fight or flight' response.") # gut-brain

role = nlp("We demonstrate experimentally and unambiguously that the genital spines of this species function to mechanically couple the genitalia together.")
# perf_role_1 = nlp("Both of these receptors are known to influence CRF release from the hypothalamus, and changes in their expression may contribute to altered HPA function in GF animals.") # gut-brain
# perf_role_2 = nlp("Faecalibacterium prausnitzii (ATCC 27766) may function as a promising psychobiotic where it recently demonstrated an anxiolytic and antidepressant-like phenotype in rats, probably via increasing cecal SCFA and plasma IL-10 levels while reducing corticosterone and IL-6 levels.")
# dysfunction = nlp("For example, atypical antipsychotics are known for weight gain and metabolic dysfunction as prominent side effects.")
visualise(role,150)
# visualise(perf_role,150)
# visualise(dysfunction,150)

TOKEN               DEPENDENCY RELATION           HEAD                DEPENDENCIES                            

We                  nominal subject               demonstrate         []
demonstrate         None                          demonstrate         [We, and, function, .]
experimentally      adverbial modifier            and                 []
and                 adverbial modifier            demonstrate         [experimentally, unambiguously]
unambiguously       adverbial modifier            and                 []
that                marker                        function            []
the                 determiner                    spines              []
genital             adjectival modifier           spines              []
spines              nominal subject               function            [the, genital, species]
of                  case marking                  species             []
this                determiner                    species             []
species        