I will use this notebook as a workspace to go through my notes on the function handbook, 
translating the scheme into a dependency grammar format (using Universal Dependencies,
https://universaldependencies.org/, which is the parsing scheme adopted by spaCy)

In [1]:
import spacy
import scispacy
from spacy import displacy

In [2]:
# an interesting first note is that the web language models from spacy don't do a great job of parsing example sentences
# Take a simple example
nlp_web_sm = spacy.load("en_core_web_sm")
doc = nlp_web_sm("The gene functions as an important regulator.")
displacy.render(doc, style="dep", jupyter=True, options={'distance' : 140})

In [3]:
# en_core_web_sm doesn't actually have word embeddings but en_core_web_md does and the problem also exists here
nlp_web_md = spacy.load("en_core_web_md")
doc = nlp_web_md("The gene functions as an important regulator.")
displacy.render(doc, style="dep", jupyter=True, options={'distance' : 140})

In [4]:
# The obvious problem here is that the POS for functions is NOUN.
# Interestingly it is assigned as the root of the sentence though.
# Incidentally, the same issue arises even if the structure is kept similar but less technical words are used
doc = nlp_web_md("The couch functions as a seating device.")
displacy.render(doc, style="dep", jupyter=True, options={'distance' : 140})

In [5]:
# But if we import a statistical model from scispacy then functions is tagged as a verb
nlp_sci_md = spacy.load("en_core_sci_md")
doc1 = nlp_sci_md("The gene functions as an important regulator.")
displacy.render(doc, style="dep", jupyter=True, options={'distance' : 140})
# With functions being correctly tagged as a verb, the dependency structure wrt functions also changes.
# Instead of having "gene" and "the" as dependencies of "functions" we now just have "gene".
# (It seems like it was previously being parsed as the noun phrase "gene functions" with "functions" as head.)
# We now also have the object "regulator" as a direct dependency of "functions" (whereas before it was "as")



In [6]:
# It's also useful to show the descriptions of the POS and dependencies in a table
def print_token_pos_dep(doc, tok_width=30, pos_width=30, dep_width=30):
    """
    prints token, pos description, and dep description of a doc
    args:
        doc: spacy doc
        tok_width: width of col 1
        pos_width: width of col 2
        dep_width: width of col 3
    returns:
        Nothing (prints output to screen)
    """
    print("{:<{}}{:<{}}{:<{}}\n".format("TOKEN", tok_width, "POS", pos_width, "DEPENDENCY", dep_width))
    for token in doc:
        print("{:<{}}{:<{}}{:<{}}".format(
            token.text, tok_width, str(spacy.explain(token.pos_)), pos_width, 
            str(spacy.explain(token.dep_)), dep_width))
        
print_token_pos_dep(doc1)

TOKEN                         POS                           DEPENDENCY                    

The                           determiner                    determiner                    
gene                          noun                          nominal subject               
functions                     verb                          None                          
as                            subordinating conjunction     case marking                  
an                            determiner                    determiner                    
important                     adjective                     adjectival modifier           
regulator                     noun                          modifier of nominal           
.                             punctuation                   punctuation                   


In [7]:
# I will use the biggest statistical model in scispacy from now on
nlp = spacy.load('en_core_sci_lg')



In [8]:
    
# Perhaps the most interesting thing to note is that "functions" is an intransitive verb (i.e.
# it doesn't take a direct object). Since this sentence is already in the "unpacked form", this means that if we
# unpack into the standard form with function as the verb "functions" (or "function" if plural), function will 
# never take a direct object.

# Note that regulator is marked as "nmod" (modifier of nominal). This seems to be a UD v1 designation?
# So perhaps spaCy hasn't implemented UD v2, as UD v2 seems to mark adjunct nominal modifiers of non-nouns as "obl"
# (see https://universaldependencies.org/u/dep/nmod.html). In either case, it's marking it as a non-essential modifier/

# In this example, I think "as" is perhaps the next most interesting thing to discuss.
# "As" is used as a subordinating conjunction (connecting the subordinate clause "as an important regulator"
# to the independent clause "The gene functions").
# "As" is being used in the sense of (Merriam-Webster)
"""
2 : in or to the same degree in which
soft as silk
—usually used as a correlative after an adjective or adverb modified by adverbial as or so
as cool as a cucumber
"""

# This perhaps isn't the ideal definition for our purposes, but basically "as" and the subordinating phrase that
# follows it helps us answer the question "how does it function?" or "what does it function to do?".
# More generally we will see a number of other (syntactically different but semantically related) examples of
# function being modified in a similar manner. An example off the top of my head is function as subject being 
# linked to a subject complement via the copula "is". For example, consider the sentence below:

doc2 = nlp("The gene's function is regulation")
displacy.render(doc2, style="dep", jupyter=True, options={'distance' : 140})
print_token_pos_dep(doc2)

# This is another instance of function being in a sense "endowed" with additional meaning.
# Semantically, I would say that "The gene's function is regulation" is equivalent to
# "The gene functions as a regulator". Other equivalent (or nearly so) examples include "The gene functions to regulate",
# "The function of the gene is to regulate", "The function of the gene is regulation", "The function of the
# gene as a regulator...", "The gene has a function to regulate", and so on.

# Although slightly different, I would argue that a clause like "Gene function regulates..." is semantically
# nearly identical. For example, compare "The promotor's function is to regulate gene expression." with
# "Promotor function acts to regulate gene expression." (The latter phrasing seems a bit unnatural, so perhaps
# something like "Promotor function plays an important role in regulating gene expression" is more common.) 


TOKEN                         POS                           DEPENDENCY                    

The                           determiner                    determiner                    
gene                          noun                          possession modifier           
's                            particle                      case marking                  
function                      noun                          nominal subject               
is                            auxiliary                     copula                        
regulation                    noun                          None                          


In [9]:
# The examples above highlight a couple of key concepts.
# First, if function is the nominal subject the "item" to which the function should be attached will often be
# found as a component of the subject (e.g. if the subject is a compound noun like "protein function") or as
# a prepositional phrase that postmodifies function. I will consider these to be equivalent and for simplicity
# will unpack "function of the <item>", "function possessed by the <item>", "function that the <item> has",
# "<item> function" and so on as "<item> function".

# Consider the following examples:

doc3 = nlp("The function of the protein is important for regulating homeostasis.")
displacy.render(doc3, style="dep", jupyter=True, options={'distance' : 90})
print_token_pos_dep(doc3)
doc4 = nlp("Protein function is important for regulating homeostasis.")
displacy.render(doc4, style="dep", jupyter=True, options={'distance' : 120})
print_token_pos_dep(doc4)
doc5 = nlp("The function possessed by the protein is important for regulating homeostasis.")
displacy.render(doc5, style="dep", jupyter=True, options={'distance' : 90})
print_token_pos_dep(doc5)

# Note that in UD the nonverbal predicate ("important") of the copula "is" is assigned ROOT.
# In all cases, "function" is the nominal subject.
# In doc3 and doc4, "protein" is a dependency of "function" (in doc3 protein is a modifier of nominal;
# in doc4, protein is probably best described as a noun adjunct that modifes the head noun function).
# In doc5, it's less straightforward, but "protein" is a dependent of "possessed" which is itself a dependent of
# "function" (so it's sort of a second order or indirect dependency--not sure of the technical terminology)
# I think the important thing here is you can follow the acyclic path from function -> protein even in doc5
# So one possible rule might be that if function is the nominal subject then the "item" can be identified (if it
# exists) by following the arrows and preferring closer dependencies (if there were multiple candidates)

TOKEN                         POS                           DEPENDENCY                    

The                           determiner                    determiner                    
function                      noun                          nominal subject               
of                            adposition                    case marking                  
the                           determiner                    determiner                    
protein                       noun                          modifier of nominal           
is                            auxiliary                     copula                        
important                     adjective                     None                          
for                           adposition                    marker                        
regulating                    verb                          adverbial clause modifier     
homeostasis                   noun                          direct object                

TOKEN                         POS                           DEPENDENCY                    

Protein                       noun                          compound                      
function                      noun                          nominal subject               
is                            auxiliary                     copula                        
important                     adjective                     None                          
for                           adposition                    marker                        
regulating                    verb                          adverbial clause modifier     
homeostasis                   noun                          direct object                 
.                             punctuation                   punctuation                   


TOKEN                         POS                           DEPENDENCY                    

The                           determiner                    determiner                    
function                      noun                          nominal subject               
possessed                     verb                          clausal modifier of noun (adjectival clause)
by                            adposition                    case marking                  
the                           determiner                    determiner                    
protein                       noun                          modifier of nominal           
is                            auxiliary                     copula                        
important                     adjective                     None                          
for                           adposition                    marker                        
regulating                    verb                          adverbial claus

In [10]:
for token in doc5:
    print(token.text, token.dep_, token.head.text, token.head.pos_,
            [child for child in token.children])

The det function NOUN []
function nsubj important ADJ [The, possessed]
possessed acl function NOUN [protein]
by case protein NOUN []
the det protein NOUN []
protein nmod possessed VERB [by, the]
is cop important ADJ []
important ROOT important ADJ [function, is, regulating, .]
for mark regulating VERB []
regulating advcl important ADJ [for, homeostasis]
homeostasis dobj regulating VERB []
. punct important ADJ []


In [11]:
# test out cosine similarity
# doc6/7 from one paper and doc8/9 from another
# using scispacy
doc6 = nlp("A central theory in evolutionary developmental biology is that functional novelty arises through changes to the regulation and expression, both spatially and temporally, of otherwise well-conserved proteins.")
doc7 = nlp("It is not possible to rule out rhythmic post-transcriptional modification of these proteins, which might also influence their function.")
doc8 = nlp("Unfortunately, these individuals are at greater risk for other cancers, cardiovascular disease, osteoporosis, diabetes, and functional decline.")
doc9 = nlp("We explored associations between lifestyle factors and physical functioning among elderly cancer survivors who were screened for Project LEAD (Leading the Way in Exercise And Diet), a home-based, diet and exercise intervention trial.")

print(doc6.similarity(doc7))
print(doc6.similarity(doc8))
print(doc6.similarity(doc9))
print('\n')
print(doc8.similarity(doc6))
print(doc8.similarity(doc7))
print(doc8.similarity(doc9))
# using web medium
doc6 = nlp_web_md("A central theory in evolutionary developmental biology is that functional novelty arises through changes to the regulation and expression, both spatially and temporally, of otherwise well-conserved proteins.")
doc7 = nlp_web_md("It is not possible to rule out rhythmic post-transcriptional modification of these proteins, which might also influence their function.")
doc8 = nlp_web_md("Unfortunately, these individuals are at greater risk for other cancers, cardiovascular disease, osteoporosis, diabetes, and functional decline.")
doc9 = nlp_web_md("We explored associations between lifestyle factors and physical functioning among elderly cancer survivors who were screened for Project LEAD (Leading the Way in Exercise And Diet), a home-based, diet and exercise intervention trial.")
print('\n')
print(doc6.similarity(doc7))
print(doc6.similarity(doc8))
print(doc6.similarity(doc9))
print('\n')
print(doc8.similarity(doc6))
print(doc8.similarity(doc7))
print(doc8.similarity(doc9))

#obviously need to test this alot more before drawing any conclusions but the first set looks intruiging
# doc 6 and 9 should intuitively not be that similar which is recovered using scispacy lg but not at all from web_md
# the sci_lg seems to space out comparisons a lot better (which is not necessarily surprising as of course two sentences
# from scientific papers will seem very similar for a model trained on a corpus including wikipedia, news, etc.)
# note that of course these comparisons have nothing to do with semantics but rather similarity in the sense of the
# text they were drawn from. Again need to run far more tests but definitely a point in favour of scispacy models

0.8172883017760809
0.5959907654013872
0.44515275327939186


0.5959907654013872
0.5954728541798259
0.6549284890486828


0.9223619423189712
0.8163925168136106
0.8640749153788936


0.8163925168136106
0.8170797185152258
0.908630643529469


In [12]:
doc10 = nlp("The protein functions to regulate homeostasis")
displacy.render(doc10, style="dep", jupyter=True, options={'distance' : 150})
print_token_pos_dep(doc10)
print('\n')
def print_children(doc, col1_width=20, col2_width=20, col3_width=20, col4_width=20):
    for token in doc:
        print("{:<{}}{:<{}}{:<{}}{:<{}}{}".format(token.text, col1_width, token.dep_, col2_width,
                                                    token.head.text, col3_width, token.head.pos_,
                                                    col4_width, [child for child in token.children]))

print_children(doc10)

TOKEN                         POS                           DEPENDENCY                    

The                           determiner                    determiner                    
protein                       noun                          nominal subject               
functions                     verb                          None                          
to                            particle                      marker                        
regulate                      verb                          open clausal complement       
homeostasis                   noun                          direct object                 


The                 det                 protein             NOUN                []
protein             nsubj               functions           VERB                [The]
functions           ROOT                functions           VERB                [protein, regulate]
to                  mark                regulate            VERB                []
regulate

I will now sketch out a way to translate my earlier notes into a dependency grammar framework.
 
I will start with "case 1", which is the easiest one to deal with.
  
Function as nominal subject (or head of nominal phrase acting as subject) or finite verb: start from "function" and follow the arrows (from tail to head) to identify the item (*x* in my earlier terminology).
The item *x* might be a direct dependent (e.g. like protein functions above) or it might be indirect and you have to follow multiple arrows (I believe that this is related to "catena" in dependency grammar but need to investigate).
In any case, it is basically a transitivity relationship (if we have directed graph with connections {1,2}, {2,3} then transitive closure adds the connection {1,3}).

(In theory, this could be easily automated, but the tricky part would be identifying *x*, which needs to be 1. a concrete noun that is 2. a biological component of a Darwinian organism (or the ecological equivalent). Could certainly be accomplished with a look-up table with an exhaustive list of permissible *x*s but clearly that would be difficult/impossible, so might also be a place for a ML approach (though it might be a bit optimistic given the technical specification--it's dubious that even the most sota transformer models would "identify" similarities between usages of words related to such a narrow technical consideration). In any case, if I wanted to examine how one might code this (for the paper) I could just create a lookup table that contains the items within the corpus (which is obviously "cheating" but I would make it extremely clear that it's only for illustrative purposes and is a best case scenario (i.e. where we somehow obtain an exhaustive list of allowable "items")).)

The main thing to note here---because it distinguishes case 1 from some of the ones below---is that

In [13]:
# consider some examples that would be unpacked as "the gene functions to..."

case1a = nlp("The function of the gene is to be transcribed.")
# here we can follow the arrow from function directly to its dependent gene
displacy.render(case1a, style="dep", jupyter=True, options={'distance' : 150})

case1b = nlp("The function possessed by the gene is to be transcribed.")
# here we get from function to gene via function -> possessed -> gene
displacy.render(case1b, style="dep", jupyter=True, options={'distance' : 130})

case1c = nlp("The function possessed by the gene's DNA sequence is to be transcribed.")
# this is an incredibly ugly sentence but highlights that we can still get from function to gene via the transitive function -> possessed -> sequence -> gene
# (we can also get to "DNA", which technically could be an item /x/, so there might need to be a rule about why you choose gene as /x/ over DNA
# it's obvious in the context of which of the two should be the item /x/ but we might also add a syntactical element (posession in this particular case))
# actually arguably we should be identifying "sequence" as the item here
displacy.render(case1c, style="dep", jupyter=True, options={'distance' : 100})

Cases 3 and 4 (i decided that case 2 wasn't possible--am just using the original numbering while I work through all of this)  

case 3: functional as adjectival premodifier of concrete head of nominal subject  
case 4: functional as adjectival postmodifier of concrete head of nominal subject (e.g. linked by a copula)

This one is interesting. In my original notes I wanted to unpack both of these as "the promotor functions...".
That's currently still the plan but they do have different dependency relationships.
In both cases, there is a direct link between promotor and functional but the direction is reversed.
This is (at least in part) due to the fact that UD sets the object of the copula as the `ROOT` (so there is 
technically no way that the arrow could go into functional in case4 as occurs in case3).
I suppose (but only off the top of my head, need to verify this), that if one put `ROOT` on "is" in case4 then
you get is -> functional and promotor <- is, which not only removes the direct link but also doesn't give a transitive link between promoter and function.
Assuming I decide to keep this unpacking, the rule for case4 might need to be something like an adjectival modifier that has a copula as a dependent and *x* as a `nsubj` dependent.

Copulas seem to be quite an issue in UD, especially with regard to consistency across languages.
The key point is ([from UD page on copulas](https://universaldependencies.org/v2/copula.html]))

> The `cop` relation is restricted to function words (verbal or nonverbal) whose sole function is to link a nonverbal predicate to its subject and which do not add any meaning other than grammaticalized TAME categories (only one word in most languages).

My interpretation of this is that a copula essentially links a subject to its complement.
For our purposes, I see no difference between functional as an adjectival modifier of the subject and functional as a adjectival complement of the subject (linked by a copula). 
(Also, a fairly nice illustration of the ambiguation of "function" displayed here!) 

In [14]:
case3 = nlp("The functional promoter transcribes the gene.")
displacy.render(case3, style="dep", jupyter=True, options={'distance' : 150})
case4 = nlp("The promoter that transcribes the gene is functional.")
displacy.render(case4, style="dep", jupyter=True, options={'distance' : 150})

Case 5 adjective (functional) as premodifier of noun phrase (with abstract head) that complements or describes subject (linked either by a copula or stative verb).

Let's look at the example of "The gene is a functional component…" In the former, the subject "gene" is connected to the subject complement "functional component" by the copula "is", implying an unpacking of "the functional gene…" (because component refers to gene). Functional is a premodifier in a noun phrase with a concrete head but because it's a subject complement it's a stand-in for gene.

Now let's consider the example, "The gene has a functional role…". Now functional is a premodifier in a noun phrase with an abstract head. Although not a subject complement linked by a copula, the noun phrase is linked by "has" acting as a stative verb (this is more semantic than syntactic but I interpret this sentence fragment to mean that the gene has a characteristic/state. Likewise, "The gene plays a functional role" indicates a state of being rather than an action. Since the head of the nominal phrase "functional role" is abstract, we can throw it away and we can unpack as "the functional gene…". It's worthwhile to think through whether it's actually possible for the nominal phrase to have a concrete head. I don't think that would work has it would imply possession no? For example, something like "The gene has a functional TATA box..." implies possession does it not? It is now not akin to "The gene is functional" but rather "The gene contains a functional TATA box". It's also pretty clear that not any abstract head will work, but we might not encounter these in practice. Consider something like "The gene has a functional disregard for...". This example is obviously nonsensical, but nevertheless it probably wouldn't be accurate to unpack as "the functional gene". So perhaps I should write a note that we are looking for abstract nouns that relate to the use of the item *x* or consequences thereof (e.g. "role", "importance", "impact", etc.)

**some old stuff below that i still need to work through** 

Take a real-world example like "However, a major limitation of these technologies is that proteins are often irreversibly modified to a complete stoichiometry that could overestimate the functional impact of low-abundant PTM sites."

Noun phrase "functional impact" used in predicative form. It's being postmodified by the prepositional phrase "of low-abundant PTM sites" of which the head is PTM (post-translational modification) sites. First we want to unpack this part of the sentence. Start with "functional impact", throw away abstract head to get "functional" (maybe we convert these cases to noun function?). It's being postmodified by the head PTM sites, so we get "functional PTM sites" (or if we convert to a noun, "function of PTM sites"). PTM sites is concrete, so (either way) we unpack as "PTM sites function to…". We ask "what do PTM sites do?" to which we cannot answer, so y is missing.

Case 3-5, functional as premodifier of noun phrase used as subject (3), functional used in a predicative manner as lone adjective (4), functional used in a predicative manner as a premodifier (5) If the subject is concrete, recast as "functional <subject>"; if the subject is abstract and related to activity, importance, role, etc. (e.g. "the role is functional"), try to unpack by replacing the abstract subject with "function" (i.e. "the role is functional for the gene…" -> "the function for the gene…" Using the above process, 5 can be manipulated into 4, which can be manipulated into 3, which can be unpacked. 

In [15]:
case5a = nlp('The gene has a functional role.')
displacy.render(case5a, style="dep", jupyter=True, options={'distance' : 150})
case5b = nlp('The gene is a functional component.')
displacy.render(case5b, style="dep", jupyter=True, options={'distance' : 150})

Case 8 I think is similar to 3-5 (e.g. the linking of the subject to a state of being via a stative verb; e.g. "The gene has functionality")

So actually I think a lot of the cases above can be unified under "stative verb".
The copula "is" is a stative verb, but as are other verbs that indicate a state of being.
"Have" can certainly function this way, for example in "The gene has functionality", "has" refers to the character of the gene not to its possessions.
(I had hinted at this in my earlier notes but wasn't sure of the terminology.)

With this we can unite cases 3, 4, 5, parts of 6 (-ing as adjective or gerund), part of 7 (-ally modifying an adjective), and 8.

Case 1 is straightforward, as is the remaining part of 6 (-ing as present participle, which can just be reworked into the verb "functions").
Case 2 ("function" as modifier of head of compound nominal subject) seems syntatically invalid (e.g. would have to be something like "The function gene..." or "The function role...". I can't think of any example that isn't nonsensical).

I still need to deal with the remaining part of 7 (-ally modifying a verb). I also need to finish 5, 6, and 7 with regard to concrete/abstract nouns.

Once I have done all the above, I need to write some clear rules for how to identify the relevant clause/dependencies (while the above rules will unpack into a standard form, it doesn't fully address how function relates to the parts of speech (e.g. whether function is unpacked into the subject, verb, or object has implications for downstream classification)).

More generally, I'm hoping that the flowchart as currently written is broadly correct (still needs some testing before we can say this with confidence). What is missing regardless, however, is that *x*, *y*, and *z* need to be formulated as POS/DEP relations and the method for identifying these parts to be cast in the context of navigating the dependency tree.

In particular, I need specific instructions for identifying *x*, *y*, and *z* for steps:

- 2.2 (this will depend on whether *x* is a noun or verb, but it will either be a dependency or transitive dependency of *x* if *x* is a noun or the `nsubj` that is dependent on function if *x* is a finite verb)
- 3.3
- 3.6
- 3.7
(also 4s)

And I need to rewrite the instructions with formal terminology and dependency relations for steps:

- 3.1
- 3.2
- 3.3
- 3.4
- 3.5
- 3.6
(also 4s)

As a random note, while some of these distinctions are syntactic, some are also semantic (even before we get to steps 4 and 5). For example, the concrete/abstract noun distinction and even stative vs action verb distinction is partly semantic (although I think there are some syntactic differences; e.g. stative verbs not being able to be used in the progressive aspect).

The root/free morpheme is "function".
Although the verb function derives from the noun function, we will "unpack" into the verb form (I don't this is of much consequence, however, as we will show how to convert between the different parts of speech).

To simplify comparison of syntactically-diverse constructions, we will use word conversion to convert from derivations of "function" to its root form function (specifically morphological derivation and inflection). We will use morphological derivation to convert between parts of speech (e.g. adjective -> noun) and inflection to convert between number, tense and aspect (e.g. the plural functions to function). The key consideration during conversion is that we maintain (and convert if necessary) the dependencies between "function" and other parts of the sentence.
What I will describe here is a method for converting these derivations of the word function to its base form (the verb "function"), while at the same time, keeping track of and converting the form (where necessary) of the dependencies of "function". The idea is to rework the sentence such that semantic relations can be cast into a standard form.

Let's start with a list (possibly non-exhaustive) of derivative forms of "function"

- function (noun)
- functions (noun, plural)
- function (verb)
- functions (verb, plural)
- functional (adj)
- functionality (noun)
- functioning (gerund, acts as a noun)
- functioning (present participle, acts as an adj (maybe also an adv?))
- functioned (past participle, simple past)
- functionally

(will come back to this, gets a bit complicated with the distinction between tense/aspect. ultimately I think what matters for my purposes is whether it acts as a noun, adjective, etc.)



If I refer to function without quotes then I am referring to all the derivatives of function that we might wish to analyze. Note that a word (word1) is a dependency of another word (word2) if the dependency arrow originates at word2 and goes into word1 (i.e. word2 -> word1).

- 1. Non-biological or technical senses of function
   - 1.1 Is function used in a biological context? If yes, go to step 1.2; if no classify as either a mathematical, non-biological role or purpose, performing non-biological role or purpose, relational, social event, or programming sense of function[<sup>1</sup>](#fn1) and EXIT.
   - 1.2 Is function used in a technical sense?[<sup>2</sup>](#fn2) If yes, classify as **technical use**; if no, go to step 2.1.
- 2. Identify the part of speech
   - 2.1. Is function an [nsubj](https://universaldependencies.org/u/dep/nsubj.html) (nominal subject)[<sup>3</sup>](#fn3)? If yes, go to step 2.2; if no, go to step **???**(to split finite verb vs unidentifiable item)
   - 2.2. Can you identify a nominal dependent of function? This might be a subject complement (via [nmod](https://universaldependencies.org/u/dep/nmod.html)) or a compound noun (via [compound](http://universaldependencies.org/docs/u/dep/compound.html)). It might be a direct dependency (i.e. follow one dependency arrow from function to its nominal dependent) or indirect (transitively follow multiple arrows from function to its nominal dependent, in which case one possible dependency structure is `nsubj` (function) -> [acl](https://universaldependencies.org/u/dep/acl.html) -> `nmod`). **Even though this should get us to one of the candidates for *x*, there might be multiple options. So need to differentiate between concrete/abstract, but also between different concrete options (e.g. the gene's DNA sequence example above**
   - 2.3. 
   
   
   Can you identify *x*?[<sup>3</sup>](#fn3) If no, classify as unidentifiable item and EXIT. If yes, go to step 2.3.
    - 2.3. Is *x* an abstract noun? If yes, go to step 3.6. If no, go to step 3.7.
- 3. Detailed unpacking of function (or variant)13
    - 3.1. Is function being modified by another part of speech (e.g. another noun as in "protein function" or an adjective as in "transcriptional function")? If yes, go to step 3.2; if no (it modifies another part of speech such as in "functional spores"), go to step 3.4.
    - 3.2. Is the part of speech that modifies function sufficiently concrete? If yes (e.g. "transcriptional function"), go to step 3.3; if no (e.g. "important function"), treat function as a simple noun and go to step 2.2.
    - 3.3. Is the part of speech that modifies function the best candidate for x (e.g. "protein function", which could be unpacked as "The protein functions to…")? If yes, go to step 3.7 (noting x). If no, go to step 3.6.
    - 3.4. Identify the noun/adjective/adverb that the variant of function modifies and rearrange into the form "<modified noun> functions to …"; go to step 3.5.
    - 3.5. Is the <modified noun> sufficiently concrete (e.g. gene, liver) or is it abstract or an adjective/adverb (e.g. effects, important)? If abstract, go to step 3.6; if concrete, go to step 3.7
    - 3.6. Can you identify a sufficiently concrete noun (candidate for x) that the <function variant + modified part of speech> clause modifies (e.g. in "The gene is functionally important", the abverb "functionally" modifies the adjective "important" and "functionally important" modifies the concrete noun "gene)? If no, classify as unidentifiable item and EXIT; if yes, go to step 3.7 (noting x and treating <function variant + abstract modified part of speech> as a simple verb (e.g. unpacking "The gene is functionally important" as "The gene functions to…")).
    - 3.7. Manipulate sentence into unpacked form "x functions to do y in z…". Can y be identified14? If yes, go to step 4.1. If no, or if the variant of function being used is directly evaluative (e.g. dysfunctional, non-functional, loss of function, malfunction, etc.), go to step 5.1.
- 4. Candidates for biological activity/role/advantage/selected effect
    - 4.1. Can it be unpacked into the following form: x functions to do y such that doing y in the past caused x to be selected for or maintained in a population (relative to an actual or counterfactual historical alternative or set of alternatives to x)? If yes and x is a component of a Darwinian individual i, classify as function as selected effect and EXIT; if yes and x is an i or an assemblage of interacting i s (group, population, species, ecosystem, etc.), classify as function as ecological selected effect and EXIT; if no go to 4.2.
    - 4.2. Can it be unpacked into the following form: x functions to do y in z, where x is a component of a complex system z, z is a Darwinian individual i or a complex system within i, and x's performance of y in z is advantageous for i (relative to an explicitly-identified or implied alternative to x)? If yes, classify as function as biological advantage and EXIT. Can it be unpacked into the following form: x functions to do y in z, where x is an i or an assemblage of interacting i s15, z is a containing system of x (an assemblage of interacting i s such as a group, population, species, ecosystem, etc.), and x's performance of y is advantageous for z16 (relative to an explicitly-identified or implied alternative to x)? If yes classify as function as ecological advantage and EXIT. If no, go to step 4.3.
    - 4.3. Can it be unpacked into the following form: x functions to do y in z, where x is a component of a complex system z, and z is a Darwinian individual i or a complex system within i? If yes, classify as function as biological role and EXIT. Can it be unpacked into the following form: x functions to do y in z, where x is an i or an assemblage of interacting i s, and z is a containing system of x (an assemblage of interacting i s such as a group, population, species, ecosystem, etc.)? If yes, classify as function as ecological role and EXIT. If no, go to step 4.4.
    - 4.4. Can it be unpacked into the following form: x functions to do y, where x is a component (or subcomponent) of a Darwinian individual i"? If yes, classify as function as biological activity and EXIT. Can it be unpacked into the following form: x functions to do y, where x is a i or an assemblage of interacting i s (group, population, species, ecosystem, etc.)? If yes, classify as function as ecological activity and EXIT. If no, classify as unidentifiable meaning: effect specified and EXIT.
- 5. Candidates for function as performing or producing activity/role/advantage/selected effects
     - 5.1. Confirm that y cannot be identified in the unpacked form (if it can, either you have made a mistake or there is a mistake with the flowchart). Assuming y cannot be identified, go to step 5.2.
     - 5.2. Substitute "x function" with "how well x performs (or works)" (or "x works", depending on the particular syntatic construction/variant of function) into the raw sentence (not the unpacked sentence). If there is loss of meaning or ambiguity, classify as unidentifiable meaning: effect unspecified; if there is no loss of meaning and no ambiguity, go to step 5.3.
    - 5.3. Does the sentences's language indicate that the biological role has been shaped by selection? If so, and x is a component of a Darwinian individual i, classify as function as performing selected effect and EXIT. If so, and x is an i or an interacting assemblage of i s, classify as function as performing ecological selected effect and EXIT. If not, go to step 5.4.
    - 5.4. Can you identify z (complex containing system) in the unpacked sentence? If yes, go to step 5.5; if not, and x is a component of a Darwinian individual i, classify as function as producing biological activity17 and EXIT. If not and x is a Darwinian individual i or an interacting assemblage of i s, classify as function as producing ecological activity and EXIT.
    - 5.5. Is the performance of the biological role associated with how well the complex system (or organism containing the complex system) performs (e.g "Liver functioning affects human health")? If yes, and if x is a component of a Darwinian individual i and z is a Darwinian individual i or a complex system within i, classify as function as performing biological advantage. If yes, and if x is an i, or an assemblage of interacting i s, and z is an interacting assemblage of i s (group, population, species, ecosystem, etc.), classify as function as performing ecological advantage. If no (e.g. "Liver functioning is important for the hepatic system"), and if x is a component of a Darwinian individual i and z is an i or a complex system within i, classify as function as producing biological role. If no, and x is an i or an assemblage of interacting i s and z is an interacting assemblage of i s (group, population, species, ecosystem, etc.), classify as function as producing ecological role. EXIT.

<span id="fn1">fn 1:</span> Not going to provide guidelines as to how to classify the non-biological senses: (i) they are less ambiguous (for humans at least); (2) if one uses a sufficiently narrow corpus, these will be rarely encountered. The two exceptions, which would need to be dealt with in a fully-automated system, are function as programming routine/method and function as mathematical relation, as these will be encountered in the biological literature.
    
<span id="fn2">fn 2:</span> This classification involves uses such as "functional ecology", "functional biology", "functional connectivity", etc.---it's clearly wrong to unpack these as "ecology functions to do..." and so on. They have effectively become compound nouns with their own meaning (influenced heavily by historical and technical considerations). Clues that you are dealing with a technical use will be repeated use of the same phrase within the same paper (although this is a poor indicator for a sentence classification, as the annotator will be dealing with isolated sentences), having it either defined in the paper (or as undefined jargon that the reader is expected to know), definitions/articles on wikipedia/google searches (obviously vague but perhaps the most useful guideline, as this is a very good indicator that a phrase is used to refer to a technical concept within a particular subfield).
    
<span id="fn3">fn 3:</span> Or if a nominal phrase is the subject, function is the head of the nominal phrase (this is equivalent in the UD scheme as the `nsubj` tag points to the head of the nominal phrase acting as subject)

<span id="fn4">fn 4:</span>