## Ways to construct ECG schemas and constructions from FN data

In [6]:
from framenet.builder import build

### Schemas

Converting FN frames to ECG schemas:

- General info:
  - 'Forbidden' terms (FN terms that clash with ECG.  e.g. 'construction', 'map', 'form') get appended with "_fn"_
  - Selected semantic types are converted to Frames. Currently, these are:
      - 'Physical_object': "Entity", 'Artifact': "Artifact", 'Living_thing': "Biological_entity"
- **Relations** -- Frame relations are converted to schema relations: 
  - parents (inherits from) are listed as subcases ('subcase of')
  - cause (is causative of) and Uses are listed as evoked schemas ('evokes')
  - [**For 'is causative of', do we also want to evoke a Causation schema and, e.g. identify the 'is causative frame' with 
       the Causation.affectedProcess role? And maybe try to figure out which role shoud be bound to affectedEntity?**]
- **Roles**:
   - FEs are converted to ECG roles, and are listed in the 'roles' section
   - If an FE has one of the selected semantic types, the relevant schema is added as a type constraint to that role
   - [**add description of how inherited roles are handled -- might consider changing from current method. 
        Note that in FN, a subframe does not necessarily inherit all of the superframes roles (i.e., they are not necessarily   
        listed as FEs in the subframe. And even when they are, they can either be core or non-core).**]
   - [**should core, non-core, and peripheral roles be marked or somehow handled differently?**]
- **Bindings**
   - FE relations are listed as bindings in the 'constraints' section [**need to add a description of how this is done**]


In [7]:
help(ecg_demo1)

Help on function ecg_demo1 in module ecg_demo:

ecg_demo1(fn)
    Returns list of ECG schemas from FrameNet frames.



In [8]:
all_schemas = ecg_demo1(fn)

In [10]:
print(all_schemas)

schema Abandonment 
    subcase of Intentionally_affect 
    roles 
       Agent 
       Theme 
       Place 
       Time 
       Manner 
       Duration 
       Explanation 
       Depictive 
       Degree 
       Means 
       Purpose 
       Event_description 
       Event 
       Instrument 
       Patient 


schema Abounding_with 
    subcase of Locative_relation 
    evokes Abundance as abundance 
    roles 
       Theme: Entity 
       Location 
       Degree 
       Depictive 
       Time 
       Ground 
       Figure 
       Distance 
       Direction 
       Figures 
       Deixis 
       Accessibility 
       Directness 
       Temporal_profile 
       Region_quantification 
       Profiled_region 


schema Absorb_heat 
    subcase of Becoming 
    roles 
       Entity 
       Container 
       Heat_source 
       Place 
       Medium 
       Manner 
       Time 
       Explanation 
       Temperature 
       Duration 
       Circumstances 
       Purpose 
       Depictive 


## Constructions

### Choose the frame for which you want to make constructions 

In [11]:
all_cxns = ecg_demo2(fn, fnb, frame="Cause_motion")

**ecg_demo2**
- Takes in:
    - FrameNet object (fn) -- keeps track of all the data
    - FrameNetBuilder object (fnb)  -- creates FrameNet object from XML. Is in Builder.py
    - frame, e.g. "Motion" (frame)

- Returns: build_cxns_for_frame(frame, fn, fnb, role, pos)


**build_cxn_for frame(frame_name, fn, fnb, role_name, pos, filter_value=False)**:
According to documentataion for this function:
- Takes in:
    - frame_name, e.g. "Motion" 
    - FrameNet object (fn)
    - FrameNetBuilder object (fnb)
    - role_name: role to modify in types/tokens
    - pos: lexical unit POS to create tokens for (e.g., "V")
    - "filter_value" boolean: determines if you want to filter valence patterns
    
- Returns:
    - tokens
    - types
    - VP valences (non-collapsed) = **valence_patterns**
    - VP valences (collapsed) = **collapsed_valences**
    - VP constructions (non-collapsed) = **cxns_all**
    - vP constructions (collapsed) = **cxns_collapsed**
    
 **BUT** the actual code indicates that there are also a couple of additional returned items, having to do with prepositions and prepositional phrases (see last 3 dictionary keys):
 
             returned = dict(tokens=tokens,
                    types=types,
                    valence_patterns=valence_patterns,
                    collapsed_valences=collapsed_valences,
                    cxns_all=cxns_all,
                    cxns_collapsed=cxns_collapsed,
                    pp=pp,
                    prep_types=prep_types,
                    prepositions=prepositions)

In [12]:
help(ecg_demo2)

Help on function ecg_demo2 in module ecg_demo:

ecg_demo2(fn, fnb, frame='Motion', role='Manner', pos='V')
    Returns dictionary of types/tokens, valence cxns, and prepositions for a frame.



The ecg_demo2 function produces a dictionary with several different keys:

In [13]:
all_cxns.keys()

dict_keys(['tokens', 'prep_types', 'collapsed_valences', 'valence_patterns', 'pp', 'types', 'cxns_all', 'prepositions', 'cxns_collapsed'])

### Valence patterns

#### Description:
The **valence_patterns** are the set of group realizations for the current frame. 

Each LU is associated with its own set of valence patterns. For a given LU, if two or more annotated sentence share **all** the same elements (e.g. same number and kind of FE/PT/GF constituents') they will be considered instances of the same single valence pattern. But, if the LUs are different, this function will produce separate valence patterns, even if all other aspects of the two valence patterns are the same. **[need to verify this!]**   Consequently, one obvious way to generalize over valence patterns is to collapse patterns in these cases (i.e. generalize over the verbs in given frame).  

The valence patterns are based on the FE **group_ realizations** field of the Frame object. This field captures data about which frame elements show up together in which constructional patterns
 

#### Code:

As defined in build_cxns_for_frame: **valence_patterns** = get_valence_patterns(frame)
 
    def get_valence_patterns(frame):
        patterns = []
        for re in frame.group_realizations:
            patterns += re.valencePatterns
        return patterns
 
 This would do the same thing:
 
    def get_valence_patterns(frame):
          return [re.valencePatterns for re in frame.group_realizations]
    

In [14]:
print("Total number of valence patterns: ", len(all_cxns['valence_patterns']))
print(all_cxns['valence_patterns'])

Total number of valence patterns:  598
[Total: 1
Valences:[Frame: Cause_motion, GF: Ext, PT: NP, FE: Agent, total: 8
, Frame: Cause_motion, GF: Ext, PT: NP, FE: Agent, total: 8
, Frame: Cause_motion, GF: Dep, PT: PP[into], FE: Goal, total: 7
, Frame: Cause_motion, GF: Obj, PT: NP, FE: Theme, total: 8
]
LU: catapult.v, Total: 1
Valences:[Frame: Cause_motion, GF: Ext, PT: NP, FE: Agent, total: 8
, Frame: Cause_motion, GF: Ext, PT: NP, FE: Agent, total: 8
, Frame: Cause_motion, GF: Dep, PT: PP[to], FE: Goal, total: 2
, Frame: Cause_motion, GF: Obj, PT: NP, FE: Theme, total: 8
]
LU: catapult.v, Total: 1
Valences:[Frame: Cause_motion, GF: Ext, PT: NP, FE: Agent, total: 8
, Frame: Cause_motion, GF: Dep, PT: PP[over], FE: Distance, total: 1
, Frame: Cause_motion, GF: Dep, PT: PP[into], FE: Goal, total: 7
, Frame: Cause_motion, GF: Obj, PT: NP, FE: Theme, total: 8
]
LU: catapult.v, Total: 1
Valences:[Frame: Cause_motion, GF: , PT: CNI, FE: Agent, total: 8
, Frame: Cause_motion, GF: Dep, PT: PP

### Collapsed valence patterns

#### Description:
The collapsed_valences are 


#### Code:
  In scripts.py, the function **build_cxns_for_frame** inclues:
  - collapsed_valences = collapse_valences_to_cxns(frame)
  
The relevant code for the **collapse_valences_to_cxns** function is in Hypothesized_constructions.py:

        
**Note**: if filter is True, this function collapses the specific PTs for prepositional phrases into a reduced set of PTs that append the relevant FE.  For instance, a case where PT: PP[in] and FE:Goal would be converted to the more general PT: PP-Goal

        
        def collapse_valences_to_cxns(frame, filter=True):
            all_patterns=[]
            s = [valence for valence in frame.individual_valences if valence.lexeme.split(".")[1] == "v"]
            
            #if filter is true, return a reduced list with valence PT changed to more general PT, e.g. "Area-PP" """
            if filter:
                s = filter_by_pp(s)
            
            # 'sorted'creates a new list that is sorted in descending order (reverse=True)
            # based on the total number of each valence pattern (key=lambda valence: valence.total)
            by_total = sorted(s, key=lambda valence: valence.total, reverse=True)
            
            
            #ValencePattern is an object defined in lexical_units.py
            # """ Contains a list of valenceUnits (Valence objects), as well as associated
            #  annotations.This corresponds to a given valence pattern for an FEGroupRealization."""
            
            #I think this next section identifies valence patterns (units??) that include an "NI"
            #   phrase type and adds that valenceUnit to the 'initial_pattern' ValencePattern object
            #  Or, maybe it actually has the opposite function? [What does 'continue' achieve here?]
            for i in by_total:
                initial_pattern = ValencePattern(frame.name, 0, None)
                if i.pt in ['INI', 'DNI', 'CNI']:
                    continue
                initial_pattern.add_valenceUnit(i)
                
                # collapse_with_seed is in hypothesize_constructions.py
                # I think it's purpose is to: (1) filter certain valence patterns out of 
                # 'by_total', e.g. select only core FEs, and (2) compare the initial_pattern with
                # by_total, with the objective of adding certain 'by_total' items to 'initial_pattern'
                # BUT, it's unclear to me what sorts of comparisons are actually being made
                
                all_patterns.append(collapse_with_seed(initial_pattern, by_total, frame))
            
            #'filter_collapsed_patterns' removes duplicate patterns from 'all_patterns' 
            return filter_collapsed_patterns(all_patterns) 
            
   **TO DO:  add comments to the above code so it's easier to tell what it's supposed to be doing!!**
   
   **Also: what do the numbers in the patterns this produces mean??**

In [15]:
print("Total number of collapsed valence patterns: ", len(all_cxns['collapsed_valences']))
print(all_cxns['collapsed_valences'])

Total number of collapsed valence patterns:  54
[Total: 1349
Valences:[Frame: Cause_motion, GF: Obj, PT: NP, FE: Theme, total: 618
, Frame: Cause_motion, GF: Dep, PT: Goal-PP, FE: Goal, total: 372
, Frame: Cause_motion, GF: Dep, PT: Path-PP, FE: Path, total: 208
, Frame: Cause_motion, GF: Dep, PT: Source-PP, FE: Source, total: 110
, Frame: Cause_motion, GF: Dep, PT: Agent-PP, FE: Agent, total: 23
, Frame: Cause_motion, GF: Dep, PT: AJP, FE: Result, total: 15
, Frame: Cause_motion, GF: Dep, PT: Initial_state-PP, FE: Initial_state, total: 3
]
LU: None, Total: 1281
Valences:[Frame: Cause_motion, GF: Ext, PT: NP, FE: Agent, total: 568
, Frame: Cause_motion, GF: Dep, PT: Goal-PP, FE: Goal, total: 372
, Frame: Cause_motion, GF: Dep, PT: Path-PP, FE: Path, total: 208
, Frame: Cause_motion, GF: Dep, PT: Source-PP, FE: Source, total: 110
, Frame: Cause_motion, GF: Dep, PT: AJP, FE: Result, total: 15
, Frame: Cause_motion, GF: Gen, PT: Poss, FE: Theme, total: 5
, Frame: Cause_motion, GF: Dep, PT

### Preposition Constructions -- not confined to local frame

- A separate construction is defined for each prepostion that is an LU in any FrameNet frame. 
- Constructional meaning is identified with the frame evoked by that LU.
- These constructions are available for all frames -- they are not unique to (and may not even occur in) the current frame
- **NOTE**: I need to look into how the subcase relations are currently determined

In [18]:
print(all_cxns['prepositions'])

construction along-Preposition-Locative_relation
    subcase of Path-Preposition
    form
      constraints
        self.f.orth <-- "along"
     meaning: Locative_relation

construction along-Preposition-Non-gradable_proximity
    subcase of Path-Preposition
    form
      constraints
        self.f.orth <-- "along"
     meaning: Non-gradable_proximity

construction along-Preposition-Locative_relation
    subcase of Path-Preposition
    form
      constraints
        self.f.orth <-- "along"
     meaning: Locative_relation

construction along-Preposition-Non-gradable_proximity
    subcase of Path-Preposition
    form
      constraints
        self.f.orth <-- "along"
     meaning: Non-gradable_proximity

construction onto-Preposition-None
    subcase of Goal-Preposition
    form
      constraints
        self.f.orth <-- "onto"


construction in-Preposition-Expected_location_of_person
    subcase of Goal-Preposition, Depictive-Preposition, Manner-Preposition, Path-Preposition, Place-Prepo

### Prepositions that occur in annotation for the current frame(??)

**NOTE**: I need to determine how these are defined

In [28]:
print(all_cxns['prep_types'])

general construction Distance-Preposition
	 subcase of Preposition


general construction Goal-Preposition
	 subcase of Preposition


general construction Goal-Preposition
	 subcase of Preposition


general construction Goal-Preposition
	 subcase of Preposition


general construction Goal-Preposition
	 subcase of Preposition


general construction Path-Preposition
	 subcase of Preposition


general construction Source-Preposition
	 subcase of Preposition


general construction Source-Preposition
	 subcase of Preposition


general construction Goal-Preposition
	 subcase of Preposition


general construction Goal-Preposition
	 subcase of Preposition


general construction Goal-Preposition
	 subcase of Preposition


general construction Goal-Preposition
	 subcase of Preposition


general construction Goal-Preposition
	 subcase of Preposition


general construction Path-Preposition
	 subcase of Preposition


general construction Path-Preposition
	 subcase of Preposition


general construct

In [29]:
print(all_cxns['prepositions'])

construction along-Preposition-Locative_relation
    subcase of Path-Preposition
    form
      constraints
        self.f.orth <-- "along"
     meaning: Locative_relation

construction along-Preposition-Non-gradable_proximity
    subcase of Path-Preposition
    form
      constraints
        self.f.orth <-- "along"
     meaning: Non-gradable_proximity

construction along-Preposition-Locative_relation
    subcase of Path-Preposition
    form
      constraints
        self.f.orth <-- "along"
     meaning: Locative_relation

construction along-Preposition-Non-gradable_proximity
    subcase of Path-Preposition
    form
      constraints
        self.f.orth <-- "along"
     meaning: Non-gradable_proximity

construction onto-Preposition-None
    subcase of Goal-Preposition
    form
      constraints
        self.f.orth <-- "onto"


construction in-Preposition-Expected_location_of_person
    subcase of Goal-Preposition, Depictive-Preposition, Manner-Preposition, Path-Preposition, Place-Prepo

### Prepositional Phrase constructions 

- These seem to be based on the 'prep_type' constructions that are shown immediately above 

In [20]:
print(all_cxns['pp'])

construction Distance-PP
  subcase of PP
  constructional
    constituents
      prep: Distance-Preposition

construction Goal-PP
  subcase of PP
  constructional
    constituents
      prep: Goal-Preposition

construction Goal-PP
  subcase of PP
  constructional
    constituents
      prep: Goal-Preposition

construction Goal-PP
  subcase of PP
  constructional
    constituents
      prep: Goal-Preposition

construction Goal-PP
  subcase of PP
  constructional
    constituents
      prep: Goal-Preposition

construction Path-PP
  subcase of PP
  constructional
    constituents
      prep: Path-Preposition

construction Source-PP
  subcase of PP
  constructional
    constituents
      prep: Source-Preposition

construction Source-PP
  subcase of PP
  constructional
    constituents
      prep: Source-Preposition

construction Goal-PP
  subcase of PP
  constructional
    constituents
      prep: Goal-Preposition

construction Goal-PP
  subcase of PP
  constructional
    constituents
    

In [21]:
all_cxns.keys()

dict_keys(['tokens', 'prep_types', 'collapsed_valences', 'valence_patterns', 'pp', 'types', 'cxns_all', 'prepositions', 'cxns_collapsed'])

### Types 

I assume that this is supposed to create a type construction that can then be referred to in the tokens listed in **'tokens'**

**But, this does not seem to be working correctly right now**

In build_cxns_for_frame:
- types = utils.generate_types(frame, fn, role_name, pos_to_type[pos])
- pos_to_type = dict(V="LexicalVerbType", N="NounType")
                       


In [22]:
print(all_cxns['types'])

{'PP[along]': ['Path'], 'PP[onto]': ['Goal'], 'PP[in]': ['Goal', 'Depictive', 'Manner', 'Path', 'Place', 'Area', 'Time'], 'PP[upon]': ['Goal'], 'PP[round]': ['Area'], 'PP[below]': ['Goal'], 'PP[against]': ['Goal'], 'PP[above]': ['Goal', 'Path'], 'PP[with]': ['Manner', 'Means', 'Instrument'], 'PP[from]': ['Source', 'Path', 'Result', 'Initial_state'], 'PP[within]': ['Goal'], 'PP[past]': ['Path'], 'PP[by]': ['Agent', 'Cause', 'Handle', 'Instrument'], 'PP[off]': ['Source'], 'PP[over]': ['Distance', 'Path', 'Goal', 'Result'], 'PP[at]': ['Goal', 'Path', 'Manner'], 'PP[behind]': ['Goal'], 'PP[of]': ['Source'], 'PP[under]': ['Goal'], 'PP[up]': ['Goal', 'Path'], 'PP[around]': ['Area', 'Goal', 'Path'], 'PP[outside]': ['Goal'], 'PP[across]': ['Path', 'Goal'], 'PP[on]': ['Goal', 'Result', 'Time'], 'PP[after]': ['Path', 'Time'], 'PP[between]': ['Goal', 'Path'], 'PP[amongst]': ['Goal'], 'PP[down]': ['Path', 'Goal'], 'PP[out]': ['Source', 'Path', 'Goal', 'Result'], 'PP[into]': ['Goal', 'Result'], 'PP

### Tokens

A token is created for each LU in the frame that has the specified POS
- The pos can be specified as the **pos** argument in the ecg_demo2 function. 
- If not specified, this function will use the default, which is currently set as "V"

The name of the 'Type' cxn for these tokens is the current **frame** name, with 'Type' appended (e.g. MotionType).

The  token string (e.g. "move") is used as the value assigned to the specified schema role (e.g. Manner).
- This role can be specified as the **role_name** argument in the ecg_demo2 function. 
- If not specified, this function will use the default, which is currently set as "Manner"

In build_cxns_for_frame:
tokens = utils.generate_tokens(frame, fn, role_name, pos)


In [23]:
print(all_cxns['tokens'])

cast :: Cause_motionType :: self.m.Manner <-- "cast"
catapult :: Cause_motionType :: self.m.Manner <-- "catapult"
chuck :: Cause_motionType :: self.m.Manner <-- "chuck"
drag :: Cause_motionType :: self.m.Manner <-- "drag"
fling :: Cause_motionType :: self.m.Manner <-- "fling"
hurl :: Cause_motionType :: self.m.Manner <-- "hurl"
nudge :: Cause_motionType :: self.m.Manner <-- "nudge"
pitch :: Cause_motionType :: self.m.Manner <-- "pitch"
press :: Cause_motionType :: self.m.Manner <-- "press"
push :: Cause_motionType :: self.m.Manner <-- "push"
shove :: Cause_motionType :: self.m.Manner <-- "shove"
throw :: Cause_motionType :: self.m.Manner <-- "throw"
thrust :: Cause_motionType :: self.m.Manner <-- "thrust"
toss :: Cause_motionType :: self.m.Manner <-- "toss"
tug :: Cause_motionType :: self.m.Manner <-- "tug"
yank :: Cause_motionType :: self.m.Manner <-- "yank"
scoot :: Cause_motionType :: self.m.Manner <-- "scoot"
draw :: Cause_motionType :: self.m.Manner <-- "draw"
run :: Cause_motionT

In [24]:
all_cxns.keys()

dict_keys(['tokens', 'prep_types', 'collapsed_valences', 'valence_patterns', 'pp', 'types', 'cxns_all', 'prepositions', 'cxns_collapsed'])

### Create constructions
The following command creates constructions for each group frame realization pattern (I think), including the relevant sentence example(s)

In [25]:
(all_cxns['cxns_all'])
print("Total number of constructions: ", len(all_cxns['cxns_all']))

Total number of constructions:  335183


In [26]:
print(all_cxns['cxns_all'])

/* [Although official records of dioxins date from the middle of last century , it was an explosion at an Italian chemical plant in Seveso in 1976 that catapulted them into the public arena . 
] */
construction Cause_motion_pattern_1
     subcase of ArgumentStructure
     constructional
      constituents
        v: Verb
        pp-into: PP-into [1.0, 0.9]
        np: NP [1.0, 0.9]
      meaning: Cause_motion
       constraints
         self.m <--> v.m
         ed.profiledParticipant <--> self.m.Agent
         ed.profiledParticipant <--> self.m.Agent
         self.m.Goal <--> pp-into.m
         self.m.Theme <--> np.m


] */
construction Cause_motion_pattern_2
     subcase of ArgumentStructure
     constructional
      constituents
        v: Verb
        pp-to: PP-to [1.0, 0.9]
        np: NP [1.0, 0.9]
      meaning: Cause_motion
       constraints
         self.m <--> v.m
         ed.profiledParticipant <--> self.m.Agent
         ed.profiledParticipant <--> self.m.Agent
         self

This method collapses some of the constructions above into more general constructions.
Different methods could be written to collapse on the basis of different features/dimensions

**TO DO: what features/dimensions might these be, and how would such methods be written?**

In [27]:
print("Total number of collapsed constructions: ", len(all_cxns['cxns_collapsed']))
print(all_cxns['cxns_collapsed'])



Total number of collapsed constructions:  41130
/* [] */
construction Cause_motion_pattern_1
     subcase of ArgumentStructure
     constructional
      constituents
        v: Verb
        np: NP [0.458, 0.9]
        goal-pp: Goal-PP [0.276, 0.9]
        path-pp: Path-PP [0.154, 0.9]
        source-pp: Source-PP [0.082, 0.9]
        agent-pp: Agent-PP [0.017, 0.9]
        ajp: AJP [0.011, 0.9]
        initial_state-pp: Initial_state-PP [0.002, 0.9]
      meaning: Cause_motion
       constraints
         self.m <--> v.m
         self.m.Theme <--> np.m
         self.m.Goal <--> goal-pp.m
         self.m.Path <--> path-pp.m
         self.m.Source <--> source-pp.m
         self.m.Agent <--> agent-pp.m
         self.m.Result <--> ajp.m
         self.m.Initial_state <--> initial_state-pp.m


/* [] */
construction Cause_motion_pattern_2
     subcase of ArgumentStructure
     constructional
      constituents
        v: Verb
        goal-pp: Goal-PP [0.29, 0.9]
        path-pp: Path-PP [0.162