# Project Status and Work Log

## Initial Approach and Thoughts
* My initial approach focused on training a fast word based tagging model. A full set of feature extraction and hyper parameter tuning was conducted
  * However the results were un satisfactory (see the non RNN  so I switched to an RNN model)
  * Also note that the mongo collections have been renamed to differentiate this fix - metrics_coref_word_tagger are the original word-based tagging model work, which are valid
* Switching to the RNN model, I made some initital mistakes:
 1. I didn't have it choose the Anaphora tag but instead chose the most common tag (which isn't always the anaphora tag)
 2. There seems to be some issues with the way the initial RNN's trained, e.g. compare the numbers under the Anaphora.data_points. This is correct in the word tagging model and the later RNN tagger (see "_fixed") but not in the initial RNN tagger work
* To remedy this I switched to newer code, copied from different notebooks, and the results of that are seen under the "coref_new_fixed" collection
* I also moved to having the models spit out tagged data points, and then re-computing the metrics directly from that data. This then allows us to interrogate those predictions at a later date as needed, not just the raw numbers

## Word Tagger Work (Notes)
- Initially I trained a word tagging model as it's much faster, and had similar accuracy to the RNN (but not quite as good). This allowed faster iteration, however the results weren't great.
- I did this in two phases:
 1. Did initial feat sel and hyper parameter tuning to determine the optimal feats (win size, etc) and parameters for training a word tagging model to tag anaphora tags
   - mongo - metrics_coref_word_tagger
   - py scripts
     - windowbasedtagger_most_common_tag_multiclass_feat_seln.py
     - windowbasedtagger_most_common_tag_multiclass_hyper_param_tuning.py
 2. Using results from 1, I then did a sort of feat selection on which co-ref tags to use to replace the concept codes
   - **<span style='color:red'>I think this is invalid as BrattEssay was not updated yet - re-use code though?</span>**
   - mongo - metrics_coref_word_tagger_coref_feats
   - py script - windowbasedtagger_most_common_tag_multiclass_hyper_param_tuning.py

## RNN CC Tagger Work (Notes)

- Noticed that the CC tag predictions were not based on those from the optimal model
 - See "CC Tagger Multiclass..." Notebooks - logic re-ran to fix this
- **NOTE:** - These predictions are now stored in **metrics_codes** mongo coll with the prefix **'STORE\_ RESULTS\_'**

## RNN Ana Tagger Work (Notes)

- Initially made some mistakes (invalid logic - had wrong number of data points), and trained multi class RNN on most common tag
 - Stored in mongo collection called **metrics_coref_broken**, now deleted (see below)
- Then i figured out issues with the predicted tagged Concept Codes not being from the best model, and corrected via logic in "CB - CC Tagger MULTICLASS - Train Save CV Word Predictions - NO EXPLICIT.ipynb"
- With the fixed anaphora training logic, 
 - the data is now stored in **metrics_coref_rnn** 
 - This contains two types of collection based on two steps of work:
   1. Ran the initial RNN tagger model without hyper parameter tuning, stored predictions for calculating metrics directly from 
     - NB "CB - Anaphora Tagger BINARY - FIXED - Train Save CV Word Predictions -NO EXPLICIT.ipynb"
     - Mongo - metrics_coref_rnn.CB_TAGGING_TD_RNN_BINARY_FIXED and similar
     - Predictions - stored in Bi-LSTM-4-Anaphora_Tags-Binary-Fixed folder - metrics match mongo 100%
   2. Decided I needed to do hyper parameter tuning on the RNN model:
     - NB - "CB - Anaphora Tagger BINARY - FIXED - Hyper Parameter Tuning.ipynb"
     - Mongo - metrics_coref_rnn.CB_TAGGING_TD_RNN_BINARY_HYPERPARAM_TUNING and similar
  - The predictions are stored in "Predictions/Bi-LSTM-4-Anaphora_Tags-Binary-Fixed/"
    - The predictions in "Predictions/Bi-LSTM-4-Anaphora_Tags-Binary/" are worse and represent the broken predictions

## Mongo Collection Naming / History 

### RNN Work
-  9/15/2018 Deleted - metrics_coref_broken
  - Initial RNN tagger work with non-optimal model / broken code (num data points incorrect, did multi-class) 
- 9/15/2018 - Consolidated - metrics_coref_new_fixed and metrics_coref_rnn_fixed into metrics_coref_rnn
 - within this, the CB_TAGGING_TD_RNN_BINARY_FIXED and similar colls reflect the initial output from the CB - Anaphora Tagger BINARY - FIXED - Train Save CV Word Predictions -NO EXPLICIT.ipynb notebook (does not hyper param tune, does dump prediction files)
  - I then decided to do hyper parameter tuning (but not to persist preds to disk...) - coll named CB_TAGGING_TD_RNN_BINARY_HYPERPARAM_TUNING and similar, refers to work done under CB - Anaphora Tagger BINARY - FIXED - Hyper Parameter Tuning.ipynb
   
   
## Word Tagging Work
- metrics coref_new renamed to metrics_coref_old
- metrics_coref_old renamed (was originally metrics_coref_new) to be the metrics_coref_word_tagger_coref_feats mongo coll to better reflect what it's used for


## Adjusting the Bratt Essay Parsing Logic to Resolve Anaphora Tags
* Next I adjusted the BrattEssay file to resolve the anaphora tags with their antecedents when provided
* These get resolved as Anaphora:[{code}] where {code} is one of the 13 or 9 concept codes, e.g. Anaphora:[50]
* Analysis of how anaphora tags are initially tagged -  see 'Examine at How Anaphora Tags are Tagged to Inform Essay Parser Changes' 

## Hyper Parameter Tuning

- The code to do this was re-written in order to persist the predictions to disk

## <span style="color:red">For some reason the Skin Cancer Metrics in Mongo do not match those coming from the database. CB matches OK</span>
- Subsequently I decided to re-run the SC train and test runs

## Merging CoRef Files with Annotated Essays

### Notes on CoRef Datastructure
- Dictionary of esssays, keyed by name
- Each essay is a list of sentences
- Each sentence is a list of words
- Words are mapped to a tag dict
  - tag dict - contains
    - NER tag (most are O - none)
    - POS tag
    - If a Co-Reference such as an anaphor (mostly pronouns)
      - COREF_PHRASE - phrase referred to by coref
      - COREF_REF - Id of referenced phrase
    - else if it is a phrase that is referenced:
      - COREF_ID - id of the co-reference, referenced in the COREF_REF tag
      
### Notes of CoRef Output from Stanford
- Co-references can be in either order - the canonical reference can be **before** or **after** the mention, so it's really just a grouping of phrases that mean the same thing.
 - e.g. essay EBA1415_SEAL_34_CB_ES-04796 in '/Users/simon.hughes/Google Drive/PhD/Data/CoralBleaching/Thesis_Dataset/CoReference/Training'
 - Mention COREF_REF = 5 comes before the coreference COREF_ID = 5, whereas for COREF_ID = 4, the coreference (id) comes before the mention  (coref)

## Next Steps
* ~~Hyper parameter tune the fixed, binary RNNs~~
 * ~~Validate the bratt parser logic~~
   * ~~See "Test Bratt Essay Changes to include Anaphora Tags"~~
 * ~~Match the cc tags with the anaphora tags~~
   * ~~See CB/SC - Load AND Eval CC and Ana Tagged Essays - Then MERGE Essays.ipynb~~
* Load the predictions and reconcile with the co-ref parser output and CC tag predictions
  - ~~I found an issue with the way I am processing the parser output - I am using a dictionary but there can be more than one coreference tag per word, so the last one is overwriting the others when multiple are present~~
  - ~~Switched to the neural parser for the CoRefs (as was unsure of whether it is the default one)~~
  - Initial work in 
   - "Match Predicted Anaphora Tags to CoRef Output.ipynb"
   - and script "MatchCoRefTagsToEssays_old.py"
   - and script "MatchCoRefTagsToEssays_new.py"
   - See also - "CB - Load AND Evaluate CC and Anaphora Tagged Essays" for logic that merges the different predictions (aside from the coref tagged predictions)
* Evaluate 2 things:
    1. Taking the predicted tags, look for intersections in co-ref output, and evaluate accuracy of the resolved anaphora concept codes
    2. Use the stanford co-reference parser alone to implement this logic

## <span style="color:red">For Next Week</span>
- Test the coref matching logic that i built end of day Sunday
- Match the POS and NER codes up also so I can play with filters on those ones

## Questions 
- Are the Co-references only present for concept codes? 
 - No (and see ans to next qu)
 - Also, some anaphora tags have no antecedent
- Are they only present for concept codes that form part of a causal relation?
 - No, not all anaphora tags have causal relations

##  Measure of Success - Considerations
  * Accuracy at detecting anaphora tags
  * Accuracy at detecting anaphora tags and correctly resolving the associated concept(s)
      1. Using the ML model's predictions to filter
      2. Using the stanford output alone
  * Impact on other metrics when incorporated into a single solution?

## Remaining Tasks / Project Plan - (Chapter 6)
- Match co-ref tags to anaphora tags
  - Merge the two essay sets
  - Cross-reference anaphora tags
  - TARGET **1 day** - Sep/30
- Compute Word Tagging Accuracy Metrics
 - Anaphora cross-referenced labels (using predicted anaphora tags)
 - **1/2 day** - Oct/7
- Compute Caual Relation Tagging Accuracy
 - **1/2 day** - Oct/7
- Compute Accuracy of using co-reference detection directly
 - **Unsure - can we use existing work?**
 - **~1-2 days** - high risk
 - Oct/14 ?
- **BREAK** In Galena weeked of Oct 20/21
 - Should I take extra Monday off the following week (before folks arrive?)
- Write up results
 - **2-3 days** initial
 - Oct/28, Nov 2, Nov 9
 - **2 days** with revisions
 - End of Nov
- TARGET DATE - **End of Nov**