# work log - ground truth - evaluate disagreements

# Table of Contents

- [Setup](#Setup)

    - [Setup - Imports](#Setup---Imports)
    - [Setup - Initialize Django](#Setup---Initialize-Django)
    - [Setup - Tools](#Setup---Tools)

        - [Tool - copy `Article_Data` to user `ground_truth`](#Tool---copy-Article_Data-to-user-ground_truth)
        - [Tool - delete `Article_Data`](#Tool---delete-Article_Data)
        - [Tool - rebuild `Reliability_Names` for an article](#Tool---rebuild-Reliability_Names-for-an-article)

            - [Delete existing `Reliability_Names` for article](#Delete-existing-Reliability_Names-for-article)
            - [Make new `Reliability_Names`](#Make-new-Reliability_Names)

- [Evaluate disagreements](#Evaluate-disagreements)

    - [Tag disagreements as TODO](#Tag-disagreements-as-TODO)
    - [View disagreements](#View-disagreements)
    
        - [Disagreement evaluation](#Disagreement-evaluation)
        - [Disagreement resolution](#Disagreement-resolution)
        - [Resolution logs](#Resolution-logs)
        
            - [Evaluation log](#Evaluation-log)
            - [Ground-truth coding fixed](#Ground-truth-coding-fixed)
            - [`Reliability_Names` records merged](#Reliability_Names-records-merged)
            - [Deleted `Reliability_Names` records](#Deleted-Reliability_Names-records)

- [Notes](#Notes)

    - [Notes and questions](#Notes-and-questions)
    - [Errors](#Errors)

- [TODO](#TODO)

    - [Coding to look into](#Coding-to-look-into)
    - [Debugging](#Debugging)

- [DONE](#DONE)

    - [quotes that contain paragraph break](#quotes-that-contain-paragraph-break)

- [NEXT](#DONE)

# Setup

- Back to [Table of Contents](#Table-of-Contents)

## Setup - Imports

- Back to [Table of Contents](#Table-of-Contents)

In [1]:
import datetime
import json
import six

print( "packages imported at " + str( datetime.datetime.now() ) )

packages imported at 2017-07-03 12:29:25.218024


In [2]:
%pwd

'/home/jonathanmorgan/work/sourcenet/django/research/work/msu_phd_work'

## Setup - Initialize Django

- Back to [Table of Contents](#Table-of-Contents)

First, initialize my dev django project, so I can run code in this notebook that references my django models and can talk to the database using my project's settings.

You need to have installed your virtualenv with django as a kernel, then select that kernel for this notebook.

In [3]:
%run django_init.py

django initialized at 2017-07-03 16:29:29.739968


Import any `sourcenet` or `sourcenet_analysis` models or classes.

In [4]:
# django imports
from django.contrib.auth.models import User

# sourcenet shared
from sourcenet.shared.person_details import PersonDetails

# sourcenet models.
from sourcenet.models import Article
from sourcenet.models import Article_Data
from sourcenet.models import Article_Subject
from sourcenet.models import Person
from sourcenet.shared.sourcenet_base import SourcenetBase
from sourcenet.tests.models.test_Article_Data_model import Article_Data_Copy_Tester

# sourcenet article_coding
from sourcenet.article_coding.article_coding import ArticleCoder
from sourcenet.article_coding.manual_coding.manual_article_coder import ManualArticleCoder

# sourcenet_analysis models.
from sourcenet_analysis.models import Reliability_Names
from sourcenet_analysis.reliability.reliability_names_builder import ReliabilityNamesBuilder

print( "sourcenet and sourcenet_analysis packages imported at " + str( datetime.datetime.now() ) )

sourcenet and sourcenet_analysis packages imported at 2017-07-03 16:29:31.530068


## Setup - Tools

- Back to [Table of Contents](#Table-of-Contents)

### Tool - copy Article_Data to user ground_truth

- Back to [Table of Contents](#Table-of-Contents)

Retrieve the ground truth user, then make a deep copy of an Article_Data record, assigning it to the ground truth user.

In [5]:
def copy_to_ground_truth_user( source_article_data_id_IN ):

    '''
    Accepts ID of Article_Data instance to copy to ground_truth user,
        for correcting coding error made by human coder.  Performs a deep
        copy of Article_Data instance, then assignes it to the ground_truth
        user.  Prints any validation errors, returns the new Article_Data.
    '''
    
    # return reference
    new_article_data_instance_OUT = -1
    
    # declare variables
    ground_truth_user = None
    ground_truth_user_id = -1
    id_of_article_data_to_copy = -1
    new_article_data = None
    new_article_data_id = -1
    validation_error_list = None
    validation_error_count = -1
    validation_error = None

    # set ID of article data we want to copy.
    id_of_article_data_to_copy = source_article_data_id_IN

    # get the ground_truth user's ID.
    ground_truth_user = SourcenetBase.get_ground_truth_coding_user()
    ground_truth_user_id = ground_truth_user.id

    # make the copy
    new_article_data = Article_Data.make_deep_copy( id_of_article_data_to_copy,
                                                    new_coder_user_id_IN = ground_truth_user_id )
    new_article_data_id = new_article_data.id

    # validate it.
    validation_error_list = Article_Data_Copy_Tester.validate_article_data_deep_copy( original_article_data_id_IN = id_of_article_data_to_copy,
                                                                                      copy_article_data_id_IN = new_article_data_id,
                                                                                      copy_coder_user_id_IN = ground_truth_user_id )

    # get error count:
    validation_error_count = len( validation_error_list )
    if ( validation_error_count > 0 ):

        # loop and output messages
        for validation_error in validation_error_list:

            print( "- Validation erorr: " + str( validation_error ) )

        #-- END loop over validation errors. --#

    else:

        # no errors - success!
        print( "Record copy a success (as far as we know)!" )

    #-- END check to see if validation errors --#

    print( "copied Article_Data id " + str( id_of_article_data_to_copy ) + " INTO Article_Data id " + str( new_article_data_id ) + " at " + str( datetime.datetime.now() ) )
    
    new_article_data_instance_OUT = new_article_data
    
    return new_article_data_instance_OUT

#-- END function copy_to_ground_truth_user() --#

print( "function copy_to_ground_truth_user() defined at " + str( datetime.datetime.now() ) )

function copy_to_ground_truth_user() defined at 2017-07-03 16:29:35.933269


In [None]:
# Example: set ID of article data we want to copy.
#copy_to_ground_truth_user( 2342 )

### Tool - delete Article_Data

- Back to [Table of Contents](#Table-of-Contents)

Delete the Article_Data whose ID you specify (intended only when you accidentally create a "`ground_truth`").

In [6]:
def delete_article_data( article_data_id_IN ):

    # declare variables
    article_data_id = -1
    article_data = None
    do_delete = False

    # set ID.
    article_data_id = article_data_id_IN

    # get model instance
    article_data = Article_Data.objects.get( id = article_data_id )

    # got something?
    if ( article_data is not None ):

        # yes.  Delete?
        if ( do_delete == True ):

            # delete.
            print( "Deleting Article_Data: " + str( article_data ) )
            article_data.delete()

        else:

            # no delete.
            print( "Found Article_Data: " + str( article_data ) + ", but not deleting." )

        #-- END check to see if we delete --#

    #-- END check to see if Article_Data match. --#
    
#-- END function delete_article_data() --#

print( "function delete_article_data() defined at " + str( datetime.datetime.now() ) )

function delete_article_data() defined at 2017-07-03 16:29:40.284354


### Tool - rebuild Reliability_Names for an article

- Back to [Table of Contents](#Table-of-Contents)

Steps:

- retrieve the Reliability_Names row(s) for article with a paritcular ID, and filter on label if one provided.
- delete the selected Reliability_Names row(s).
- set up a call to the Reliability_Names program that just generates data for:

    - the article in question
    - users in a desired order.
    - etc.

#### Delete existing Reliability_Names for article

- Back to [Table of Contents](#Table-of-Contents)

In [7]:
def delete_reliability_names_for_article( article_id_IN ):

    # declare variables
    article_id = -1
    label = ""
    do_delete = False
    row_string_list = None

    # first, get existing Reliability_Names rows for article and label.
    article_id = article_id_IN
    label = "prelim_month"
    do_delete = True

    # Do the delete
    row_string_list = Reliability_Names.delete_reliabilty_names_for_article( article_id,
                                                                             label_IN = label,
                                                                             do_delete_IN = do_delete )

    # print the strings.
    for row_string in row_string_list:

        # print it.
        print( row_string )

    #-- END loop over row strings --#

#-- END function delete_reliability_names_for_article() --#

print( "function delete_reliability_names_for_article() defined at " + str( datetime.datetime.now() ) )

function delete_reliability_names_for_article() defined at 2017-07-03 16:29:43.120263


#### Make new Reliability_Names

- Back to [Table of Contents](#Table-of-Contents)

In [8]:
def rebuild_reliability_names_for_article( article_id_IN, delete_existing_first_IN = True ):
    
    '''
    Remove existing Reliability_Names records for article, then rebuild them
        from related Article_Data that matches any specified criteria.
        
    Detailed logic:
    - remove old Reliability_Names for that article ( [Delete existing `Reliability_Names` for article](#Delete-existing-Reliability_Names-for-article) ).  Make sure to specify both label and Article ID, so you don't delete more than you intend.
    - re-run Reliability_Names creation for the article ( [Make new `Reliability_Names`](#Make-new-Reliability_Names) ).  Specify:

        - Article ID list (just put the ID of the article you want to reprocess in the list).
        - label: make sure this is the same as the label of the rest of your Reliability_Names records ("prelim_month").
        - Tag list: If you want to make even more certain that you don't do something unexpected, also specify the article tags that make up your current data set, so if you accidentally specify the ID of an article not in your data set, it won't process.  Current tag is "grp_month".
        - Coders to assign to which index in the Reliability_Names record, and in what priority.  You can assign multiple coders to a given index, for example, when multiple coders coded subsets of a data set, and you want their combined coding to be used as "coder 1" or "coder 2", for example.  See the cell for an example.
        - Automated coder type: You can specify the particular automated coding type you want for automated coder, to filter out coding done by other automated methods.  See the cell for an example for "OpenCalais v2".
    '''
    
    # django imports
    #from django.contrib.auth.models import User

    # sourcenet imports
    #from sourcenet.shared.sourcenet_base import SourcenetBase

    # sourcenet_analysis imports
    #from sourcenet_analysis.reliability.reliability_names_builder import ReliabilityNamesBuilder

    # declare variables
    my_reliability_instance = None
    tag_in_list = []
    article_id_in_list = []
    label = ""

    # declare variables - user setup
    current_coder = None
    current_coder_id = -1
    current_index = -1

    # declare variables - Article_Data filtering.
    coder_type = ""

    # delete old Reliability_Names?
    if ( delete_existing_first_IN == True ):
        
        # delete first
        delete_reliability_names_for_article( article_id_IN )
        
    #-- END check to see if we delete first --#
    
    # make reliability instance
    my_reliability_instance = ReliabilityNamesBuilder()

    #===============================================================================
    # configure
    #===============================================================================

    # list of tags of articles we want to process.
    tag_in_list = [ "grp_month", ]

    # list of IDs of articles we want to process:
    article_id_in_list = [ article_id_IN, ]

    # label to associate with results, for subsequent lookup.
    label = "prelim_month"

    # ! ====> map coders to indices

    # set it up so that...

    # ...the ground truth user has highest priority (4) for index 1...
    current_coder = SourcenetBase.get_ground_truth_coding_user()
    current_coder_id = current_coder.id
    current_index = 1
    current_priority = 4
    my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

    # ...coder ID 8 is priority 3 for index 1...
    current_coder_id = 8
    current_index = 1
    current_priority = 3
    my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

    # ...coder ID 9 is priority 2 for index 1...
    current_coder_id = 9
    current_index = 1
    current_priority = 2
    my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

    # ...coder ID 10 is priority 1 for index 1...
    current_coder_id = 10
    current_index = 1
    current_priority = 1
    my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

    # ...and automated coder (2) is index 2
    current_coder = SourcenetBase.get_automated_coding_user()
    current_coder_id = current_coder.id
    current_index = 2
    current_priority = 1
    my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

    # and only look at coding by those users.  And...

    # configure so that it limits to automated coder_type of OpenCalais_REST_API_v2.
    coder_type = "OpenCalais_REST_API_v2"
    #my_reliability_instance.limit_to_automated_coder_type = "OpenCalais_REST_API_v2"
    my_reliability_instance.automated_coder_type_include_list.append( coder_type )

    # output debug JSON to file
    #my_reliability_instance.debug_output_json_file_path = "/home/jonathanmorgan/" + label + ".json"

    #===============================================================================
    # process
    #===============================================================================

    # process articles
    my_reliability_instance.process_articles( tag_in_list,
                                              article_id_in_list_IN = article_id_in_list )

    # output to database.
    my_reliability_instance.output_reliability_data( label )

#-- END function rebuild_reliability_names_for_article() --#

print( "function rebuild_reliability_names_for_article() defined at " + str( datetime.datetime.now() ) )

function rebuild_reliability_names_for_article() defined at 2017-07-03 16:29:46.087910


# Evaluate disagreements

- Back to [Table of Contents](#Table-of-Contents)

Need to go through each disagreement and make sure that the ground truth is correct.  In the interest of accuracy/precision/recall, my human coding serves as ground truth to compare computer against.  So, will look at all the disagreements and make sure that the human coding is right.  This isn't perfect.  The error where both incorrectly agree is still unaddressed, and would effectively require me to re-code all the articles (which I could do...).  But, better than not checking.

## Tag disagreements as TODO

- Back to [Table of Contents](#Table-of-Contents)

First, assign "TODO" tag to all disagreements using the "View reliability name information" screen:

- [http://research.local/sourcenet/sourcenet/analysis/reliability/names/disagreement/view](http://research.local/sourcenet/sourcenet/analysis/reliability/names/disagreement/view).

To do this:

- First, enter the following in the fields there:

    - **Label:** "prelim_month"
    - **Coders to compare (1 through ==>):** 2
    - **Reliability names filter type:** Select "Disagree (only rows with disagreement between coders)"
    
- Click the "**Submit Query**" button.  This should load all the disagreement rows (424 after removing single-word names).
- Click the "**(all)**" link in the "**select**" column header to check the checkbox next to all of the records.
- In the "**Reliability names action:**" field, select "Add tag(s) to selected".
- In the "**Tag(s) - (comma-delimited):**" field, enter "`TODO`" (without the quotes).
- Click the "**Do Action**" button.

## View disagreements

- Back to [Table of Contents](#Table-of-Contents)

Evaluate disagreements using the "View reliability name information" screen:

- [http://research.local/sourcenet/sourcenet/analysis/reliability/names/disagreement/view](http://research.local/sourcenet/sourcenet/analysis/reliability/names/disagreement/view)

To start, enter the following in fields there:

- **Label:** "prelim_month"
- **Coders to compare (1 through ==>):** 2
- **Reliability names filter type:** Select "Lookup"
- **[Lookup] - Reliability_Names tags (comma-delimited):** Enter "`TODO`" (without the quotes).

Then click the "**Submit Query**" button.

You should see all the records with disagreements that still need to be evaluated (we remove "TODO" from records as we go to keep track of which we have evaluated).  To start, the same 424 that had disagreements after removing single names should be assigned "TODO" tag.

### Disagreement evaluation

- Back to [Table of Contents](#Table-of-Contents)

Need to look at each instance where there is a disagreement and make sure the human coding is correct.

Most are probably instances where the computer screwed up, but since we are calling this human coding "ground truth", want to winnow out as much human error as possible.

For each disagreement, to check for coder error (like just capturing a name part for a person whose full name was in the story), click the "Article ID" in the column that has a link to article ID. It will take you to a view of the article where all the people who coded the article are included, with each detection of a mention or quotation displayed next to the paragraph where the person was originally first detected.

If the disagreement deals with mentions only, and if the person shouldn't instead have been quoted, it is OK to skip fixing it if the human coder was in error since those are not included in this work.  It is also OK to fix if you want.

### Disagreement resolution

For each disagreement, click on the article ID link in the row to go to the article and check to see if the human coding for the disagreement in question is correct ( [http://research.local/sourcenet/sourcenet/article/article_data/view_with_text/](http://research.local/sourcenet/sourcenet/article/article_data/view_with_text/) ).

#### Human coder error

If human coder did not detect person or made some other kind of error:

- Setup (set variable values, then run the cell):

In [9]:
# Setup variables of interest.
resolve_article_id = 21350
human_article_data_id = 2365

- use the function "`copy_to_ground_truth_user()`" defined in section [Tool - copy Article_Data to user ground_truth](#Tool---copy-Article_Data-to-user-ground_truth) to create a copy of the person's `Article_Data` and assign it to coder "`ground_truth`".  Make a code cell and set up a call to "`copy_to_ground_truth_user()`", passing it the ID of the `Article_Data` you want to copy to `ground_truth`.  Example:
    
        # copy Article_Data 12345 to ground_truth user.
        copy_to_ground_truth_user( 12345 )

In [10]:
# copy Article_Data to ground_truth user.
copy_to_ground_truth_user( human_article_data_id )

Original Article_Data ID = 2365; Copy Article_Data ID = 3340


Article_Data_Notes ( count = 1 ):
- 2118 - Person Store JSON (likely from manual coding via article-code view). of type "json" for article_data: 2365 - minnesota1 - no coder_type -- Article: 21350 - Dec 06, 2009, Lakeshore ( N2 ), UID: 12C7A9A4DF21E4F8 - Former fire chief embraces photography ( Grand Rapids Press, The )

Article_Author ( count = 1 ):

- 2440 (AA) - Chandler, Greg ( id = 302; type = staff; capture_method = OpenCalais_REST_API_v2 )

----> Alternate_Author_Match ( count = 0 ):

Article_Subject ( count = 3 ):

- 8213 (AS) - Brown, Chris ( id = 327; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual)

----> Alternate_Subject_Match ( count = 0 ):

----> Article_Subject_Mention ( count = 1 ):
----> - 21736 -  ( graf: 11; from word: 233; to word: 234; index: 1403 ) - 8213 (AS) - Brown, Chris ( id = 327; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) : Chris Brown

----> Article_Subject_Qu

<Article_Data: 3340 - ground_truth - no coder_type -- Article: 21350 - Dec 06, 2009, Lakeshore ( N2 ), UID: 12C7A9A4DF21E4F8 - Former fire chief embraces photography ( Grand Rapids Press, The )>

- fix the coding in `ground_truth`'s coding record:

    - If you want to stay logged in as your normal user while processing an error, do the following in a separate browser (I like Opera).
    - if this is the first time you've used the "`ground_truth`" user, log into the django admin ( [http://research.local/sourcenet/admin/](http://research.local/sourcenet/admin/) ) and:

        - set or reset the "`ground_truth`" user's password.
        - give it "staff status".

    - log in to the coding tool ( [http://research.local/sourcenet/sourcenet/article/code/](http://research.local/sourcenet/sourcenet/article/code/) ) as the "`ground_truth`" user and fix the coding for the article in question, then save.

- In the Reliability_Names disagreement view ( [http://research.local/sourcenet/sourcenet/analysis/reliability/names/disagreement/view](http://research.local/sourcenet/sourcenet/analysis/reliability/names/disagreement/view) ), remove the "`TODO`" tag from any items related to this disagreement and save:

    - Click the checkbox in the "**select**" column next to the record whose evaluation is complete.
    - In the "**Reliability names action:**" field, select "_Remove tag(s) from selected_".
    - In the "**Tag(s) - (comma-delimited):**" field, enter "_`TODO`_" (without the quotes).
    - Click the "**Do Action**" button.
    
    - This will also place information on the `Reliability_Names` record into a `Reliability_Names_Evaluation` record in the database.  The message that results from this action completing will include a link to the record (the first number in the output).  Click the link to open the record and update it with additional details.  Specifically:
    
        - check the "`is_ground_truth_fixed`" checkbox.
        - if problems caused by automated coder error, click the "`is_automated_error`" checkbox.
        - Since we had to update ground truth, set "`status`" to "ERROR".
        - update the "`status_message`" so it contains a brief description of what exactly happened (should have been mentioned, should have been quoted, missed the person entirely, etc.).
        - update "`Notes`" with more details.
        - add "`Tags`" if appropriate (for sports articles, for example, add "sports" tag).

- rebuild Reliability_Names for just that article.

    - make a code cell and call "`rebuild_reliability_names_for_article()`", passing it the ID of the article whose Reliability_Names records you want to rebuild.  It will automatically delete existing and then rebuild, using all the right parameters.  Example:

            # rebuild Reliability_Names for article 12345
            rebuild_reliability_names_for_article( 12345 )

In [11]:
# rebuild Reliability_Names for article
rebuild_reliability_names_for_article( resolve_article_id )

Found 5 records.
- delete()-ing: 10235 - label: prelim_month - article ID: 21350 - Greg Chandler ( 302 ) - coders: 12 ====> 1 - 8; 1; 302 ====> 2 - 2; 1; 302
- delete()-ing: 10239 - label: prelim_month - article ID: 21350 - Chris Brown ( 327 ) - coders: 12 ====> 1 - 8; 1; 327 ====> 2 - 2; 1; 327
- delete()-ing: 10238 - label: prelim_month - article ID: 21350 - Dan Henderson ( 326 ) - coders: 12 ====> 1 - 8; 1; 326 ====> 2 - 2; 1; 326
- delete()-ing: 10237 - label: prelim_month - article ID: 21350 - Bill Schwab ( 860 ) - coders: 12 ====> 1 - 8; 1; 860 ====> 2 - 2; 1; 860
- delete()-ing: 10236 - label: prelim_month - article ID: 21350 - Chris Tinney ( 762 ) - coders: 12 ====> 1 - 8; 0; 0 ====> 2 - 2; 1; 762
- Article ID: 21350; Article_Data count: 3
---- - Article Data Info: 2365 - minnesota1 - no coder_type -- Article: 21350 - Dec 06, 2009, Lakeshore ( N2 ), UID: 12C7A9A4DF21E4F8 - Former fire chief embraces photography ( Grand Rapids Press, The )
---- - Article Data Info: 1642 - automa

- Then, you'll need to re-fix any other problems with the article. Specifically:

    - load just the Reliability_Names for this article - [http://research.local/sourcenet/sourcenet/analysis/reliability/names/disagreement/view](http://research.local/sourcenet/sourcenet/analysis/reliability/names/disagreement/view):

        - **Label:** "prelim_month"
        - **Coders to compare (1 through ==>):** 2
        - **Reliability names filter type:** Select "Lookup"
        - **[Lookup] - Associated Article IDs (comma-delimited):** Enter "`<article_id>`," (without the quotes).

    - check for single names, either to remove, or to tie an erroneously parsed name to the correct person (forgot to capture first name, for example).

        - If two people that should be tied together are not, you'll need to merge the two rows.  To merge two rows:

            - In the "**select**" checkbox, click the checkbox next to the erroneous entry that you want to merge into the correct entry.
            - In the "**merge INTO**" checkbox, click the checkbox next to the entry INTO WHICH you want to merge.
            - In the "**Reliability names action:**" field, select "Merge Coding --> FROM 'select' TO 'merge INTO'".
            - Click the "**Do Action**" button.

    - add again the "TODO" tag to any rows with disagreement, or if no disagreements, to the row that initiated this work.

        - Click the checkbox in the "**select**" column next to any records that are either disagreements or the person who initiated this work.
        - In the "**Reliability names action:**" field, select "_Add tag(s) to selected_".
        - In the "**Tag(s) - (comma-delimited):**" field, enter "_`TODO`_" (without the quotes).
        - Click the "**Do Action**" button.

If there is a problem where human and computer coding of same person are so different they split into different rows, merge the computer row into the human row, then remove the computer row.

- TK

Once you've evaluated and verified the human coding, remove the "`TODO`" tag from the current record (either from the single-article view above if you've removed all disagreements, or from the disagreement view if not):

- Click the checkbox in the "**select**" column next to the record whose evaluation is complete.
- In the "**Reliability names action:**" field, select "_Remove tag(s) from selected_".
- In the "**Tag(s) - (comma-delimited):**" field, enter "_`TODO`_" (without the quotes).
- Click the "**Do Action**" button.
- This will also place information on the `Reliability_Names` record into a `Reliability_Names_Evaluation` record in the database.  The message that results from this action completing will include a link to the record (the first number in the output).  Click the link to open the record and update it with additional details.  Specifically:

    - if problems caused by automated coder error, click the "`is_automated_error`" checkbox.
    - update the "`status_message`" so it contains a brief description of what exactly happened (should have been mentioned, should have been quoted, missed the person entirely, etc.).
    - update "`Notes`" with more details.
    - add "`Tags`" if appropriate (for sports articles, for example, add "sports" tag).

### Resolution logs

- Back to [Table of Contents](#Table-of-Contents)

Table of Reliability_Names records with disagreements, then separate tables of those where:

- human coding had to be fixed.
- records for the same person needed to be merged together.
- coding had to be deleted.

#### Evaluation log

- Back to [Table-of-Contents](#Table-of-Contents)

Track each Reliability_Names that we evaluate:

- Moved to `Reliability_Names_Evaluation` table in django: [http://research.local/sourcenet/admin/sourcenet_analysis/reliability_names_evaluation/?label=prelim_month&o=-1.7.8.3.5](http://research.local/sourcenet/admin/sourcenet_analysis/reliability_names_evaluation/?label=prelim_month&o=-1.7.8.3.5)

#### Ground truth coding fixed

- Back to [Table-of-Contents](#Table-of-Contents)

For some, the error will be on the part of the human coder.  For human error, we create a new "`ground_truth`" record that we will correct, so we preserve original coding (and evidence of errors) in case we want or need that information later.  Below, we have a table of the articles where we had to fix ground truth.  To find the original coding, click the Article link.

- Denoted by records with "`is_ground_truth_fixed`" set to True in the `Reliability_Names_Evaluation` table in django:  [http://research.local/sourcenet/admin/sourcenet_analysis/reliability_names_evaluation/?is_ground_truth_fixed__exact=1&label=prelim_month&o=-1.7.8.3.5](http://research.local/sourcenet/admin/sourcenet_analysis/reliability_names_evaluation/?is_ground_truth_fixed__exact=1&label=prelim_month&o=-1.7.8.3.5)

#### Reliability_Names records merged

- Back to [Table-of-Contents](#Table-of-Contents)

For some, need to merge a single-name detection by Calais with full-name detection by ground_truth (an OpenCalais error - did not detect full name - combined with lookup error - didn't lookup the right person since missed part of his or her name).  Will still have subsequently deleted one or more duplicate rows.

- Denoted by records with "`event_type`" set to "merge" in the Reliability_Names_Evaluation table in django: [http://research.local/sourcenet/admin/sourcenet_analysis/reliability_names_evaluation/?event_type__exact=merge&label=prelim_month&o=-1.7.8.3.5](http://research.local/sourcenet/admin/sourcenet_analysis/reliability_names_evaluation/?event_type__exact=merge&label=prelim_month&o=-1.7.8.3.5)

#### Deleted Reliability_Names records

- Back to [Table-of-Contents](#Table-of-Contents)

Some records are just broken, need to be deleted.

- Denoted by records with "`event_type`" set to "deleted" in the `Reliability_Names_Evaluation` table in django:
[http://research.local/sourcenet/admin/sourcenet_analysis/reliability_names_evaluation/?event_type__exact=delete&label=prelim_month&o=-1.7.8.3.5](http://research.local/sourcenet/admin/sourcenet_analysis/reliability_names_evaluation/?event_type__exact=delete&label=prelim_month&o=-1.7.8.3.5)

# Notes

- Back to [Table of Contents](#Table-of-Contents)

## Notes and questions

- Back to [Table of Contents](#Table-of-Contents)

Notes and questions:

- TK

## Errors

- Back to [Table of Contents](#Table-of-Contents)

Errors of note in automated coding:

- Denoted by records with "`is_automated_error`" set to True in the `Reliability_Names_Evaluation` table in django:  [http://research.local/sourcenet/admin/sourcenet_analysis/reliability_names_evaluation/?is_ground_truth_fixed__exact=1&label=prelim_month&o=-1.7.8.3.5](http://research.local/sourcenet/admin/sourcenet_analysis/reliability_names_evaluation/?is_automated_error__exact=1&label=prelim_month&o=-1.7.8.3.5)

# TODO

- Back to [Table of Contents](#Table-of-Contents)

TODO:

- Update sections of code that output table markdown to also just insert that information into the database it Reliability_Names_Evaluation.

    - // debug admin pages.
    - import all of the existing rows from pipe delimited string.
    
        - // base list
        - // fixed ground truth
        - deleted
        - merged
        
    - update the places where it outputs the pipe-delimited lists to write also to the database.

- Use keywords for Lakeshore section stories to try to filter out sports stories ("Basketball").  Maybe try this for all articles in the month?
- Want a way to limit to disagreements where quoted?  Might not - this is a start to assessing erroneous agreement.  If yes, 1 < coding time < 4 hours.

    - problem - `Reliability_Names.person_type` only has three values - "author", "subject", "source" - might need a row-level measure of "`has_mention`", "`has_quote`" to more readily capture rows where disagreement is over quoted-or-not.

## Coding to look into

- Back to [Table of Contents](#Table-of-Contents)

Coding decisions to look at more closely:

- TK

## Debugging

- Back to [Table of Contents](#Table-of-Contents)

Issues to debug:

- TK

# DONE

- Back to [Table of Contents](#Table-of-Contents)

## quotes that contain paragraph break

- Back to [Table of Contents](#Table-of-Contents)

Quotes with newlines in them (not sure how that is captured on the way to the server, in the database, etc.) break the article coder: [http://research.local/sourcenet/sourcenet/article/code/](http://research.local/sourcenet/sourcenet/article/code/).

When you load JSON that contains quote text that spans lines, the newlines within the text cause the JSON parsing to break.  Looks like it is read and parsed correctly when submitted to serrver (except for the graf number - evaluates to -1 - so that is a bug, too, since there are no newlines in any of the text we are looking at, just paragraph breaks).

How to fix?:

- First try stripping out any stretches of multiple white space characters and substituting a space.  This should work with all of the rest of the code on the server.  Can implement in javascript, and for sanity check also in Python that processes received JSON.
- If rest of code doesn't play nice with reformatting, then maybe figure out how to escape the carriage returns and line feeds, and might need to update the "find in text" functions, too.
- turns out that fixing this in cases when the quotation spans paragraphs might then break things when there are extra spaces within a paragraph.  So, leaving it as is for now, need to fix that paragraph in the article.

Examples:

- Article 21001: [http://research.local/sourcenet/sourcenet/article/article_data/view_with_text/?article_id=21001](http://research.local/sourcenet/sourcenet/article/article_data/view_with_text/?article_id=21001)
    
    - user minnesota1, article 21001
    - user ground_truth, article 21001 (copied from minnesota1).

# NEXT

- Back to [Table of Contents](#Table-of-Contents)