# 2016.12.04 - work log

# Table of Contents

- [Setup](#Setup)

    - [Setup - Imports](#Setup---Imports)
    - [Setup - virtualenv jupyter kernel](#Setup---virtualenv-jupyter-kernel)
    - [Setup - Initialize Django](#Setup---Initialize-Django)

- [Reliability data creation - `prelim_month`](#Reliability-data-creation---prelim_month)
- [Database backup - `sourcenet-2016.12.04.pgsql.gz`](#Database-backup---sourcenet-2016.12.04.pgsql.gz)
- [Data cleanup](#Data-cleanup)

    - [Remove single name reliability data](#Remove-single-name-reliability-data)
    
        - [Single-name data assessment](#Single-name-data-assessment)
        - [Delete selected single-name data](#Delete-selected-single-name-data)


# Setup

- Back to [Table of Contents](#Table-of-Contents)

## Setup - Imports

- Back to [Table of Contents](#Table-of-Contents)

In [None]:
import datetime

print( "packages imported at " + str( datetime.datetime.now() ) )

## Setup - virtualenv jupyter kernel

- Back to [Table of Contents](#Table-of-Contents)

If you are using a virtualenv, make sure that you:

- have installed your virtualenv as a kernel.
- choose the kernel for your virtualenv as the kernel for your notebook (Kernel --> Change kernel).

Since I use a virtualenv, need to get that activated somehow inside this notebook.  One option is to run `../dev/wsgi.py` in this notebook, to configure the python environment manually as if you had activated the `sourcenet` virtualenv.  To do this, you'd make a code cell that contains:

    %run ../dev/wsgi.py
    
This is sketchy, however, because of the changes it makes to your Python environment within the context of whatever your current kernel is.  I'd worry about collisions with the actual Python 3 kernel.  Better, one can install their virtualenv as a separate kernel.  Steps:

- activate your virtualenv:

        workon sourcenet

- in your virtualenv, install the package `ipykernel`.

        pip install ipykernel

- use the ipykernel python program to install the current environment as a kernel:

        python -m ipykernel install --user --name <env_name> --display-name "<display_name>"
        
    `sourcenet` example:
    
        python -m ipykernel install --user --name sourcenet --display-name "sourcenet (Python 3)"
        
More details: [http://ipython.readthedocs.io/en/stable/install/kernel_install.html](http://ipython.readthedocs.io/en/stable/install/kernel_install.html)

In [None]:
%pwd

## Setup - Initialize Django

- Back to [Table of Contents](#Table-of-Contents)

First, initialize my dev django project, so I can run code in this notebook that references my django models and can talk to the database using my project's settings.

In [None]:
%run django_init.py

# Reliability data creation - `prelim_month`

- Back to [Table of Contents](#Table-of-Contents)

Create the data.

In [None]:
from __future__ import unicode_literals

# django imports
from django.contrib.auth.models import User

# sourcenet imports
from sourcenet.shared.sourcenet_base import SourcenetBase

# sourcenet_analysis imports
from sourcenet_analysis.reliability.reliability_names_builder import ReliabilityNamesBuilder

# declare variables
my_reliability_instance = None
tag_list = None
label = ""

# declare variables - user setup
current_coder = None
current_coder_id = -1
current_index = -1
current_priority = -1

# declare variables - Article_Data filtering.
coder_type = ""

# make reliability instance
my_reliability_instance = ReliabilityNamesBuilder()

#===============================================================================
# configure
#===============================================================================

# list of tags of articles we want to process.
tag_list = [ "grp_month", ]

# label to associate with results, for subsequent lookup.
label = "prelim_month"

# ! ====> map coders to indices

# set it up so that...

# ...the ground truth user has highest priority (4) for index 1...
current_coder = SourcenetBase.get_ground_truth_coding_user()
current_coder_id = current_coder.id
current_index = 1
current_priority = 4
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

# ...coder ID 8 is priority 3 for index 1...
current_coder_id = 8
current_index = 1
current_priority = 3
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

# ...coder ID 9 is priority 2 for index 1...
current_coder_id = 9
current_index = 1
current_priority = 2
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

# ...coder ID 10 is priority 1 for index 1...
current_coder_id = 10
current_index = 1
current_priority = 1
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

# ...and automated coder (2) is index 2
current_coder = SourcenetBase.get_automated_coding_user()
current_coder_id = current_coder.id
current_index = 2
current_priority = 1
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

# and only look at coding by those users.  And...

# configure so that it limits to automated coder_type of OpenCalais_REST_API_v2.
coder_type = "OpenCalais_REST_API_v2"
#my_reliability_instance.limit_to_automated_coder_type = "OpenCalais_REST_API_v2"
my_reliability_instance.automated_coder_type_include_list.append( coder_type )

# output debug JSON to file
#my_reliability_instance.debug_output_json_file_path = "/home/jonathanmorgan/" + label + ".json"

#===============================================================================
# process
#===============================================================================

# process articles
#my_reliability_instance.process_articles( tag_list )

# output to database.
#my_reliability_instance.output_reliability_data( label )

print( "reliability data created at " + str( datetime.datetime.now() ) )

# Database backup - `sourcenet-2016.12.04.pgsql.gz`

- Back to [Table of Contents](#Table-of-Contents)

First, making backup of database.

- File name: `sourcenet-2016.12.04.pgsql.gz`
- All articles in tag "grp_month" are coded by OpenCalais.
- Reliability data generated with label "prelim_month", no cleanup done yet.

# Data cleanup

- Back to [Table of Contents](#Table-of-Contents)

## Remove single-name reliability data

- Back to [Table of Contents](#Table-of-Contents)

Next, remove all reliability data that refers to a single name using the "View reliability name information" screen:

- [https://data.jrn.cas.msu.edu/sourcenet-dev/sourcenet/analysis/reliability/names/disagreement/view](https://data.jrn.cas.msu.edu/sourcenet-dev/sourcenet/analysis/reliability/names/disagreement/view)

To start, enter the following in fields there:

- Label: - "prelim_month"
- Coders to compare (1 through ==>): - 2
- Reliability names filter type: - Select "Lookup"
- [Lookup] - Person has first name, no other name parts. - CHECK the checkbox

You should see lots of entries where the automated coder detected people who were mentioned only by their first name.

### Single-name data assessment

- Back to [Table of Contents](#Table-of-Contents)

See [2016.12.09-work_log.ipynb](2016.12.09-work_log.ipynb)

### Delete selected single-name data

- Back to [Table of Contents](#Table-of-Contents)

See [2016.12.09-work_log.ipynb](2016.12.09-work_log.ipynb)