# Deletion/replacement/stable chunks that are common to all branch beginnings

## 1 Setup

Flags

In [1]:
SAVE_FIGURES = False

Setup the database

In [2]:
import os, sys
sys.path.insert(1, os.path.abspath('../..'))
import analysis

FIG = os.path.join(os.path.abspath(os.path.curdir), '{}.png')
DB_NAME = 'spreadr_' + os.path.split(os.path.abspath(os.path.curdir))[1]
analysis.setup(DB_NAME)
print('Database:', DB_NAME)

Database: spreadr_exp_3


Imports for the analysis

In [3]:
import itertools

import pandas as pd
from progressbar import ProgressBar

from gists.models import Sentence, Tree

from analysis.utils import contiguous_chunks

## 2 Chunks common to all branch beginnings

We need a few helper functions first

In [4]:
chunk_int_type = {
    0: 'del',
    1: 'ins',
    2: 'rpl_parent',
    3: 'rpl_child',
    4: 'stb_parent'
}
chunk_type_int = dict((v, k) for k, v in chunk_int_type.items())

What chunks are affected in all transformations of a tree root? Or in none?

In [5]:
data = []
for tree in Tree.objects.experiment:#ProgressBar(max_value=Tree.objects.experiment.count())(Tree.objects.experiment):
    
    heads_del_ids = []
    heads_stb_ids = []
    
    for head in tree.root.children.kept:
        del_ids, _, rpl_pairs, stb_pairs = tree.root.consensus_relationships(head)
        #dis_ids = set(dis_ids).union(disrpl_ids)
        if len(del_ids) > 0:
            # Ignore when there were no disappearances;
            # otherwise this would crush our intersection sets to zero.
            heads_del_ids.append(set(del_ids))
        heads_stb_ids.append(set([p[0] for p in itertools.chain(stb_pairs, rpl_pairs)]))
    
    # Intersect all the heads of the tree
    root_del_ids = set.intersection(*heads_del_ids) if len(heads_del_ids) > 0 else []
    root_stb_ids = set.intersection(*heads_stb_ids) if len(heads_stb_ids) > 0 else []
    
    # Save the data to a DataFrame
    for del_idx in root_del_ids:
        data.append({
            'tree_id': tree.id,
            'tipe': chunk_type_int['del'],
            'token_idx': del_idx
        })
    for stb_idx in root_stb_ids:
        data.append({
            'tree_id': tree.id,
            'tipe': chunk_type_int['stb_parent'],
            'token_idx': stb_idx
        })
    
    # Print what we found to have a look
    print('Tree {}'.format(tree.id))
    print('-------')
    print(tree.root.text)
    print('Disappearances')
    print('^^^^^^^^^^^^^^')
    for cchunk in contiguous_chunks(root_del_ids):
        print([tree.root.tokens[i] for i in cchunk])
    print('Stabilities')
    print('^^^^^^^^^^^')
    for cchunk in contiguous_chunks(root_stb_ids):
        print([tree.root.tokens[i] for i in cchunk])
    print()

data = pd.DataFrame(data)

Tree 4
-------
At Dover, the finale of the bailiffs' convention. Their duties, said a speaker, are "delicate, dangerous, and insufficiently compensated."
Disappearances
^^^^^^^^^^^^^^
Stabilities
^^^^^^^^^^^
[At, Dover, the]
[Their]
[dangerous, and]

Tree 5
-------
Three bears driven down from the heights of the Pyrenees by snow have been decimating the sheep of the valley.
Disappearances
^^^^^^^^^^^^^^
Stabilities
^^^^^^^^^^^
[Three, bears, driven, down, from]
[the, Pyrenees]

Tree 6
-------
A dozen hawkers who had been announcing news of a nonexistent anarchist bombing at King's Cross have been arrested.
Disappearances
^^^^^^^^^^^^^^
Stabilities
^^^^^^^^^^^
[have, been, arrested]

Tree 7
-------
Due to their ardour during audits and polls, some congregants and a voter have been sentenced, in Derby and Nottingham.
Disappearances
^^^^^^^^^^^^^^
[a]
Stabilities
^^^^^^^^^^^
[Due, to, their]
[during]
[and, polls]
[have, been, sentenced]

Tree 8
-------
The charge of embezzlement against t

Tree 37
-------
The fever, of military origin, that is raging in Lincoln, Nebraska, is getting worse and spreading. Preventative measures have been taken.
Disappearances
^^^^^^^^^^^^^^
Stabilities
^^^^^^^^^^^
[The, fever]

Tree 38
-------
A dishwasher from Bristol, Vital Gray, who had just come back from Lourdes cured forever of tuberculosis, died Sunday by mistake.
Disappearances
^^^^^^^^^^^^^^
Stabilities
^^^^^^^^^^^
[A, dishwasher, from]
[Vital, Gray]
[cured]
[tuberculosis, died]

Tree 39
-------
M. Webb denied to the commission that the new tax plan was a scheme to make the budget's ends meet.
Disappearances
^^^^^^^^^^^^^^
Stabilities
^^^^^^^^^^^
[Webb, denied]
[the, commission, that, the]

Tree 40
-------
Despite a 20-year penitentiary sentence, in his absence, M. Patel, a Glasgow architect, lived quietly in Plymouth. He was arrested there.
Disappearances
^^^^^^^^^^^^^^
Stabilities
^^^^^^^^^^^
[Despite]
[penitentiary, sentence]

Tree 41
-------
On the bowling lawn a stroke levelle

Just to make sure this doesn't come from too many deep alignments that don't agree with each other, here are the number of deep alignments for each child of a tree root, for each tree. They're almost all 1, sometimes 2.

In [6]:
for tree in Tree.objects.experiment:#ProgressBar(max_value=Tree.objects.experiment.count())(Tree.objects.experiment):

    da_counts = [len(tree.root.align_deep_lemmas(head)) for head in tree.root.children.kept]
    print('Tree {}, da counts: {}'.format(tree.id, da_counts))

Tree 4, da counts: [1, 1, 2, 1, 1, 1, 1]
Tree 5, da counts: [1, 1, 1, 1, 2, 1, 1]
Tree 6, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 7, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 8, da counts: [1, 1, 2, 2, 2, 2, 2]
Tree 9, da counts: [1, 2, 2, 1, 1, 1, 1]
Tree 10, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 11, da counts: [1, 1, 1, 1, 1, 1, 2]
Tree 12, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 13, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 14, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 15, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 16, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 17, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 18, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 19, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 20, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 21, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 22, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 23, da counts: [1, 1, 1, 1, 1, 1, 1]
Tree 24, da counts: [1, 1, 2, 1, 2, 1, 1]
Tree 25, da counts: [2, 1, 1, 1, 1, 1, 1]
Tree 26, da counts: [1, 1, 2, 1, 2, 2, 1]
Tree 27, da counts: [1, 1, 1, 1, 1, 1, 1