## PSS Relevance for Information Retrieval

In this notebook we will look at the influence of a PSS model on the ranking of a system, does it help search by having a good document segmentation?

The first step in this task is very simple, we are going to use the best performing model (late ensembling) and compare this with the gold standard, For each boundary in the gold standard, we are going to see how close the nearest boundary in the predicted segmentation was. Then we make a histogram of this.
Note for Ruben: This in essence measures your recall, this would be very high if you just place ones verywhere, some comparison of sizes might also be necessary / a penalty to get a bit more of an informed picture. Although the late ensembling model is already quite good so for this specific model it should be fine.

In [1]:
import json
import numpy as np
import pandas as pd
from tqdm import tqdm
import matplotlib.pyplot as plt
from collections import Counter
from typing import Union
import seaborn as sns
from collections import defaultdict

# import metricutils file
%run ../../metricutils.py

In [2]:
# load in gold standard and predictions
with open('../../experiment_notebooks/experiment_results/GOLD_STANDARD/D1/gold_standard.json', 'r') as f:
    gold_standard = json.load(f)
with open('../../experiment_notebooks/experiment_results/LATE_FUSION_EFFICIENTNET/D1_D1/predictions.json', 'r') as f:
    predictions = json.load(f)
    
for key, value in predictions.items():
    value[0] = 1

The first step is to calculate all the scores that we get with the late ensembling approach and see what this means in practice for our search engine performance.

In [3]:
evaluation_report(gold_standard, predictions)

Unnamed: 0,precision,recall,F1,support,CI Precision,CI Recall,CI F1
Accuracy,0.92,0.92,0.92,6347,0.91-0.93,0.91-0.93,0.91-0.93
Boundary,0.85,0.88,0.83,6347,0.84-0.86,0.87-0.89,0.82-0.84
WindowDiff,0.15,0.15,0.15,6347,0.14-0.16,0.14-0.16,0.14-0.16
PQ,0.79,0.77,0.79,6347,0.78-0.8,0.76-0.78,0.78-0.8
SQ,0.96,0.96,0.96,6347,0.96-0.96,0.96-0.96,0.96-0.96
RQ,0.82,0.82,0.8,6347,0.81-0.83,0.81-0.83,0.79-0.81


In [4]:
def get_document_sets(length_list_input: list):
    docs = []
    l= sum(length_list_input)
    pages= list(np.arange(l))
    out = defaultdict(set)
    for block_length in length_list_input:
        block = pages[:block_length]
        pages = pages[block_length:]
        docs.append(set(block))
    return docs
    

Next we are going to calculate some results about the document level precision of the model, and in particular when the model makes false positives,
how do these occur?

In [5]:
total_FP = 0
intersection_fp_size = []

for stream_id, prediction in predictions.items():
    ll_prediction = bin_to_length_list(prediction)
    ll_gold_standard = bin_to_length_list(gold_standard[stream_id])
    IOUS, TP, FP, FN = align(ll_gold_standard, ll_prediction)
    real_documents = get_document_sets(ll_gold_standard)
    for false_positive in FP:
        number_of_intersections = 0
        for real_doc in real_documents:
            intersec = false_positive.intersection(real_doc)
            if len(intersec):
                number_of_intersections+=1
        intersection_fp_size.append(number_of_intersections)

    total_FP+=len(FP)

In [6]:
FP_stats = pd.Series(intersection_fp_size)

In [7]:
(FP_stats > 2).mean()

0.0912621359223301

Next up, we do the same thing but now for the false negatives, which impact the recall.

In [8]:
total_FN = 0
intersection_fn_size = []

for stream_id, prediction in predictions.items():
    ll_prediction = bin_to_length_list(prediction)
    ll_gold_standard = bin_to_length_list(gold_standard[stream_id])
    IOUS, TP, FP, FN = align(ll_gold_standard, ll_prediction)
    
    pred_documents = get_document_sets(ll_prediction)
    for false_negative in FN:
        number_of_intersections = 0
        for pred_doc in pred_documents:
            intersec = false_negative.intersection(pred_doc)
            if len(intersec):
                number_of_intersections+=1
        intersection_fn_size.append(number_of_intersections)

    total_FN+=len(FN)

In [9]:
FN_stats = pd.Series(intersection_fn_size)

In [10]:
FN_stats.shape

(724,)

In [11]:
(FN_stats == 2).mean()

0.04143646408839779

Finally we are going to investigate the segmentation quality, by going through the true positive pairs, and seeing how large the number of non-overlapping pages is for different lengths of ground truth documents.

In [18]:
ground_truth_length = []
non_overlap = []
overlap = []

for stream_id, prediction in predictions.items():
    ll_prediction = bin_to_length_list(prediction)
    ll_gold_standard = bin_to_length_list(gold_standard[stream_id])
    IOUS, TP, FP, FN = align(ll_gold_standard, ll_prediction)
    for T, H in TP:
        ground_truth_length.append(len(T))
        non_overlap.append(len(T.symmetric_difference(H)))
        overlap.append(len(T & H)/len(T | H))
        

frozenset({12, 13, 14, 15}) frozenset({12, 13, 14})
frozenset({7}) frozenset({7})
frozenset({1, 2, 3, 4}) frozenset({1, 2, 3, 4})
frozenset({16, 17, 18}) frozenset({16, 17})
frozenset({6}) frozenset({6})
frozenset({38}) frozenset({38})
frozenset({9}) frozenset({9})
frozenset({39}) frozenset({39})
frozenset({33}) frozenset({33})
frozenset({19, 20}) frozenset({19, 20})
frozenset({0}) frozenset({0})
frozenset({5}) frozenset({5})
frozenset({32}) frozenset({32})
frozenset({30, 31}) frozenset({30, 31})
frozenset({8}) frozenset({8})
frozenset({10, 11}) frozenset({10, 11})
frozenset({19}) frozenset({19})
frozenset({12, 13, 14}) frozenset({12, 13, 14})
frozenset({29, 30, 31}) frozenset({29, 30, 31})
frozenset({20, 21}) frozenset({20, 21})
frozenset({8, 9, 10, 11}) frozenset({8, 9, 10, 11})
frozenset({44, 45}) frozenset({44, 45})
frozenset({32, 33, 34}) frozenset({32, 33, 34})
frozenset({0, 1, 2, 3, 4, 5, 6, 7}) frozenset({0, 1, 2, 3, 4})
frozenset({48, 49}) frozenset({48, 49})
frozenset({24, 25

frozenset({1699}) frozenset({1699})
frozenset({1118}) frozenset({1118})
frozenset({1922, 1923, 1924, 1925, 1926}) frozenset({1922, 1923, 1924, 1925, 1926})
frozenset({341, 342}) frozenset({341, 342})
frozenset({1352}) frozenset({1352})
frozenset({483}) frozenset({483})
frozenset({1609}) frozenset({1609})
frozenset({1761}) frozenset({1761})
frozenset({60, 61}) frozenset({60, 61})
frozenset({128, 127}) frozenset({128, 127})
frozenset({616, 617, 614, 615}) frozenset({616, 617, 614, 615})
frozenset({680, 681, 678, 679}) frozenset({680, 681, 678, 679})
frozenset({1676, 1677}) frozenset({1676, 1677})
frozenset({807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821}) frozenset({808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821})
frozenset({1066, 1067, 1068}) frozenset({1066, 1067, 1068})
frozenset({237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249}) frozenset({237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249})
frozenset({193,

frozenset({187}) frozenset({187})
frozenset({160, 156, 157, 158, 159}) frozenset({160, 156, 157, 158, 159})
frozenset({569, 570}) frozenset({569, 570})
frozenset({24, 25, 26, 27}) frozenset({24, 25, 26, 27})
frozenset({18}) frozenset({18})
frozenset({443, 444}) frozenset({443, 444})
frozenset({609, 610, 611}) frozenset({609, 610, 611})
frozenset({98, 99, 100}) frozenset({98, 99, 100})
frozenset({280, 281, 282, 283, 284, 285, 286, 287}) frozenset({280, 281, 282, 283, 284})
frozenset({147, 148, 149, 150}) frozenset({147, 148, 149, 150})
frozenset({429}) frozenset({429})
frozenset({387}) frozenset({387})
frozenset({304, 305, 306, 307, 308}) frozenset({304, 305, 306, 307, 308})
frozenset({428}) frozenset({428})
frozenset({632, 630, 631}) frozenset({632, 630, 631})
frozenset({432}) frozenset({432})
frozenset({128, 129, 130, 131, 132, 133, 134, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127}) frozenset({128, 129, 130, 131, 132, 133, 134, 121, 122, 123, 124, 125, 126, 127})
frozenset({110})

frozenset({656, 654, 655}) frozenset({656, 654, 655})
frozenset({698}) frozenset({698})
frozenset({332}) frozenset({332})
frozenset({664}) frozenset({664})
frozenset({661, 662}) frozenset({661, 662})
frozenset({108}) frozenset({108})
frozenset({83, 84, 85, 86}) frozenset({83, 84, 85, 86})
frozenset({394, 395}) frozenset({394, 395})
frozenset({304}) frozenset({304})
frozenset({956, 957, 958}) frozenset({956, 957, 958})
frozenset({992, 987, 988, 989, 990, 991}) frozenset({992, 987, 988, 989, 990, 991})
frozenset({846}) frozenset({846})
frozenset({284, 285}) frozenset({284, 285})
frozenset({88, 89, 87}) frozenset({88, 89, 87})
frozenset({857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883}) frozenset({859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883})
frozenset({81, 82}) frozenset({81, 82})
frozenset({4}) frozenset({4})
frozenset({109

frozenset({1160, 1161}) frozenset({1160, 1161})
frozenset({3846}) frozenset({3846})
frozenset({3920, 3921, 3922, 3923, 3924, 3925, 3926, 3927, 3928, 3929, 3930, 3931, 3932, 3933}) frozenset({3920, 3921, 3922, 3923, 3924, 3925, 3926, 3927, 3928, 3929, 3930, 3931, 3932, 3933})
frozenset({128, 127}) frozenset({128, 127})
frozenset({136, 135}) frozenset({136, 135})
frozenset({4986, 4987}) frozenset({4986, 4987})
frozenset({3960, 3961, 3962, 3963, 3964}) frozenset({3960, 3961, 3962, 3963, 3964})
frozenset({3545}) frozenset({3545})
frozenset({3585, 3586}) frozenset({3585, 3586})
frozenset({1861}) frozenset({1861})
frozenset({384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383}) frozenset({384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 358, 359, 360, 361, 362, 363, 364, 365, 3

frozenset({210}) frozenset({210})
frozenset({106}) frozenset({106})
frozenset({184}) frozenset({184})
frozenset({143}) frozenset({143})
frozenset({231}) frozenset({231})
frozenset({299, 300}) frozenset({299, 300})
frozenset({216}) frozenset({216})
frozenset({67, 68}) frozenset({67, 68})
frozenset({243}) frozenset({243})
frozenset({185, 186}) frozenset({185, 186})
frozenset({368, 369}) frozenset({368, 369})
frozenset({175}) frozenset({175})
frozenset({13}) frozenset({13})
frozenset({228, 229, 230}) frozenset({228, 229, 230})
frozenset({135}) frozenset({135})
frozenset({240, 239}) frozenset({240, 239})
frozenset({167}) frozenset({167})
frozenset({333, 334}) frozenset({333, 334})
frozenset({160, 161, 162}) frozenset({160, 161, 162})
frozenset({275, 276}) frozenset({275, 276})
frozenset({292, 293, 294, 295}) frozenset({292, 293, 294, 295})
frozenset({208, 209}) frozenset({208, 209})
frozenset({232, 233}) frozenset({232, 233})
frozenset({266}) frozenset({266})
frozenset({124}) frozenset({12

frozenset({190}) frozenset({190})
frozenset({330, 331, 332}) frozenset({330, 331})
frozenset({337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418}) frozenset({334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418})
frozenset({191}) frozenset({191})
frozenset({201, 202, 203

frozenset({1235}) frozenset({1235})
frozenset({1247}) frozenset({1247})
frozenset({2201, 2202}) frozenset({2201, 2202})
frozenset({1}) frozenset({1})
frozenset({2424}) frozenset({2424})
frozenset({202, 203, 204, 205, 206, 207, 208, 209, 210}) frozenset({202, 203, 204, 205, 206, 207, 208, 209, 210})
frozenset({0}) frozenset({0})
frozenset({1777, 1778, 1779, 1780, 1781}) frozenset({1777, 1778, 1779, 1780, 1781})
frozenset({536}) frozenset({536})
frozenset({2269, 2270, 2271}) frozenset({2269, 2270, 2271})
frozenset({1624, 1625, 1626, 1627, 1628}) frozenset({1626, 1627, 1628})
frozenset({2300, 2301}) frozenset({2300, 2301})
frozenset({2114, 2115, 2116, 2117, 2118, 2119}) frozenset({2114, 2115, 2116, 2117, 2118})
frozenset({1664, 1665, 1666, 1667, 1668, 1669, 1670, 1671, 1672, 1673, 1674, 1675, 1676, 1677, 1678, 1679, 1680, 1681, 1682, 1652, 1653, 1654, 1655, 1656, 1657, 1658, 1659, 1660, 1661, 1662, 1663}) frozenset({1667, 1668, 1669, 1670, 1671, 1672, 1673, 1674, 1675, 1676, 1677, 1678, 1

frozenset({1160, 1161}) frozenset({1160, 1161})
frozenset({1723, 1724}) frozenset({1723, 1724})
frozenset({1939}) frozenset({1939})
frozenset({257, 258}) frozenset({257, 258})
frozenset({1260}) frozenset({1260})
frozenset({1269, 1270, 1271}) frozenset({1269, 1270, 1271})
frozenset({1360}) frozenset({1360})
frozenset({547, 548}) frozenset({547, 548})
frozenset({60, 61}) frozenset({60, 61})
frozenset({1905}) frozenset({1905})
frozenset({576, 577, 578, 579, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575}) frozenset({551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566})
frozenset({405, 406, 407}) frozenset({405, 406, 407})
frozenset({1860, 1861, 1862, 1863}) frozenset({1860, 1861, 1862, 1863})
frozenset({120, 119}) frozenset({120, 119})
frozenset({1747, 1748}) frozenset({1747, 1748})
frozenset({1889, 1890}) frozenset({1889, 1890})
frozenset({1821, 1822, 1823}) frozenset({1821, 

frozenset({1954, 1955, 1956, 1957, 1958}) frozenset({1954, 1955, 1956, 1957, 1958})
frozenset({1524, 1525, 1526, 1527, 1528}) frozenset({1524, 1525, 1526, 1527, 1528})
frozenset({145, 146, 147, 148, 149}) frozenset({145, 146, 147, 148, 149})
frozenset({1960, 1959}) frozenset({1960, 1959})
frozenset({209, 210, 211, 212}) frozenset({209, 210, 211})
frozenset({1376, 1377, 1371, 1372, 1373, 1374, 1375}) frozenset({1376, 1377, 1374, 1375})
frozenset({2655}) frozenset({2655})
frozenset({1680, 1676, 1677, 1678, 1679}) frozenset({1680, 1676, 1677, 1678, 1679})
frozenset({2617, 2618, 2619, 2620}) frozenset({2617, 2618, 2619, 2620})
frozenset({2026, 2027, 2028}) frozenset({2026, 2027, 2028})
frozenset({2080, 2077, 2078, 2079}) frozenset({2080, 2077, 2078, 2079})
frozenset({2118, 2119}) frozenset({2118, 2119})
frozenset({68, 69, 70, 71, 72, 73}) frozenset({72, 73, 70, 71})
frozenset({1588}) frozenset({1588})
frozenset({1792, 1793, 1794, 1795, 1796, 1797, 1798, 1799, 1800, 1801, 1802, 1803, 1804, 

frozenset({1858}) frozenset({1858})
frozenset({523, 524, 525}) frozenset({523, 524, 525})
frozenset({280, 279}) frozenset({280, 279})
frozenset({322, 323, 324}) frozenset({322, 323, 324})
frozenset({848, 849}) frozenset({848, 849})
frozenset({4, 5, 6}) frozenset({4, 5, 6})
frozenset({884}) frozenset({884})
frozenset({1107}) frozenset({1107})
frozenset({1705}) frozenset({1705})
frozenset({2228, 2229, 2230, 2231, 2232, 2233, 2234}) frozenset({2228, 2229, 2230, 2231, 2232, 2233, 2234})
frozenset({1242, 1243, 1244, 1245, 1246}) frozenset({1242, 1243, 1244, 1245, 1246})
frozenset({1664, 1665, 1666, 1661, 1662, 1663}) frozenset({1664, 1665, 1666, 1661, 1662, 1663})
frozenset({132, 133}) frozenset({132, 133})
frozenset({14}) frozenset({14})
frozenset({341, 342, 343}) frozenset({341, 342, 343})
frozenset({1011, 1012}) frozenset({1011, 1012})
frozenset({2318}) frozenset({2318})
frozenset({449, 450}) frozenset({449, 450})
frozenset({216, 217}) frozenset({216, 217})
frozenset({921, 922, 923}) fro

In [19]:
non_overlap_df = pd.DataFrame({'true': ground_truth_length, 'non_overlap': non_overlap, 'overlap': overlap})

In [20]:
non_overlap_df[(3 <= non_overlap_df['true']) & ( non_overlap_df['true'] <= 10 )].non_overlap.mean()

0.18858560794044665

In [21]:
non_overlap_df[(3 <= non_overlap_df['true']) & ( non_overlap_df['true'] <= 10 )].overlap.mean()

0.9629897037392293

In [15]:
non_overlap_df[(10 <= non_overlap_df['true']) & ( non_overlap_df['true'] <= 50)].non_overlap.mean()

1.6181818181818182

In [23]:
non_overlap_df[(10 <= non_overlap_df['true']) & ( non_overlap_df['true'] <= 50)].overlap.mean()

0.9253565496361068

In [16]:
non_overlap_df[( non_overlap_df['true'] > 50)].non_overlap.mean()

4.634920634920635

In [24]:
non_overlap_df[( non_overlap_df['true'] > 50)].overlap.mean()

0.9629442824508234