# Bibliography Categorization: 'BibCat'
## Tutorial: Estimating performance of classifiers in bibcat.



---


## Introduction.

In this tutorial, we will use bibcat to estimate the performance of classifiers on sets of texts.


---

## User Workflow: Training a machine learning (ML) model.


The `Performance` class contains user-friendly methods for estimating the performance of given classifiers and outputting that performance as, e.g., confusion matrices.  We overview how this method can be run in the code blocks below.

For this tutorial, we assume that the user has already run the trainML tutorial, and so has generated and saved a machine learning model.

In [1]:
#Import external packages
import re
import os
import sys
import json
import numpy as np


In [2]:
# Set up for fetching necessary bibcat modules for the tutorial
# Check work directories: src/ is where all source python scripts are available. 
current_dir= os.path.dirname(os.path.abspath('__file__'))
_parent = os.path.dirname(current_dir)
src_dir = os.path.join(_parent, "src")

print(f'Current Directory: {current_dir}')
print(f'Source directory: {src_dir}')

# move to the ../src/ directory to import necessary modules. 
os.chdir(src_dir)

Current Directory: /Users/jamila.pegues/Documents/STScI_Fellowship/Functional/Library/BibTracking/repo_stsci/bibcat/docs
Source directory: /Users/jamila.pegues/Documents/STScI_Fellowship/Functional/Library/BibTracking/repo_stsci/bibcat/src


In [3]:
#Import bibcat packages
import bibcat_classes as bibcat
import bibcat_config as config
import bibcat_parameters as params #Temporary file until contents moved elsewhere

Root directory =/Users/jamila.pegues/Documents/STScI_Fellowship/Functional/Library/BibTracking/repo_stsci/bibcat/src, parent directory=/Users/jamila.pegues/Documents/STScI_Fellowship/Functional/Library/BibTracking/repo_stsci/bibcat
/Users/jamila.pegues/Documents/STScI_Fellowship/Functional/Library/BibTracking/repo_stsci/bibcat/src/models folder already exists.
/Users/jamila.pegues/Documents/STScI_Fellowship/Functional/Library/BibTracking/repo_stsci/bibcat/output folder already exists.


In [4]:
#Set parameters for each operator and its internal classifier
#Global parameters
do_verify_truematch = True #A very important parameter - discuss with J.P. first!!!  Set it to either True or False.
do_raise_innererror = False #If True, will stop if exception encountered; if False, will print error and continue
do_reuse_run = True
#
do_include_trainML_unused_bibcodes_in_testset = False #If True, will include the bibcodes from the trainML tutorial that were skipped because e.g. not target missions
#
list_threshold_arrays = [np.arange(0.35, 0.95+0.05, 0.05)]*2 #For uncertainty test
class_mapper = params.map_papertypes #Mapper for class types; None for no mapper
fileroot_evaluation = "test_eval" #Root name of the file within which to store the performance evaluation output
threshold = 0.7 #0.9

#For operator 1
mapper_1 = class_mapper #Mapper to mask classifications; None if no masking
threshold_1 = threshold #Uncertainty threshold for this classifier
buffer_1 = 0

#For operator 2
mapper_2 = class_mapper #Mapper to mask classifications; None if no masking
threshold_2 = threshold #Uncertainty threshold for this classifier
buffer_2 = 0

#Gather parameters into lists
list_mappers = [mapper_1, mapper_2]
list_thresholds = [threshold_1, threshold_2]
list_buffers = [buffer_1, buffer_2]

In [5]:
#Set some overarching global variables
seed_test = 20 #Random seed for shuffling text dataset
np.random.seed(seed_test)
do_shuffle = True #Whether or not to shuffle the text dataset
do_real_testdata = True #If True, will use real papers to test performance; if False, will use fake texts below
#
max_tests = 500 #None #100 #Number of text entries to test the performance for; None for all tests available
mode_modif = "none" #"skim_anon" #"skim_trim_anon" #None #We are using preprocessed data in this tutorial, so we do not need a processing mode at all
target_classifs_basic = ["science", "mention", "datainfluenced"]
target_classifs_uncertainty = ["science", "mention", "datainfluenced", "other", "zlowprob"]
minmax_exclude_classifs = ([item.lower().replace("_","") for item in config.list_other_verdicts] + ["other"])

#
#Prepare some Keyword objects
all_kobjs = params.all_kobjs

In [6]:
#Fetch filepaths for model and data
name_model = config.name_model
filepath_json = config.path_json
dir_model = os.path.join(config.dir_allmodels, name_model)
#
#Set filepath for unused bibcodes from trainML, if so requested
if do_include_trainML_unused_bibcodes_in_testset:
    filesave_unused_bibcodes = os.path.join(dir_model,
                              "{0}_bibcodes_unused_during_trainML.npy".format(name_model)) #Where to save processing errors
#
#Set and create (as needed) directories for storing performance output
filepath_output = os.path.join(dir_model, "output") #Where to store performance output, such as confusion matrices
if (not os.path.exists(filepath_output)):
    os.makedirs(filepath_output)
    print("Output folder created at: {0}".format(filepath_output))
print("Output will be saved to: {0}".format(filepath_output))
#
#Set directories for fetching text
dir_info = dir_model
folder_test = config.folders_TVT["test"]
dir_test = os.path.join(dir_model, folder_test)

Output will be saved to: /Users/jamila.pegues/Documents/STScI_Fellowship/Functional/Library/BibTracking/repo_stsci/bibcat/src/models/test_run_rule/output


Let's build a set of classifiers for which we'd like to test the performance.  We'll then feed each of those classifiers into an instance of the Operator class to handle them.

In [7]:
#Create a list of classifiers
#This can be modified to use whatever classifiers you'd like.
#Load a previously trained ML model
filepath_model = os.path.join(dir_model, (name_model+".npy"))
fileloc_ML = os.path.join(dir_model, (config.tfoutput_prefix+name_model))
# !!! classifier_ML = bibcat.Classifier_ML(filepath_model=filepath_model, fileloc_ML=fileloc_ML, do_verbose=True)
#

#Load a rule-based classifier
classifier_rules = bibcat.Classifier_Rules()
#

In [8]:
#Load models into instances of the Operator class
# !!! operator_1 = bibcat.Operator(classifier=classifier_ML, mode=mode_modif, keyword_objs=all_kobjs,
#                            name="Operator_ML", do_verbose=True, load_check_truematch=True, do_verbose_deep=False)
operator_2 = bibcat.Operator(classifier=classifier_rules,
                            name="Operator_RB", mode=mode_modif, keyword_objs=all_kobjs,
                            do_verbose=True, do_verbose_deep=False)
list_operators = [operator_2] #[operator_1, operator_2] #Feel free to add more/less operators here.
#

Instance of Operator successfully initialized!
Keyword objects:
0: Keyword Object:
Name: Hubble
Keywords: ['Hubble Space Telescope', 'Hubble Telescope', 'Hubble']
Acronyms: ['HST', 'HT']
Banned Overlap: ['Hubble Legacy Archive']

1: Keyword Object:
Name: Webb Telescope
Keywords: ['James Webb Space Telescope', 'Webb Space Telescope', 'James Webb Telescope', 'Webb Telescope']
Acronyms: ['JWST', 'JST', 'JT']
Banned Overlap: []

2: Keyword Object:
Name: Transiting Exoplanet Survey Satellite
Keywords: ['Transiting Exoplanet Survey Satellite']
Acronyms: ['TESS']
Banned Overlap: []

3: Keyword Object:
Name: Kepler
Keywords: ['Kepler']
Acronyms: []
Banned Overlap: []

4: Keyword Object:
Name: Pan-STARRS
Keywords: ['Panoramic Survey Telescope and Rapid Response System', 'Pan-STARRS1', 'Pan-STARRS']
Acronyms: ['PanSTARRS1', 'PanSTARRS', 'PS1']
Banned Overlap: []

5: Keyword Object:
Name: Galaxy Evolution Explorer
Keywords: ['Galaxy Evolution Explorer']
Acronyms: ['GALEX']
Banned Overlap: []

6: 

Now, let's fetch some text for our classifiers to classify. For this tutorial, we'll load previously processed texts from the directory containing the test set for the ML classifier.

In [9]:
#For use of real papers from test dataset to test on
if (do_real_testdata and ((not do_reuse_run) or (not os.path.exists(os.path.join(filepath_output, (fileroot_evaluation+".npy")))))):
    #Load information for processed bibcodes reserved for testing
    dict_TVTinfo = np.load(os.path.join(dir_info, "dict_TVTinfo.npy"), allow_pickle=True).item()
    list_test_bibcodes = [key for key in dict_TVTinfo if (dict_TVTinfo[key]["folder_TVT"] == folder_test)]
    
    #Load the original data
    with open(filepath_json, 'r') as openfile:
        dataset = json.load(openfile)
    #
    
    #Extract text information for the bibcodes reserved for testing
    list_test_indanddata_raw = [(ii, dataset[ii]) for ii in range(0, len(dataset))
                                if (dataset[ii]["bibcode"] in list_test_bibcodes)] #Data for test set
    #
    #Add in unused bibcodes from trainML tutorial, if so requested
    if do_include_trainML_unused_bibcodes_in_testset:
        tmp_dict = np.load(filesave_unused_bibcodes, allow_pickle=True).item()
        list_test_indanddata_raw += [(tmp_dict[key], dataset[tmp_dict[key]]) for key in tmp_dict]
    #
    
    #Shuffle, if requested
    if do_shuffle:
        np.random.shuffle(list_test_indanddata_raw)
    #
    
    #Extract target number of test papers from the test bibcodes
    if (max_tests is not None): #Fetch subset of tests
        list_test_indanddata = list_test_indanddata_raw[0:max_tests]
    else: #Use all tests
        list_test_indanddata = list_test_indanddata_raw
    #
    
    #Process the text input into dictionary format for inputting into the codebase
    dict_texts = {} #To hold formatted text entries
    for ii in range(0, len(list_test_indanddata)):
        curr_ind = list_test_indanddata[ii][0]
        curr_data = list_test_indanddata[ii][1]
        #
        #Convert this data entry into dictionary with: key:text,id,bibcode,mission structure
        curr_info = {"text":curr_data["body"], "id":str(curr_ind), "bibcode":curr_data["bibcode"],
                    "missions":{}}
        
        #Initialize all mission entries as non-matches
        for curr_kobj in all_kobjs: #Iterate through declared Keyword objects
            curr_name = curr_kobj.get_name()
            curr_info["missions"][curr_name] = {"mission":curr_name, "class":config.verdict_rejection}                    
            
        #If using unused bibcodes and no class_missions, store as is
        if (do_include_trainML_unused_bibcodes_in_testset and ("class_missions" not in curr_data)):
            #Store this data entry and skip ahead
            dict_texts[str(curr_ind)] = curr_info
            continue
        
        #Iterate through missions
        for curr_mission in curr_data["class_missions"]: #Iterate through missions for this paper
            for curr_kobj in all_kobjs: #Iterate through declared Keyword objects
                curr_name = curr_kobj.get_name()
                #Store mission data under keyword name, if applicable
                if (curr_kobj.identify_keyword(curr_mission)["bool"]):
                    curr_info["missions"][curr_name] = {"mission":curr_name,
                                                    "class":curr_data["class_missions"][curr_mission]["papertype"]}
                #
                #Otherwise, store that this mission was not detected for this text
                #else:
                #    curr_info["missions"][curr_name] = {"mission":curr_name, "class":config.verdict_rejection}                    
            #
        #
        #Store this data entry
        dict_texts[str(curr_ind)] = curr_info
    #
    
    #Print some notes about the testing data
    print("Number of texts in text set: {0}".format(len(dict_texts)))
    """
    print("")
    for key in dict_texts:
        print("Entry {0}:".format(key))
        print("ID: {0}".format(dict_texts[key]["id"]))
        print("Bibcode: {0}".format(dict_texts[key]["bibcode"]))
        print("Missions: {0}".format(dict_texts[key]["missions"]))
        print("Start of text:\n{0}".format(dict_texts[key]["text"][0:500]))
        print("-\n")
    #"""
#
else:
    dict_texts = None

Number of texts in text set: 500


In [10]:
#For use of fake, made-up data entries to test on
if (not do_real_testdata):
    print("Using fake test data for testing.")
    #Make some fake data
    dict_texts_raw = {"science":["We present HST observations in Figure 4.",
                        "The HST stars are listed in Table 3b.",
                        "Despite our efforts to smooth the data, there are still rings in the HST images.",
                        "See Section 8c for more discussion of the Hubble images.",
                        "The supernovae detected with HST tend to be brighter than initially predicted.",
                        "Our spectra from HST fit well to the standard trend first published in Someone et al. 1990.",
                        "We use the Hubble Space Telescope to build an ultraviolet database of the target stars.",
                        "The blue points (HST) exhibit more scatter than the red points (JWST).",
                        "The benefit, then, is the far higher S/N we achieved in our HST observations.",
                        "Here we employ the Hubble Telescope to observe the edge of the photon-dominated region.",
                        "The black line shows that the region targeted with Hubble has an extreme UV signature."],
                 "datainfluenced":["The simulated Hubble data is plotted in Figure 4.",
                       "Compared to the HST observations in Someone et al., our JWST follow-up reached higher S/N.",
                       "We were able to reproduce the luminosities from Hubble using our latest models.",
                       "We overplot Hubble-observed stars from Someone et al. in Figure 3b.",
                       "We built the spectral templates using UV data in the Hubble archive.",
                       "We simulate what our future HST observations will look like to predict the S/N.",
                       "Our work here with JWST is inspired by our earlier HST study published in 2010.",
                       "We therefore use the Hubble statistics from Author et al. to guide our stellar predictions.",
                       "The stars in Figure 3 were plotted based on the HST-fitted trend line in Person et al.",
                       "The final step is to use the HST exposure tool to put our modeled images in context."],
                 "mention":["Person et al. used HST to measure the Hubble constant.",
                        "We will present new HST observations in a future work.",
                        "HST is a fantastic instrument that has revolutionized our view of space.",
                        "The Hubble Space Telescope (HST) has its mission center at the STScI.",
                        "We can use HST to power a variety of science in the ultraviolet regime.",
                        "It is not clear when the star will be observable with HST.",
                        "More data can be found and downloaded from the Hubble archive.",
                        "We note that HST can be used to observe the stars as well, at higher S/N.",
                        "However, we ended up using the JWST rather than HST observations in this work.",
                        "We push the analysis of the Hubble component of the dataset to a future study.",
                        "We expect the HST observations to be released in the fall.",
                        "We look forward to any follow-up studies with, e.g., the Hubble Telescope."]}
    #
    #Convert into dictionary with: key:text,class,id,mission structure
    i_track = 0
    dict_texts = {}
    #Store subheadings by mission, to avoid duplicating and processing the same text across different missions
    mission = operator_1._fetch_keyword_object(lookup="HST")._get_info("name")
    for key in dict_texts_raw:
        curr_set = dict_texts_raw[key]
        for ii in range(0, len(curr_set)):
            dict_texts[str(i_track)] = {"text":curr_set[ii], "id":"{0}_{1}".format(key, ii), "bibcode":str(i_track),
                                        "missions":{mission:{"mission":mission, "class":key}}}
            i_track += 1
    #
    print("Mission: {0}".format(mission))
    print("Number of texts in text set: {0}".format(len(dict_texts)))
    print("")
    for key in dict_texts:
        print(dict_texts[key])
        print("-")
    #
#

Next, let's prepare some additional information for each of these classifiers.  We'll need to set, for example, the uncertainty thresholds for accepting or rejecting each classifier's output.

In [11]:
#Store texts for each operator and its internal classifier
#For operator 1
dict_texts_1 = dict_texts #Dictionary of texts to classify

#For operator 2
dict_texts_2 = dict_texts #Dictionary of texts to classify

#Gather into list
list_dict_texts = [dict_texts_1, dict_texts_2]

Now, let's evaluate the performance of these classifiers in different ways.  We will consider these performance tests:
* Basic: We generate confusion matrices for the set of Operators (containing the different classifiers).
* Uncertainty: We plot performance as a function of uncertainty level for the set of Operators.

In [12]:
#Create an instance of the Performance class
performer = bibcat.Performance()

The Basic evaluation:

In [13]:
#Parameters for this evaluation
filename_root = "performance_confmatr_basic_{0}".format(name_model)
for ii in range(0, len(list_operators)):
    if (list_thresholds[ii] is not None):
        filename_root += "_unc{0}of{1:.2f}".format((ii+1), list_thresholds[ii]).replace(".","p")
fileroot_misclassif = "test_misclassif_basic" #Root name of the file within which to store misclassified text information
figsize = (20, 12)

#Run the pipeline for a basic evaluation of model performance
performer.evaluate_performance_basic(operators=list_operators, dicts_texts=list_dict_texts, mappers=list_mappers,
                                     thresholds=list_thresholds, buffers=list_buffers, is_text_processed=False,
                                     do_reuse_run=do_reuse_run, filename_root=filename_root,
                                     do_verify_truematch=do_verify_truematch, do_raise_innererror=do_raise_innererror,
                                     do_save_evaluation=True, do_save_misclassif=True, filepath_output=filepath_output,
                                     fileroot_evaluation=fileroot_evaluation, fileroot_misclassif=fileroot_misclassif,
                                     print_freq=1, do_verbose=True, do_verbose_deep=False, figsize=figsize,
                                     target_classifs=target_classifs_basic,
                                     minmax_exclude_classifs=minmax_exclude_classifs)


> Running evaluate_performance_basic()!
Generating classifications for the given operators...

> Running _generate_classifications()!
Iterating through Operators to classify each set of text...
Classifying with Operator #0...

> Running classify_set()!
Classification for text #1 of 500 complete...
Classification for text #2 of 500 complete...
Classification for text #3 of 500 complete...
Classification for text #4 of 500 complete...
Classification for text #5 of 500 complete...
Classification for text #6 of 500 complete...
Classification for text #7 of 500 complete...
Classification for text #8 of 500 complete...
Classification for text #9 of 500 complete...
Classification for text #10 of 500 complete...
Classification for text #11 of 500 complete...
Classification for text #12 of 500 complete...
Classification for text #13 of 500 complete...
Classification for text #14 of 500 complete...
-
The following err. was encountered in operate:
NotImplementedError('Err: Unrecognized ambig. ph

Classification for text #62 of 500 complete...
Classification for text #63 of 500 complete...
Classification for text #64 of 500 complete...
Classification for text #65 of 500 complete...
Classification for text #66 of 500 complete...
Classification for text #67 of 500 complete...
-
The following err. was encountered in operate:
NotImplementedError('Err: Unrecognized ambig. phrase:\nKepler detector\nTaken from this text snippet:\nThe ability of the Kepler detector to resolve multi-star systems is limited due to it having a relatively large pixel size (approximately 000′′ on sky. 000 000 Characteristics of the Kepler space telescope: websitewebsite.)')
Error was noted. Returning error as verdict.
-
Classification for text #68 of 500 complete...
Classification for text #69 of 500 complete...
Classification for text #70 of 500 complete...
Classification for text #71 of 500 complete...
Classification for text #72 of 500 complete...
Classification for text #73 of 500 complete...
-
The follo

Classification for text #134 of 500 complete...
Classification for text #135 of 500 complete...
Classification for text #136 of 500 complete...
Classification for text #137 of 500 complete...
Classification for text #138 of 500 complete...
Classification for text #139 of 500 complete...
Classification for text #140 of 500 complete...
Classification for text #141 of 500 complete...
Classification for text #142 of 500 complete...
Classification for text #143 of 500 complete...
Classification for text #144 of 500 complete...
Classification for text #145 of 500 complete...
Classification for text #146 of 500 complete...
Classification for text #147 of 500 complete...
Classification for text #148 of 500 complete...
Classification for text #149 of 500 complete...
Classification for text #150 of 500 complete...
Classification for text #151 of 500 complete...
Classification for text #152 of 500 complete...
Classification for text #153 of 500 complete...
Classification for text #154 of 500 comp

Classification for text #215 of 500 complete...
Classification for text #216 of 500 complete...
-
The following err. was encountered in operate:
NotImplementedError('Err: Unrecognized ambig. phrase:\nHubble flow\nTaken from this text snippet:\n000 λsrc The source is assumed to be at rest with respect to the Hubble flow—the coordinate frame moving away from the observer with the recession speed c da(t) v= a0 dt Zz dz ′, H(z ′) 000 000 which is ≈ H0 D for small z, where D is distance (in Mpc) in the expanding Universe; a(t)/a0 is the normalised scale factor; and H(z ′) = 000 + ΩM [000 + z ′)000 − 000])−000/000.')
Error was noted. Returning error as verdict.
-
Classification for text #217 of 500 complete...
Classification for text #218 of 500 complete...
Classification for text #219 of 500 complete...
Classification for text #220 of 500 complete...
Classification for text #221 of 500 complete...
Classification for text #222 of 500 complete...
-
The following err. was encountered in operat

Classification for text #285 of 500 complete...
Classification for text #286 of 500 complete...
Classification for text #287 of 500 complete...
Classification for text #288 of 500 complete...
Classification for text #289 of 500 complete...
Classification for text #290 of 500 complete...
Classification for text #291 of 500 complete...
Classification for text #292 of 500 complete...
-
The following err. was encountered in operate:
NotImplementedError('Err: Unrecognized ambig. phrase:\nK2 eclipse\nTaken from this text snippet:\nTo check for possible period changes, we fitted for a set of eclipse times over 000 yr segments of the ASAS and ASAS-SN light curves, using the K2 eclipse as a template.')
Error was noted. Returning error as verdict.
-
Classification for text #293 of 500 complete...
Classification for text #294 of 500 complete...
Classification for text #295 of 500 complete...
-
The following err. was encountered in operate:
NotImplementedError('Err: Unrecognized ambig. phrase:\nHu

Classification for text #356 of 500 complete...
Classification for text #357 of 500 complete...
Classification for text #358 of 500 complete...
Classification for text #359 of 500 complete...
Classification for text #360 of 500 complete...
Classification for text #361 of 500 complete...
Classification for text #362 of 500 complete...
Classification for text #363 of 500 complete...
Classification for text #364 of 500 complete...
Classification for text #365 of 500 complete...
Classification for text #366 of 500 complete...
Classification for text #367 of 500 complete...
Classification for text #368 of 500 complete...
Classification for text #369 of 500 complete...
Classification for text #370 of 500 complete...
Classification for text #371 of 500 complete...
Classification for text #372 of 500 complete...
Classification for text #373 of 500 complete...
Classification for text #374 of 500 complete...
Classification for text #375 of 500 complete...
Classification for text #376 of 500 comp

-
The following err. was encountered in operate:
NotImplementedError('Err: Unrecognized ambig. phrase:\nKepler USPs\nTaken from this text snippet:\nThe histograms are the metallicity distribution of all Kepler USPs (blue) and hot Jupiters (orange).')
Error was noted. Returning error as verdict.
-
Classification for text #452 of 500 complete...
Classification for text #453 of 500 complete...
Classification for text #454 of 500 complete...
-
The following err. was encountered in operate:
NotImplementedError('Err: Unrecognized ambig. phrase:\nplanet Kepler-7b\nTaken from this text snippet:\nInterestingly, for the planet Kepler-7b, similar to HAT-P-000 b in terms of surface gravity and equilibrium temperature, an albedo A g ~ 000.000 was derived in the Kepler band and interpreted as reflection on a silicate cloud deck.')
Error was noted. Returning error as verdict.
-
Classification for text #455 of 500 complete...
Classification for text #456 of 500 complete...
Classification for text #457


> Running _combine_performance_across_evaluations()!
Combining evaluations across these operators: ['Operator_RB']
All possible operator combinations: []
Run of _combine_performance_across_evaluations() complete!
Confusion matrices have been plotted at:
/Users/jamila.pegues/Documents/STScI_Fellowship/Functional/Library/BibTracking/repo_stsci/bibcat/src/models/test_run_rule/output

Run of evaluate_performance_basic() complete!


The Uncertainty evaluation:

In [14]:
#Parameters for this evaluation
filename_root = "performance_grid_uncertainty_{0}".format(name_model)
figsize = (40, 12)
colors = ["tomato", "dodgerblue", "silver", "purple", "dimgray", "darkgoldenrod", "darkgreen", "green", "cyan"]
linestyles = ["-", "-", "--", "-", "--", "--", ":", ":", ":"]

#Run the pipeline for an evaluation of model performance as a function of uncertainty
performer.evaluate_performance_uncertainty(operators=list_operators, dicts_texts=list_dict_texts, mappers=list_mappers,
                                     threshold_arrays=list_threshold_arrays, buffers=list_buffers,
                                     is_text_processed=False, do_reuse_run=do_reuse_run,
                                     do_verify_truematch=do_verify_truematch, do_raise_innererror=do_raise_innererror,
                                     do_save_evaluation=True, filepath_output=filepath_output,
                                     fileroot_evaluation=fileroot_evaluation,
                                     filename_root=filename_root,
                                     print_freq=25, do_verbose=True, do_verbose_deep=False, figsize=figsize,
                                     target_classifs=target_classifs_uncertainty,
                                     colors=colors, linestyles=linestyles)


> Running evaluate_performance_uncertainty()!
Generating classifications for operators...

> Running _generate_classifications()!
Iterating through Operators to classify each set of text...
Previous evaluation exists at /Users/jamila.pegues/Documents/STScI_Fellowship/Functional/Library/BibTracking/repo_stsci/bibcat/src/models/test_run_rule/output/test_eval.npy
Loading that eval...

Classifications generated.
Evaluating classifications...
Threshold #1 of 13:

> Running _generate_performance_counter() for: Operator_RB
Accumulating performance over 500 texts.
Actual class names: ['datainfluenced', 'mention', 'science', 'other', 'znotmatch']
Measured class names: ['datainfluenced', 'mention', 'science', 'other', 'zerror', 'zlowprob', 'znotmatch']

-
Performance counter generated:
Actual datainfluenced total: 8
Actual datainfluenced vs Measured datainfluenced: 1
Actual datainfluenced vs Measured mention: 1
Actual datainfluenced vs Measured science: 3
Actual datainfluenced vs Measured other


-
Performance counter generated:
Actual datainfluenced total: 8
Actual datainfluenced vs Measured datainfluenced: 0
Actual datainfluenced vs Measured mention: 0
Actual datainfluenced vs Measured science: 1
Actual datainfluenced vs Measured other: 0
Actual datainfluenced vs Measured zerror: 3
Actual datainfluenced vs Measured zlowprob: 4
Actual datainfluenced vs Measured znotmatch: 0
Actual datainfluenced vs Measured _total: 8
Actual mention total: 457
Actual mention vs Measured datainfluenced: 0
Actual mention vs Measured mention: 68
Actual mention vs Measured science: 5
Actual mention vs Measured other: 0
Actual mention vs Measured zerror: 44
Actual mention vs Measured zlowprob: 331
Actual mention vs Measured znotmatch: 9
Actual mention vs Measured _total: 457
Actual science total: 237
Actual science vs Measured datainfluenced: 0
Actual science vs Measured mention: 0
Actual science vs Measured science: 8
Actual science vs Measured other: 0
Actual science vs Measured zerror: 27
Actual

And with that, you should have new confusion matrices summarizing the basic performance for these classifiers saved in your requested directory!

---

In [15]:
#Set end marker for this tutorial.
print("This tutorial completed successfully.")

This tutorial completed successfully.
