As the final part of poetic
's main workflow, the post-processing of prediction results
consists of diagnostics, summary, and file output. The package's Diagnostics
class
provides all these functionalities with a few simple methods, which are documented below.
Both predict()
and predict_file()
methods of the Predictor
class returns an
instance of the Predictions
class:
import poetic
pred = poetic.Predictor()
score = pred.predict("Is this poetic?")
In the example above, the score
object will be a Predictions
object, which can
then call methods to run diagnostics and save results.
The Predictions
class inherits from the Diagnostics
class, and all methods are also
inherited with the only difference in the constructor. The advantage of using an inherited
class instead of using the Diagnostics
class directly is that the preprocessing of keras
predictions can occur separately. Thus, the Predictions
class serves as an internal
interface to distinguish from manually instantiated instances of the Diagnostics
class.
To use the toolchain and methods separately, use the Diagnostics
class instead. All
methods of the Predictons
will be documented with the Diagnostics
class unless
they are overridden.
As the base class for Predictions
, the Diagnostics
class provides a more genralized
framework for working with any prediction results. In the future, more abstractions may be
added to allow for more versatility to use independently.
A typical workflow will involve making predictions, running diagnostics, and saving the results to a file:
import poetic
pred = poetic.Predictor()
score = pred.predict("Is this poetic?")
score.run_diagnostics()
print(score.generate_report())
score.to_file(path="<PATH>")
To use the Diagnostics
class, only the predictions
argument is required as a list
of floats:
import poetic
results = poetic.Diagnostics(predictions = [1, 0, 0.5])
# OR: with sentences
sentences = ["Hi.", "I am poetic", "How about you?"]
results_sentences = poetic.Diagnostics(predictions = [1, 0, 0.5], sentences=sentences)
The sentences
argument is optional. If used, it will store the corresponding sentences of
the predictions as a class attribute; otherwise, it will be None
, and all other methods
are largely unaffected, except the contents of the outputs.
As of now, the Diagnostics
class supports five-number summary for predictions. As part of
the workflow, it is automatically called by the run_diagnostics()
method, and the results
are stored in the diagnostics
attribute of the object. As an example:
import poetic
results = poetic.Diagnostics(predictions = [1, 0, 0.5])
results.run_diagnostics()
# Get the diagnostic results
print(results.diagnostics)
The diagnostics
attribute is a dictionary with three keywords: "Sentence_count",
"Five_num", and "Predictions". The corresponding values are the following:
- "Sentence_count": An
int
of the length of entries.- "Five_num": Five number summary stored with a dictionary.
- "Predictions": A
list
of floats from thepredictions
attribute.
To obtain the five number summary separately using the classmethod five_number()
,
which is essentially a utility function that can be use for any array-like objects
compatible with numpy
:
import poetic
results = poetic.Diagnostics(predictions = [1, 0, 0.5])
poetic.Diagnostic.five_number(results.predictions)
# As a stand-alone method:
poetic.Diagnostic.five_number([1, 0, 0.5])
A diagnostic report is a string (or plain text) summary of the object with diagnostic
statistics. To obtain the diagnostic report, the run_diagnostics()
method has to be
called previously on the object. Otherwise, a type error will be raised because the
"diagnostics" attribute will be None
.
An example usgae of the method is this:
import poetic
results = poetic.Diagnostics(predictions = [1, 0, 0.5])
results.run_diagnostics()
print(results.generate_diagnostics())
The contents of the report will be identical to the text file output as documented below.
The results and diagnostics can be saved to either .txt
or .csv
file. The former
writes the diagnostics report to a plain text while the latter saves the actual values
separated by comma. The usage is essentially identical to using the -o
option on the
command line.
To save results to a text file:
import poetic
results = poetic.Diagnostics(predictions = [2/3, 7/11])
results.run_diagnostics()
results.to_file("<PATH>")
The output format is the following:
Poetic
Version: 1.0.2
For latest updates: www.github.com/kevin931/Poetic
Diagnostics Report
Model: Lexical Model
Number of Sentences: 2
~~~Five Number Summary~~~
Minimum: 0.6363636363636364
Mean: 0.6515151515151515
Median: 0.6515151515151515
Maximum: 0.6666666666666666
Standard Deviation: 0.015151515151515138
~~~All Scores~~~
Sentence #1: 0.6666666666666666
Sentence #2: 0.6363636363636364
The to_file()
does not enforce file extension for text files, except for .csv
ending. In the latter case, it will automatically call the to_csv()
method. The
text file output is more for a quick summary than a way to store data, and the format can
potentially change with updates. If an object needs to be restored or data will be further
processed, use the csv format instead.
When a file path ending in .csv
is encountered or the to_csv()
method of the
Diagnostics
class is explicitly called, the results will be formatted with three
columns separated with Sentence_num
, Sentence
, and Score
as keywords in
the first row. Each sentence and its prediction is in a new row, which follows the
Tidy Data format for optimal compatibility. The Sentence_num
column can be treated
as the index.
To save to a csv file as an example:
import poetic
results = poetic.Diagnostics(predictions = [2/3, 7/11])
results.run_diagnostics()
results.to_csv("<PATH>")
Or, let to_file()
handle it automatically:
import poetic
results = poetic.Diagnostics(predictions = [2/3, 7/11])
results.run_diagnostics()
results.to_file("<PATH>.csv")
The raw csv file looks like the following:
Sentence_num,Sentence,Score
1,Hi.,0.6666666666666666
2,This is poetic.,0.6363636363636364
If formated to a table (like if opened in excel or the like), this will be the result:
Sentence_num | Sentence | Score |
---|---|---|
1 | Hi. | 0.6666666666666666 |
2 | This is poetic. | 0.6363636363636364 |
The str()
method will return a short summary of the object. It will truncate the output
to 14 characters after the description:
'Diagnostics object for the following predictions: [0.66666666666...'
The repr()
method will return a dictionary cast into a string with all the predictions,
sentences, and the diagnostics
attributes. It does not truncate any results. This will
be more appropraite for a full representation of the object. It will have the following format:
"{'Predictions': [0.6666666666666666, 0.6363636363636364], 'Sentences': None, 'Diagnostics': None}"
The len()
method returns the length of the predictions
attribute of the object,
which is the number of entries in the predictions list. Since the length of predictions
and sentences
are intended to match, the returned length logically represents the
length of the object.
The Diagnostics
class currently supports the following four operators: >
, >=
, <
,
and <=
. They compare the mean values of the predictions
attribute of the compared
onjects.
When the distribution of the predictions are not normally distributed, such as skewed, the mean values may not be meaningful. In these cases, manual comparions are necessary.
Given that the predictions attribute is a list of float
, the ==
and !=
operators
are currently not implemented.
Both +
and +=
operators are supported to concatenate two Diagnostics
objects. The
implementation of concatenation is to concatenate each attribute when applicable: the
Diagnostics.predictions
attributes are concatenated directly since they are mandetory for
initialization; the Diagnostics.sentences
and Diagnostics.diagnostics
methods have
different behaviors depending whether they are None
for each object. Mathetical addition is
undefined and nonsensical for either predictions or diagnostics. Therefore, both operators will
concatenate objects, which will be useful for making multiple predictions and subsequently
analyzing them together.
When Diagnostics.sentences
is None
for both objects, the resulting will also be None
.
When both objects have sentences as their attributes, the sentences will simply be concatenated
as lists. When one object has a list of sentences and the other object's sentences
attribute
is None
, the resulting object will have a list as its Diagnostics.sentences
with either
the sentences themselves or None
corresponding to each entry of Diagnostics.predictions
.
The behavior for Diagnostics.diagnostics
also depends on each object. When both are None
,
the run_diagnostics()
method will not be called on the resulting object. Otherwise, the
method will automatically call run_diagnostics()
to update the results. The latter strategy
will help avoid a situation in which the diagnostics and predictions are mismatched, leading
to unintentionally wrong diagnostics.
The +
operator returns a new onject, which means that it is copy-safe for existing objects.
The +=
operator modifies the left-hand-side object as intended.