Skip to content

Commit

Permalink
Update end-to-end evaluation
Browse files Browse the repository at this point in the history
  • Loading branch information
kermitt2 committed Sep 22, 2018
1 parent df5f92b commit 688ff81
Showing 1 changed file with 327 additions and 0 deletions.
327 changes: 327 additions & 0 deletions grobid-trainer/doc/PMC_sample_1943.results.grobid-0.5.2-22.09.2018
Original file line number Diff line number Diff line change
@@ -0,0 +1,327 @@
GROBID failed on 0 PDF
1943 PDF files processed in 3316.847 seconds, 1.7070751415337109 seconds per PDF file.
warning: no evaluation (gold) XML data file found under /home/lopez/biblio/PMC_sample_1943/__README__
warning: no evaluation (gold) XML data file found under /home/lopez/biblio/PMC_sample_1943/__README__
warning: no evaluation (gold) XML data file found under /home/lopez/biblio/PMC_sample_1943/__README__
Evaluation metrics produced in 688.203 seconds

======= Header metadata =======

Evaluation on 1943 random PDF files out of 1942 PDF (ratio 1.0).

======= Strict Matching ======= (exact matches)

===== Field-level results =====

label accuracy precision recall f1

abstract 81.69 13.99 12.87 13.41
authors 97.08 86.69 86.24 86.47
first_author 99.03 96.16 95.41 95.78
keywords 92.78 65.74 52.97 58.67
title 95.32 78.92 78.02 78.47

all fields 93.18 69.59 66.03 67.76 (micro average)
93.18 68.3 65.11 66.56 (macro average)


======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches)

===== Field-level results =====

label accuracy precision recall f1

abstract 88.23 47.92 44.11 45.94
authors 97.15 87.05 86.6 86.83
first_author 99.05 96.26 95.52 95.89
keywords 94.01 75.81 61.09 67.66
title 96.99 86.83 85.85 86.34

all fields 95.09 79.65 75.57 77.56 (micro average)
95.09 78.78 74.63 76.53 (macro average)


==== Levenshtein Matching ===== (Minimum Levenshtein distance at 0.8)

===== Field-level results =====

label accuracy precision recall f1

abstract 94.61 81.07 74.62 77.71
authors 98.59 93.84 93.35 93.6
first_author 99.09 96.47 95.72 96.1
keywords 95.58 88.67 71.45 79.13
title 97.77 90.53 89.5 90.01

all fields 97.13 90.43 85.79 88.05 (micro average)
97.13 90.11 84.93 87.31 (macro average)


= Ratcliff/Obershelp Matching = (Minimum Ratcliff/Obershelp similarity at 0.95)

===== Field-level results =====

label accuracy precision recall f1

abstract 93.61 75.84 69.81 72.7
authors 97.76 89.9 89.44 89.67
first_author 99.03 96.16 95.41 95.78
keywords 95.04 84.26 67.9 75.2
title 97.55 89.48 88.47 88.98

all fields 96.59 87.62 83.13 85.31 (micro average)
96.59 87.13 82.21 84.47 (macro average)

===== Instance-level results =====

Total expected instances: 1943
Total correct instances: 168 (strict)
Total correct instances: 578 (soft)
Total correct instances: 1073 (Levenshtein)
Total correct instances: 959 (ObservedRatcliffObershelp)

Instance-level recall: 8.65 (strict)
Instance-level recall: 29.75 (soft)
Instance-level recall: 55.22 (Levenshtein)
Instance-level recall: 49.36 (RatcliffObershelp)

======= Citation metadata =======

Evaluation on 1943 random PDF files out of 1942 PDF (ratio 1.0).

======= Strict Matching ======= (exact matches)

===== Field-level results =====

label accuracy precision recall f1

authors 97.45 82.57 71.47 76.62
date 98.92 92.8 79.23 85.48
first_author 98.5 90.19 77.97 83.64
inTitle 96.01 72.13 68.17 70.09
issue 99.56 89.08 81.02 84.86
page 98.61 93.79 80.82 86.82
title 96.95 78.38 70.59 74.28
volume 99.2 94.84 84.94 89.62

all fields 98.15 86.31 76.3 81 (micro average)
98.15 86.72 76.78 81.43 (macro average)


======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches)

===== Field-level results =====

label accuracy precision recall f1

authors 97.53 83.11 71.95 77.13
date 98.92 92.8 79.23 85.48
first_author 98.52 90.32 78.08 83.76
inTitle 97.55 82.9 78.35 80.56
issue 99.56 89.08 81.02 84.86
page 98.61 93.79 80.82 86.82
title 98.48 89.67 80.76 84.98
volume 99.2 94.84 84.94 89.62

all fields 98.55 89.52 79.14 84.01 (micro average)
98.55 89.56 79.39 84.15 (macro average)


==== Levenshtein Matching ===== (Minimum Levenshtein distance at 0.8)

===== Field-level results =====

label accuracy precision recall f1

authors 98.27 88.45 76.57 82.09
date 98.92 92.8 79.23 85.48
first_author 98.54 90.46 78.21 83.89
inTitle 97.69 83.83 79.24 81.47
issue 99.56 89.08 81.02 84.86
page 98.61 93.79 80.82 86.82
title 98.9 92.75 83.54 87.91
volume 99.2 94.84 84.94 89.62

all fields 98.71 90.86 80.31 85.26 (micro average)
98.71 90.75 80.44 85.27 (macro average)


= Ratcliff/Obershelp Matching = (Minimum Ratcliff/Obershelp similarity at 0.95)

===== Field-level results =====

label accuracy precision recall f1

authors 97.82 85.21 73.76 79.07
date 98.92 92.8 79.23 85.48
first_author 98.51 90.21 77.98 83.65
inTitle 97.35 81.49 77.02 79.19
issue 99.56 89.08 81.02 84.86
page 98.61 93.79 80.82 86.82
title 98.76 91.76 82.65 86.96
volume 99.2 94.84 84.94 89.62

all fields 98.59 89.89 79.46 84.35 (micro average)
98.59 89.9 79.68 84.46 (macro average)

===== Instance-level results =====

Total expected instances: 90125
Total extracted instances: 86949
Total correct instances: 36689 (strict)
Total correct instances: 47600 (soft)
Total correct instances: 51892 (Levenshtein)
Total correct instances: 48714 (RatcliffObershelp)

Instance-level precision: 42.2 (strict)
Instance-level precision: 54.74 (soft)
Instance-level precision: 59.68 (Levenshtein)
Instance-level precision: 56.03 (RatcliffObershelp)

Instance-level recall: 40.71 (strict)
Instance-level recall: 52.82 (soft)
Instance-level recall: 57.58 (Levenshtein)
Instance-level recall: 54.05 (RatcliffObershelp)

Instance-level f-score: 41.44 (strict)
Instance-level f-score: 53.76 (soft)
Instance-level f-score: 58.61 (Levenshtein)
Instance-level f-score: 55.02 (RatcliffObershelp)

Matching 1 : 63720

Matching 2 : 3896

Matching 3 : 2717

Matching 4 : 666

Total matches : 70999

======= Fulltext structures =======

Evaluation on 1943 random PDF files out of 1942 PDF (ratio 1.0).

======= Strict Matching ======= (exact matches)

===== Field-level results =====

label accuracy precision recall f1

figure_title 96.67 30.07 23.21 26.2
reference_citation 57 55.95 52.45 54.14
reference_figure 94.56 60.98 60.96 60.97
reference_table 99.09 83.06 82.28 82.67
section_title 94.4 74.48 66.52 70.28
table_title 97.42 7.34 7.64 7.48

all fields 89.86 58.19 54.44 56.25 (micro average)
89.86 51.98 48.84 50.29 (macro average)


======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches)

===== Field-level results =====

label accuracy precision recall f1

figure_title 98.27 72.35 55.84 63.03
reference_citation 59.49 60.18 56.42 58.24
reference_figure 94.51 61.97 61.96 61.96
reference_table 99.08 83.57 82.79 83.18
section_title 95.02 78.78 70.36 74.33
table_title 97.55 14.6 15.2 14.89

all fields 90.65 63.13 59.06 61.03 (micro average)
90.65 61.91 57.09 59.27 (macro average)


************************************************************************************
COUNTER: org.grobid.core.engines.counters.ReferenceMarkerMatcherCounters
************************************************************************************
------------------------------------------------------------------------------------
UNMATCHED_REF_MARKERS: 11149
MATCHED_REF_MARKERS_AFTER_POST_FILTERING: 2176
STYLE_AUTHORS: 35414
STYLE_NUMBERED: 48321
MANY_CANDIDATES: 3600
MANY_CANDIDATES_AFTER_POST_FILTERING: 383
NO_CANDIDATES: 20471
INPUT_REF_STRINGS_CNT: 88311
MATCHED_REF_MARKERS: 106694
NO_CANDIDATES_AFTER_POST_FILTERING: 967
STYLE_OTHER: 4576
====================================================================================

************************************************************************************
COUNTER: org.grobid.core.engines.counters.TableRejectionCounters
************************************************************************************
------------------------------------------------------------------------------------
CANNOT_PARSE_LABEL_TO_INT: 238
CONTENT_SIZE_TOO_SMALL: 132
CONTENT_WIDTH_TOO_SMALL: 14
FEW_TOKENS_IN_CONTENT: 1
EMPTY_LABEL_OR_HEADER_OR_CONTENT: 2131
HEADER_NOT_STARTS_WITH_TABLE_WORD: 301
HEADER_NOT_CONSECUTIVE: 198
HEADER_AND_CONTENT_DIFFERENT_PAGES: 5
HEADER_AND_CONTENT_INTERSECT: 675
====================================================================================

************************************************************************************
COUNTER: org.grobid.core.engines.label.TaggingLabelImpl
************************************************************************************
------------------------------------------------------------------------------------
CITATION_TITLE: 82937
NAME-HEADER_MIDDLENAME: 4313
TABLE_FIGDESC: 321
NAME-HEADER_SURNAME: 11157
NAME-CITATION_OTHER: 412428
CITATION_BOOKTITLE: 3935
CITATION_NOTE: 11438
FULLTEXT_CITATION_MARKER: 176254
FULLTEXT_TABLE_MARKER: 14605
CITATION_WEB: 1379
TABLE_LABEL: 3673
FULLTEXT_SECTION: 51259
NAME-HEADER_FORENAME: 11344
TABLE_CONTENT: 5175
CITATION_COLLABORATION: 150
CITATION_ISSUE: 17257
CITATION_JOURNAL: 77253
NAME-CITATION_SURNAME: 314040
TABLE_FIGURE_HEAD: 7420
FULLTEXT_EQUATION_MARKER: 1721
CITATION_OTHER: 429858
FULLTEXT_FIGURE_MARKER: 38906
CITATION_TECH: 250
FIGURE_CONTENT: 2585
FIGURE_LABEL: 5364
FULLTEXT_EQUATION_LABEL: 1819
FULLTEXT_EQUATION: 3923
CITATION_DATE: 85348
FULLTEXT_FIGURE: 14735
CITATION_AUTHOR: 85131
FULLTEXT_TABLE: 11213
CITATION_EDITOR: 2484
FULLTEXT_OTHER: 355
NAME-HEADER_OTHER: 12784
FIGURE_FIGDESC: 6109
NAME-HEADER_SUFFIX: 11
CITATION_VOLUME: 75367
CITATION_LOCATION: 7028
NAME-CITATION_SUFFIX: 557
NAME-HEADER_TITLE: 496
CITATION_INSTITUTION: 914
CITATION_PAGES: 78627
NAME-HEADER_MARKER: 7428
NAME-CITATION_FORENAME: 306140
CITATION_PUBLISHER: 4576
NAME-CITATION_MIDDLENAME: 60118
CITATION_PUBNUM: 3016
FULLTEXT_PARAGRAPH: 370985
FIGURE_FIGURE_HEAD: 9220
====================================================================================
====================================================================================


0 comments on commit 688ff81

Please sign in to comment.