Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparation release 0.8.1 #1123

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft

Preparation release 0.8.1 #1123

wants to merge 9 commits into from

Conversation

lfoppiano
Copy link
Collaborator

This PR contains the updates for the release 0.8.1

@coveralls
Copy link

Coverage Status

coverage: 40.787%. remained the same
when pulling b6a2a20 on release-0.8.1
into 694f0ed on master.

@lfoppiano lfoppiano added this to the 0.8.1 milestone Jun 10, 2024
@coveralls
Copy link

Coverage Status

coverage: 40.799% (+0.01%) from 40.787%
when pulling 4675511 on release-0.8.1
into 694f0ed on master.

@coveralls
Copy link

Coverage Status

coverage: 40.787%. remained the same
when pulling f1d703c on release-0.8.1
into 694f0ed on master.

@coveralls
Copy link

Coverage Status

coverage: 40.787%. remained the same
when pulling c408076 on release-0.8.1
into 694f0ed on master.

@lfoppiano
Copy link
Collaborator Author

lfoppiano commented Jun 22, 2024

I've ran the evaluation with a partial glutton (around 80-90M records from

Since I don't have a GPU machine I can log in, I

  1. first ran the extraction using the client + an instance on GPU + partial glutton.
  2. I renamed the files .grobid.tei.xml to .fulltext.tei.xml and then
  3. I ran the evaluation with no regeneration of the grobid extraction.

Since I did not use the standard method, this should be taken with a pinch of salt.

TLDR: Header metadata and citation context performances have decreased, the rest as increased.

======= Header metadata ======= 

Evaluation on 1943 random PDF files out of 1941 PDF (ratio 1.0).

======= Strict Matching ======= (exact matches)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

abstract             82.45        16.78        16.48        16.63        1911   
authors              95.68        79.94        79.65        79.79        1941   
first_author         98.93        95.29        94.95        95.12        1941   
keywords             94.22        64.99        63.62        64.3         1380   
title                95.65        80.39        79.52        79.95        1943   

all (micro avg.)     93.39        67.94        67.21        67.57        9116   
all (macro avg.)     93.39        67.48        66.84        67.16        9116   


======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

abstract             92.1         63.83        62.69        63.25        1911   
authors              95.97        81.28        80.99        81.14        1941   
first_author         99.01        95.66        95.31        95.48        1941   
keywords             95.5         73.65        72.1         72.87        1380   
title                97.43        88.87        87.91        88.38        1943   

all (micro avg.)     96           81.2         80.33        80.77        9116   
all (macro avg.)     96           80.66        79.8         80.22        9116   


==== Levenshtein Matching ===== (Minimum Levenshtein distance at 0.8)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

abstract             97.68        91.05        89.43        90.23        1911   
authors              97.18        87.02        86.71        86.86        1941   
first_author         99.1         96.12        95.78        95.95        1941   
keywords             97.05        84.16        82.39        83.27        1380   
title                98.55        94.17        93.15        93.66        1943   

all (micro avg.)     97.91        90.91        89.93        90.42        9116   
all (macro avg.)     97.91        90.51        89.49        89.99        9116   


= Ratcliff/Obershelp Matching = (Minimum Ratcliff/Obershelp similarity at 0.95)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

abstract             96.88        87.11        85.56        86.33        1911   
authors              96.36        83.14        82.84        82.99        1941   
first_author         98.93        95.29        94.95        95.12        1941   
keywords             96.35        79.42        77.75        78.58        1380   
title                98.15        92.3         91.3         91.8         1943   

all (micro avg.)     97.33        87.97        87.02        87.49        9116   
all (macro avg.)     97.33        87.45        86.48        86.96        9116   

===== Instance-level results =====

Total expected instances:       1943
Total correct instances:        195 (strict) 
Total correct instances:        786 (soft) 
Total correct instances:        1274 (Levenshtein) 
Total correct instances:        1121 (ObservedRatcliffObershelp) 

Instance-level recall:  10.04   (strict) 
Instance-level recall:  40.45   (soft) 
Instance-level recall:  65.57   (Levenshtein) 
Instance-level recall:  57.69   (RatcliffObershelp) 

======= Citation metadata ======= 

Evaluation on 1943 random PDF files out of 1941 PDF (ratio 1.0).

======= Strict Matching ======= (exact matches)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

authors              97.58        83.04        76.32        79.54        85778  
date                 99.23        94.61        84.26        89.13        87067  
first_author         98.53        89.78        82.5         85.99        85778  
inTitle              96.19        73.23        71.88        72.55        81007  
issue                99.68        91.11        87.76        89.41        16635  
page                 98.61        94.57        83.7         88.81        80501  
title                97.21        79.67        75.31        77.43        80736  
volume               99.44        96.02        89.83        92.82        80067  

all (micro avg.)     98.31        87.22        80.75        83.86        597569 
all (macro avg.)     98.31        87.76        81.44        84.46        597569 


======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

authors              97.65        83.51        76.76        79.99        85778  
date                 99.23        94.61        84.26        89.13        87067  
first_author         98.55        89.95        82.66        86.15        85778  
inTitle              97.85        84.92        83.35        84.13        81007  
issue                99.68        91.11        87.76        89.41        16635  
page                 98.61        94.57        83.7         88.81        80501  
title                98.82        91.44        86.43        88.87        80736  
volume               99.44        96.02        89.83        92.82        80067  

all (micro avg.)     98.73        90.62        83.89        87.13        597569 
all (macro avg.)     98.73        90.77        84.34        87.41        597569 


==== Levenshtein Matching ===== (Minimum Levenshtein distance at 0.8)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

authors              98.45        89.22        82           85.46        85778  
date                 99.23        94.61        84.26        89.13        87067  
first_author         98.58        90.16        82.85        86.35        85778  
inTitle              98.03        86.18        84.59        85.37        81007  
issue                99.68        91.11        87.76        89.41        16635  
page                 98.61        94.57        83.7         88.81        80501  
title                99.14        93.81        88.66        91.16        80736  
volume               99.44        96.02        89.83        92.82        80067  

all (micro avg.)     98.9         91.97        85.14        88.42        597569 
all (macro avg.)     98.9         91.96        85.46        88.56        597569 


= Ratcliff/Obershelp Matching = (Minimum Ratcliff/Obershelp similarity at 0.95)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

authors              98           85.98        79.03        82.36        85778  
date                 99.23        94.61        84.26        89.13        87067  
first_author         98.53        89.8         82.52        86           85778  
inTitle              97.65        83.5         81.95        82.72        81007  
issue                99.68        91.11        87.76        89.41        16635  
page                 98.61        94.57        83.7         88.81        80501  
title                99.08        93.4         88.28        90.77        80736  
volume               99.44        96.02        89.83        92.82        80067  

all (micro avg.)     98.78        91.02        84.26        87.51        597569 
all (macro avg.)     98.78        91.13        84.67        87.75        597569 

===== Instance-level results =====

Total expected instances:               90125
Total extracted instances:              85898
Total correct instances:                38759 (strict) 
Total correct instances:                50899 (soft) 
Total correct instances:                55786 (Levenshtein) 
Total correct instances:                52324 (RatcliffObershelp) 

Instance-level precision:       45.12 (strict) 
Instance-level precision:       59.26 (soft) 
Instance-level precision:       64.94 (Levenshtein) 
Instance-level precision:       60.91 (RatcliffObershelp) 

Instance-level recall:  43.01   (strict) 
Instance-level recall:  56.48   (soft) 
Instance-level recall:  61.9    (Levenshtein) 
Instance-level recall:  58.06   (RatcliffObershelp) 

Instance-level f-score: 44.04 (strict) 
Instance-level f-score: 57.83 (soft) 
Instance-level f-score: 63.38 (Levenshtein) 
Instance-level f-score: 59.45 (RatcliffObershelp) 

Matching 1 :    68335

Matching 2 :    4155

Matching 3 :    1859

Matching 4 :    662

Total matches : 75011

======= Citation context resolution ======= 

Total expected references:       90125 - 46.38 references per article
Total predicted references:      85898 - 44.21 references per article

Total expected citation contexts:        139835 - 71.97 citation contexts per article
Total predicted citation contexts:       115386 - 59.39 citation contexts per article

Total correct predicted citation contexts:       97290 - 50.07 citation contexts per article
Total wrong predicted citation contexts:         18096 (wrong callout matching, callout missing in NLM, or matching with a bib. ref. not aligned with a bib.ref. in NLM)

Precision citation contexts:     84.32
Recall citation contexts:        69.57
fscore citation contexts:        76.24

======= Fulltext structures ======= 

Evaluation on 1943 random PDF files out of 1941 PDF (ratio 1.0).

======= Strict Matching ======= (exact matches)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

figure_title         96.63        31.47        24.64        27.64        7281   
reference_citation   59.15        57.42        58.68        58.05        134196 
reference_figure     94.74        61.21        65.9         63.47        19330  
reference_table      99.22        83.01        88.39        85.62        7327   
section_title        94.73        76.39        67.76        71.82        27619  
table_title          98.76        57.29        50.29        53.56        3971   

all (micro avg.)     90.54        60.41        60.32        60.36        199724 
all (macro avg.)     90.54        61.13        59.28        60.02        199724 


======== Soft Matching ======== (ignoring punctuation, case and space characters mismatches)

===== Field-level results =====

label                accuracy     precision    recall       f1           support

figure_title         98.52        78.72        61.63        69.13        7281   
reference_citation   61.86        61.68        63.03        62.34        134196 
reference_figure     94.6         61.69        66.41        63.97        19330  
reference_table      99.2         83.19        88.58        85.8         7327   
section_title        95.43        81.25        72.07        76.38        27619  
table_title          99.35        81.87        71.87        76.55        3971   

all (micro avg.)     91.49        65.76        65.67        65.72        199724 
all (macro avg.)     91.49        74.73        70.6         72.36        199724 


====================================================================================

@lfoppiano
Copy link
Collaborator Author

I'm attaching all the results as files for completeness:

@kermitt2
Copy link
Owner

kermitt2 commented Jul 2, 2024

Hi Luca ! I think there is a major issue with the the jvm version indicated by the Kotlin jvmToolchain

kotlin {
        jvmToolchain(17)
    }

The classes and jar become incompatible with jvm lower than 17... So it's not possible to run grobid any more with a jvm 11:

Error: LinkageError occurred while loading main class org.grobid.trainer.NameAddressTrainer
        java.lang.UnsupportedClassVersionError: org/grobid/trainer/NameAddressTrainer has been compiled by a more recent version of the Java Runtime (class file version 61.0), this version of the Java Runtime only recognizes class file versions up to 55.0

In addition, it has blocking consequences for other modules and libraries using grobid which can't be run with jvm 17.

The solution seems to simply make everything to java 11:

    kotlin {
        jvmToolchain(11)
    }

although source compatibility java 11 is not working:

    sourceCompatibility = 1.11
    targetCompatibility = 1.11

gives

lopez@smallbook:~/grobid$ ./gradlew clean install

FAILURE: Build failed with an exception.

* Where:
Build file '/home/lopez/grobid/build.gradle' line: 268

* What went wrong:
Could not determine the dependencies of task ':grobid-core:shadowJar'.
> The new Java toolchain feature cannot be used at the project level in combination with source and/or target compatibility

@kermitt2
Copy link
Owner

kermitt2 commented Jul 2, 2024

It seems the Java 11 compatibility is broken by the recent changes in FundingAcknowledgementParser:

./gradlew clean install

> Task :grobid-core:compileJava
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:193: error: cannot find symbol
                List<OffsetPosition> annotationsPositionTokens = annotations.stream().map(AnnotatedXMLElement::getOffsetPosition).toList();
                                                                                                                                 ^
  symbol:   method toList()
  location: interface Stream<OffsetPosition>
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:253: error: cannot find symbol
            .map(AnnotatedXMLElement::getOffsetPosition).toList());
                                                        ^
  symbol:   method toList()
  location: interface Stream<OffsetPosition>
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:259: error: cannot find symbol
                .toList();
                ^
  symbol:   method toList()
  location: interface Stream<OffsetPosition>
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:266: error: cannot find symbol
                    .toList();
                    ^
  symbol:   method toList()
  location: interface Stream<Integer>
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:294: error: cannot find symbol
                            .toList());
                            ^
  symbol:   method toList()
  location: interface Stream<BoundingBox>
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:304: error: cannot find symbol
                        String coordsAsString = String.join(";", postMergeBoxes.stream().map(BoundingBox::toString).toList());
                                                                                                                   ^
  symbol:   method toList()
  location: interface Stream<String>
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:372: error: cannot find symbol
                    .toList();
                    ^
  symbol:   method toList()
  location: interface Stream<AnnotatedXMLElement>
/home/lopez/grobid/grobid-core/src/main/java/org/grobid/core/engines/FundingAcknowledgementParser.java:410: error: cannot find symbol
                        .toList();
                        ^
  symbol:   method toList()
  location: interface Stream<AnnotatedXMLElement>

@lfoppiano
Copy link
Collaborator Author

lfoppiano commented Jul 2, 2024

Hi @kermitt2,
I was going to do it after, with the idea of upgrading to 17 what's needs to be upgraded.

I checked grobid-quantities, software-mentions, datastet and they seems to be compatible with JDK 17. I would say that the old modules may stay with an older version.
In any case, I can help you on updating and testing them. Let me know what I can do.

If you want to keep jdk 11 compatibility, for the second problem, you can replace toList() with .collect(Collectors.toList()).

@kermitt2
Copy link
Owner

kermitt2 commented Jul 3, 2024

I think it's good to move to JDK 17 in general, but we need to update the other modules first, otherwise this is blocking for users. This is also a general issue for everything that depends on Grobid and for existing production environment where Grobid runs. For example I am currently stuck and failed to upgrade entity-fishing from JDK 8 to JDK 11 and this is very annoying for the users.

I think it's better to ensure JDK 11 compatibility for this release - 17 would be a breaking change for version 0.9.0, especially given that the move to 17 is more for our comfort than providing really actual advantages?

@lfoppiano
Copy link
Collaborator Author

I think it's good to move to JDK 17 in general, but we need to update the other modules first, otherwise this is blocking for users. This is also a general issue for everything that depends on Grobid and for existing production environment where Grobid runs. For example I am currently stuck and failed to upgrade entity-fishing from JDK 8 to JDK 11 and this is very annoying for the users.

OK, no problem. I might be to optimistic in thinking that people would have migrated to Docker by now.

Let me help you with entity-fishing. Could you commit and push everything you've done so far on a branch of the project, I will have a look ASAP 😉
If there are other modules that need to be updated please do let me know.

I think it's better to ensure JDK 11 compatibility for this release - 17 would be a breaking change for version 0.9.0, especially given that the move to 17 is more for our comfort than providing really actual advantages?

Sure. 👍

@lfoppiano
Copy link
Collaborator Author

@kermitt2 I tested the latest commits 56d351c and it work with JDK 11 on my Apple M2.

@kermitt2
Copy link
Owner

kermitt2 commented Jul 4, 2024

Thank you very much @lfoppiano it is working also for me now with jdk 11 on Linux (as you, I usually run jdk 17, and it's why I saw the issue only recently).

About entity-fishing, the master has the latest commit if I am not wrong, and running with grobid 0.8.0 and jdk 11 fails because the current version uses an incubator module that has disappeared after jdk 1.8. I did not analyze further which dependency uses this module and if there is a possible replacement in jdk 11.

@coveralls
Copy link

Coverage Status

coverage: 40.769% (-0.04%) from 40.804%
when pulling 99f653a on release-0.8.1
into e04048d on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants