Skip to content

Commit

Permalink
Merge pull request #556 from koppor/add-changelog.md
Browse files Browse the repository at this point in the history
Create CHANGELOG.md
  • Loading branch information
kermitt2 committed Mar 11, 2020
2 parents adeca65 + 4836505 commit 6385eb3
Show file tree
Hide file tree
Showing 2 changed files with 217 additions and 123 deletions.
195 changes: 195 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [Unreleased]

### Added

+ Support for `application/x-bibtex` at `/api/processReferences` and `/api/processCitation`

### Changed

+ Documentation improvements

### Fixed

+ Fixed flags of pdf2xml in `Dockerfile`

## [0.5.6] – 2019-10-16

### Changed

+ Better abstract structuring (with citation contexts)
+ n-fold cross evaluation and better evaluation report (thanks to @lfoppiano)
+ Improved PMC ID and PMID recognition
+ Improved subscript/superscript and font style recognition (via [pdfalto](https://github.com/kermitt2/pdfalto))
+ Improved JEP integration (support of python virtual environment for using DeLFT Deep Learning library, thanks @de-code and @lfoppiano)
+ Improved dehyphenization (thanks to @lfoppiano)

### Fixed

+ Several bug fixes (thanks @de-code, @bnewbold, @Vitaliy-1 and @lfoppiano)

## [0.5.5] – 2019-05-29

### Added

+ Improvement and full review of the integration of consolidation services, supporting [biblio-glutton](https://github.com/kermitt2/biblio-glutton) (additional identifiers and Open Access links) and [Crossref REST API](https://github.com/CrossRef/rest-api-doc) (add specific user agent, email and token for Crossref Metadata Plus)

### Changed

+ Using [pdfalto](https://github.com/kermitt2/pdfalto) instead of pdf2xml for the first PDF parsing stage, with many improvements in robustness, ICU support, unknown glyph/font normalization (thanks in particular to @aazhar)
+ Updated lexicon #396

### Fixed

+ Fix bounding box issues for some PDF #330

## [0.5.4] – 2019-02-12

### Added

+ Support of [biblio-glutton](https://github.com/kermitt2/biblio-glutton) as DOI/metadata matching service, alternative to crossref REST API

### Changed

+ Transparent usage of [DeLFT](https://github.com/kermitt2/delft) deep learning models (BidLSTM-CRF/ELMo) instead of Wapiti CRF models, native integration via [JEP](https://github.com/ninia/jep)
+ Improvement of citation context identification and matching (+9% recall with similar precision, for PMC sample 1943 articles, from 43.35 correct citation contexts per article to 49.98 correct citation contexts per article)
+ Citation callout now in abstract, figure and table captions
+ Structured abstract (including update of TEI schema)

### Fixed

+ Bug fixes and some more parameters: by default using all available threads when training (thanks [@de-code](https://github.com/de-code)) and possibility to load models at the start of the service

## [0.5.3] – 2018-11-25

### Added

+ Support of proxy for calling crossref with Apache HttpClient

### Changed

+ Improvement of consolidation options and processing (better handling of CrossRef API, but the best is coming soon ;)
+ Better recall for figure and table identification (thanks to @detonator413)

### Fixed

+ Minor bug fixing

## [0.5.2] – 2018-10-17

### Added

+ Added [Grobid clients](https://grobid.readthedocs.io/en/latest/Grobid-service/#clients-for-grobid-web-services) for Java, Python and NodeJS
+ Added metrics in the REST entrypoint (accessible via <http://localhost:8071>)
+ Added counters for consolidation tasks and consolidation results
+ Added case sensitiveness option in lexicon/FastMatcher

### Changed

+ Updated documentation

### Fixed

+ Corrected back status codes from the REST API when no available engine (503 is back again to inform the client to wait, it was removed by error in version 0.5.0 and 0.5.1 for PDF processing services only, see documentation of the REST API)
+ Bugfixing #339, #322, #300, and others

## [0.5.1] – 2018-01-29

### Fixed

+ Various bug fixes

## [0.5.0] – 2017-11-09

### Changed

+ Migrate from maven to gradle for faster, more flexible and more stable build, release, etc.
+ Usage of Dropwizard for web services
+ Move the Grobid service manual to [readthedocs](http://grobid.readthedocs.io/en/latest/Grobid-service/)
+ (thanks to @detonator413 and @lfoppiano for this release! future work in versions 0.5.* will focus again on improving PDF parsing and structuring accuracy)

## [0.4.4] – 2017-10-13

### Fixed

+ Fixed issue that was making the release build not working

## [0.4.3] – 2017-10-07

### Added

+ New models: f-score improvement on the PubMed Central sample, bibliographical references +2.5%, header +7%
+ New training data and features for bibliographical references, in particular for covering HEP domain (INSPIRE), arXiv identifier, DOI and url (thanks @iorala and @michamos !)
+ Support for CrossRef REST API (instead of the slow OpenURL-style API which requires a CrossRef account), in particular for multithreading usage (thanks @Vi-dot)
+ Unicode normalisation and more robust body extraction (thanks @aoboturov)

### Changed

+ Improve training data generation and documentation (thanks @jfix)
+ Update of the pdf2xml fork for Windows (thanks @lfoppiano)

### Fixed

+ fixes, tests, documentation

## [0.4.2] – 2017-08-05

### Added

+ Identification of equations (with PDF coordinates)
+ End-to-end evaluation with Pub2TEI conversions

### Changed

+ f-score improvement for the PubMed Central sample: fulltext +10-14%, header +0.5%, citations +0.5%
+ More robust PDF parsing

### Fixed

+ many fixes and refactoring

## [0.4.1] – 2016-10-02

### Added

+ Support for Windows thanks to the contributions of Christopher Boumenot!
+ Support to Docker.
+ New web services for PDF annotation and updated web console application.

### Changed

+ Some improvements on figure/table extraction - but still experimental at this stage (work in progress, as the whole full text model).

### Fixed

+ Fixes and refactoring.

## [0.4.0] – 2016-10-02

### Changed

+ Improvement of the recognition of citations thanks to refinements of CRF features - +4% in f-score for the PubMed Central sample.
+ Improvement of the full text model, with new features and the introduction of two additional models for figures and tables.
+ More robust synchronization of CRF sequence with PDF areas, resulting in improved bounding box calculations for locating annotations in the PDF documents.
+ Improved general robustness thanks to better token alignments.

[Unreleased]: https://github.com/kermitt2/grobid/compare/0.5.6...HEAD
[0.5.6]: https://github.com/kermitt2/grobid/compare/0.5.5...0.5.6
[0.5.5]: https://github.com/kermitt2/grobid/compare/0.5.4...0.5.5
[0.5.4]: https://github.com/kermitt2/grobid/compare/0.5.3...0.5.4
[0.5.3]: https://github.com/kermitt2/grobid/compare/0.5.2...0.5.3
[0.5.2]: https://github.com/kermitt2/grobid/compare/0.5.1...0.5.2
[0.5.1]: https://github.com/kermitt2/grobid/compare/0.5.0...0.5.1
[0.5.0]: https://github.com/kermitt2/grobid/compare/grobid-parent-0.4.4...0.5.0
[0.4.4]: https://github.com/kermitt2/grobid/compare/grobid-parent-0.4.3...grobid-parent-0.4.4
[0.4.3]: https://github.com/kermitt2/grobid/compare/grobid-parent-0.4.2...grobid-parent-0.4.3
[0.4.2]: https://github.com/kermitt2/grobid/compare/grobid-parent-0.4.1...grobid-parent-0.4.2
[0.4.1]: https://github.com/kermitt2/grobid/compare/grobid-parent-0.4.0...grobid-parent-0.4.1
[0.4.0]: https://github.com/kermitt2/grobid/compare/grobid-parent-0.3.9...grobid-parent-0.4.0

<!-- markdownlint-disable-file MD024 MD033 -->

0 comments on commit 6385eb3

Please sign in to comment.