Skip to content

Commit

Permalink
deposits: fix PDF date extraction
Browse files Browse the repository at this point in the history
* Fixes an error with the date property when the PDF contains an host document.
* Closes #603.

Co-Authored-by: Sébastien Délèze <sebastien.deleze@rero.ch>
  • Loading branch information
Sébastien Délèze committed Jul 21, 2021
1 parent dc8eb1a commit 2f96691
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 1 deletion.
5 changes: 4 additions & 1 deletion sonar/modules/pdf_extractor/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,10 @@ def format_extracted_data(data):
'@from'] + '-' + item['@to']

if monogr['imprint'].get('date').get('@when'):
publication['year'] = monogr['imprint']['date']['@when']
match = re.search(r'^([0-9]{4}).*$',
monogr['imprint']['date']['@when'])
if match:
formatted_data['documentDate'] = match.group(1)

if publication:
formatted_data['publication'] = publication
Expand Down
1 change: 1 addition & 0 deletions tests/ui/pdf_extractor/test_pdf_extractor_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,4 @@ def test_format_extracted_data(app):
assert formatted_data['publication'][
'publishedIn'] == 'Frontiers in Earth Science'
assert formatted_data['publication']['volume'] == '7'
assert formatted_data['documentDate'] == '2019'

0 comments on commit 2f96691

Please sign in to comment.