-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate reference citation keys for ingested references #114
Generate reference citation keys for ingested references #114
Conversation
@@ -4,5 +4,4 @@ | |||
yarn check | |||
yarn test | |||
|
|||
yarn pylint:check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed that someone added this to yarn check
(thank you!), so we don't need to call it twice here
def update(self, target: typing.Reference): | ||
""" | ||
Update a Reference in storage with the target reference. | ||
This is used when the client has updated the reference in the UI. | ||
""" | ||
for i, source in enumerate(self.references): | ||
if source.filename_md5 == target.filename_md5: | ||
logger.info(f"Updating reference {source.source_filename} ({source.filename_md5})") | ||
self.references[i] = target | ||
break | ||
else: | ||
logger.info(f"Unable to find reference {target.source_filename} ({target.filename_md5})") | ||
return | ||
self.save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively, we could pass in the filename hash (or we should probably use a UUID) and dict, making the function signature something like:
def update(reference_id: UUID, updates: dict[string, Any])
where dict[string, Any]
=> field_name_to_update: value_to_set
. would need to think through how to do this for subtypes like Author though. maybe individual methods for update_reference
and update_author
to make things clear and specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok, also bc this is JsonStorage file. Maybe we need to expose to the frontend more granular operations (ex: rename citation). So this is ok for now.
given_name: str | None = None | ||
surname: str | None = None | ||
email: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grobid won't always have these fields, but from what I can tell full_name
will always be present if it can parse any authors.
We'll be able to get a lot more consistent here once we start work on further augmenting our references with #140
@@ -7,7 +7,7 @@ function parsePdfIngestionResponse(response: IngestResponse): ReferenceItem[] { | |||
return response.references.map((reference) => ({ | |||
id: reference.filename_md5, | |||
title: reference.title ?? reference.source_filename.replace('.pdf', ''), | |||
authors: (reference.authors ?? []).map((author) => ({ fullName: author.full_name, surname: author.surname })), | |||
authors: (reference.authors ?? []).map((author) => ({ fullName: author.full_name })), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def update(self, target: typing.Reference): | ||
""" | ||
Update a Reference in storage with the target reference. | ||
This is used when the client has updated the reference in the UI. | ||
""" | ||
for i, source in enumerate(self.references): | ||
if source.filename_md5 == target.filename_md5: | ||
logger.info(f"Updating reference {source.source_filename} ({source.filename_md5})") | ||
self.references[i] = target | ||
break | ||
else: | ||
logger.info(f"Unable to find reference {target.source_filename} ({target.filename_md5})") | ||
return | ||
self.save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok, also bc this is JsonStorage file. Maybe we need to expose to the frontend more granular operations (ex: rename citation). So this is ok for now.
@gjreda just found an error (date parsing) in the ingestion for the Machine Learning at Scale.pdf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a pending error when parsing some PDF files (check my latest comment).
Codecov Report
@@ Coverage Diff @@
## main #114 +/- ##
=======================================
Coverage 36.73% 36.73%
=======================================
Files 52 52
Lines 2488 2488
Branches 114 114
=======================================
Hits 914 914
Misses 1551 1551
Partials 23 23
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
fixes #112
Adds
citation_key
andpublished_date
to the Reference response.Depending on whether the "uniqueness" postfix of
a, b, c ... 1, 2, 3
should be handled by the frontend or sidecar, this PR might require more work. See discussion here.