Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for OAI-PMH to report eprints that have been live but aren't in the 'archive' dataset as deleted #324

Merged
merged 2 commits into from Aug 16, 2017

Conversation

@jesusbagpuss
Copy link
Contributor

@jesusbagpuss jesusbagpuss commented Jun 17, 2015

If something is moved from archive to review, inbox, or dark_archive (if one is configured), the record is now reported as deleted in the OAI-PMH interface.

Anything in archive = metadata available
Anything not in archive = deleted
@patrickmcsweeney
Copy link
Contributor

@patrickmcsweeney patrickmcsweeney commented Jun 19, 2015

I am not entirely sure that that is incorrect John. Looking at the OAI-PMH
spec "If a record is no longer available then it is said to be deleted."
https://www.openarchives.org/OAI/openarchivesprotocol.html There is some
other guff in there about qualifying status of a delete which may be
relevent here but needs someone with more motivation to investigate.

Patrick

On Wed, Jun 17, 2015 at 11:57 AM, jesusbagpuss notifications@github.com
wrote:

If something is moved from archive to review, inbox, or dark_archive (if
one is configured), the record is now reported as deleted in the OAI-PMH

interface.

You can view, comment on, or merge this pull request online at:

#324
Commit Summary

  • Report anything that has a datestamp via OAI-PMH
  • Update OpenArchives.pm

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#324.

'But your intentions are beside the point, It's the outcome of your actions
that count...'

@jesusbagpuss
Copy link
Contributor Author

@jesusbagpuss jesusbagpuss commented Jun 19, 2015

Hi Patrick,
I investigated :o) But am happy to get ‘proper’ validation of this approach from an expert too.
From https://www.openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords
“If a record is no longer available then it is said to be deleted”

As EPrints uses ‘persistent’ support of deleted records, I think that anything that was live (publicly available), but now isn’t, should be represented as ‘deleted’.

This is the scenario:

  • a record that is live.
  • the record is later removed (put into the DarkArchive - because someone made a boo-boo).
  • the record is no longer available over OAI-PMH (it is not in the archive – IMO therefore OAI-PMH deleted).

Any harvester that has links to the record will now have broken links
I believe the correct behaviour in this situation is to mark the no-longer-available-record as deleted (this ‘deleted’ is in the scope of the OAI-PMH interface – it has been deleted from there – not necessarily deleted from the repository).

I sought advice on oai-pmh@googlegroups.commailto:oai-pmh@googlegroups.com about harvester behaviour when a record such as this reappears.

Shall I seek more clarification on that list from the experts?

Cheers,
John

PS EPrints is broken if someone destroys (rather than retires) a once-live record – as this breaks the persistence model.

From: Patrick McSweeney [mailto:notifications@github.com]
Sent: 19 June 2015 09:03
To: eprints/eprints
Cc: John Salter
Subject: Re: [eprints] OAI-PMH report eprints that have been live but aren't in the 'archive' dataset as deleted (#324)

I am not entirely sure that that is incorrect John. Looking at the OAI-PMH
spec "If a record is no longer available then it is said to be deleted."
https://www.openarchives.org/OAI/openarchivesprotocol.html There is some
other guff in there about qualifying status of a delete which may be
relevent here but needs someone with more motivation to investigate.

Patrick

On Wed, Jun 17, 2015 at 11:57 AM, jesusbagpuss <notifications@github.commailto:notifications@github.com>
wrote:

If something is moved from archive to review, inbox, or dark_archive (if
one is configured), the record is now reported as deleted in the OAI-PMH

interface.

You can view, comment on, or merge this pull request online at:

#324
Commit Summary

  • Report anything that has a datestamp via OAI-PMH
  • Update OpenArchives.pm

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#324.

'But your intentions are beside the point, It's the outcome of your actions
that count...'


Reply to this email directly or view it on GitHubhttps://github.com//pull/324#issuecomment-113421954.

@patrickmcsweeney
Copy link
Contributor

@patrickmcsweeney patrickmcsweeney commented Jun 20, 2015

Sorry John I think I misread your original mail. I thought you were
reporting a bug not pushing a fix. Apologies for the confusion.
Patrick
On 19 Jun 2015 09:37, "jesusbagpuss" notifications@github.com wrote:

Hi Patrick,
I investigated :o) But am happy to get ‘proper’ validation of this
approach from an expert too.
From
https://www.openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords
“If a record is no longer available then it is said to be deleted”

As EPrints uses ‘persistent’ support of deleted records, I think that
anything that was live (publicly available), but now isn’t, should be
represented as ‘deleted’.

This is the scenario:

  • a record that is live.
  • the record is later removed (put into the DarkArchive - because someone
    made a boo-boo).
  • the record is no longer available over OAI-PMH (it is not in the archive
    – IMO therefore OAI-PMH deleted).

Any harvester that has links to the record will now have broken links
I believe the correct behaviour in this situation is to mark the
no-longer-available-record as deleted (this ‘deleted’ is in the scope of
the OAI-PMH interface – it has been deleted from there – not necessarily
deleted from the repository).

I sought advice on oai-pmh@googlegroups.com<mailto:
oai-pmh@googlegroups.com> about harvester behaviour when a record such as
this reappears.

Shall I seek more clarification on that list from the experts?

Cheers,
John

PS EPrints is broken if someone destroys (rather than retires) a once-live
record – as this breaks the persistence model.

From: Patrick McSweeney [mailto:notifications@github.com]
Sent: 19 June 2015 09:03
To: eprints/eprints
Cc: John Salter
Subject: Re: [eprints] OAI-PMH report eprints that have been live but
aren't in the 'archive' dataset as deleted (#324)

I am not entirely sure that that is incorrect John. Looking at the OAI-PMH
spec "If a record is no longer available then it is said to be deleted."
https://www.openarchives.org/OAI/openarchivesprotocol.html There is some
other guff in there about qualifying status of a delete which may be
relevent here but needs someone with more motivation to investigate.

Patrick

On Wed, Jun 17, 2015 at 11:57 AM, jesusbagpuss <notifications@github.com
mailto:notifications@github.com>
wrote:

If something is moved from archive to review, inbox, or dark_archive (if
one is configured), the record is now reported as deleted in the OAI-PMH

interface.

You can view, comment on, or merge this pull request online at:

#324
Commit Summary

  • Report anything that has a datestamp via OAI-PMH
  • Update OpenArchives.pm

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#324.

'But your intentions are beside the point, It's the outcome of your
actions
that count...'


Reply to this email directly or view it on GitHub<
https://github.com/eprints/eprints/pull/324#issuecomment-113421954>.


Reply to this email directly or view it on GitHub
#324 (comment).

@jesusbagpuss jesusbagpuss changed the title OAI-PMH report eprints that have been live but aren't in the 'archive' dataset as deleted Fix for OAI-PMH to report eprints that have been live but aren't in the 'archive' dataset as deleted Jun 22, 2015
@jiadiyao jiadiyao merged commit fdf0c95 into 3.3 Aug 16, 2017
@jesusbagpuss jesusbagpuss deleted the oai-pmh-update branch Sep 8, 2017
mpbraendle
Copy link
Contributor

mpbraendle commented on e3e0852 Jul 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eprints that have been moved back to buffer and inbox from archive already have a datestamp. However, they should not be visible in the OAI-PMH result set. Therefore I'm not sure whether this change is appropriate.

jesusbagpuss
Copy link
Contributor

jesusbagpuss commented on e3e0852 Jul 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpbraendle - they should be flagged as 'deleted' in the OAI-PMH set (meaning the metadata has been removed from the OAI interface; not the item has been deleted from the repository).

To flag them as being deleted in OAI, they need to be returned - hence this change.

The original pull request is here: #324
but it looks like the update to EPrints::OpenArchives was somehow not merged in?
6b19298

This is obviously a critical part of the fix!

jesusbagpuss
Copy link
Contributor

jesusbagpuss commented on e3e0852 Jul 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mpbraendle
Copy link
Contributor

mpbraendle commented on e3e0852 Jul 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That makes it all clear. We took over the patch in OpenArchives.pm , so that is fine.

jesusbagpuss
Copy link
Contributor

jesusbagpuss commented on e3e0852 Jul 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent - I should have made it as a single commit - sorry!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants