Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look at search options #9

Open
blahah opened this issue Apr 16, 2017 · 16 comments
Open

Look at search options #9

blahah opened this issue Apr 16, 2017 · 16 comments

Comments

@blahah
Copy link

blahah commented Apr 16, 2017

Would be cool to build a nice search / stats interface around the data.

The whole dataset is small enough that we could have a small index in browser memory and call out to the API for full records:

Alternatively, algolia have a free community search offering, and apparently it's really nice.

We should evaluate the options.

@brandonStell
Copy link

A new version of PubPeer will be released in a few weeks and will incorporate Elasticsearch (Algolia is built with same tech). We would be interested in incorporating these data in PubPeer. Providing context to the retraction would be an obvious advantage. Perhaps retraction dates could even be inserted into the PubPeer timeline for each article.

@blahah
Copy link
Author

blahah commented Apr 16, 2017

@brandonStell cool, a big motivation for making this is that information about retractions and other updates should be able to be displayed in context wherever papers are displayed. We'll incorporate it into sciencefair, and will probably make some plugins for reference managers.

Let us know what we can do to support your use in PubPeer (which, as I think you know, we ❤️).

@brandonStell
Copy link

Your API is great. We could implement it immediately if there were retraction dates and links to retraction notices. Also, why limit to retractions? Could expressions of concern, etc. be returned?

maybe something like this is possible:

{
  "retracted": true,
   "retraction_notice": {
          "date": "1492336605",
          "url": "http://doi.org/10.7860/JCDR/2013/4833.2724",
        },
  "expression_of_concern": false,
  "other_events": false,
}

@blahah
Copy link
Author

blahah commented Apr 16, 2017

Yup, that's coming today - we made the whole thing yesterday so we started with the simplest thing. Today we'll add all updates from this list, and we made a command-line tool for getting them from crossref.

I think we'll go with retracted: bool and then update: obj where the object describes the type of update, links to the DOI of the update, and has the date. Some of the coding of update types is overloaded and misused by the publishers.

@blahah
Copy link
Author

blahah commented Apr 16, 2017

@brandonStell how's this for the metadata format:

{
  "retracted": false,
  "update": {
    "timestamp": 1361836800000,
    "doi": "10.1002/job.1858",
    "type": "correction"
  },
  "doi": "10.1002/job.1787",
  "journal": "Journal of Organizational Behavior",
  "publisher": "Wiley-Blackwell",
  "title": "Erratum: Cognitive and affective identification: Exploring the links between different forms of social identification and personality with work attitudes and behavior"
}

@brandonStell
Copy link

Great for us; looks like everything we would need.

@blahah
Copy link
Author

blahah commented Apr 16, 2017

@brandonStell awesome! for future reference, here's the jq command to generate this from a CrossRef entry: https://jqplay.org/s/KxQS_Rx0rL

jq '{ retracted: (."update-to"[0].type == "retraction"), update: { timestamp: ."update-to"[0].updated.timestamp, doi: .DOI, type: ."update-to"[0].type }, doi: ."update-to"[0].DOI, journal: ."container-title"[0], publisher: .publisher, title: .title[0] }'

@brandonStell
Copy link

You didn't get the metadata from here?

curl -L -iH "Accept: application/vnd.citationstyles.csl+json" http://dx.doi.org/10.1002/job.1787 | jq '{ retracted: (."update-to"[0].type == "retraction"), update: { timestamp: ."update-to"[0].updated.timestamp, doi: .DOI, type: ."update-to"[0].type }, doi: ."update-to"[0].DOI, journal: ."container-title"[0], publisher: .publisher, title: .title[0] }'

@blahah
Copy link
Author

blahah commented Apr 16, 2017

No - that's the original DOI record. We are harvesting just the update DOI records and then reconstructing the metadata from those. We get them en masse using the API with deep cursor paging.

@brandonStell
Copy link

OK.

Shouldn't this produce the example metadata you show above?

curl http://openretractions.com/api/doi/10.1002/job.1787/data.json | jq

@blahah
Copy link
Author

blahah commented Apr 17, 2017

@brandonStell yes, just got held up in releasing the new format metadata by rolling power cuts here in Kenya :)

working on releasing it now, I'll ping you once it's up

@joshed-io
Copy link

@brandonStell Minor correction, Algolia is it's own search engine (more info). Because we do some really strong latency optimizations, Algolia often feels like it's client side, but you still get the full power of a full search engine (weighting fields differently, prefix matching, handling typos, not having to download data set to each client, etc).

instantsearch.js or the React version could be used to create a nice single-field or faceted search experience for Open Retractions pretty quickly. If the data goes above 10k records we're happy to up the limit, all we ask for is to keep a "search by Algolia" icon by the search or the results. If you decide to go this way, we'd be happy to have you share the project on our community forum to get more visibility for it.

@blahah
Copy link
Author

blahah commented Apr 21, 2017

@brandonStell sorry, lost track of where this discussion was. New format has been up for a few days. ~45k records up now, and a few thousand of those are missing the timestamp and/or DOI for the update. I'll be working on getting all those fields populated ASAP, but in its current state the data and API should be pretty useful. :)

@blahah
Copy link
Author

blahah commented Apr 21, 2017

@dzello thanks for this info - definitely interested in exploring it. My main concern is that I want to balance it with our fundamental commitment to open source and open data. In general we try to make reusable components and modules that aren't tied too deeply to any framework. Reading through your linked posts and browsing the repos, it looks to me like you're being unusually open for a company whose core business is a secret algorithm - and I appreciate that what you're offering here is free resources. Would you be interested in a chat to talk about these things?

@brandonStell
Copy link

looks great. we're going to try to get it into the new site when it launches.

@joshed-io
Copy link

@blahah very happy to chat. We rely on open source and want to give back where we can. We also want to support people who are finding ways to use search to improve their lives or the world, sites like Grantmakers.io. Send me a note and we'll set something up? josh at algolia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants