Look at search options #9

blahah · 2017-04-16T08:28:53Z

Would be cool to build a nice search / stats interface around the data.

The whole dataset is small enough that we could have a small index in browser memory and call out to the API for full records:

Alternatively, algolia have a free community search offering, and apparently it's really nice.

We should evaluate the options.

brandonStell · 2017-04-16T09:37:06Z

A new version of PubPeer will be released in a few weeks and will incorporate Elasticsearch (Algolia is built with same tech). We would be interested in incorporating these data in PubPeer. Providing context to the retraction would be an obvious advantage. Perhaps retraction dates could even be inserted into the PubPeer timeline for each article.

blahah · 2017-04-16T09:43:32Z

@brandonStell cool, a big motivation for making this is that information about retractions and other updates should be able to be displayed in context wherever papers are displayed. We'll incorporate it into sciencefair, and will probably make some plugins for reference managers.

Let us know what we can do to support your use in PubPeer (which, as I think you know, we ❤️).

brandonStell · 2017-04-16T09:59:31Z

Your API is great. We could implement it immediately if there were retraction dates and links to retraction notices. Also, why limit to retractions? Could expressions of concern, etc. be returned?

maybe something like this is possible:

{
  "retracted": true,
   "retraction_notice": {
          "date": "1492336605",
          "url": "http://doi.org/10.7860/JCDR/2013/4833.2724",
        },
  "expression_of_concern": false,
  "other_events": false,
}

blahah · 2017-04-16T10:06:07Z

Yup, that's coming today - we made the whole thing yesterday so we started with the simplest thing. Today we'll add all updates from this list, and we made a command-line tool for getting them from crossref.

I think we'll go with retracted: bool and then update: obj where the object describes the type of update, links to the DOI of the update, and has the date. Some of the coding of update types is overloaded and misused by the publishers.

blahah · 2017-04-16T16:47:49Z

@brandonStell how's this for the metadata format:

{
  "retracted": false,
  "update": {
    "timestamp": 1361836800000,
    "doi": "10.1002/job.1858",
    "type": "correction"
  },
  "doi": "10.1002/job.1787",
  "journal": "Journal of Organizational Behavior",
  "publisher": "Wiley-Blackwell",
  "title": "Erratum: Cognitive and affective identification: Exploring the links between different forms of social identification and personality with work attitudes and behavior"
}

brandonStell · 2017-04-16T16:55:26Z

Great for us; looks like everything we would need.

blahah · 2017-04-16T16:57:44Z

@brandonStell awesome! for future reference, here's the jq command to generate this from a CrossRef entry: https://jqplay.org/s/KxQS_Rx0rL

jq '{ retracted: (."update-to"[0].type == "retraction"), update: { timestamp: ."update-to"[0].updated.timestamp, doi: .DOI, type: ."update-to"[0].type }, doi: ."update-to"[0].DOI, journal: ."container-title"[0], publisher: .publisher, title: .title[0] }'

brandonStell · 2017-04-16T17:16:19Z

You didn't get the metadata from here?

curl -L -iH "Accept: application/vnd.citationstyles.csl+json" http://dx.doi.org/10.1002/job.1787 | jq '{ retracted: (."update-to"[0].type == "retraction"), update: { timestamp: ."update-to"[0].updated.timestamp, doi: .DOI, type: ."update-to"[0].type }, doi: ."update-to"[0].DOI, journal: ."container-title"[0], publisher: .publisher, title: .title[0] }'

blahah · 2017-04-16T17:40:32Z

No - that's the original DOI record. We are harvesting just the update DOI records and then reconstructing the metadata from those. We get them en masse using the API with deep cursor paging.

brandonStell · 2017-04-17T14:14:42Z

OK.

Shouldn't this produce the example metadata you show above?

curl http://openretractions.com/api/doi/10.1002/job.1787/data.json | jq

blahah · 2017-04-17T14:20:34Z

@brandonStell yes, just got held up in releasing the new format metadata by rolling power cuts here in Kenya :)

working on releasing it now, I'll ping you once it's up

joshed-io · 2017-04-17T15:39:00Z

@brandonStell Minor correction, Algolia is it's own search engine (more info). Because we do some really strong latency optimizations, Algolia often feels like it's client side, but you still get the full power of a full search engine (weighting fields differently, prefix matching, handling typos, not having to download data set to each client, etc).

instantsearch.js or the React version could be used to create a nice single-field or faceted search experience for Open Retractions pretty quickly. If the data goes above 10k records we're happy to up the limit, all we ask for is to keep a "search by Algolia" icon by the search or the results. If you decide to go this way, we'd be happy to have you share the project on our community forum to get more visibility for it.

blahah · 2017-04-21T00:37:22Z

@brandonStell sorry, lost track of where this discussion was. New format has been up for a few days. ~45k records up now, and a few thousand of those are missing the timestamp and/or DOI for the update. I'll be working on getting all those fields populated ASAP, but in its current state the data and API should be pretty useful. :)

blahah · 2017-04-21T00:44:10Z

@dzello thanks for this info - definitely interested in exploring it. My main concern is that I want to balance it with our fundamental commitment to open source and open data. In general we try to make reusable components and modules that aren't tied too deeply to any framework. Reading through your linked posts and browsing the repos, it looks to me like you're being unusually open for a company whose core business is a secret algorithm - and I appreciate that what you're offering here is free resources. Would you be interested in a chat to talk about these things?

brandonStell · 2017-04-21T07:19:32Z

looks great. we're going to try to get it into the new site when it launches.

joshed-io · 2017-04-21T12:51:54Z

@blahah very happy to chat. We rely on open source and want to give back where we can. We also want to support people who are finding ways to use search to improve their lives or the world, sites like Grantmakers.io. Send me a note and we'll set something up? josh at algolia

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Look at search options #9

Look at search options #9

blahah commented Apr 16, 2017 •

edited

brandonStell commented Apr 16, 2017

blahah commented Apr 16, 2017

brandonStell commented Apr 16, 2017

blahah commented Apr 16, 2017

blahah commented Apr 16, 2017

brandonStell commented Apr 16, 2017

blahah commented Apr 16, 2017

brandonStell commented Apr 16, 2017

blahah commented Apr 16, 2017

brandonStell commented Apr 17, 2017

blahah commented Apr 17, 2017

joshed-io commented Apr 17, 2017

blahah commented Apr 21, 2017

blahah commented Apr 21, 2017

brandonStell commented Apr 21, 2017

joshed-io commented Apr 21, 2017

Look at search options #9

Look at search options #9

Comments

blahah commented Apr 16, 2017 • edited

brandonStell commented Apr 16, 2017

blahah commented Apr 16, 2017

brandonStell commented Apr 16, 2017

blahah commented Apr 16, 2017

blahah commented Apr 16, 2017

brandonStell commented Apr 16, 2017

blahah commented Apr 16, 2017

brandonStell commented Apr 16, 2017

blahah commented Apr 16, 2017

brandonStell commented Apr 17, 2017

blahah commented Apr 17, 2017

joshed-io commented Apr 17, 2017

blahah commented Apr 21, 2017

blahah commented Apr 21, 2017

brandonStell commented Apr 21, 2017

joshed-io commented Apr 21, 2017

blahah commented Apr 16, 2017 •

edited