Data model: full support for references #430

kaplun · 2015-10-20T12:29:48Z

references should be lists of lists (because there might be errata/ibid on the same reference)
What was once a pubnote in $s should be splitted up into journal title, volume, issue, start page

kaplun · 2015-10-20T12:30:16Z

annetteholtkamp · 2015-10-20T12:39:45Z

We would also need an object for “reportnr, page” in case a report contains several contributions - see e.g. CERN Yellow Reports.
And also “conf acronym, page/article id” e.g. for JACoW conferences.

Annette

On 20 Oct 2015, at 14:30, Samuele Kaplun notifications@github.com wrote:

cc: @jalavik, @annetteholtkamp

—
Reply to this email directly or view it on GitHub.

kaplun · 2015-10-20T12:53:02Z

Well, if we split up $s into its components then we would naturally have separate page and reportnumber supporting your above Yellow report use case. Regarding Conf acronym, do we have some today in 999C5?

annetteholtkamp · 2015-10-20T13:08:38Z

Not yet, but we’ve already some in 773. As soon as we’re exposing that in our bibliographic data we should be able to recognise the corresponding references as well.

Annette

On 20 Oct 2015, at 14:53, Samuele Kaplun notifications@github.com wrote:

Well, if we split up $s into its components then we would naturally have separate page and reportnumber supporting your above Yellow report use case. Regarding Conf acronym, do we have some today in 999C5?

—
Reply to this email directly or view it on GitHub.

aw-bib · 2015-10-20T14:23:14Z

Just to ask:

a reference is (usually) pointing to an existing record
basically, thus it has the very same structure as a record
it even probably needs this complex structure to model all different pub types etc.

Thus, isn't the link(tm) enough probably drawing in some display and get expansions for indexing?

Ie for me references sound a bit like "just the same as the gigantic workflow, except it's children live in HEP space".

The exception are references that are not in inspire, ie. records that usually do not get a curation etc. Thus they will end up in some free form text anyway.

I wonder if such an approach would not simplify the model a lot.

kaplun · 2015-10-20T19:27:42Z

@aw-bib we have to store the whole reference (possibly already structured) because we don't know if:

this match any record at the time the record is ingested
it will possibly match a future record still to arrive
it is currently matching a record but this is a mistake and the reference structure is used to check this.

So the link is not enough.

But indeed it is a good point that the reference could basically be structured almost as whole record. That open up quite some reflection points...

In fact it all boils down to how much information we are able to match from publishers or guess from PDFs via refextract/Grobid.

aw-bib · 2015-10-21T05:05:25Z

I see your point, but wouldn't in this case storage of the raw string be enough?

If not, why not use the data extracted and create a (stub) record and use this for linking. Then you are sure you can map every needs. If the real record comes in later, just brush up the stub by usual merging. I think this simplifies the overall data structure.

So the question is it worth the efford to add a nested record structure, with all its complications.

kaplun · 2016-05-03T12:52:10Z

Moving discussion to dedicated issue #1099

kaplun · 2016-05-03T12:55:59Z

@bittirousku can you take care of the above points:

references should be lists of lists (because there might be errata/ibid on the same reference)
What was once a pubnote in $s should be splitted up into journal title, volume, issue, start page

bittirousku · 2016-05-03T13:03:32Z

Sure, I can do that.

eamonnmag · 2016-07-19T08:49:26Z

List of dicts is always better. Otherwise one has to iterate over all items
and check properties to discover which reference should be displayed.

On Tue, 19 Jul 2016, 10:45 Jacopo Notarstefano, notifications@github.com
wrote:

Closed #430 #430 via
#1279 #1279.

—
You are receiving this because you are subscribed to this thread.

Reply to this email directly, view it on GitHub
#430 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AARPHAU6D6dy1KIL8pGVClor4EpMG43qks5qXI7AgaJpZM4GSGxd
.

kaplun · 2016-07-19T08:54:28Z

Uh! @eamonnmag what are you referring to?

eamonnmag · 2016-07-19T09:01:49Z

references should be lists of lists (because there might be errata/ibid on the same reference)
What was once a pubnote in $s should be splitted up into journal title, volume, issue, start page

Unless I've misunderstood what you're storing in the references block and it's different from say, holdingpen, then if they are lists of lists you have this:

[
    [{'type': 'erratum', 'title': 'bla'},
    {'type': 'correct', 'title': 'blah'}]
]

Now to display the references, I will have to loop over the array and get the 'correct' type in this case. If instead we have this:

[
    {
        'correct': {'title': 'blah'},
        'erratum': {'title': 'bla'}
    }
]

I can just iterate over each reference and get 'correct'.

Obviously the keys are rubbish in this case. But something like this. This would be especially convenient when you have perhaps even more than two.

jacquerie · 2016-07-19T09:08:29Z

Currently references are still list of dicts, but each reference has a list of raw_references inside.

mihaibivol · 2016-07-19T09:31:10Z

The correct / erratum use-case should always happen when adding info between various versions and has to do with the way versioning is done. It will aways be v0 = [{'title': 'bla'}] v1 = [{'titles': 'blah'}] -- use magic --> correct = [{'title': 'bla'}] and you will only display that title.

The problem with list of lists was with references that share the same number. @kaplun You had a pdf example. So far, refextract did a mess in legacy references. List of lists are generally bad for versioning and merging since you have to match a list of reference-like things denoting a single reference with another list of reference-like things. The goal is to always keep correct on top and previous versions of raw_reference only for curators to have fast access in fixing things.

jacquerie · 2017-04-23T16:26:30Z

Everything that had to be decided about this has already been decided in inspirehep/inspire-schemas#130.

kaplun added the roadmap label Oct 20, 2015

kaplun added this to the Citation machinery on Labs milestone Oct 20, 2015

kaplun mentioned this issue May 3, 2016

Reference handling revolution #1099

Open

10 tasks

kaplun assigned bittirousku May 3, 2016

This was referenced Jun 28, 2016

Merger Config inveniosoftware-contrib/json-merger#18

Closed

simplify references #1273

Closed

mihaibivol mentioned this issue Jul 6, 2016

jsonschema: pumped up references #1279

Merged

4 tasks

mihaibivol assigned mihaibivol and unassigned bittirousku Jul 6, 2016

jacquerie closed this as completed in #1279 Jul 19, 2016

jacquerie reopened this Jul 19, 2016

kaplun removed the Type: RFC label Apr 20, 2017

kaplun unassigned mihaibivol Apr 20, 2017

kaplun added Type: RFC and removed roadmap labels Apr 20, 2017

jacquerie closed this as completed Apr 23, 2017

ghost removed the Status: RFC label Apr 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data model: full support for references #430

Data model: full support for references #430

kaplun commented Oct 20, 2015 •

edited

Loading

kaplun commented Oct 20, 2015

annetteholtkamp commented Oct 20, 2015

kaplun commented Oct 20, 2015

annetteholtkamp commented Oct 20, 2015

aw-bib commented Oct 20, 2015

kaplun commented Oct 20, 2015

aw-bib commented Oct 21, 2015

kaplun commented May 3, 2016 •

edited

Loading

kaplun commented May 3, 2016

bittirousku commented May 3, 2016

eamonnmag commented Jul 19, 2016

kaplun commented Jul 19, 2016

eamonnmag commented Jul 19, 2016

jacquerie commented Jul 19, 2016

mihaibivol commented Jul 19, 2016

jacquerie commented Apr 23, 2017

Data model: full support for references #430

Data model: full support for references #430

Comments

kaplun commented Oct 20, 2015 • edited Loading

kaplun commented Oct 20, 2015

annetteholtkamp commented Oct 20, 2015

kaplun commented Oct 20, 2015

annetteholtkamp commented Oct 20, 2015

aw-bib commented Oct 20, 2015

kaplun commented Oct 20, 2015

aw-bib commented Oct 21, 2015

kaplun commented May 3, 2016 • edited Loading

kaplun commented May 3, 2016

bittirousku commented May 3, 2016

eamonnmag commented Jul 19, 2016

kaplun commented Jul 19, 2016

eamonnmag commented Jul 19, 2016

jacquerie commented Jul 19, 2016

mihaibivol commented Jul 19, 2016

jacquerie commented Apr 23, 2017

kaplun commented Oct 20, 2015 •

edited

Loading

kaplun commented May 3, 2016 •

edited

Loading