Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonschema: pumped up references #1279

Merged
merged 9 commits into from
Jul 19, 2016

Conversation

mihaibivol
Copy link
Contributor

@mihaibivol mihaibivol commented Jun 30, 2016

Provide a hep-like reference field inside hep references

Closes #430

  • Add reference new schema (similar to hep)
  • Fix mapping - it was fixed already
  • Amend dojson to work with this new schema
  • Amend hepcrawl to work with this new schema Moved to next bullet
  • Amend holdingpen to save this new schema Refs not use in holdingpen
  • Amend HEP Detail View to work with this new schema
  • Create follow-up issue for merging good ideas into the hep schema

Follow-up + extra findings

@mihaibivol mihaibivol self-assigned this Jun 30, 2016
@mihaibivol mihaibivol added the WIP label Jun 30, 2016
@mihaibivol mihaibivol mentioned this pull request Jun 30, 2016
17 tasks
"type": "array",
"title": "Document type"
},
"urls": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

999C5u

@mihaibivol
Copy link
Contributor Author

mihaibivol commented Jul 1, 2016

We are missing

  • 999C5m (misc)
  • 999C5[12] - won't need them
  • 999C5e (might merge them in authors + put some role in there) @kaplun maybe we want this to be compatible with the first draft in unifying people
  • hdl

We have extra fields with no MARC counterpart:

  • imprint -> may come from publishers
  • book_series -> may come from publishers
  • arxiv_eprints - @kaplun where to populate these from (at least in the MARC case)
  • persistent_identifiers (we can keep them though, even if they don't actually have a source in the schema) @kaplun do we add source?

@mihaibivol
Copy link
Contributor Author

We are missing

  • 999C5[12] - won't need them

We added back

  • 999C5m - misc will be here for a while
  • 999C5e - merged into authors + role 'ed.'
  • hdl ids will go into persistent identifiers

We have extra fields with no MARC counterpart:

  • imprint -> may come from publishers
  • book_series -> may come from publishers
  • arxiv_eprints - @kaplun where to populate these from (at least in the MARC case)
  • persistent_identifiers (we can keep them though, even if they don't actually have a source in the schema) @kaplun do we add source?

@mihaibivol
Copy link
Contributor Author

cc @annetteholtkamp
https://github.com/mihaibivol/inspire-next/blob/06d034b60a910a3545fe2a076870c48fdba73fe2/inspirehep/modules/records/jsonschemas/records/elements/reference.json here's the proposed file for porting 999C5

Also I commented in this issue how the fields are ported from the old data model to the current one.



RE_VALID_PUBNOTE = re.compile(".*,.*,.*(,.*)?")
RE_VALID_PUBNOTE = re.compile(r'.*,.*,.*(,.*)?')
RE_VALID_ARXIV_REP_NO = re.compile(r'(arxiv:)?\d{4}.\d{4,5}|\w+-\w+/\d+|\w+/\d+r')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I am surprised. Where have you taken these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were in the hep schema. For the report number I just saw that there are a lot of arXiv prepended report numbers.

texkey = force_single_element(value.get('1'))

# Publication info specific.
cnum = force_single_element(value.get('b'))
Copy link
Contributor

@kaplun kaplun Jul 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CNUM have a regexp to check them C\d\d-\d\d-\d\d(\.\d+)? (I think)

@kaplun
Copy link
Contributor

kaplun commented Jul 12, 2016

BTW as part of your or branch drop the WIP from my commit :)

mihaibivol and others added 3 commits July 18, 2016 16:04
Signed-off-by: Mihai Bivol <mm.bivol@gmail.com>
Signed-off-by: Samuele Kaplun <samuele.kaplun@cern.ch>
* Adds rules for builidng the new reference schema.

Signed-off-by: Mihai Bivol <mihai.bivol@cern.ch>
@mihaibivol mihaibivol force-pushed the references-schema branch 2 times, most recently from 35b396a to 9cccf7f Compare July 18, 2016 14:07
@mihaibivol mihaibivol removed the WIP label Jul 18, 2016
@mihaibivol mihaibivol changed the title WIP jsonschema: pumped up references jsonschema: pumped up references Jul 18, 2016
{% endif %}
{% if reference['isbn'] %}
<span class="reference-detail">{{ reference['isbn'] }}</span>
{% if reference['misc'] %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use dotted notation consistently?

@jmartinm
Copy link
Contributor

Just went through it and did not see anything to change at first sight 👍

},
"serialization": {
"type": "string",
"description": "E.g. refextract, text, JATS, Elsevier, BibTeX..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what a serialization is ._.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should call it source as in other parts of the schema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @kaplun

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsevier references come directly in a json format, while others come in text. I think the examples are not particularly good. We could actually have both source and format (text, json, bibtex, xml)

Signed-off-by: Mihai Bivol <mihai.bivol@cern.ch>
Signed-off-by: Mihai Bivol <mihai.bivol@cern.ch>
Signed-off-by: Mihai Bivol <mihai.bivol@cern.ch>
* Uses reference macros for Reference datatables.
* Fixes small error in `title` field handling for references.
* Adds prepend_text optinal param for publication_info macro.

Signed-off-by: Mihai Bivol <mihai.bivol@cern.ch>
Signed-off-by: Mihai Bivol <mihai.bivol@cern.ch>
* Stands as an example for changing a schema using builders.

Signed-off-by: Mihai Bivol <mihai.bivol@cern.ch>
'e': [a.get('full_name') for a in value.get('authors', [])
if a.get('role') == 'ed.'],
'h': [a.get('full_name') for a in value.get('authors', [])
if a.get('role') != 'ed.'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost tempted to add a condition lambda to get_value : )

@jacquerie
Copy link
Contributor

LGTM 🚢

@jacquerie jacquerie merged commit 104d520 into inspirehep:master Jul 19, 2016
@mihaibivol mihaibivol deleted the references-schema branch July 19, 2016 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data model: full support for references
4 participants