New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse BIPM Metrologia data from rawdata-bipm-metrologia #28
Comments
We need to action this issue ASAP due to BIPM request. The corresponding data sync work has been done by @CAMOBAP at: |
@ronaldtse there are two date types in the source: <pub-date pub-type="ppub">
<day>01</day>
<month>1</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="epub">
<day>18</day>
<month>2</month>
<year>2022</year>
</pub-date> Nick's suggestion is treating "epub" type as relation: <relation type="hasManifestation">
<bibitem>
<title>(same)</title>
<date>2022-02-18</date>
<medium><carrier>online resource</carrier></medium>
</bibitem>
</relation> |
This is a good point. We should take the earliest date of I think that even the original "ppub" (which stands for "print publication", according to JATS) should also be encoded as a new manifestation: <relation type="hasManifestation">
<bibitem>
<title>(same)</title>
<date>2022-01-01</date>
<medium><carrier>print</carrier></medium>
</bibitem>
</relation>
<relation type="hasManifestation">
<bibitem>
<title>(same)</title>
<date>2022-02-18</date>
<medium><carrier>traditional</carrier></medium>
</bibitem>
</relation> |
@ronaldtse it seems the data source doesn't provide URL's. |
Then we don't need to provide a URL. We do have DOIs, so that is sufficient. |
@ronaldtse yes, we do have DOIs for articles. But we also need to create issue documents with article relations, volume documents with issue relations, and root "Metrologia" documents with volume relations. Can we have these documents without URLs? |
I think so for the moment. Let me ask BIPM/IOPP to provide URLs for these entries. |
I have asked BIPM for URLs. For the moment, let's continue with URLs and file a ticket to keep track. |
BIPM's Janet Miles says we should use the DOI for URL for articles. For volume and issues, there are no DOIs. Let's use these URLs instead:
|
@ronaldtse the source file rawdata-bipm-metrologia/2022-04-05T10_55_52_content/0026-1394/0026-1394_37/0026-1394_37_5/0026-1394_37_5_68/me0568.xml misses page (article) number. It has the title "Index of Contributors" so it should have page 68 https://iopscience.iop.org/article/10.1088/0026-1394/37/5/68. Is it BIPM's mistake? UPD same for: |
@andrew2net have you re-pulled from this repo? The data path is different now. I can see in the first file: <article-id pub-id-type="manuscript">
68
</article-id>
<title-group>
<article-title xml:lang="en">
Index of Contributors
</article-title>
</title-group> The number In the second file: <article-id pub-id-type="manuscript">
001
</article-id>
<title-group>
<article-title xml:lang="en">
Editorial
</article-title>
</title-group> The |
@ronaldtse indeed. You are right about these documents, but most documents have an
has |
It seems so. What a strange encoding. |
Can you document this strange behavior in the README? Thanks. |
@ronaldtse if we use |
@ronaldtse here are duplicates in the source dataset:
|
@andrew2net sorry to get back late here. For these source duplications:
Thanks! |
In the 1, 5, and 6 cases the docs have difference in contributors. One doc has extra contributors. In the 2 case docs look identical, but one of them has ...
</front>
<back>
<ref-list content-type="numerical">
<title>References</title>
<ref id="metac7687bib1">
<label>1</label>
<element-citation publication-type="journal" xlink:type="simple">
<person-group person-group-type="author">
<name name-style="western">
<surname>Petit</surname>
<given-names>G</given-names>
</name>
<name name-style="western">
<surname>Jiang</surname>
<given-names>Z</given-names>
</name>
</person-group>
<year>2008</year>
<source>Int. J. Navig. Obs.</source>
<volume>2008</volume>
<fpage>1</fpage>
<lpage>8</lpage>
<page-range>1–8</page-range>
<pub-id pub-id-type="doi">10.1155/2008/562878</pub-id>
</element-citation>
</ref>
<ref id="metac7687bib2">
<label>2</label>
... I looks like relations. Shouldn't we parse the relations? In the 3 and 4 cases the docs look identical.
I think we should merge them
In these cases dates are identical. |
As described in relaton/relaton-data-bipm#17 .
This task supersedes #2 which implemented support to retrieve Metrologia bibliographic data from IOP but was unsatisfactory due to remote performance issues.
BIPM has now provided the full bibliographic data set of Metrologia, and we have an agreement in place with IOP Publishing, the publisher. The dataset is now at https://github.com/relaton/rawdata-bipm-metrologia (private access).
The work here is to parse that dataset into the relaton-data-bipm Relaton repository.
(The following information is also provided in README.adoc of the repository but included here for clarity)
The full set of bibliographic data comes in a zipped format in the following structure:
Subsequent updates will be provided also in the archived format.
The update archives have the same structure:
We need to parse this archive into a Relaton dataset.
Notice in the folder/file structure:
Contents of
metv1i1p1.xml
:Contents of
0026-1394_59_1A_08005.xml
:The text was updated successfully, but these errors were encountered: