Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the official NISO STS XML sample conversion #407

Closed
Intelligent2013 opened this issue Mar 4, 2024 · 7 comments
Closed

Add the official NISO STS XML sample conversion #407

Intelligent2013 opened this issue Mar 4, 2024 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Mar 4, 2024

Add the official NISO STS XML https://www.niso-sts.org/downloadables/samples/NISO-STS-Standard-1-0.XML (from the page https://www.niso-sts.org/Samples.html) into JUnit tests Makefile.

@Intelligent2013
Copy link
Contributor Author

Intelligent2013 commented Mar 4, 2024

Metanorma Adoc: NISO-STS-Standard-1-0.adoc.zip

The issues:

  • include::sections/08-ansi/niso.adoc[] - / should be converted to _.
  • :date: release Approved: October 6, 2017 - this format isn't supported by Metanorma
    Source XML:
<release-date date-type="approved" iso-8601-date="2017-10-06">Approved: October 6, 2017</release-date>

I.e. release-date contains the presentation data, instead of 2017-10-06 (https://www.niso-sts.org/TagLibrary/niso-sts-TL-1-2-html/element/release-date.html). I don't want to parse this data by XSLT. Need adapt in adoc manually after conversion.

  • STS: Standards Tag Suite from the XML
<title-wrap>
<main-title-wrap>
<main>STS: Standards Tag Suite</main>
</main-title-wrap>
</title-wrap>
  • the sequence of the resulted adoc
include::sections/04-application.adoc[]

include::sections/02-normrefs.adoc[]

include::sections/06-references.adoc[]

based on Metanorma requirements, And need to be adapted manually after conversion.

  • from 07-definitions.adoc :
[[sec_7]]
== Definitions

ANSIAmerican National Standards Institute. ANSI is a private nonprofit organization that oversees the development of voluntary consensus standards in the U.S.

ASCIIASCII is a character-encoding scheme based on the English alphabet. It defines the encoding for 128 characters including A-Z, a-z, 0-9, some symbols, and some control characters.

from XML:

		<sec id="sec_7" sec-type="definitions">
			<label>7</label>
			<title>Definitions</title>
			<term-sec>
				<term-display>
					<term>ANSI</term>
					<def>
						<p>American National Standards Institute. ANSI is a private nonprofit organization that oversees the development of voluntary consensus standards in the U.S.</p>
					</def>
				</term-display>
			</term-sec>
			<term-sec>
				<term-display>
					<term>ASCII</term>
					<def>
						<p>ASCII is a character-encoding scheme based on the English alphabet. It defines the encoding for 128 characters including A-Z, a-z, 0-9, some symbols, and some control characters.</p>
						<p>It is sometimes used to refer to &#x201C;plain text,&#x201D; i.e., text without special characters or equations.</p>
					</def>
				</term-display>
			</term-sec>

mnconvert doesn't know about this practice of using term-sec/term-display/term|dec. I'll add the conversion rules.

Metanorma XML: NISO-STS-Standard-1-0.mn.zip

The issues:

  • missing bibdata:
<?xml version="1.0" encoding="UTF-8"?><iso-standard xmlns="https://www.metanorma.org/ns/iso" type="presentation">
<preface>
<foreword type="foreword" displayorder="1">

Notes:

  • currently, mnconvert tool indented to convert ISO,IEC,BSI documents, the conversion rules developed on the available input XMLs and some documentation (ISO/IEC)
  • NISO STS XML allows to widely represent the same data in the different XML structures, therefore the conversion rules need to be changed/added for concrete source data organization.
  • reverse conversion from Metanorma XML to NISO STS XML also is a big task, and depends on the concrete organization practice.

@Intelligent2013
Copy link
Contributor Author

Intelligent2013 commented Mar 29, 2024

  • the source NISO STS XML contains new (unknown for mnconvert) elements:
<front>
	<std-doc-meta>
		...
		<std-ident>
			...
			<std-id-group std-relationship-type="std-as-published">
				<std-id std-id-type="undated">ANSI/NISO Z39.102</std-id>
				<std-id std-id-type="dated">ANSI/NISO Z39.102-2017</std-id>
			</std-id-group>
			<isbn publication-format="HTML">978-1-937522-77-3</isbn>
			<isbn publication-format="PDF">978-1-937522-78-0</isbn>
			<issn specific-use="National Information standards series">1041-5653</issn>
		</std-ident>
		<std-org-group>
			<std-org std-org-role="developer">
				<std-org-name>National Information Standards Organization</std-org-name>
				<std-org-abbrev>NISO</std-org-abbrev>
			</std-org>
			<std-org>
				<std-org-loc>
					<addr-line>NISO</addr-line>
					<addr-line>3600 Clipper Mill Road</addr-line>
					<addr-line>Suite 302</addr-line>
					<addr-line>Baltimore, MD 21211-1948</addr-line>
					<ext-link ext-link-type="uri" xlink:href="https://www.niso.org">www.niso.org</ext-link>
				</std-org-loc>
			</std-org>
		</std-org-group>
		<content-language>en</content-language>
		<std-ref>ANSI/NISO Z39.102-2017</std-ref>
		...
		<accrediting-organization accredit-acronym="ANSI">American National Standards Institute</accrediting-organization>
		<authorization authorize-acronym="ANS">An American National Standard</authorization>
		<permissions>
			<copyright-statement>Copyright &#x00A9; 2017 by the National Information Standards Organization</copyright-statement>
			<license>
				<license-p>All rights reserved under International and Pan-American Copyright Conventions. For noncommercial purposes only, this publication may be reproduced or transmitted in any form or by any means without prior permission in writing from the publisher, provided it is reproduced accurately, the source of the material is identified, and the NISO copyright status is acknowledged. All inquiries regarding translations into other languages or commercial reproduction or distribution should be addressed to: NISO, 3600 Clipper Mill Road, Suite 302, Baltimore, MD 21211-1948.</license-p>
			</license>
		</permissions>
		<abstract>
			<title>Abstract:</title>
			<p>The Standards Tag Suite (STS) provides a common XML format that developers, publishers, and distributors of standards, including national standards bodies, regional and international standards bodies, and standards development organizations can use to publish and exchange full-text content and metadata of standards. STS is based on ANSI/NISO Z39.96 (JATS). Structures are provided to encode both the normative and non-normative content of: standards, adoptions of standards, and standards-like documents that are produced by standards organizations.</p>
		</abstract>
		<meta-note content-type="cover-address">
			<p>Published by the National Information Standards Organization</p>
			<p>Baltimore, Maryland, U.S.A.</p>
		</meta-note>
		<meta-note content-type="title-page">
			<title>About NISO Standards</title>
			<p>NISO standards are developed by Working Groups of the National Information Standards Organization under the oversight of a Topic Committee. The development process is a strenuous one that includes a rigorous peer review of proposed standards open to each NISO Voting Member and any other interested party. Final approval of the standard involves verification by the American National Standards Institute that its requirements for due process, consensus, and other approval criteria have been met by NISO. Once verified and approved, NISO Standards also become American National Standards.</p>
			<p>These standards may be revised or withdrawn at any time. For current information on the status of this standard contact the NISO office or visit the NISO website at: <ext-link ext-link-type="uri" xlink:href="https://www.niso.org">https://www.niso.org</ext-link>
			</p>
		</meta-note>
	</std-doc-meta>

@Intelligent2013
Copy link
Contributor Author

Intelligent2013 commented Mar 29, 2024

@ronaldtse what is the main goal of the conversion 'NISO STS XML tagged version of the standard' into Metanorma Adoc and XML? Do we need round trip conversion?

@ronaldtse
Copy link
Contributor

@Intelligent2013 we want to be able to handle the elements produced in the sample in Metanorma. We don't need a round trip.

@Intelligent2013
Copy link
Contributor Author

The attribute :docnumber: Z39.102 causes the error:

bundle exec metanorma -t iso -x presentation test.adoc
Fatal Error: Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 1.
cause: Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 1.
`- Expected one of [TC_DOCUMENT_BODY, STD_DOCUMENT_BODY, DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?] at line 1 char 1.
   |- Failed to match sequence ((tctype:TCTYPE '/'?){0, } SPACE tcnumber:DIGITS ('/' ((sctype:SCTYPE SPACE scnumber:DIGITS '/')? wgtype:WGTYPE SPACE wgnumber:DIGITS / sctype:SCTYPE (SPACE / '/' wgtype:WGTYPE SPACE) scnumber:DIGITS))? SPACE 'N' SPACE? number:DIGITS) at line 1 char 1.
   |  `- Expected " ", but got "Z" at line 1 char 1.
   |- Failed to match sequence ((TYPE / stage:STAGE iteration:DIGITS?)? SPACE? ((stage:STAGE / stage:TYPED_STAGE / TYPE) SPACE)? number:DIGITS ('|' joint_document:(publisher:'IDF' SPACE number:DIGITS))? PART? ITERATION? (SPACE? (':' / DASH) YEAR)? SUPPLEMENT? EXTRACT? ADDENDUM? EDITION? LANGUAGE?) at line 1 char 1.
   |  `- Expected at least 1 of \\d at line 1 char 1.
   |     `- Failed to match \\d at line 1 char 1.
   `- Failed to match sequence (DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?) at line 1 char 1.
      `- Extra input after last repetition at line 1 char 1.
         `- Failed to match sequence (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY)) at line 1 char 1.
            `- Expected " + ", but got "Z39" at line 1 char 1.
C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/pubid-core-1.12.5/lib/pubid/core/identifier/base.rb:166:in `rescue in parse': Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 1. (Pubid::Core::Errors::ParseError)
cause: Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 1.
`- Expected one of [TC_DOCUMENT_BODY, STD_DOCUMENT_BODY, DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?] at line 1 char 1.
   |- Failed to match sequence ((tctype:TCTYPE '/'?){0, } SPACE tcnumber:DIGITS ('/' ((sctype:SCTYPE SPACE scnumber:DIGITS '/')? wgtype:WGTYPE SPACE wgnumber:DIGITS / sctype:SCTYPE (SPACE / '/' wgtype:WGTYPE SPACE) scnumber:DIGITS))? SPACE 'N' SPACE? number:DIGITS) at line 1 char 1.
   |  `- Expected " ", but got "Z" at line 1 char 1.
   |- Failed to match sequence ((TYPE / stage:STAGE iteration:DIGITS?)? SPACE? ((stage:STAGE / stage:TYPED_STAGE / TYPE) SPACE)? number:DIGITS ('|' joint_document:(publisher:'IDF' SPACE number:DIGITS))? PART? ITERATION? (SPACE? (':' / DASH) YEAR)? SUPPLEMENT? EXTRACT? ADDENDUM? EDITION? LANGUAGE?) at line 1 char 1.
   |  `- Expected at least 1 of \\\\d at line 1 char 1.
   |     `- Failed to match \\\\d at line 1 char 1.
   `- Failed to match sequence (DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?) at line 1 char 1.
      `- Extra input after last repetition at line 1 char 1.
         `- Failed to match sequence (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY)) at line 1 char 1.
            `- Expected " + ", but got "Z39" at line 1 char 1.
        from C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/pubid-core-1.12.5/lib/pubid/core/identifier/base.rb:161:in `parse'
        from C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/metanorma-iso-2.7.6/lib/metanorma/iso/front_id.rb:60:in `orig_id_parse'
        from C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/metanorma-iso-2.7.6/lib/metanorma/iso/front_id.rb:47:in `iso_id_params'
        from C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/metanorma-iso-2.7.6/lib/metanorma/iso/front_id.rb:37:in `iso_id'
        from C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/metanorma-iso-2.7.6/lib/metanorma/iso/front_id.rb:13:in `metadata_id'
        from C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/metanorma-standoc-2.8.7/lib/metanorma/standoc/front.rb:168:in `metadata'
        from C:/tools/ruby31/lib/ruby/gems/3.1.0/gems/metanorma-standoc-2.8.7/lib/metanorma/standoc/base.rb:117:in `block in front'
...

@Intelligent2013
Copy link
Contributor Author

Intelligent2013 commented Apr 13, 2024

  • wrong named-content conversion from
<list-item>
	<p>
		<named-content content-type="organization">American Library Association (ALA)</named-content>
	</p>
	<p>
		<named-content content-type="committee-member-name">Jill Emery</named-content>
	</p>
</list-item>

to

* {{American-Library-Association--ALA-,American Library Association (ALA)}}
+
--
{{Jill-Emery,Jill Emery}}
--
  • 00-01-std-doc-meta.adoc contains redundant data (document attributes):
:title-main-en: STS: Standards Tag Suite
= Z39.102
National Information Standards Organization:publisher: NISO
:pub-address: NISO + \
3600 Clipper Mill Road + \
Suite 302 + \
Baltimore, MD 21211-1948 + \
www.niso.org
enANSI/NISO Z39.102-20172017-10-06:semantic-metadata-accrediting-organization: American National Standards Institute
:authorizer: An American National Standard, ANS
:semantic-metadata-copyright-statement: Copyright © 2017 by the National Information Standards Organization
:semantic-metadata-license: All rights reserved ...

[.preface,type=cover-address]
== {blank}

Published by the National Information Standards Organization
  • wrong bibitem markup in '06-references.adoc`:
[[ref_6]]
[%bibitem]
=== _Access License and Indicators_, 5 January 2015. https://www.niso.org/apps/group_public/download.php/14226/rp-22-2015_ALI.pdf[https://www.niso.org/apps/group_public/download.php/14226/rp-22-2015_ALI.pdf]
docid::
id::: NISO RP-22-2015
type:: standard
  • missing [bibliography] in '06-references.adoc`

@Intelligent2013
Copy link
Contributor Author

Intelligent2013 commented Apr 13, 2024

  • named-content, for instance:
<named-content content-type="element-name">&lt;contrib&gt;</named-content>

should be converted to semantic spans span:category[text] (https://www.metanorma.org/author/topics/inline_markup/text_formatting/#semantic-spans), i.e.:

span:element-name[&lt;contrib&gt;]

Intelligent2013 added a commit that referenced this issue Apr 18, 2024
official NISO STS XML sample conversion added to Makefile, #407
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

2 participants