Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bibliography: IEEE standards rendered with duplicated designators with/without TM #1114

Closed
ronaldtse opened this issue Feb 28, 2024 · 24 comments
Assignees
Labels

Comments

@ronaldtse
Copy link
Contributor

ronaldtse commented Feb 28, 2024

If there is "TM", then ONLY the "TM" version should be shown.

* [[[IEEE_2413_2019,IEEE Std 2413-2019]]]
* [[[IEEE_7010_2020,IEEE Std 7010-2020]]]
Screenshot 2024-02-28 at 3 18 37 PM
@opoudjis
Copy link
Contributor

Now filtering in relaton-render, but identifier codes as biblio-tag are being extracted separately in isodoc pref_ref_code(). The identifier should be rendered only once.

Need to refactor pref_ref_code(), so that it calls relaton-render, and uses its filter of relaton-render (which in turn needs to be enriched from pref_ref_code() ). This in turn requires refactoring so that relaton-render becomes available to base isodoc class as well as presentationXML branch.

@opoudjis
Copy link
Contributor

opoudjis commented Mar 2, 2024

This is surfacing that our parsing of authoritative identifiers was much more strict in metanorma-standoc than in relaton-render, by the artifice that if there are no primary identifiers in the bibitem, we only picked the first identifier, instead of actually doing proper filtering. We are generalising the filtering and moving it to relaton-render, so it is consistently realised:

  • If there are languages, take them into account in relaton-render filtering.
  • If there are identifiers of the same type but different scope, then ignore the scoped identifier. Per this ticket, IEEE, with scope = trademark, is the one exception: trademark scope is preferred.

@opoudjis
Copy link
Contributor

opoudjis commented Mar 2, 2024

Do not raise "cannot access URI" error on citation URIs in relaton-render that are not in fact online urls but file references.

@opoudjis opoudjis closed this as completed Mar 2, 2024
opoudjis added a commit that referenced this issue Mar 4, 2024
opoudjis added a commit to metanorma/metanorma-ieee that referenced this issue Mar 4, 2024
opoudjis added a commit to metanorma/metanorma-itu that referenced this issue Mar 4, 2024
opoudjis added a commit to metanorma/metanorma-ogc that referenced this issue Mar 4, 2024
opoudjis added a commit to metanorma/metanorma-iec that referenced this issue Mar 4, 2024
opoudjis added a commit to metanorma/metanorma-iho that referenced this issue Mar 4, 2024
opoudjis added a commit to metanorma/metanorma-bipm that referenced this issue Mar 4, 2024
@opoudjis
Copy link
Contributor

opoudjis commented Mar 5, 2024

Somehow (again, not sure how) the intended URI checking is now actually happening with the resulting refactor of relaton-render, rather than being just ignored if it doesn't. The URI checking is needed to insert an automated "last accessed" statement into the reference, which citations often expect.

As a result, we are now getting more timeout errors in GHA, and more demands of VCR cassettes in Presentation XML generation, to capture the URIs being looked up. May need to look into the method being used to check URIs (generic HEAD request on URI, accounting for redirects). May also make the lookup conditional on whether access date is required at all in the current template, since it is costly.

@opoudjis opoudjis reopened this Mar 5, 2024
@opoudjis
Copy link
Contributor

opoudjis commented Mar 5, 2024

Strangely, the URI check timeouts are only in Mac NIST samples; Ubuntu, Windows NIST samples run fine. https://github.com/metanorma/metanorma-cli/actions/runs/8147985377/job/22274376291

@opoudjis
Copy link
Contributor

opoudjis commented Mar 5, 2024

And NIST is giving relaton-render a lot of trouble with URI checking, possibly because of redirects:

BIBLIOGRAPHY WARNING: cannot access https://csrc.nist.gov/pubs/fips/201-2/final (but I can see it fine); https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-116r1.pdf (from a redirect, failing to access in relaton-render, but I can also see it fine).

@opoudjis
Copy link
Contributor

opoudjis commented Mar 5, 2024

I've replicated the error on my Mac.

  • Failed to open TCP connection to csrc.nist.gov:443 (execution expired)

We really don't want such errors propagating, so I need to rescue in relaton-render, if that's what's going on. Release still on hold.

@opoudjis
Copy link
Contributor

opoudjis commented Mar 5, 2024

mn-samples-nist/sources/800-53r4[main] $ bundle exec metanorma compile document.adoc
BIBLIOGRAPHY WARNING: cannot access https://csrc.nist.gov/pubs/fips/186-3/final

Then timeout crash, which is NOT fips/186-3/final, but the next URI being tested.

/Users/nickn/.rbenv/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `initialize'
/Users/nickn/.rbenv/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `open'
/Users/nickn/.rbenv/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `block in connect'
/Users/nickn/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/timeout-0.4.1/lib/timeout.rb:186:in `block in timeout'
/Users/nickn/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/timeout-0.4.1/lib/timeout.rb:193:in `timeout'
/Users/nickn/.rbenv/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1269:in `connect'
/Users/nickn/.rbenv/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1248:in `do_start'
/Users/nickn/.rbenv/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1237:in `start'
/Users/nickn/.rbenv/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1817:in `request'
/Users/nickn/.rbenv/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1741:in `request_head'
/Users/nickn/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/relaton-render-0.7.1/lib/relaton/render/general/render.rb:242:in `access_url'
      def access_url(url)
        req = Net::HTTP.new(url.host, url.port)
        req.use_ssl = (url.scheme == "https")
        path = url.path or return false
        path.empty? and path = "/"
        req.request_head(path)
      end 

@opoudjis
Copy link
Contributor

opoudjis commented Mar 5, 2024

May also make the lookup conditional on whether access date is required at all in the current template, since it is costly.

That's already in place.

@opoudjis
Copy link
Contributor

opoudjis commented Mar 5, 2024

Cache URL accesses in relaton-render, as they are so expensive. NIST website is having acknowledged issues right now, but this is going to be an ongoing concern.

@ronaldtse
Copy link
Contributor Author

@opoudjis this must be a bug because NIST data is managed under Relaton, there should not be any remote fetches at all.

Ping @andrew2net .

@opoudjis
Copy link
Contributor

opoudjis commented Mar 5, 2024

No, this is not relaton-nist, this is relaton-render validating URIs. Separate issue.

@andrew2net
Copy link
Contributor

@opoudjis this must be a bug because NIST data is managed under Relaton, there should not be any remote fetches at all.

Ping @andrew2net .

It's not always true. Some documents parsed from local file pubs-export.json, but there aren't IEEE Std 2413-2019 and IEEE Std 7010-2020 documents in the file. So the documents are fetched from the relaton-data-nist repo.
But, as I understand, @opoudjis needs to check if the documents' URLs on https://csrc.nist.gov are available, not the urls where the Relaton fetches the documents from.

@opoudjis
Copy link
Contributor

opoudjis commented Mar 5, 2024

Yes, and crsc.nist.gov is having lots of flakiness this week, the website warns about HTTP 503 errors (and I've got some even on the browser.)

I've done a hotfix to relaton-render, so I think the NIST samples will parse OK on Mac GHA. I am now setting a 2 sec threshold for timeout; and if you manually supply a span:date.accessed[ ] in the bibliography, there will be no lookup and date accessed provided. (Need to document that.) But ideally the URI validation should be more streamlined. I do pass bibliographic data as a list to relaton-render, for efficiency, so I should be able to do this check in preprocessing within relaton-render, with an async HTTP lookup.

@opoudjis
Copy link
Contributor

opoudjis commented Mar 5, 2024

Unpleasant discovery: the group processing of citations, render_all, which I am hoping to use async fetching on to optimise retrieval, is not adding accessed dates (calling enhance_data() ) on citations at all. The only reason fetches are happening now is because I am also calling bibrender in main stream of isodoc, to retrieve the reference tag separately. That's not great: dates accessed are not being attached as expected in Metanorma bibliographies, and URIs are being inefficiently processed too late.

What is needed is to do the async fetching of URIs when I am processing citations as a group; cache the URIs so retrieved; and play them back when I query records individually for their reference tags. (Ideally, in fact, I'd be catching the entire response.)

@opoudjis
Copy link
Contributor

opoudjis commented Mar 6, 2024

URI asynchronous fetching implemented; now need to ensure that the same renderer class instance is used to process citations and biblio tags, so that the bibliography and URLs are not processed twice.

@opoudjis
Copy link
Contributor

opoudjis commented Mar 6, 2024

This is heading towards a bibrenderer singleton, shared between isodoc main and Presentation XML...

... but in reality, this is just the result of a partial refactor. The real solution is to preempt the separate extraction and generation of biblio-tag in HTML, so it is already in place in Presentation XML. I'll store the retrieved authoritative identifiers from relaton-render as /docidentifier[@scope = 'biblio-tag'] in the Presentation XML, so I don't need to reparse them; it can have multiple values.

@opoudjis
Copy link
Contributor

opoudjis commented Mar 6, 2024

Done. All my Presentation XML specs change, so next release will be tedious, but this is going to pay off.

In NIST document: parsing NIST doc in isolation, date accessed appears as full date, but when I compile NIST 800-53r5, the "viewed" is given as only the year. Why?

... I think it's because I'm truncating dates to years in general, and I need to tell NIST not to for date accessed. But why is that not happening in sample doc?

@ronaldtse
Copy link
Contributor Author

ronaldtse commented Mar 6, 2024

I want to raise a question: do we always want to update last access date, or the viewed date, at every compile?

There are cases when a link is available at a previous time, but no longer available.

By forcing everyone to use this functionality of always accessing the URL when given means there’s no more leeway.

@opoudjis
Copy link
Contributor

opoudjis commented Mar 6, 2024

The workaround, which I'm about to document, is that they should be including span:date.accessed[...] in their bibliography. If they do, then I'm not adding anything, and I've been doing that check all along.

If they (a) do not provide a date last accessed, and (b) the standard requires a date last accessed, then (c) I confirm the link and provide a date. If I can't access the link, I don't say anything.

There is a workaround, which is that the document author does what they were supposed to do all along, and provides the date accessed themselves. That's plenty of leeway.

opoudjis added a commit to metanorma/metanorma.org that referenced this issue Mar 6, 2024
@opoudjis
Copy link
Contributor

opoudjis commented Mar 6, 2024

Well, we have two issues, which are forcing yet another code change.

First, I have a uniform date renderer, and part of that rendering is that it can remove days and months as part of rendering. But that criterion should not be applied indiscriminately to all dates: we need information about context, including the type of date being rendered, and (to be safe) the entire bibliographic record.

Second, perplexingly, the instance running in mn-samples-nist is invoking OGC's date rendering module, not NIST's.

@opoudjis
Copy link
Contributor

opoudjis commented Mar 6, 2024

OGC date formatter modified to consider date type --- although OGC does not use date accessed, so this will never be invoked.

OGC's date renderer was not subclassed properly, which is why it was overwriting NIST's.

And that was why:

... I think it's because I'm truncating dates to years in general, and I need to tell NIST not to for date accessed. But why is that not happening in sample doc?

Because that sample doc was processed in the metanorma-nist directory, not with all of metanorma-cli, so it wasn't being contaminated by metanorma-ogc.

@opoudjis
Copy link
Contributor

opoudjis commented Mar 6, 2024

Another oversight: relaton-render does not know about the rules for omit_docid_prefix, which make certain SDO prefixes to docid's omissible; e.g. no ITU ITU-R 3, or BSI BSI 3, or IETF RFC 3.

That's because I was caching the newly added docidentifier[@scope = 'biblio-tag'] on the first run of xrefs, but xrefs is rerun after references processing; so the caching needs to happen the second time round.

The real solution is to run docid_prefixes before the first run of xrefs processing.

opoudjis added a commit to relaton/relaton-render that referenced this issue Mar 6, 2024
opoudjis added a commit to relaton/relaton-render that referenced this issue Mar 6, 2024
@opoudjis opoudjis closed this as completed Mar 6, 2024
opoudjis added a commit to metanorma/isodoc that referenced this issue Mar 6, 2024
… URI fetches are cached between classes of isodoc; store authoritative identifiers retrieved from relaton-render in Presentation XML as docidentifier[scope = 'biblio-tag']: metanorma/metanorma-iso#1114
@opoudjis opoudjis reopened this Mar 6, 2024
ronaldtse pushed a commit to metanorma/metanorma.org that referenced this issue Mar 6, 2024
@opoudjis opoudjis closed this as completed Mar 6, 2024
@opoudjis
Copy link
Contributor

opoudjis commented Mar 7, 2024

Must filter on scope more aggressively: if there is no match on a type + scope (including scope = nil), return empty. We are adding type = nil scope = biblio-tag, and we were including that identifier as the only authoritative identifier of type nil; that's a mistake.

opoudjis added a commit to metanorma/metanorma-itu that referenced this issue Mar 7, 2024
opoudjis added a commit to metanorma/metanorma-ogc that referenced this issue Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

No branches or pull requests

3 participants