Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documenting/improving ways to quickly check docbook xml syntax/correctness on editor save #72

Open
TysonAndre opened this issue Oct 21, 2022 · 2 comments

Comments

@TysonAndre
Copy link
Contributor

Motivation

Rendering a part of the docbook takes around 10 seconds for me, even for a partial build, and requires prerequisite steps

time phd --docbook doc-base/.manual.xml --package PHP --partial en/reference/simdjson --format xhtml

Some editors (e.g. vim) don't have xml validation built in, and rely on plugins using external programs such as xmllint (from libxml2-utils) to work, so documenting ways to set up xml validation would save time

Related to php/doc-en#1148

Feature Request

Add example scripts and editorconfigs to quickly check validity of individual xml files to doc-base/scripts.

This could possibly be extended by hardcoding known entities and warning about unknown entities, xml tag names, etc
(or by actually configuring the proper dtd files when run in the doc-base folder)

(other alternatives exist, but usually require external programs, e.g. https://github.com/vim-syntastic/syntastic/blob/master/syntax_checkers/xml/xmllint.vim - assume php documentation contributors would have php installed)

" Example additions to vimrc to check xml tags match up
function! XMLsynCHK()
  let winnum =winnr() " get current window number
  silent make %
  cw 4 " open the error window if it contains error
  " return to the window with cursor set on the line of the first error (if any)
  execute winnum . "wincmd w"
  :redraw!
endfunction
au! BufWritePost  *.xml    call XMLsynCHK()

au FileType xml,docbk setlocal makeprg=/path/to/doc-base/scripts/xmllint.php
au FileType xml,docbk setlocal errorformat=%m\ in\ %f\ on\ line\ %l
#!/usr/bin/env php
<?php // xmllint.php

/** @return never */
function print_usage_and_exit() {
    global $argv;
    fprintf(STDERR, "Usage: %s path/to/file.xml\n", $argv[0]);
    exit(1);
}

call_user_func(function () {
    error_reporting(E_ALL);
    ini_set('display_errors', E_ALL);
    global $argv;
    if (count($argv) !== 2) {
        print_usage_and_exit();
    }
    $file = $argv[1];
    if (!is_readable($file)) {
        fprintf(STDERR, "%s is not readable\n", var_export($file, true));
        print_usage_and_exit();
    }
    $contents = file_get_contents($file);
    if (!is_string($contents)) {
        fprintf(STDERR, "Could not read %s\n", var_export($file, true));
        print_usage_and_exit();
    }
    libxml_use_internal_errors(true);
    try {
        (new DOMDocument())->loadXML($contents, LIBXML_PARSEHUGE|LIBXML_COMPACT);
    } catch (Exception $e) { }
    foreach (libxml_get_errors() as $error) {
        $message = trim($error->message);
        if (preg_match('/^Entity.*not defined$/', $message)) {
            continue;
        }
        
        printf("%s in %s on line %d\n", $message, $file, $error->line);
    }
});

Brainstorming other ideas

  • For DOMDocument::schemaValidate - I see https://docbook.org/ns/docbook has no official schema. doc-base has RFC/schema for a proposed schema but the commit from 2010 notes "PhD doesn't use any of this"
  • I'm not familiar with the implementation of the tools. Currently, it seems like we have to generate the entire .manual.xml with the manual of all settings, to generate the html even for one page. (process on http:// site for http://doc.php.net/tutorial/local-setup.php )
  • I haven't yet looked into whether phd or configure.php can be changed to run on an error-tolerant way on a single file without building the full manual.xml file with every single page (or by using some other method faster for decoding and retrieval than parsing an entire xml file, e.g. putting all the definitions once in sqlite, caching it, and only querying the necessary rows later and on manual request)
@cmb69
Copy link
Member

cmb69 commented Oct 21, 2022

  • I see https://docbook.org/ns/docbook has no official schema. doc-base has RFC/schema for a proposed schema but the commit from 2010 notes "PhD doesn't use any of this"

DocBook switch from DTD to RelaxNG schemas quite a while ago. We're still stuck with our patched DocBook (4.5?) DTD schemas. These are not really relevant for PhD, though, but checked by configure.

Add example scripts and editorconfigs to quickly check validity of individual xml files to doc-base/scripts.

I don't think that this is feasible, since a single XML file is not really valid according to the DocBook DTD; it's just a part of what configure produces as doc-base/.manual.xml; only this file can be valid according to the DocBook DTD. It might be possible to introduce more caching, but than we might face one of two hard problems in computer science, namely cache invalidation.

Possibly related: php/doc-en#877

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Oct 21, 2022

php/web-php#556 also noting that the render step slowness is more noticeable for slow connections (or a wifi issue, in my case) for php/phd#70 - php/phd#70

curl would allow for connection reuse if downloading the css for the first time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants