Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_manpage_completions.py: introduce Type5ManParser which is capable of parsing scdoc manpages; add 'return' statements to other parsers; cleanup #7187

Merged
merged 8 commits into from Jul 11, 2020

Conversation

MaxVerevkin
Copy link
Contributor

@MaxVerevkin MaxVerevkin commented Jul 9, 2020

Description

Some man pages (in my case swappy(1)) that was created with scdoc had issues with autocompletion.

Before

image

After

image

Affects

In my case man pages for those programms are now correctly parsed by Type5ManParser instead of TypeDeroffManParser:
grim, mako, slurp, sway, swappy, swaymsg, swaynag, speedtest-cli and some more

Bugs

Some autocompletions are not generated any more because previous logic was broken

@MaxVerevkin
Copy link
Contributor Author

Bug: poor performance, will fix soon

@MaxVerevkin
Copy link
Contributor Author

MaxVerevkin commented Jul 9, 2020

Fixed
Edit: but some options are now ignored, will fix
Edit2: Fixed

@MaxVerevkin
Copy link
Contributor Author

Now fish_update_completions is twice faster!

@krobelus
Copy link
Member

krobelus commented Jul 9, 2020

Indeed twice as fast! Judging from a quick profile, the replaced TypeDeroffManParser seems to be the bottleneck.

The manpage for runc-checkpoint does not yet parse correctly. Possible fix:

diff --git a/share/tools/create_manpage_completions.py b/share/tools/create_manpage_completions.py
index ab5692f91..20237d628 100755
--- a/share/tools/create_manpage_completions.py
+++ b/share/tools/create_manpage_completions.py
@@ -519,3 +519,3 @@ class Type5ManParser(ManParser):
-        options_section_regex = re.compile("\.SH OPTIONS(.*?)\.SH", re.DOTALL)
+        options_section_regex = re.compile("\.SH OPTIONS(.*?)(\.SH|$)", re.DOTALL)
@@ -1017,3 +1017,3 @@ def get_paths_from_man_locations():
runc-checkpoint(8)
.nh
.TH runc\-checkpoint "8"

.SH NAME
.PP
runc checkpoint \- checkpoint a running container


.SH SYNOPSIS
.PP
runc checkpoint [command options] \fB\fC<container\-id>\fR

.PP
Where "\fB\fC<container\-id>\fR" is the name for the instance of the container to be
checkpointed.


.SH DESCRIPTION
.PP
The checkpoint command saves the state of the container instance.


.SH OPTIONS
.PP
.RS

.nf
\-\-image\-path value           path for saving criu image files
\-\-work\-path value            path for saving work files and logs
\-\-parent\-path value          path for previous criu image files in pre\-dump
\-\-leave\-running              leave the process running after checkpointing
\-\-tcp\-established            allow open tcp connections
\-\-ext\-unix\-sk                allow external unix sockets
\-\-shell\-job                  allow shell jobs
\-\-lazy\-pages                 use userfaultfd to lazily restore memory pages
\-\-status\-fd value            criu writes \\0 to this FD once lazy\-pages is ready
\-\-page\-server value          ADDRESS:PORT of the page server
\-\-file\-locks                 handle file locks, for safety
\-\-pre\-dump                   dump container's memory information only, leave the container running after this
\-\-manage\-cgroups\-mode value  cgroups mode: 'soft' (default), 'full' and 'strict'
\-\-empty\-ns value             create a namespace, but don't restore its properties
\-\-auto\-dedup                 enable auto deduplication of memory images

.fi
.RE

Also a quick diff -r with old and new generated completions shows that some completions are missing, for example in this man page it only generates one completion when there should be two:

411toppm(1)
\
.\" This man page was generated by the Netpbm tool 'makeman' from HTML source.
.\" Do not hand-hack it!  If you have bug fixes or improvements, please find
.\" the corresponding HTML page on the Netpbm website, generate a patch
.\" against that, and send it to the Netpbm maintainer.
.TH "411toppm User Manual" 0 "03 March 2001" "netpbm documentation"

.UN ixAAB
.UN lbAB
.SH NAME
411toppm - convert Sony Mavica .411 image to PPM
.UN lbAC
.SH SYNOPSIS

\fB411toppm\fP
[\fB-width \fP\fIwidth\fP]
[\fB-height \fP\fIheight\fP]
[\fI411file\fP]

.UN lbAD
.SH DESCRIPTION
.PP
This program is part of
.BR Netpbm (1)
.
.PP
 \fB411toppm\fP reads a .411 file, such as from a Sony Mavic
camera, and converts it to a PPM image as output.
.PP
Output is to Standard Output.
.PP
The originator of this program and decipherer of the .411 format,
Steve Allen
<\fIsla@alumni.caltech.edu\fP>,
has this to say about the
utility of this program: 'There's so little image in a 64x48 thumbnail
(especially when you have the full size JPG file) that the only point
in doing this was to answer the implicit challenge posed by the manual
stating that only the camera can use these files.'

.UN lbAE
.SH OPTIONS
.PP
All options may be abbreviated to the shortest unique prefix.


.TP
\fB-width\fP
The width (number of columns) of the input image.  Default is 64.
.TP
\fB-height\fP
The height (number of rows) of the input image.  Default is 48.


.UN lbAF
.SH SEE ALSO
.BR ppm (5)

Another thing: the -s/--stdout option no longer seems to work.

python share/tools/create_manpage_completions.py /usr/share/man/man1/man.1.gz -s
Traceback (most recent call last):
  File "share/tools/create_manpage_completions.py", line 942, in parse_and_output_man_pages
    if parse_manpage_at_path(manpage_path, output_directory):
  File "share/tools/create_manpage_completions.py", line 867, in parse_manpage_at_path
    fullpath = os.path.join(output_directory, CMDNAME + ".fish")
  File "/usr/lib/python3.8/posixpath.py", line 76, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

@MaxVerevkin
Copy link
Contributor Author

MaxVerevkin commented Jul 9, 2020

@krobelus

The manpage for runc-checkpoint does not yet parse correctly. Possible fix:

runc-checkpoint man page has very different syntax compared to what 'Type5ManParser' can handle. I've added 'None' check so it will be parsed by TypeDeroffManParser (As it is now, right?)

411toppm is fixed

Another thing: the -s/--stdout option no longer seems to work.

I'll look why it happens.

@MaxVerevkin
Copy link
Contributor Author

@krobelus

Another thing: the -s/--stdout option no longer seems to work.

It seems to work on my machine.

Can you tell me on which man page does it happen?

@krobelus
Copy link
Member

krobelus commented Jul 9, 2020

Another thing: the -s/--stdout option no longer seems to work.

It seems to work on my machine.

Ah sorry that was my bad, it failed because some of my local modifications.

@MaxVerevkin
Copy link
Contributor Author

MaxVerevkin commented Jul 9, 2020

@krobelus Any other man pages that this PR breaks?

@krobelus
Copy link
Member

krobelus commented Jul 9, 2020

Yeah, we are still losing some completions:

$ rm -rf completions.old; mkdir completions.old; python share/tools/create_manpage_completions.py --manpath -d completions.old
$ rm -rf completions.new; mkdir completions.new; python share/tools/create_manpage_completions.py --manpath -d completions.new
$ diff -ru completions.{old,new} | grep -v '^.# using' | grep ^- | wc -l
10896
$ diff -ru completions.{old,new} | grep -v '^.# using' | grep ^+ | wc -l
3894

For example the completion for ? for acyclic from Graphviz is missing, reproduce by running

$ python share/tools/create_manpage_completions.py -s /usr/share/man/man1/acyclic.1.gz

@MaxVerevkin
Copy link
Contributor Author

@krobelus Fixed, but I don't understand why it worked before. The only thing I've changed in Type2ManParser is return True after while loop, which is logical. I think because there was no return True Type2ManParser was considered as failed, so mostly all completions was parsed by TypeDeroffManParser and that's why it was so slow.

@krobelus
Copy link
Member

krobelus commented Jul 9, 2020

Well that's awkward - Type{1,2,3}ManParser were dead for all those years since they were considered failed.
Good thing you noticed!

I can confirm the acyclic completion is fixed. However there is still a net diff of almost 6000 completions missing on my system. (Measured with for p in - +; diff -ru completions.{old,new} | grep '^'$p'complete' | wc -l; end)
Most lost completions are cases where the last option is no longer recognized, just like for acyclic.

We could fix the revived parsers. If you don't want to do this in this PR, then it's probably a good idea to explicitly disable them. We can always fix them later if needed.

@MaxVerevkin
Copy link
Contributor Author

@krobelus

Most lost completions are cases where the last option is no longer recognized, just like for acyclic.

Yes, and most of them are Type2ManParser, so it shouldn't be hard to fix.

We could fix the revived parsers. If you don't want to do this in this PR, then it's probably a good idea to explicitly disable them. We can always fix them later if needed.

I think it's better to fix them now.

@MaxVerevkin
Copy link
Contributor Author

No, it is hard. I'll disable Type2ManParser for now, since it was 'disabled' for a long time and needs complex review.

@MaxVerevkin
Copy link
Contributor Author

I enabled using ... line in current implementation and counted how many man pages was generated by each parser:

Type1: 135
Type2: 0
Type3: 0
Type4: 0
TypeDarwin: 39
TypeDeroff: 1556

And new implementation (with Type2 and Type3 disabled):

Type1: 461
Type4: 0
Type5: 21
TypeDarwin: 39
TypeDeroff: 1215

Counted with grep -r 'using {Type}' {old,new} | wc -l

All of those 21 man pages are parsed correctly except hunspell and dirmngr, but dirmngr has complicated logic (it has commands and options separated) and I think it's better to ignore it and write completions manually in share/completions/

@krobelus
Copy link
Member

Alright the type 5 ones that are made with scdoc work nicely. However, it breaks the generated completions for less, curl, mosh, qemu-img and tar. Can we somehow detect whether the manpage was generated with scdoc, or be a bit more strict about when to use the type 5 format?

It looks like removing type 2 somehow breaks the completions for alsa_in from jack, even though the deroff parser is used either way. Also the generated completions for fallocate break. The type 2 parser must be mutating some state.

...
-complete -c fallocate -s c -l collapse-range -d 'Removes a byte range from a file, without leaving a hole'
-complete -c fallocate -s d -l dig-holes -d 'Detect and dig holes'
-complete -c fallocate -s i -l insert-range -d 'Insert a hole of  length bytes from R offset , shifting existing data'
+complete -c fallocate -l collapse-range -l dig-holes -l punch-hole -d and
...

Sadly, man pages are not standardized so trying to change the parsing is like opening a can of worms. I agree that the parser need not handle complicated formats but we should not generate worse completions for common man page formats. I'm sure we can find a solution that is an overall improvement.

@MaxVerevkin
Copy link
Contributor Author

Yes, we can identify if man page was created by scdoc and It is better to make Type5 handle only those.

It would be so much better if man pages was standardized, and not just troff(?) documents.

@MaxVerevkin
Copy link
Contributor Author

Now Type5 parses only 10 pages and it seems that all of them are correct.

krobelus and others added 8 commits July 11, 2020 15:46
…rser

This improves some generated completions, for example:

	diff -u completions.old/g3topbm.fish completions.new/g3topbm.fish
	+complete -c g3topbm -o stop_error -d 'This option tells g3topbm to fail when it finds a problem in the input'
	-complete -c g3topbm -o stop_error
…capable of parsing scdoc manpages

This greatly improves generated completions for scdoc man pages, see #7187.
@krobelus
Copy link
Member

I reshuffled the commits to separate refactoring commits from behavior changes.

Net diff below, most importantly:

  • do not return True in parse_man_page because it makes many generated completions worse
  • use the correct capture group to avoid including section headers

Merging, thanks for this contribution!

$ git diff -U0 3c52c7 aff2e7 -- share/tools/
diff --git a/share/tools/create_manpage_completions.py b/share/tools/create_manpage_completions.py
index e90cefb59..97e48e49a 100755
--- a/share/tools/create_manpage_completions.py
+++ b/share/tools/create_manpage_completions.py
@@ -281 +281 @@ class Type1ManParser(ManParser):
-        options_section = re.search(options_section_regex, manpage).group(0)
+        options_section = re.search(options_section_regex, manpage).group(1)
@@ -320 +319,0 @@ class Type1ManParser(ManParser):
-        return True
@@ -394 +393 @@ class Type2ManParser(ManParser):
-        options_section = re.search(options_section_regex, manpage).group(0)
+        options_section = re.search(options_section_regex, manpage).group(1)
@@ -425,2 +423,0 @@ class Type2ManParser(ManParser):
-        return True
-
@@ -433 +430 @@ class Type3ManParser(ManParser):
-        options_section = re.search(options_section_regex, manpage).group(0)
+        options_section = re.search(options_section_regex, manpage).group(1)
@@ -467,2 +463,0 @@ class Type3ManParser(ManParser):
-        return True
-
@@ -476 +471 @@ class Type4ManParser(ManParser):
-        options_section = re.search(options_section_regex, manpage).group(0)
+        options_section = re.search(options_section_regex, manpage).group(1)
@@ -510,4 +505 @@ class Type4ManParser(ManParser):
-        return True
-
-# This parser works with and only with scdoc man pages
-class Type5ManParser(ManParser):
+class TypeScdocManParser(ManParser):
@@ -841 +833 @@ def parse_manpage_at_path(manpage_path, output_directory):
-            Type5ManParser(),
+            TypeScdocManParser(),
@@ -880,2 +872,2 @@ def parse_manpage_at_path(manpage_path, output_directory):
-                "\n# Autogenerated from man page " + manpage_path +
-                "\n# using " + parser.__class__.__name__)
+                "\n# Autogenerated from man page " + manpage_path)
+        # built_command_output.insert(2, "# using " + parser.__class__.__name__) # XXX MISATTRIBUTES THE CULPABLE PARSER! Was really using Type2 but reporting TypeDeroffManParser

@krobelus krobelus merged commit 2083ace into fish-shell:master Jul 11, 2020
@MaxVerevkin
Copy link
Contributor Author

MaxVerevkin commented Jul 11, 2020

@krobelus

do not return True in parse_man_page because it makes many generated completions worse

So, if I get it right, without returning True many parses parse one man page and it does makes sense, except for parser in parsersToTry: loop, we can explicitly run every parser and it would be more clear and straight forward.

I mean without

# Make sure empty files aren't reported as success
        if not built_command_output:
            success = False
        if success:
            PARSER_INFO.setdefault(parser.__class__.__name__, []).append(CMDNAME)
            break

@MaxVerevkin
Copy link
Contributor Author

Anyway, it all works now. Thanks for merging!

@krobelus
Copy link
Member

The scdoc parser is meant to succeed, in that case we do want to break (though I did not check if it makes a difference). I'll add a comment because this is just too confusing. There are probably some more ways to improve this script.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants