Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider trailing whitespace a part of the prompt, making copy/paste more straight forward #1645

merged 2 commits into from
Dec 28, 2020


Copy link

Consider white space between prompt and command to be part of the prompt. this usually makes copy/paste more straight forward.

For example, if highlighting this shell session:

user@host:~$ ls --many-parameters-here

Before the patch, attempting to select the command with the mouse for copy & paste will include the space between $ and ls. With the patch, any number of spaces will be considered part of the prompt is now not selected any more. This is convenient because the space will cause (at least) bash to not add the command to the history of commands.

According to man (1) bash, the default PS1 is \s-\v\$ and Debian sets ${debian_chroot:+($debian_chroot)}\u@\h:\w\$ . I believe the vast majority of real world prompts will include a space.

An alternative implementation could make spaces an extra special token that cannot be selected. but I don't know the right class for that. Another variation could be to really consider only the first space as part of the prompt. Please let me know if you prefer any of that.

Regarding tests: I don't quite get why the first element in test_end_of_line_nums changes to Token.Name.Builtin, but the output at least for HTML otherwise looks fine. Please let me know if this is somehow a problem (and I appreciate pointers on how to fix this ;-))

Copy link

Anteru commented Dec 25, 2020

Would it help to classify it as Whitespace instead of Text? That seems ... more appropriate actually, but I don't know how that fixes the copy/paste issue for you.

Copy link
Contributor Author

I'm not to deep into pygments code of course, so I hope I get you right. I've tried to create subclass lexer that makes the whitespace a pygments.token.Whitespace, and ended up with this HTML:

<span class="gp">user@host:~/ca/$</span><span class="w"> </span>

The whitespace is still selected, so at least this doesn't help, I'm still able to select the whitespace.

Copy link

Anteru commented Dec 25, 2020

Ok, just to understand this properly: If you change it to be of part of the prompt, it doesn't get selected, but if it's whitespace it does? I'm not sure I'm following what exactly you're doing. If I run this through the Bash shell session, I get:

<span></span><span style="color: #000080; font-weight: bold">user@host:~$</span> ls --many-parameters-here

If I double-click on ls, I don't get the whitespace ahead of ls selected. This is using Firefox. Can you please help me reproduce the problem that you're trying to solve? I would have though the issue was that there was a single span which caused your browser to combine the and the ls into one "selection", but that doesn't seem to be the case given in your second example (which looks as expected) doesn't seem to work for you.

Copy link
Contributor Author

Ok, just to understand this properly: If you change it to be of part of the prompt, it doesn't get selected, but if it's whitespace it does?

Exactly. That's because the .gp CSS class has user-select: none, while .w doesn't (and really shouldn't I guess, if anything, there should be a new "whitespace-but-not-selectable" class).

If I double-click on ls, I don't get the whitespace ahead of ls selected.

Double click only only selects the word (command in this case) without any surrounding whitespace. But try a "triple click" (which is supposed to select the whole line) and you get the starting space:

Screenshot from 2020-12-25 16-23-21

The same thing happens e.g. when you have for example multiple lines of commands (and don't show output) and the user wants to select multiple lines. Take this Sphinx doc as an example:

New developers: just run these commands to set up your PC to get started:

.. code-block:: console

   user@host:~$ apt update
   user@host:~$ apt install -y python3 python3-pip
   user@host:~$ mkdir -p ~/.config/pip/
   user@host:~$ echo -e "[install]\nupdate-strategy = eager" > ~/.config/pip/pip.conf

From the rendered HTML I'd expect that users can copy/paste the entire code block, paste it in the terminal, and all commands run through (assuming all commands are non-interactive). This is properly assured by the .gp CSS class (so the prompt itself is not copied). But since the leading whitespace will be included in the copied text, at the very least the commands do not end up in the history. And it's not pretty 😃:

Screenshot from 2020-12-25 16-35-15

For comparison, here is the same selection with the patch applied (notice how the space after the $ is not selected):

Screenshot from 2020-12-25 16-36-10

Copy link

Anteru commented Dec 25, 2020

Thanks for the detailed explanation, that makes sense! Would you mind taking another look at the test_end_of_line_nums tests? I'm feeling a bit uncomfortable about the switch to built-in there.

Copy link
Contributor Author

mathiasertl commented Dec 26, 2020

I too feel a bit uncomfortable about this, I agree. I looked into this in a bit more detail, and believe I found what's causing it. But I think the answer is... not pretty. Not sure you want to know ;-). In any case I believe my patch only shows a problem, and does not cause it.

But here's what's happening: In both cases (with/without patch), the first Text/Builtin element is yielded by - the first yield in that function.

I have modified the code a bit to get some text output to see what's happening. Here is the modified code snippet (note that I have also added a print in front of every other yield to make sure that this is really the first one, but I've trimmed that here for brevity):

diff --git a/pygments/lexers/ b/pygments/lexers/
index daef1f39..3d97b2ad 100644
--- a/pygments/lexers/
+++ b/pygments/lexers/
@@ -204,8 +204,12 @@ class ShellSessionBaseLexer(Lexer):
                 backslash_continuation = curcode.endswith('\\\n')
                 if insertions:
-                    toks = innerlexer.get_tokens_unprocessed(curcode)
+                    print('insertions:', insertions)
+                    print('   curcode:', repr(curcode))
+                    toks = list(innerlexer.get_tokens_unprocessed(curcode))
+                    print('      toks:', toks)
                     for i, t, v in do_insertions(insertions, toks):
+                        print('     yield:', 1, pos+i, t, repr(v))  # proof of what we yield
                         yield pos+i, t, v
                 yield match.start(), Generic.Output, line
                 insertions = []

Here is what I get without my patch:

insertions: [(0, [(0, Token.Generic.Prompt, '$')])]
   curcode: ' echo \\\nhi\n'
      toks: [(0, Token.Text, ' '), (1, Token.Name.Builtin, 'echo'), (5, Token.Text, ' '), (6, Token.Literal.String.Escape, '\\\n'), (8, Token.Text, 'hi'), (10, Token.Text, '\n')]
     yield: 1 0 Token.Text ''
     yield: 1 0 Token.Generic.Prompt '$'
     yield: 1 1 Token.Text ' '
     yield: 1 2 Token.Name.Builtin 'echo'

And here's what I get with my patch:

insertions: [(0, [(0, Token.Generic.Prompt, '$ ')])]
   curcode: 'echo \\\nhi\n'
      toks: [(0, Token.Name.Builtin, 'echo'), (4, Token.Text, ' '), (5, Token.Literal.String.Escape, '\\\n'), (7, Token.Text, 'hi'), (9, Token.Text, '\n')]
     yield: 1 0 Token.Name.Builtin ''
     yield: 1 0 Token.Generic.Prompt '$ '
     yield: 1 2 Token.Name.Builtin 'echo'

Bottom line seems to be that the class of the first yielded empty text token - before the prompt - is determined by the element that comes after the prompt.

I have also attached two more test cases to illustrate and dissect this even further. Note that the test test_comment_after_prompt passes both with and without this PR applied.

Copy link

Anteru commented Dec 26, 2020

Thanks for the analysis ... now I'm even more confused. @birkenfeld, would you mind taking a look as to what is going on here?

Copy link

It's an artifact of do_insertions I would guess. I thought something like that was fixed recently, maybe there is another instance.

@birkenfeld birkenfeld merged commit 00a31bc into pygments:master Dec 28, 2020
Copy link

Ok, the artifact has been fixed, and I've merged the PR removing the now-unneeded empty tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.

None yet

3 participants