Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] ANSI-escape codes for colourful HTML mails #1015

Merged
merged 4 commits into from
Jan 21, 2020

Conversation

pazz
Copy link
Owner

@pazz pazz commented Jan 28, 2017

This is a rebase of an old branch "0.3.6-feature-color-wip", which I will retire now, and ultimately passes on a proof-of-concept implementation of TextWidgets that interprets ANSI escape codes.

The aim is to let the html renderer interpret colours from html mails and translate them to ANSI escapes. For instance, elinks can do this:

text/html; elinks -force-html -dump -dump-color-mode 3 -dump-charset utf8 -eval 'set document.codepage.assume = "%{charset}"' %s; copiousoutput

This feature can be enabled simply by configuring your mailcap as above and should not have any effect
if the browser does not produce these characters.
The boolean config setting can be used to disable interpretation of a subset of escape codes,
those which change the background colour of text (on top of say, changing foreground or character weights).

This is intended as a RFC and is not at all ready to go live!

I'll retire the related issue #612 because that PR is outdated.

@pazz pazz changed the title This is a rebase of an old branch "0.3.6-feature-color-wip", which I will retire now, and ultimately passes on a proof-of-concept implementation of TextWidgets that interprets [ANSI escape codes](https://en.wikipedia.org/wiki/ANSI_escape_code#Colors). ANSI-escape codes for colourful HTML mails Jan 28, 2017
@pazz pazz mentioned this pull request Jan 28, 2017
@pazz pazz force-pushed the feature-ANSI-colour-interpretation branch from 4b66809 to cdf30c2 Compare January 28, 2017 16:13
Copy link
Collaborator

@dcbaker dcbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all is pretty straight forward and looks mostly good. I have a few comments, but they're mostly nits.

corresponding Urwid Attributes object, it also returns a dictionary which
maps all attributes applied here to focused attribute.
"""
ECODES = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance reasons I would not put this dict in the function, if we put it at the module level it will be initialized once, inside the function it will be re-initialized each time the function is called.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, only that the value for '0' should be set to the function parameters.
I could define this dict at module level, then .copy() it and redefine the value for '0' in the function. But that's equally inefficient I suppose.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, that is an interesting problem. I guess we could special case '0' like we do a couple of other values?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever you decide to do here is fine. If it becomes a perforance bottleneck we can figure it out later.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I hardcoded 0 now.

esc_code, esc_substr = part.split('m', 1)
esc_code = esc_code.split(';')

if len(esc_code) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not esc_code avoids uneeded calls.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

attr.update(ECODES[code])
# 256 codes
elif code == '38':
attr.update(fg='h'+esc_code[i+2])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's be consistant and use spaces around the + operators.

@@ -0,0 +1,113 @@
# Copyright (C) 2011-2012 Patrick Totzke <patricktotzke@gmail.com>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably update this to 2017


# If there is no string in esc_substr we skip it, the above
# attributes will accumulate to the next escapes.
if esc_substr != '':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just if esc_substr: is good here.

urwid_fg = attr['fg']
urwid_bg = default_attr.background
if attr['bold']:
urwid_fg += ',bold'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of str concatenation isn't recomended by the python devs for perforamance reasons. There's some clever hacks in CPython to make this fast, but it's not guaranteed to be. The usual idiom is to make a list and then return a ','.join(list_).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? even for "lists" of length 2?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be right, for len <= 3 concatenation may be fine.

I know that i've run into cases in pypy where string concatenation is significantly slower than it is in cpython (granted with several hundred concatenation operations), and there is a comment in the documentation somewhere that says string concatenation isn't guaranteed to be efficient/fast.

@pazz pazz force-pushed the feature-ANSI-colour-interpretation branch from cdf30c2 to 7a5e9eb Compare January 30, 2017 19:33
@pazz pazz force-pushed the feature-ANSI-colour-interpretation branch 2 times, most recently from 46d8652 to a10f73f Compare January 30, 2017 19:45
@pazz
Copy link
Owner Author

pazz commented Jan 30, 2017

I've just tested that this actually works :) it does, and e.g. github mails look kind of cool.
Now I just need a html renderer that enriches the plaintext and does not enforce a fixed line width.

I suppose one could also use some colouring scripts from ranger, to colorize text attachments like patches..

@dcbaker
Copy link
Collaborator

dcbaker commented Jan 30, 2017

I've never seen ranger, and it's remarkably hard to goolge for. Is it something like pygments?

@pazz
Copy link
Owner Author

pazz commented Jan 30, 2017

sorry, here you go: http://ranger.nongnu.org/
(It is a console file manager)
Maybe more related: http://www.andre-simon.de/doku/highlight/en/highlight.php#ch1

@dcbaker
Copy link
Collaborator

dcbaker commented Jun 16, 2017

What's the status of this branch?

@pazz
Copy link
Owner Author

pazz commented Jun 17, 2017

This seems to work as expected. But this being said, I'm not using it myself simply because it's difficult to find the right input:
In this branch, all ANSI codes are interpreted if they appear in what alot computes as "body text" of a mail. One application is html parts, which get piped through an interpreter that enriches the text with ANSI codes. I've only found one (elinks) with the mailcap entry below

text/html; elinks -force-html -dump -dump-color-mode 3 -dump-charset utf8 -eval 'set document.codepage.assume = "%{charset}"' %s; copiousoutput

... which does that, but elinks will hardcode linebreaks at a given line width. I find this ugly, so I don't use it.

Nevertheless, the widget could later be used nicely to inline display some text-based attachments, or other text-interpreted stuff (pdfs?) directly in alot. This relates to #528 and #894, which are low prio unfortunately. Such a feature should also be via dedicated toggle commands and config settings for the default behaviour I think.

At the very least, ANSI code interpretation as is in this PR should get a disable-switch in the config.
So: This could be a rather cheap new feature merge. The question is just if it's useful to anyone..

@lucc
Copy link
Collaborator

lucc commented Jun 21, 2017

@pazz what do you use to not render html in fixed width?

I'm also not if this is an essential feature. If you are looking for arguments you can look at mutt. They have the ability to highlight text in the body via regex and if I remember correctly I saw a discussion about security once. It was about the gpg notification being intermixed and highlighted with the displayed body text so if the sender adds escape sequences to the mail they could fool you into beliving that it was signed/encrypted.

@pazz
Copy link
Owner Author

pazz commented Jun 21, 2017 via email

@pazz pazz changed the title ANSI-escape codes for colourful HTML mails [RFC] ANSI-escape codes for colourful HTML mails Jul 29, 2018
@lypanov
Copy link

lypanov commented Nov 14, 2019

Any further interest in this feature? I'd love to see alot support https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda

Background: I have tried to move to an elinks+url_scan workflow for "clicking" on links in emails but I'm still having issues getting the numbering to be accurate between the two of them no matter how many hacky regexps I apply.

@pazz
Copy link
Owner Author

pazz commented Nov 14, 2019

Yes, I'm definitely still interested in this feature. Perhaps we should simply rebase and merge it...
There is also this branch, which may be related: #1441

I'm not sure that your feature request is related at all @lypanov :
This PR would introduce a new text widget that interprets escape codes for colourings, i.e., removes them from the text and wraps the text in urwid-styled subwidgets.
It would not interpret any other non-printable escape codes such as the url one. Therefore, the displayed text would simply still contain that code. If your terminal interprets these, it should work independently from alot. I'm surprized it doesn't for you.

@lypanov
Copy link

lypanov commented Nov 14, 2019

Sorry for the delay in response. I've finally got gnome-terminal running locally to test this but (maybe due to confusion on my end) am unable to get the direct pass through working as expected.

Specifically, I have a .sh which wraps the printf test line (from https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda). I've verified that the .sh works from within the terminal, however, when used in alot via the the following mailcap entry:
text/html; /home/lypanov/.local/bin/urltest.sh; copiousoutput

I see the following within alot when navigating to an email with HTML content:
?]8;;http://example.com?\This is a link?]8;;?\

Is there something magical extra needed in the mailcap entry?

Thank you!

@pazz
Copy link
Owner Author

pazz commented Nov 20, 2019

I'm not sure how exactly these escape codes work, meaning where they are interpreted, but was under the impression that the terminal translates them whenever they occur on screen. So there should be no difference between the shell and urwid/alot.

Perhaps it's worth asking on the urwid list.

@lypanov
Copy link

lypanov commented Nov 29, 2019

Dug a lot deeper on this this morning. URWIDs support for colors looks like it could be extended to support hyperlinks also as it's not using something like curses but rather a raw escape sequence based method for rendering. Supporting the OSC 8 hyperlink extension would just be a case of adding this support to URWID, and then extending your color patch.

I'll try this branch and look at the next steps as soon as I have a working terminal emulator (right now even that has a bunch of issues).

@pazz
Copy link
Owner Author

pazz commented Dec 1, 2019

@lypanov: maybe, but you'd also need to find a html2txt renderer that produces those sequences, unless they are already part of a plaintext email. I'm not sure if elinks as above already does that.

In any case, the patch still looks good to me and should be harmless if one doesn't use escape sequences.
@lucc: should we simply merge this into master?

@lypanov
Copy link

lypanov commented Jan 4, 2020

@lypanov: maybe, but you'd also need to find a html2txt renderer that produces those sequences, unless they are already part of a plaintext email. I'm not sure if elinks as above already does that.

Finally on a "coding vacation" and was able to make a postprocess hack in ruby for elinks to merge it's link numbering output back into the text in the hyperlink format needed. Works well enough. Able to use it via a pipeto hack right now but would be amazing to have it inline. Would like to look into the URWID / alot side of things next. Any progress on this ticket?

@lypanov
Copy link

lypanov commented Jan 8, 2020

Got this working locally (in a very messy form for now). Have patches to urwid and alot (~20 lines) on top of this branch. I'll harden it for a few days and try to get the mess on github in some form. I'll first try to get the branch up to date with master as as of right now I'm blocked by it being such an old version of alot.

@lypanov
Copy link

lypanov commented Jan 9, 2020

Just in case it's useful here is the conflict resolved branch without my changes: https://github.com/lypanov/alot/tree/feature-ANSI-colour-interpretation

@pazz
Copy link
Owner Author

pazz commented Jan 10, 2020

@lypanov: I don't understand. Your last commit is a merge and is not easy to see what the diff to this branch is. Do you want to add to this branch? Then please rebase.
I'm tempted to merge this branch into master as is.

edit: sorry, not I got it; you were rebasing this against master. I've updated the branch here. Let me know if you need to add anything, otherwise i'll merge.

this is based on #612
and can be used interpret colour hints extracted by a html renderer
directly within alots message widgets.

pep8
The new setting "interpret_ansi_background" (boolean, defaults to
False), allows to disable interpretation of ANSI escape characters that
change the background colour of text.
@pazz pazz force-pushed the feature-ANSI-colour-interpretation branch from a10f73f to fa381b3 Compare January 10, 2020 08:03
@lypanov
Copy link

lypanov commented Jan 10, 2020

@pazz Sorry if the message wasn't clear. The branch above is just the merging of master into your branch + resolving of the conflicts. Still not 100% sure of a rebase workflow sorry. (Ah in the meantime see that you've anyway resolved them :) )

My changes aren't quite ready for a PR (and a PR is worthless until my urwid branch is merged anyway). See https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda#gistcomment-3132951 for the various branches / scripts.

@pazz
Copy link
Owner Author

pazz commented Jan 10, 2020

ok cool. thanks for working on this :)

@lypanov
Copy link

lypanov commented Jan 10, 2020

My understanding is that you're using w3m right @pazz? I might look into getting an option into w3m rather than using a postprocess mechanism. At least w3m looks relatively active. elinks / links are far from.

@pazz
Copy link
Owner Author

pazz commented Jan 10, 2020 via email

@pazz pazz merged commit 88f9596 into master Jan 21, 2020
@pazz pazz deleted the feature-ANSI-colour-interpretation branch January 21, 2020 21:25
ryneeverett added a commit to ryneeverett/alot that referenced this pull request Mar 13, 2020
ryneeverett added a commit to ryneeverett/alot that referenced this pull request Mar 13, 2020
ryneeverett added a commit to ryneeverett/alot that referenced this pull request Mar 13, 2020
ryneeverett added a commit to ryneeverett/alot that referenced this pull request Mar 23, 2020
pazz pushed a commit that referenced this pull request May 6, 2020
GuillaumeSeren pushed a commit to GuillaumeSeren/alot that referenced this pull request Oct 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants